100% found this document useful (1 vote)
250 views168 pages

CFA L2 2024 Volume1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
250 views168 pages

CFA L2 2024 Volume1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 168

Quantitative

Methods
Learning Module 1
Basics of Multiple Regression and
Underlying Assumptions

LOS: Describe the types of investment problems addressed by multiple linear regression and
the regression process.

LOS: Formulate a multiple linear regression model, describe the relation between the dependent
variable and several independent variables, and interpret estimated regression coefficients.

LOS: Explain the assumptions underlying a multiple linear regression model and interpret
residual plots indicating potential violations of these assumptions.

Multiple linear regression is a modeling technique that uses two or more independent variables to explain
the variation of the dependent variable. A reliable model can lead to a better understanding of value drivers
and improve forecasts, but an unreliable model can lead to spurious correlations and poor forecasts.

Several software programs and functions exist to help execute multiple regression models:

Software Programs/Functions

Excel Data Analysis > Regression

scipy.stats.linregress

Python statsmodels.lm

sklearn.linear_model.LinearRegression

R lm

PROC REG
SAS
PROC GLM

STATA regress

Vol 1-3
Learning Module 1

Uses of Multiple Linear Regression

LOS: Describe the types of investment problems addressed by multiple linear regression and
the regression process.

The complexity of financial and economic relationships often requires understanding multiple factors that
affect the dependent variable. Some examples where multiple linear regression can be useful include:

y A portfolio manager wants to understand how returns are influenced by underlying factors.
y A financial advisor wants to identify when financial leverage, profitability, revenue growth and changes
in market share can predict financial distress.
y An analyst wants to examine the effect of country risk on fixed-income returns.

In all cases, the basic framework of a regression model is as follows:

y Specify a model, including independent variables.


y Estimate a regression model and analyze it to ensure that it satisfies key underlying assumptions and
meets the goodness-of-fit criteria.
y Test the model’s out-of-sample performance. If acceptable, it can then be used for further identifying
relationships between variables, testing existing theories, or forecasting.

Vol 1-4
Basics of Multiple Regression and Underlying Assumptions

Exhibit 1 Regression process

Explain the variation


of the dependent variable
by using the variation of
other, independent
variables

Is the
Use logistic
dependent variable No
regression
continuous?

Yes

Estimate the
regression model

Analyse the residuals

Are the
Adjust the assumptions of
model No
regression
satisfied?

Yes

Examine the
goodness of fit
of the model

Is the overall fit


No
significant?

Yes

Is the model the


No best of possible
models?

Yes

Use the model for


analysis and
prediction
© CFA Institute

Vol 1-5
Learning Module 1

The Basics of Multiple Regression

LOS: Formulate a multiple linear regression model, describe the relation between the dependent
variable and several independent variables, and interpret estimated regression coefficients.

Multiple regression is similar to simple regression where a dependent variable, Y, is explained by the
variation of an independent variable, X. Multiple regression expands this concept into a statistical procedure
that evaluates the impact of more than one independent variable on a dependent variable. A multiple linear
regression model has the following general form:

Multiple regression equation


Yi = b0 + b1X1i + b2X2i + ∙∙∙ + bk Xki + εi, i = 1, 2, ..., n

Where:
Yi = The ith observation of the dependent variable Y
Xji = The ith observation of the independent variable Xj, j = 1, 2, …, k
b0 = The intercept of the regression
b1, …, bk = The slope coefficients for each of the independent variables
εi = The error term for the ith observation
n = The number of observations

The slope coefficients, b1 to bk, measure how much the dependent variable, Y, changes in response to
a one-unit change in that specific independent variable. In our equation, the independent variable X1,
holding all other independent variables constant, will change Y by a factor of b1. Here, b1 is called a partial
regression coefficient, or a partial slope coefficient, because it explains only the part of the variation in Y
related to that specific variable, X1.

Note that for any multiple regression equation:

y There are k slope coefficients in a multiple regression.


y The k slope coefficients and the intercept, b0, are all known as regression coefficients.
y There are k + 1 regression coefficients in a multiple regression equation.
y The residual term, εi, equals the difference between the actual value of Y (Yi) and the predicted value
of Y (5i). In terms of our multiple regression equation:

Residual term

ε̂ i = Yi − 5i = Yi − (30 + 31X1i + 32X2i + ∙∙∙ + 3k Xki)

Vol 1-6
Basics of Multiple Regression and Underlying Assumptions

Assumptions Underlying Multiple Linear Regression

LOS: Explain the assumptions underlying a multiple linear regression model and interpret
residual plots indicating potential violations of these assumptions.

In order to make valid predictions using a multiple regression model based on ordinary least squares (OLS),
a few key assumptions must be met.

Exhibit 2 Multiple linear regression assumptions

Assumption Description Violation

Dependent and independent


Linearity Nonlinearity
variable have linear relationship

Variance of residuals constant


Homoskedasticity Heteroskedasticity
across all observations

Observations are independent of


Serial correlation
Independence of errors each other; errors (ie, residuals)
or autocorrelation
uncorrelated across all observations

Residuals normally distributed, with


Normality Non-normality
expected value of zero

Independent variables are not


Independence of
random; no exact linear relation Multicollinearity
independent variables
between independent variables

Statistical tools exist to test these assumptions and the model for overall goodness of fit. Most regression
software packages have built in diagnostics for this purpose.

To better illustrate this, consider a regression to analyze 10 years of monthly total excess returns of ABC
stock using the Fama-French three-factor model. This model uses market excess return (MKTRF), size
(SMB), and value (HML) as explanatory variables.

ABCreturn = b0 + b1 MKTRFt + b2SMBt + b3HMLt + εt


t

Vol 1-7
Learning Module 1

The software produced the following set of scatterplots to test the relationship between the three
independent variables:

Exhibit 3 Scatterplots for three independent variables

0.06
0.04
0.02
0.02
−0.02
−0.04

0.05
0.00
−0.05
−0.10
−0.15

0.2
0.1
0.0
−0.1
−0.2
−0.10 0.00 0.10 −0.05 0.00 0.05 −0.1 0.00 −0.2 0.0 0.2
MKTRF SMB HML ABC_RETRF
© CFA Institute

In the lower set of scatterplots of Exhibit 3, there is a positive relationship between ABC’s return and the
market risk factor (MKTRF), no apparent relationship between ABC’s return and the size factor (SMB) and a
negative relationship between ABC’s return and the value factor (HML).

In the second-to-last (penultimate) level we can see little relationship between SMB and HML. This
suggests independence between the variables, which satisfies the assumption of independence.

Then, compare the predicted values, or 5i, with the actual values of ABC RETRFt in the residual plot
in Exhibit 4:

Vol 1-8
Basics of Multiple Regression and Underlying Assumptions

Exhibit 4 Residual plot

0.20

0.15

0.10

0.05

0.00

−0.05

−0.10

−0.15

−0.20

−0.25
−0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2
Predicted value

Potential outliers indicated with square markers

© CFA Institute

Exhibit 4 shows the relationship between the residuals and the predicted values. A visual inspection does
not show any directional relationship, positive or negative, between the residuals and the predicted values
from the regression model. This also suggests that the regression’s errors have a constant variance and are
uncorrelated with each other. There are however, three residuals (square markers) that may be outliers.

Vol 1-9
Learning Module 1

Exhibit 5 Regression residuals versus each of the three factors

0.20
0.15
0.10
0.05
0.00
−0.05
−0.10
−0.15
−0.20
−0.25
−0.06 −0.4 −0.02 0.00 0.02 0.04 0.06 0.08
MKTRF

0.20

0.15

0.10

0.05

0.00

−0.05

−0.10

−0.15

−0.20

−0.25
−0.06 −0.4 −0.02 0.00 0.02 0.04 0.06 0.08
SMB

0.20

0.15

0.10

0.05

0.00

−0.05

−0.10

−0.15

−0.20

−0.25
−0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10
HML
© CFA Institute

Vol 1-10
Basics of Multiple Regression and Underlying Assumptions

Each plot shows the relationship of the residual output versus the value of each independent variable
to look for directional relationships related to that specific factor. In this example, none of the three plots
indicate any direct relationship between the residuals and the explanatory variables, which suggests
that there is no violation of multiple regression assumptions. Furthermore, in all four graphs, the outliers
identified are the same.

Exhibit 6 is a normal Q-Q plot used to visualize the distribution of a variable compared with a theoretical
normal distribution.

Exhibit 6

0.20

0.15

0.10

0.05

0.00

−0.05

−0.10

−0.15

−0.20

−0.25
−3 −2 −1 0.00 1 2 3
Theoretical distribution

Superimposed on the plot is a linear relation

© CFA Institute

In this plot, the red line represents a normal distribution with a mean of 0 and a standard deviation of 1.
The green dots are the model residuals fit to a normal distribution, or the empirical distribution on the
vertical axis of Exhibit 5. These are superimposed over the red theoretical distribution line to visualize how
consistent the normalized residuals are with a standard normal distribution. The same three outliers remain,
but the rest of the residuals closely align with a normal distribution, which is the desired outcome.

Vol 1-11
Learning Module 1

Vol 1-12
Learning Module 2
Evaluating Regression Model Fit and
Interpreting Model Results

LOS: Evaluate how well a multiple regression model explains the dependent variable by
analyzing ANOVA table results and measures of goodness of fit.

LOS: Formulate hypotheses on the significance of two or more coefficients in a multiple


regression model and interpret the results of the joint hypothesis tests.

LOS: Calculate and interpret a predicted value for the dependent variable, given the estimated
regression model and assumed values for the independent variable.

Goodness of Fit

LOS: Evaluate how well a multiple regression model explains the dependent variable by
analyzing ANOVA table results and measures of goodness of fit.

The coefficient of determination measures a regression’s goodness of fit, known as the R2 statistic:
how much of the variation in the dependent variable is captured by the independent variables in the
regression. Exhibit 1 shows how a regression model explains the variation in the dependent variable:

Exhibit 1 Regression model seeks to explain the variation of Y

Total variation of Y
Sum of squares Variation of Y explained
regression (SSR) by the regression
n
Sum of squares (Yi − Y )2
total (SST) i=1

n
(Yi − Y )2 Sum of squares Variation of Y unexplained
i=1 error (SSE) by the regression
n
(Yi − Y i )2
i=1

Y = Dependent variable Yi = Predicted value of Y for a particular Xi


Yi = Observed value of Y for a particular Xi Y = Average value of Y

Vol 1-13
Learning Module 2

R2 is calculated as:

Coefficient of determination

R 2 = Total variation − Unexplained variation

Sum of squares regression


R2 =
Sum of squares total

n n
R2 = (Yi − Y i )2 / (Yi − Y i)2
i=0 i=0

Where:
n is the number of observations in the regression
Yi is an observation of the Y variable
Y is the predicted value of the dependent variable
Y is the average of the dependent variable

A major concern with using R2 in multiple regression analysis is that as more independent variables are
added to the model, the total amount of unexplained variation will decrease as the amount of explained
variation increases. As such, each successive R2 measure will appear to reflect an improvement over the
previous model. This will be the case as long as each newly added independent variable is even slightly
correlated with the dependent variable and is not a linear combination of the other independent variables
already in the regression model.

Other limitations to using R2:

y It does not tell the analyst whether the coefficients are statistically significant
y It does not indicate whether there are biases in the coefficients or predictions
y It can misread the fit due to bias and overfitting

Overfitting can result from an overly complex model with too many independent variables relative to the
number of observations. In such cases, the model does not properly represent the true relationship between
the independent and dependent variables.

Therefore, analysts typically use adjusted R2, or 72, which does not increase whenever another variable is
added to the regression since it is adjusted for degrees of freedom.

Adjusted R 2
Sum of squares error / (n − k − 1)
R2 =
Sum of squares total / (n − k − 1)
Where: k = Number of independent variables

A few things to note when comparing R2 to 72

y If k = 1, R2 > 72
y 72 will decrease if the inclusion of another independent variable in the regression model results in a
nominal increase in explained variation (RSS) and R2.

Vol 1-14
Evaluating Regression Model Fit and Interpreting Model Results

y 72 can be negative (in which case we consider its value to equal 0) while R2 can never be negative.
y If 72 is used to compare two regression models, the dependent variable must be identically defined in
the two models and the sample sizes used to estimate the models must be the same.

Additionally, if the t-statistic > |1| then 72 will increase; conversely, values < |1| will decrease 72

There are cases where both the R2 and 72 can increase when more independent variables are added. For
these cases there are several statistics used to compare model quality, including Akaike’s information
criterion (AIC) and Schwarz’s Bayesian information criterion (BIC).

AIC is used to evaluate a collection of models that explain the same dependent variable. Even though this
will generally be provided in the output for regression software, we can also calculate it as:

Sum of squares error


AIC = n × ln + 2(k + 1)
n
Where:
k = Number of independent variables
n = Sample size

A lower AIC indicates a better-fitting model. Note that AIC depends on the sample size (n), the number
of independent variables (k), and the sum of the squares error (SSE). The term at the end, 2(k + 1), is a
penalty term that increases as more independent variables, k, are added.

Similarly, BIC allows comparison of models with the same dependent variable:

Sum of squares error


BIC = n × ln + ln(n)(k + 1)
n
Where:
k = Number of independent variables
n = Sample size

With BIC, there is a greater penalty for having more parameters than with AIC. BIC will tend to prefer
smaller models because ln(n) is greater than 2, even for very small sample sizes. AIC is preferred if the
model is for prediction purposes, and BIC is preferred for evaluating goodness of fit.

Vol 1-15
Learning Module 2

The AIC and the BIC alone are not telling, however, and should be compared across models using a
combination of factors. Example 1 shows the goodness-of-fit measures for a model that incorporates five
independent variables (factors):

Example 1 Goodness of fit evaluation

R2 Adjusted R 2 AIC BIC

Factor 1 only 0.541 0.531 19.079 22.903


Factors 1 and 2 0.541 0.531 21.078 26.814
Factors 1, 2, and 3 0.562 0.533 20.743 28.393
Factors 1, 2, 3, and 4 0.615 0.580 16.331 25.891
Factors 1, 2, 3, 4, and 5 0.615 0.572 18.251 29.687

Note that:
R2 increases or stays the same as more factors are added
72 either increases or decreases as each new factor is added
AIC is minimized when the first four factors are used
BIC is minimized when only the first is used

Using the results, we would select the four-factor model if we were using it to make predictions, but
would use the first model if we were just measuring goodness of fit.

Testing Joint Hypotheses for Coefficients

LOS: Formulate hypotheses on the significance of two or more coefficients in a multiple


regression model and interpret the results of the joint hypothesis tests.

In a multiple regression, the intercept is the value of the dependent variable if all independent variables are
0. The slope coefficient of each of the independent variables is the change in the dependent variable for a
change in that independent variable if all other independent variables remain constant.

Tests for individual coefficients in multiple regression are identical to tests for individual coefficients in
simple regression. The hypothesis structure is the same and the t-test is the same.

For a two-sided test of whether a variable is significant in explaining the dependent variable’s variation, the
hypotheses are:

H0: bi = Bi

Ha: bi ≠ Bi

Where b is the true coefficient for the ith independent variable and B is a hypothesized slope coefficient for
the same variable.

Vol 1-16
Evaluating Regression Model Fit and Interpreting Model Results

If the hypothesis test is simply to test the significance of the variable’s predictive power, the hypotheses
would be: H0: Bj = 0 and Ha: Bj ≠ 0

There are times to test a subset of variables in a multiple regression, for example, when comparing the
Fama-French three-factor model (MKTRF, SMB, HML) to the Fama-French five-factor model (MKTRF, SMB,
HML, RMW, CMA) to determine which model is more concise or to find the factors that are most useful in
explaining the variation in the dependent variable. In other words, it may be that not all the factors in such a
model are actually required for the model to have predictive power.

The full model, using all independent variables, is called the unrestricted model. This model is compared
with a restricted model, which effectively includes fewer independent variables since coefficients for each
unneeded variable are set to 0. A restricted model is also called a nested model since its independent
variables form a subset of the variables in the unrestricted model.

Unrestricted five-factor model:

Yi = b0 + b1X1i + b2X2i + b3X3i + b4X4i + b5X5i + εi

Restricted two-factor model:

Yi = b0 + b1X1i+ b4X4i + εi

The hypothesis test in this example would be to test whether the coefficients of X2, X3, and X5 are
significantly different than 0. To compare the unrestricted model to the nested model, perform an F-test to
test the role of the jointly omitted variables:

Sum of squares error(Restricted model) − Sum of squares error(Unrestricted model)


q
F=
Sum of squares error(Restricted model)
n−k−1

Where: q = Number of variables omitted in the restricted model

The role of the F-test determines whether the change in the sum of squared errors (SSE) caused by
including the variables from the unrestricted model is significant enough to compensate for the decrease in
degrees of freedom. In the example shown here, there is a loss of three degrees of freedom since there are
only two independent variables instead of five.

y The null hypothesis is that the slope of the omitted factors is equal to 0:
y The alternative hypothesis is that at least one is not equal to 0: Ha: at least one of the factors ≠ 0.

If the F-statistic is less than the critical value, then we fail to reject the null hypothesis. This means that the
added predictive power of the variables omitted in the restricted model is not significant and the restricted
model fits the data better.

Vol 1-17
Learning Module 2

Exhibit 2 summarizes the desired values of a multiple regression test:

Exhibit 2 Assessing model fit using multiple regression statistics

Statistic Criterion to use in assessment

Adjusted R2 The higher the better

Akaike’s information criterion (AIC) The lower the better

Schwarz’s Bayesian information criterion (BIC) The lower the better

Outside the bounds of critical t-value(s) for the


t-statistic on a slope coefficient
selected significance level

Exceeds the critical F-value for the selected


F-test for joint tests of slope coefficients
significance level

© CFA Institute

Forecasting Using Multiple Regression

LOS: Calculate and interpret a predicted value for the dependent variable, given the estimated
regression model and assumed values for the independent variable.

Predicting the value of the dependent variable in a multiple regression is similar to the prediction process for
a simple regression. However, in the case of multiple independent variables, the predicted value is the sum
of the product of each variable and its coefficient, plus the intercept:

5f = 30 + 31X1f + 32X2f + … 3kXkf


For example, given the following formula:

5i = 3.546 + 3.235X1 + 7.342X2 − 7.234X3


Assume the values of X1, X2, and X3 are:

X1 X2 X3

3.8 8.3 5.9

With this information the predicted value of Yi is calculated as:

5i = 3.546 + (3.235 × 3.8) + (7.342 × 8.3) − (7.234 × 5.9) = 34.097

Vol 1-18
Evaluating Regression Model Fit and Interpreting Model Results

It should be noted that the estimate should include all the variables, even those that are not statistically
significant, since these variables were used in estimating the value of the slope coefficient.

As with simple linear regression, in multiple linear regression there will often be a difference between the
actual value and the value forecasted by the regression model. This is the error term, or the ε1 term of the
regression equation: the difference between the predicted value and the actual value. This is the basic
uncertainty of the model known as the model error.

Models using estimated independent variables add another source of error. These out-of-sample data
introduce sampling error to the model and will increase the error contributed by the model error.

Vol 1-19
Learning Module 2

Vol 1-20
Learning Module 3
Model Misspecification

LOS: Describe how model misspecification affects the results of a regression analysis and how
to avoid common forms of misspecification.

LOS: Explain the types of heteroskedasticity and how it affects statistical inference.

LOS: Explain serial correlation and how it affects statistical inference.

LOS: Explain multicollinearity and how it affects regression analysis.

Model Specification Errors

LOS: Describe how model misspecification affects the results of a regression analysis and how
to avoid common forms of misspecification.

Model specification refers to the set of variables included in the regression and the regression equation’s
functional form. A good regression model will:

y Be grounded in economic reasoning


y Be concise: each variable included in the model is essential
y Perform well out of sample
y Have an appropriate functional form (for example, if a nonlinear form is expected, then it should
use nonlinear terms)
y Satisfy regression assumptions, without heteroskedasticity, serial correlation, or multicollinearity

Vol 1-21
Learning Module 3

Misspecified Functional Form


Exhibit 1 illustrates four ways a model’s functional form may fail:

Exhibit 1 Functional form failures

Failures in regression
Explanation Possible consequence
functional form

One or more important variables are Heteroskedasticity or


Omitted variables
omitted from the regression serial correlation

Ignoring a nonlinear relationship


Inappropriate form
between the dependent and the Heteroskedasticity
of variables
independent variable

One or more regression variables


Inappropriate Heteroskedasticity or
may need to be transformed before
variable scaling multicollinearity
estimating the regression

Regression model pools data from


Inappropriate Heteroskedasticity or
different samples that should not
data pooling serial correlation
be pooled

Omitted Variables
The omitted variable bias is the bias resulting from the omission of an important independent variable.

For example, assume the true regression model is defined as:

Y1 = b0 + b1X 1i + b2X2i + εi

But the model was estimated as:

Y1 = b0 + b1X 1i + εi

In this case, the model would be misspecified by the omission of X2. If the omitted variable is uncorrelated
with X1, then the residual would be b2X2i + εi. This means that the residual would not have an expected
value of 0 nor would it be independent and identically distributed. As a result, the estimate of the intercept
would be biased, even if the X1 were estimated correctly.

If, however, the omitted variable X2 is correlated with X1, then the error term in the model would now be
correlated with X1 and the estimated values in the model would be biased and inconsistent with b1, so the
intercept and residuals would also be incorrect.

Inappropriate Form of Variables


An example is when the analyst fails to account for nonlinearity in the relationship between the dependent
variable and one or more independent variables. The analyst should consider whether the situation
suggests a nonlinear relationship and should confirm nonlinearity by plotting the data. Sometimes,
misspecification can be fixed by taking the natural logarithm of the variable.

Vol 1-22
Model Misspecification

Inappropriate Scaling of Variables


Using unscaled data when scaled data is more appropriate can result in a misspecified model. This can
happen, for example, when looking at financial statement data across companies. This misspecification can
be addressed by using common-size financial statements, allowing analysts to quickly compare trends such
as profitability, leverage, and efficiency.

Inappropriate Pooling of Data


Inappropriate pooling of data occurs when a sample spans structural breaks in the behavior of the data,
such as changes in government regulations or a change from a low-volatility period to a high-volatility
period. In a scatterplot, this type of data can appear in discrete, widely separated clusters with little or
no correlation. This can be fixed by using the subsample most representative of conditions during the
forecasting period.

Violations of Regression Assumptions: Heteroskedasticity

LOS: Explain the types of heteroskedasticity and how it affects statistical inference.

Heteroskedasticity occurs when the variance of the error term in the regression is not constant across
observations; it results from a violation of the assumption of homoskedasticity, that is, that there is no
systematic relationship between the regression residuals, or the vertical distances between the data points
and the regression line, and the independent variable.

Heteroskedasticity can result from any kind of model misspecification. Exhibit 2 shows the scatterplot and
regression line for a model with heteroskedasticity. Notice that the regression residuals appear to increase
in size as the value of the independent variable increases.

Exhibit 2 Example of heteroskedasticity (violation of the homoskedasticity assumption)

Linear regression Residual plot

120 25
20
100 15
10
80
5
0
60 50 100 150 200 250
−5
40 −10
−15
20 −20
−25
−30
0 50 100 150 200 250
X = Annual household income X = Annual household income
(USD thousands)(independent variable) (USD thousands)

Vol 1-23
Learning Module 3

Consequences of Heteroskedasticity
Heteroskedasticity comes in two forms:

y Unconditional heteroskedasticity occurs when the heteroskedasticity of the variance in the error
term is not related to the independent variables in the regression. Unconditional heteroskedasticity
does not create major problems for regression analysis even though it violates a linear
regression assumption.
y Conditional heteroskedasticity occurs when the heteroskedasticity in the error variance is
correlated with the independent variables in the regression. While conditional heteroskedasticity
creates problems for statistical inference, such as unreliable F-tests and t-tests, it can be easily
identified and corrected. Conditional heteroskedasticity will tend to find significant relationships when
none actually exist and lead to Type I errors, or false positives.

Testing for Conditional Heteroskedasticity


The most common test for heteroskedasticity is the Breusch-Pagan (BP) test. The BP test is best
explained as a three-step process requiring a regression of the squared residuals from the original
estimated regression equation, where the dependent variable is regressed on the independent variables in
the regression.

If conditional heteroskedasticity does not exist, the independent variables will not explain much of the
variation in the squared residuals from the original regression. However, if conditional heteroskedasticity is
present, the independent variables will explain the variation in the squared residuals to a significant extent.
Because, in this case, each observation’s squared residual is correlated with the independent variables, the
independent variable will affect the variance of the errors.

The test statistic for the BP test is approximately chi-square distributed, and is calculated as:
Chi-square test statistic

X2 = nR2
BP,k

Where:
n = Number of observations
R2 = Coefficient of determination of the second regression (the regression when the squared
residuals of the original regression are regressed on the independent variables)
k = Number of independent variables

The null hypothesis is that the original regression’s squared error term is uncorrelated with the independent
variables, or no heteroskedasticity is present.

The alternative hypothesis is that the original regression’s squared error term is correlated with the
independent variables, or heteroskedasticity is present.

The BP test is a one-tailed chi-square test, because conditional heteroskedasticity is only a problem if it
is too large.

Vol 1-24
Model Misspecification

Example 1 Testing for heteroskedasticity

An analyst wants to test a hypothesis suggested by Irving Fisher that nominal interest rates increase by
1% for every 1% increase in expected inflation. The Fisher effect assumes the following relationship:

i = r + πe

Where:

y i = Nominal interest rate


y r = Real interest rate (assumed constant)
y πe = Expected inflation

The analyst specifies the regression model as: ii = b0 + b1 πe + εi.

Since the Fisher effect basically asserts that the coefficient on the expected inflation (b1) variable equals
1, the hypotheses are structured as:

H0: b1 = 1

Ha: b1 ≠ 1

Quarterly data for 3-month T-bill returns, the nominal interest rate, are regressed on inflation rate
expectations over the last 25 years. The results of the regression are:

Coefficient Standard error t-statistic

Intercept 0.04 0.0051 7.843


Expected inflation 1.153 0.065 17.738
Residual standard error 0.029
Multiple R-squared 0.45
Observations 100
Durbin-Watson statistic 0.547

To determine whether the data support the assertions of the Fisher relation, we calculate the t-stat for the
slope coefficient on expected inflation as:

b1 − b1 1.153 − 1
t = = ≈ 2.35
Standard error b1 0.065

The critical t-values with 98 degrees of freedom at the 5% significance level are approximately −1.98
and +1.98. The test statistic is greater than the upper critical t-value, so we reject the null hypothesis and
conclude that the Fisher effect does not hold, since the coefficient on expected inflation appears to be
significantly different from 1.

However, before accepting the validity of the results of this test, we should test the null hypothesis that
the regression errors do not suffer from conditional heteroskedasticity. A regression of the squared
residuals from the original regression on expected inflation rates yields R2 = 0.193.

Vol 1-25
Learning Module 3

The test statistic for the BP test is calculated as:

Χ2 = nR2 = 100 × 0.193 = 19.3

The critical Χ2 value at the 5% significance level for a one-tailed test with one degree of freedom is 3.84.
Since the t-statistic (19.3) is higher, we reject the null hypothesis of no conditional heteroskedasticity
in the error terms. Since conditional heteroskedasticity is present in the residuals (of the original
regression), the standard errors calculated in the original regression are incorrect, and we cannot accept
the result of the t-test above (which provides evidence against the Fisher relation) as valid.

Correcting Heteroskedasticity
With efficient markets, heteroskedasticity should not exist in financial data. However, when it can be
observed, an analyst should not only look to correct for heteroskedasticity, but also understand it and try to
capitalize on it.

There are two ways to correct for conditional heteroskedasticity in linear regression models. The first
is to use robust standard errors, also known as White-corrected standard errors or heteroskedasticity-
consistent standard errors, to recalculate the t-statistics for the original regression coefficients. The other
method is to use generalized least squares, where the original regression equation is modified to eliminate
heteroskedasticity.

Example 2 Using robust standard errors to adjust for conditional heteroskedasticity

The analyst corrects the standard errors from the initial regression of 3-month T-bill returns (nominal
interest rate) on expected inflation rates for heteroskedasticity and obtains the following results:

Coefficient Standard error t-statistic

Intercept 0.04 0.0048 8.333


Expected inflation 1.153 0.085 13.565
Residual standard error 0.029
Multiple R-squared 0.45
Observations 100

Compared with the regression results in Example 1, notice that the standard error for the intercept
does not change significantly, but the standard error for the coefficient on expected inflation increases
by about 30% (from 0.065 to 0.085). Further, the regression coefficients remain the same (0.04 for the
intercept and 1.153 for expected inflation).

Using the adjusted standard error for the slope coefficient, the test statistic for the hypothesis test is
calculated as:

b1 − b1 1.153 − 1
t = = ≈ 1.8
Standard error b 0.085
1

Vol 1-26
Model Misspecification

When we compare this test statistic to the upper critical t-value (1.98), we fail to reject the null
hypothesis since the upper value is greater than the test statistic. The conditional heteroskedasticity in
the data was so significant that the result of our hypothesis test changed once the standard errors were
corrected for heteroskedasticity. We can conclude that the Fisher effect holds since the slope coefficient
of the expected inflation independent variable does not significantly differ from 1.

Violations of Regression Assumptions: Serial Correlation

LOS: Explain serial correlation and how it affects statistical inference.

Serial correlation (autocorrelation) occurs when regression errors are correlated, either positively or
negatively, across observations, typically in time-series regressions.

The Consequences of Serial Correlation


Serial correlation results in incorrect estimates of the regression coefficients’ standard errors. If none of
the regressors, or independent variables, is a lagged value of the dependent variable, it will not affect the
consistency of the estimated regression coefficients.

For example, when examining the Fisher relation, if we were to use the T-bill return for the previous month
as an independent variable (even though the T-bill return that represents the nominal interest rate is actually
the dependent variable in our regression model), serial correlation would cause all parameter estimates
from the regression to be inconsistent.

y Positive serial correlation occurs when:


○ a positive residual from one estimate increases the likelihood of a positive residual in the next
observation.
○ a negative residual from one observation raises the probability of a negative residual resulting from
the next observation.
○ In either case, positive serial correlation will result in a stable pattern of residuals over time.

y Negative serial correlation occurs when a positive residual in one instance increases the likelihood
of a negative residual in the next.

Positive serial correlation is the most common type found in regression models. Positive serial correlation
does not affect the consistency of the estimated regression coefficients, but it does have an impact on
statistical tests. It will cause the F-stat, which is used to test the overall significance of the regression, to be
inflated because MSE will tend to underestimate the population error variance.

In addition, it will cause the standard errors for the regression coefficients to be underestimated, which
results in larger t-values. Consequently, analysts may incorrectly reject null hypotheses, making Type I
errors, and attach significance to relationships that are in fact not significant.

Testing for Serial Correlation


The Durbin-Watson (DW) test and the Breusch-Godfrey (BG) test are the most common tests for serial
correlation.

Vol 1-27
Learning Module 3

The DW test is a measure of autocorrelation that compares the squared differences of successive residuals
with the sum of the squared residuals. This test is somewhat limited, however, because it only applies to
first-order serial correlation.

The BG test is more robust because it can detect autocorrelation up to a pre-designated order p, where the
error in period t is correlated with the error in period t-p. The null hypothesis of the BG test is that there is
no serial correlation in the model’s residuals up to lag p. The alternative hypothesis is that the correlation
of residuals of at least one of the lags is different from zero and that serial correlation exists up to that
order. The test statistic is approximately F-distributed with n-p-k-1 degrees of freedom where p is the
number of lags.

Correcting Serial Correlation


There are two ways to correct for serial correlation in the regression residuals:

1. Adjust the coefficient standard errors to account for serial correlation: The regression coefficients
remain the same but the standard errors change. This also corrects for heteroskedasticity. After
correcting for positive serial correlation, the robust standard errors are larger than they were
originally. Note that the DW stat still remains the same.
2. Modify the regression equation to eliminate the serial correlation.

Example 3 Correcting for serial correlation

The table shows the results of correcting the standard errors of the original regression for serial
correlation and heteroskedasticity using Hansen’s method:

Coefficient Standard error t-statistic

Intercept 0.04 0.0088 4.545


Expected inflation 1.153 0.155 7.439
Residual standard error 0.029
Multiple R-squared 0.45
Observations 100
Durbin-Watson statistic 0.547

Note that the coefficients for the intercept and slope are exactly the same (0.04 for the intercept and
1.153 for expected inflation) as in the original regression (Example 1). Further, note that the DW stat is
the same (0.547), but the standard errors have been corrected (they are now much larger) to account for
the positive serial correlation.

Given these new and more accurate coefficient standard errors, test the null hypothesis that the
coefficient on the expected inflation independent variable equals 1. The test statistic for the hypothesis
test is calculated as:

b1 − b1
t = = (1.153 − 1)/0.155 ≈ 0.98
Standard error b1

Vol 1-28
Model Misspecification

The critical t-values with 98 degrees of freedom at the 5% significance level are approximately −1.98
and +1.98. Comparing the test statistic (0.987) with the upper critical t-value (+1.98) we fail to reject
the null hypothesis and conclude that the Fisher effect holds as the slope coefficient on the expected
inflation independent variable does not significantly differ from 1.

Note that this result is different from the result of the test we conducted using the standard errors of
the original regression (which were affected by serial correlation and heteroskedasticity) in Example 2.
Further, the result is the same as from the test conducted on White-corrected standard errors (which
were corrected for heteroskedasticity) in Example 2.

Violations of Regression Assumptions: Multicollinearity

LOS: Explain multicollinearity and how it affects regression analysis.

Multicollinearity occurs when two or more independent variables, or combinations of independent


variables, are highly correlated with each other. Multicollinearity can also be present even when there is an
approximate linear relationship between two or more independent variables. This is a particular problem
with financial and economic data because linear relationships are common.

Consequences of Multicollinearity
While multicollinearity does not affect the consistency of OLS estimates and regression coefficients, it does
make them inaccurate and unreliable, as it becomes increasingly difficult to isolate the impact of each
independent variable on the dependent variable. This results in inflated standard errors for the regression
coefficients, which results in t-stats becoming too small and less reliable in rejecting the null hypothesis.

Detecting Multicollinearity
An indicator of multicollinearity is a high R2 and a significant F-statistic, both of which indicate that the
regression model overall does a good job of explaining the dependent variable, coupled with insignificant
t-statistics of slope coefficients. These insignificant t-statistics indicate that the independent variables
individually do not explain the variation in the dependent variable, although the high R2 indicates that
the model overall does a good job: a classic case of multicollinearity. The low t-statistics on the slope
coefficients increase the chances of Type II errors: failure to reject the null hypothesis when it is false.
The variance inflation factor (VIF) can quantify multicollinearity issues. In a multiple regression, a VIF
exists for each independent variable. Assume k independent variables and regress one independent
variable on the k − 1 independent variables to obtain R2 for the regression explained by the other k − 1
independent variables. The VIF for Xj is:

1
VIF i =
1 − R2
j

Vol 1-29
Learning Module 3

For a given independent variable, Xj, the minimum VIFj is 1, which occurs when R2j is 0. The minimum
VIFi means that there is no correlation between Xj and the remaining independent variables. However, VIF
increases as the correlation increases. So the higher a variable’s VIF is, the more likely it is that the variable
can be predicted using another independent variable in the model, making it more likely to be redundant.

The following are useful rules of thumb:

y VIFj > 5 warrants further investigation of the given independent variable


y VIFj > 10 indicates serious multicollinearity requiring correction

Bear in mind that multicollinearity may be present even when we do not observe insignificant t-stats and a
highly significant F-stat for the regression model.

Example 4 Testing for multicollinearity

An individual is trying to determine how closely associated her portfolio manager’s investment strategy is
with the returns of a value index and the returns of a growth index over the last 60 years. She regresses
the historical annual returns of her portfolio on the historical returns of the S&P 500/BARRA Growth
Index, S&P 500/BARRA Value Index, and the S&P 500. The results of her regression are:

Regression coefficient t-statistic

Intercept 1.250
S&P 500/BARRA Growth Index −0.825
S&P 500/BARRA Value Index −0.756
S&P 500 Index 1.520
F-statistic 35.17
R2 82.34%
Observations 60

Evaluate the results of the regression.

Solution

The absolute values of the t-stats for all the regression coefficients—the intercept (1.25), slope
coefficient on the growth index (0.825), slope coefficient on the value index (0.756), and slope coefficient
on the S&P 500 (1.52)—are lower than the absolute value of tcrit (2.00) at the 5% level of significance
(df = 56). This suggests that none of the coefficients on the independent variables in the regression are
significantly different from 0.

However, the F-stat (35.17) is greater than the F critical value of 2.76 (α = 0.05, df = 3, 56), which
suggests that the slope coefficients on the independent variables do not jointly equal 0 (at least one of
them is significantly different from 0). Further, the R2 (82.34%) is quite high, which means that the model
as a whole does a good job of explaining the variation in the portfolio’s returns.

This regression, therefore, clearly suffers from the classic case of multicollinearity as described earlier.

Vol 1-30
Model Misspecification

Correcting for Multicollinearity


Analysts may correct for multicollinearity by excluding one or more of the independent variables from
the regression model. Stepwise regression is a technique that systematically removes variables from the
regression until multicollinearity is eliminated.

Example 5 Correcting for multicollinearity

Given that the regression in Example 4 suffers from multicollinearity, the independent variable "return
on the S&P 500" is removed from the regression. The results of the regression with only the return on
the S&P 500/BARRA Growth Index and the return on the S&P 500/BARRA Value Index as independent
variables are:

Regression coefficient t-statistic

Intercept 1.250
S&P 500/BARRA Growth Index 6.53
S&P 500/BARRA Value Index 1.16
F-statistic 57.62
R2 82.12%
Observations 60

Evaluate the results of this regression.

Solution

The t-statistic of the slope coefficient on the growth index (6.53) is greater than the t-critical value
(2.00), indicating that the slope coefficient on the growth index is significantly different from 0 at the
5% significance level. However, the t-statistic of the value index (−1.16) is not different from 0 at the
5% significance level. This suggests that returns on the portfolio are linked to the returns on the growth
index but not closely related to the returns on the value index.

The F-statistic (57.62) is greater than the F-critical value of 3.15 (α = 0.05, df = 2, 57), which suggests
that the slope coefficients on the independent variables do not jointly equal 0. Further, the R2 (82.12%)
is quite high, which means that the model as a whole does a good job of explaining the variation in the
portfolio’s returns.

Removing the return on the S&P 500 as an independent variable in the regression corrected the
multicollinearity problem in the initial regression. The significant relationship between the portfolio’s
returns and the return on the growth index was uncovered as a result.

Vol 1-31
Learning Module 3

Exhibit 3 summarizes the violations of the assumptions of multiple linear regression covered in this module,
the problems that result, and how to detect and manage them:

Exhibit 3

Problem Effect Solution

Use robust standard errors (corrected for


Heteroskedasticity Incorrect standard errors
conditional heteroskedasticity)

Incorrect standard errors (additional


problems if a lagged value of the Use robust standard errors (corrected for
Serial correlation
dependent variable is used as an serial correlation)
independent variable)

Remove one or more independent


Multicollinearity High R2 and low t-statistics
variables; often no solution based in theory

Vol 1-32
Learning Module 4
Extensions of Multiple Regression

LOS: Describe influence analysis and methods of detecting influential data points.

LOS: Formulate and interpret a multiple regression model that includes qualitative
independent variables.

LOS: Formulate and interpret a logistic regression model.

Influence Analysis

LOS: Describe influence analysis and methods of detecting influential data points.

It is possible for a small number of data points, called influential observations, to bias regression results.

Influential Data Points


There are two categories of observations that can influence regression results:

● A high-leverage point, a data point with an extreme value of an independent variable


● An outlier, an observation of an extreme dependent variable

Both types of influential observations will be significantly different from most of the other sample
observations, but because one is an independent variable, and the other is a dependent variable, they
present in the larger dataset differently.

In Exhibit 1, the triangle on the right side of the plot could be considered a high-leverage point because its
X value does not follow the trend of the other observations, or the other independent variables, and this
observation should be investigated to determine whether it is an influential high-leverage point. To visualize
the impact of omitting this data point, observe the difference between the solid line, representing the
regression without the high-leverage point, and the dashed line, which includes the high-leverage point.

Vol 1-33
Learning Module 4

Exhibit 1 High-leverage point

X
© CFA Institute

Alternatively, the triangle in Exhibit 2 is an outlier: its Y value, or dependent variable, does not follow the
trend of other observations estimated by the regression model. This will result in a large residual. As
with the high leverage point above, this observation should be investigated to determine whether it is an
influential outlier. Also, as in Exhibit 1, the dashed line includes the outlier, while the solid line represents the
slope of the model without the outlier.

Exhibit 2 Outlier

X
© CFA Institute

Outliers and high-leverage points are not always a problem in a regression model. In Exhibit 2, even as the
outlier may appear extreme compared with the other observations, the comparative distance between the
dashed line (original regression) and the solid line (regression without the outlier) may not be significant
enough to warrant removing the data point.

Problems can occur however, if the high-leverage points or outliers result in a tilt of the estimated regression
line toward those data points, and if these outliers materially affect the slope coefficients and goodness-of-
fit statistics.

Vol 1-34
Extensions of Multiple Regression

Detecting Influential Points


Scatterplots can be used to identify outliers and high-leverage points in a simple linear regression.
But a multiple linear regression adds complexity and requires quantitative methods to measure the
extreme values.

High-leverage points are identified by a measure called leverage (hii). For any given independent variable,
leverage is the distance between the value of the ith observation of that variable and the mean of that
variable across all observations (n) in the model.

The leverage of the ith data point will be between 0 and 1. The greater the leverage of i, the more distant its
value is from that variable’s mean and the greater its potential influence on the estimated regression.

The sum of the individual leverages for all observations equals k + 1, where k is the number of independent
variables; 1 is added to k to account for the intercept. Informally, it is assumed that if an observation’s
leverage exceeds

k+1

n

then it is a potentially influential observation.

The preferred method to identify outliers is to use studentized residuals. The following three-step
process explains:

● Estimate the initial regression model with n observations, then re-estimate with observations
deleted one at a time so that each time the model results are re-estimated on the remaining n − 1
observations.
● Compare the original observed Y values, those estimated with n observations, with the predicted Y
values resulting from the models with the ith observation deleted (n − 1 observations). The residual
between the observed Y and the predicted Y value using the model with the ith observation removed
(e ) is defined as:

e = Yi− 5i
● This residual is then divided by the estimated standard deviation of the residuals, se*, which produces
the studentized deleted residual, ti∗.

Studentized deleted residual


e*i ei
ti* = =
se* √MSEi × (1 − hii)

In the equivalent formula (on the right), the terms are based on the initial regression with n observations.

Where:
e = The residual with the ith observation deleted
se* = The standard deviation of the residuals
k = The number of independent variables
MSEi = The mean squared error regression model with the deleted ith observation
hii = The leverage value for the ith observation

Vol 1-35
Learning Module 4

Studentized deleted residuals are effective for detecting influential outlying Y observations. But not
all influential data points should be removed because they can have significant impacts on coefficient
estimates and the interpretation of the model’s results. Rules of thumb to keep in mind are:

● If |t | > 3, then flag the observation as being an outlier.


● If |t | > critical value of t-statistic with n − k − 2 degrees of freedom, then flag the outlier as potentially
influential.

Critical t-statistic values are based on the value of the t-distribution statistic with (n − k −1) degrees of
freedom at the specified significance level.

An observation is influential if its exclusion from the sample causes substantial changes in the estimated
regression function. One metric used to identify influential data points is Cook’s distance (Cook’s D or Di),
which measures how much the estimated values of the regression change if i is deleted. Cook’s distance is
calculated as:

Cook’s distance

eiꢀ hii
Di = ×
(k + 1) × MSE (1 − hii)2

Where:
ei = The residual for observation i
k = The number of independent variables
MSE = The mean squared error of the estimated regression model
hii = The leverage value for observation i

Cook’s D is an F-distributed statistic with k + 1 and n − k − 1 degrees of freedom.

Practical guidelines for using Cook’s D are the following:

● If Di > 0.5, the ith observation may be influential and merits further investigation.
● If Di > 1.0, the ith observation is highly likely to be an influential data point.
● If Di > √k/n, the ith observation is highly likely to be an influential data point.

While Cook’s D can detect influential observations, this often has to be confirmed using visual analysis of
graphs, revised regression results, and studentized residuals.

After detecting influential data points, investigate why they occurred and determine a remedy. If they were
caused by data input errors or inaccurate measurements, then the solution is to correct the erroneous data
and recalculate. If correction is impossible, discard them. If the influential data points are valid, important
explanatory variables may have been omitted and may need to be identified and perhaps included to
ensure that the model satisfies all regression assumptions.

Vol 1-36
Extensions of Multiple Regression

Dummy Variables in a Multiple Linear Regression

LOS: Formulate and interpret a multiple regression model that includes qualitative
independent variables.

Dummy variables in regression models help analysts determine whether a particular qualitative variable
explains the variation in the model’s dependent variable to a significant extent. These variables are most
frequently used to label categories or groups and are binary, taking on a value of either 0 or 1. If the model
aims to distinguish between n categories, n − 1 dummy variables are used with the category that is omitted
as a reference point for the other categories. This omitted category is used as a base case, or control
group. If all n dummy variables were included, the regression would fail because the sum of the dummies
would equal the variable used to estimate the intercept variable.

Dummy variables can be applied either to the regression model’s intercept or slope coefficients.

● A dummy intercept would cause a parallel shift in the regression line, either up or down, from the
regression line estimated for the control group.
● A slope dummy affects the slope coefficient and creates an interaction term between the X variable
and the activated condition: in this case, the slope will become more or less steep.

Cases also exist where both the intercept term and the slope change based on the value of the dummy
variable. In these cases, both the slope and the intercept will differ based on the activation of the
dummy variable.

Example 1 illustrates how to test that the statistical significance of the regression model is different between
groups identified by the dummy variables:

Vol 1-37
Learning Module 4

Example 1 Hypothesis testing with dummy variables

Management believes that the company’s sales are significantly different in the fourth quarter compared
with the other three quarters. Therefore, in regressing sales to determine whether seasonality exists,
management used the fourth quarter as the reference point (ie, the fourth quarter represents the omitted
category) in the regression.

The results of the regression based on quarterly sales data (in $ millions) for the last 15 years are:

Coefficient Standard error t-statistic

Intercept 4.27 0.97 4.4021


SalesQ1 −2.735 0.83 −3.295
SalesQ2 −2.415 0.83 −2.91
SalesQ3 −2.69 0.83 −3.241
ANOVA df SS MS
Regression 3 37.328 12.443
Residual 56 26.623 0.4754
Total 59 63.951
Standard error 0.6895
R2 0.5837
Observations 60

First, state the regression equation to understand what the variables actually represent:

Yt = b0 + b1Salesb1 + b2Salesb2 + b3Salesb3 + ε

Quarterly sales = 4.27 − 2.735SalesQ1 − 2.415SalesQ2 − 2.690SalesQ3 + ε

● b0 (4.27) is the intercept term. It represents average sales in the fourth quarter (the omitted category).
● b1 is the slope coefficient for sales in the first quarter (SalesQ1). It represents the average
difference in sales between the first quarter and the fourth quarter (the omitted category). According
to the regression results, sales in Q1 are on average 2.735 million less than sales in the fourth
quarter. Average sales in Q1 equal 4.27 million − 2.735 = 1.535 million.
● Similarly, sales in Q2 average 2.415 million less than sales in the fourth quarter, while sales in Q3
average 2.69 million less than sales in the fourth quarter. Average sales in Q2 and Q3 are 1.855
million and 1.58 million, respectively.

Vol 1-38
Extensions of Multiple Regression

The F-test is used to evaluate the null hypothesis that, jointly, the slope coefficients all equal 0:

H0: b1 = b2 = b3 = 0

Ha: At least one of the slope coefficients ≠ 0

The F-statistic is given in the regression results (26.174). The critical F-value at the 5% significance level
with 3 and 56 degrees of freedom for the numerator and denominator, respectively, lies between 2.76
and 2.84. Given that the F-statistic for the regression is higher, we can reject the null hypothesis that all
the slope coefficients in the regression jointly equal 0.

When working with dummy variables, t-statistics are used to test whether the value of the dependent
variable in each category is different from the value of the dependent variable in the omitted category. In
our example, the t-statistics can be used to test whether sales in each of the first three quarters of the
year are different from sales in the fourth quarter on average.

● H0: b1 = 0 versus Ha: b1 ≠ 0 tests whether Q1 sales are significantly different from Q4 sales.
● H0: b2 = 0 versus Ha: b2 ≠ 0 tests whether Q2 sales are significantly different from Q4 sales.
● H0: b3 = 0 versus Ha: b3 ≠ 0 tests whether Q3 sales are significantly different from Q4 sales.

The critical t-values for a two-tailed test with 56 [calculated as n − (k + 1)] degrees of freedom are −2.0
and +2.0. Since the absolute values of the t-statistics for the coefficients on each of the three quarters
are greater than 2.0, we reject all three null hypotheses (that Q1 sales equal Q4 sales, that Q2 sales
equal Q4 sales, and that Q3 sales equal Q4 sales) and conclude that sales in each of the first three
quarters of the year are significantly different from sales in the fourth quarter on average.

Multiple Linear Regression with Qualitative Dependent Variables

LOS: Formulate and interpret a logistic regression model.

A qualitative dependent variable, or a categorical dependent variable, is an output variable that describes
a binary condition, or category. For example, whether or not a credit card transaction is fraudulent can
be modeled as a qualitative dependent variable (1 = fraud; 0 = no fraud) based on various independent
variables such as return on size of transaction, location of transaction, time of transaction, etc. In this
scenario, a linear regression model cannot be used to capture the relationship between the variables
because the value that the dependent variable can take could be less than 0 or greater than 1, which is not
empirically possible as the probability of fraud must be between 0% and 100%.

To address these issues associated with linear regression, apply a nonlinear transformation to the
probability of fraud and relate the transformed probabilities linearly to the independent variables. One of the
most commonly used transformations is the logistic transformation, where p = a condition that is fulfilled
or an event that happens. Continuing with the example, p is the probability that a transaction is fraudulent.
Using the value p, the logistic transformation is defined as:

P
ln
1−P

Vol 1-39
Learning Module 4

This is the ratio of the probability that an event happens to the probability that it does not happen, or the
odds of it happening. If the probability of fraud is 0.75, the odds of fraud are 3 to 1. It also shows that the
probability of fraud is 3 times as great as the probability of no fraud. The result of the logistic transformation
is also called the log odds or logit function.

Once the logistic transformation is complete, we can use logistic regression (ie, a logit model) to estimate
the odds of fraud where logistic transformation of the event probability is the dependent variable. The
logistic regression model is given as:

P
ln = b 0 + b 1X 1 + b 2X 2 + b 3X 3 + ε
1−P

The equation can be rearranged to derive the event probability:

1
P=
1 + e−(b0+b1X1+b2X2+b3X3)

Logistic regression assumes a logistic distribution of the error term, similar to a normal distribution, but with
fatter tails.

The coefficients of a logistic regression model are often estimated using the maximum likelihood
estimation (MLE) method instead of least squares. This method estimates the coefficients by making it
most likely that the event would occur by maximizing the likelihood function of the data.

The slope coefficients are the change in the log odds that the event happens per unit change in the
independent variable, holding all other independent variables constant. The exponent of the slope
coefficient in our event probability transformation is the odds ratio, which is the ratio of the odds that
the event happens with a unit increase in the independent variable to the odds that the event happens
without the increase in the independent variable. The hypothesis test that a logit regression coefficient is
significantly different from zero is similar to the test in an ordinary linear regression.

Evaluate the overall performance of a logit regression by examining the likelihood ratio chi-square test
statistic. Most statistical analysis packages report this statistic along with the p-value.

Logistic regression does not have a regression-equivalent R2 measure because the model cannot be
fitted using least squares. Instead, a pseudo-R2 is typically used, but must be interpreted with caution. The
pseudo-R2 in logistic regression may be used to compare different specifications of the same model but is
not appropriate for comparing models based on different datasets.

Vol 1-40
Extensions of Multiple Regression

Example 2 Interpreting a logistic regression

Studies have found that the performance of CEOs (as measured by the firm’s stock return and return on
assets) declines after winning an award. They also find that award-winning CEOs spend more time on
activities outside their companies and underperform relative to non-award-winning CEOs.

A financial analyst is interested in determining which CEOs are likely to win an award. Her sample
consists of observations of company characteristics for each month in which an award is given to CEOs
of companies in the S&P 1500 index in a 10-year period in the 2000s. The analyst employs a logistic
regression for her analysis.

Dependent variable = AWARD

● This is a binary variable that takes on a value of 1 for a CEO winning an award in the award month
and 0 for non-winning CEOs.

The independent variables are:

● BOOK-TO-MARKET: the ratio of the company’s book equity and market capitalization
● LNSIZE: the natural log of the market value of the company’s equity
● RETURN-1TO3, RETURN-4TO6, RETURN-7TO12: the total return during months 1–3, 4–6, and
7–12 prior to the award month, respectively
● LNTENURE: the natural log of the CEO’s tenure with a firm in number of years
● FEMALE: a dummy variable that takes on a value of 1 if the CEO is a female

In order to explain the likelihood of a CEO winning an award, the analyst is examining whether CEOs
of companies with low book-to-market ratios, larger companies, and companies with higher returns in
recent months are more likely to win an award. She is also examining whether female CEOs and older
CEOs are more likely to receive an award. The results of the logit estimation are:

Coefficient Standard error z-Statistic p-Value

Intercept −2.5169 2.2675 −1.11 0.267


BOOK-TO-MARKET −0.0618 0.0243 −2.54 0.011
LNSIZE 1.3515 0.5201 2.60 0.009
RETURN-1TO3 0.3684 0.5731 0.64 0.520
RETURN-4TO6 0.1734 0.5939 0.29 0.770
RETURN-7TO12 0.9345 0.2250 4.15 0.000
LNTENURE 1.2367 0.5345 2.31 0.021
FEMALE 0.8100 0.3632 2.23 0.026
Likelihood ratio chi-square 323.16
Prob > chi-square 0.000
Pseudo-R2 0.226

Note: Returns are expressed as fractions. For example, a return of 10% is entered as 0.10.
Therefore, one unit is 1 or 100%.

Questions

1. Why is the use of the logit model appropriate in this case?

Vol 1-41
Learning Module 4

2. Comment on the significance of the independent variables used in the analysis.


3. Suppose that the log odds of a male CEO winning an award, based on the estimated regression
model, are −2.7634. Calculate:
a. The log odds of a female CEO instead of a male CEO winning an award.
b. The odds of a male CEO winning an award.
c. The probability of a male CEO winning an award.
d. The ratio of odds that a female CEO wins an award to the odds that a male CEO wins an award.
4. Given that the total return for a company (with a male CEO) during months 7 to 12 prior to the
award month was 11%, what would be the log odds of its CEO winning an award if the total return
during months 7 to 12 prior to the award month were 12%?

Solutions

1. The analyst should use the logit model, a nonlinear estimation model, because the dependent
variable, AWARD, is a binary dependent variable. Having a binary independent variable does not
make ordinary linear regression inappropriate for estimating the model.
2. LNSIZE and RETURN-7TO12 have a significantly positive relationship with log odds of a CEO
winning an award.

BOOK-TO-MARKET has a significantly negative relationship with log odds of a CEO winning an
award.
RETURN-1TO3 and RETURN-4TO6 are not statistically significant.
The binary variable FEMALE is positive and statistically significant, indicating that female CEOs are
more likely to win an award than male CEOs.
LNTENURE is also positive and statistically significant, indicating that CEOs with a longer tenure
with the company are more likely to win an award.
3.
a. The log odds for a female CEO instead of a male CEO, while other variables are held constant,
are calculated as: −2.7634 + 0.8100 = −1.9534.
b. The odds of a male CEO winning an award are calculated as: e(−2.7634) = 0.0631.
c. We know that P / (1 − P) = 0.06308, where p is the probability of the CEO winning an award.
Therefore, P = 0.05933. This can also be calculated as:

1
P= = 0.05933
[1 + e−(−2.7634)]

d. The binary variable FEMALE has a slope coefficient of 0.8100. Therefore, the odds ratio for a
female CEO winning an award to a male CEO winning an award is exp(0.8100) = 2.2479.
4. The variable RETURN-7TO12 has a slope coefficient of 0.9345. Therefore, for every 1 unit or 100%
increase in this variable, the log odds increase by 0.9345. Here, the variable increases by 0.01
unit or 1%. Therefore, the log odds would increase by 0.01 × 0.9345 = 0.009345, and the log odds
would be −2.7634 + 0.009345 = −2.7541.

Logistic regression is widely used in machine learning where the objective is classification. Qualitative
dependent variable models can be useful not only for portfolio management but also for business
management (eg, to evaluate the effectiveness of a particular direct-mail advertising campaign based on
the demographic characteristics of the target audience).

Vol 1-42
Learning Module 5
Time-Series Analysis

LOS: Calculate and evaluate the predicted trend value for a time series, modeled as either a
linear trend or a log-linear trend, given the estimated trend coefficients.

LOS: Describe factors that determine whether a linear or a log-linear trend should be used with
a particular time series and evaluate limitations of trend models.

LOS: Explain the requirement for a time series to be covariance stationary and describe the
significance of a series that is not stationary.

LOS: Describe the structure of an autoregressive (AR) model of order p and calculate one- and
two-period-ahead forecasts given the estimated coefficients.

LOS: Explain how autocorrelations of the residuals can be used to test whether the
autoregressive model fits the time series.

LOS: Explain mean reversion and calculate a mean-reverting level.

LOS: Contrast in-sample and out-of-sample forecasts and compare the forecasting accuracy of
different time-series models based on the root mean squared error criterion.

LOS: Explain the instability of coefficients of time-series models.

LOS: Describe characteristics of random walk processes and contrast them to covariance
stationary processes.

LOS: Describe implications of unit roots for time-series analysis, explain when unit roots are
likely to occur and how to test for them, and demonstrate how a time series with a unit root can
be transformed so it can be analyzed with an AR model.

LOS: Describe the steps of the unit root test for nonstationarity and explain the relation of the
test to autoregressive time-series models.

LOS: Explain how to test and correct for seasonality in a time-series model and calculate and
interpret a forecasted value using an AR model with a seasonal lag.

LOS: Explain autoregressive conditional heteroskedasticity (ARCH) and describe how ARCH
models can be applied to predict the variance of a time series.

LOS: Explain how time-series variables should be analyzed for nonstationarity and/or
cointegration before use in a linear regression.

LOS: Determine an appropriate time-series model to analyze a given investment problem and
justify that choice.

A time series is a data set of observations made over a period of time. In financial analysis this can be
quarterly based on a company’s financial results or monthly or daily observations of the movement of
a stock price. In this type of analysis, questions arise such as how to model trends or how to model
future values based on past values. Several models exist to help analyze this data and address the
challenges that arise.

Vol 1-43
Learning Module 5

Linear Trend Models

LOS: Calculate and evaluate the predicted trend value for a time series, modeled as either a
linear trend or a log-linear trend, given the estimated trend coefficients.

A linear trend model is like a regression model where the dependent variable changes by a constant
amount over time. The dependent variable is represented graphically as a straight line, where an upward
line indicates a positive trend and a downward line indicates a negative trend. Linear trends can be
modeled with the following regression equation:

Time Series regression equation

yt = b0 + b1t + εt, t = 1,2,…,T

Where:
yt = The value of the time series at time t (value of the dependent variable)
b0 = The y-intercept term
b1 = The slope coefficient/trend coefficient
t = Time, the independent or explanatory variable
εt = A random error term

With the time series regression equation, ordinary least squares (OLS) regression is used to estimate the
regression coefficients (30 and 31 ), and the resulting regression equation is used to predict the value of the
time series (yt) at time t. In a linear trend model, the independent variable is the time period.

In a linear trend model, the value of the dependent variable changes by b1, which is referred to as the trend
coefficient. In each successive time period, t increases by 1 unit.

Vol 1-44
Time-Series Analysis

Example 1 Linear trend models

Linear trend models

Keiron Gibbs wants to estimate the linear trend in inflation in Gunnerland over time. He uses monthly
observations of the inflation rate, expressed as annual percentage-rate changes, over the 30-year period
between January 1981 to December 2010 and obtains the regression results shown:

Regression statistics

R2 0.0537
Standard error 2.3541
Observations 360
Durbin-Watson 1.27
Coefficient Standard error t-statistic

Intercept 4.2587 0.4132 10.3066


Trend −0.0087 0.0029 −3

Evaluating the significance of regression coefficients

At the 5% significance level with 358 degrees of freedom [360 − (1 + 1)], the critical t-value for a two-
tailed test is 1.972. Since the absolute values of the t-statistics for both the intercept (10.3066) and the
trend coefficient (−3.00) are greater than the absolute value of the critical t-value, we conclude that both
the regression coefficients (30= 4.2587, 31 = −0.0087) are statistically significant.
Estimating the regression equation

Based on these results, the estimated regression equation would be written as:

yt = 4.2587 − 0.0087t

Using the regression results to make forecasts

The regression equation can be used to make in-sample forecasts [eg, inflation for t = 12, December
1981 is estimated at 4.2587 − 0.0087(12) = 4.1543%] and out-of-sample forecasts [eg, inflation for t =
384, December 2012 is estimated at 4.2587 − 0.0087(384) = 0.9179%]. The regression equation also
tells us that the inflation rate decreased by approximately 0.0087%, or the trend coefficient, each month
during the sample period.

Vol 1-45
Learning Module 5

The graph shows a plot of the actual time series (monthly observations of the inflation rate during the
sample period) along with the estimated regression line:

Inflation
Rate (%)
20

15

10

−5

−10
81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11
Year
© CFA Institute Inflation Trend

Notice that the residuals appear to be uncorrelated and unpredictable over time and are not as
persistent. Therefore, use of the linear trend to model the time series seems appropriate. However, the
low R2 of the model (5.37%) suggests that its inflation forecasts are quite uncertain, and that a better
model may be available.

Log-Linear Trend Models

LOS: Calculate and evaluate the predicted trend value for a time series, modeled as either a
linear trend or a log-linear trend, given the estimated trend coefficients.

LOS: Describe factors that determine whether a linear or a log-linear trend should be used with
a particular time series and evaluate limitations of trend models.

A linear trend would not be appropriate to model a time series that exhibits exponential growth, or constant
growth at a particular rate, because the regression residuals would be persistent and not uncorrelated, thus
violating a key assumption of OLS. In these situations, a log-linear trend may be more appropriate to fit a
time series that exhibits exponential growth.

A series that grows exponentially can be described using the following equation:

Time series regression equation

yt = eb0 + b1t, t = 1,2, … ,T

Vol 1-46
Time-Series Analysis

Where:
yt = The value of the time series at time t (value of the dependent variable)
b0 = The y-intercept term
b1 = The slope coefficient
t = Time = 1, 2, 3, … , T

In this equation, the dependent variable (yt) is an exponential function of the independent variable, time (t).
We take the natural logarithm of both sides of the equation to arrive at the equation for the log-linear model:

Log-linear regression model

ln yt = b0 + b1t + εt, t = 1,2, … ,T

The equation linking the variables, yt and t, has been transformed from an exponential function to a
linear function (the equation is linear in the coefficients, b0 and b1), so linear regression can model the
series. Since no time series grows at an exact exponential rate the error term is added to the log-linear
model equation.

Example 2 Linear versus log-linear trend model

Samir Nasri wants to model the quarterly sales made by ABC Company over the 15-year period from
1991 to 2005. The graph illustrates quarterly sales data over the period.

$ Millions
9,000
8,000
7,000
6,000
5,000
4,000
3,000
2,000
1,000
0
91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06
Year

Vol 1-47
Learning Module 5

Initially, Nasri uses a linear trend model to capture the data. The results from the regression are:

Regression statistics

R2 0.8443
Standard error 786.32
Observations 60
Durbin-Watson 0.15
Coefficient Standard error t-statistic

Intercept −1,212.46 335.8417 −3.6102


Trend 125.3872 6.3542 19.733

The results of the regression appear to support the use of a linear trend model to fit the data. The
absolute values of the t-statistics of both the intercept and the trend coefficient (−3.61 and 19.73
respectively) appear statistically significant as they exceed the critical t-value of 2.0 (α = 0.05, degrees
of freedom = 58). However, when quarterly sales are plotted along with the trend line, the errors seem to
be persistent, or the residuals remain above or below the trend line for an extended period of time. This
suggests evidence of positive serial correlation. The persistent serial correlation in the residuals makes
the linear regression model inappropriate (even though the R2 is quite high at 84.43%) to fit ABC’s sales
as it violates the regression assumption of uncorrelated residual errors.

$ Millions
Sales persistently Sales persistently Sales persistently
10,000 above trend line. below trend line. above trend line.

8,000

6,000

4,000

2,000

−2,000
91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06
Year

Sales Trend

Since the sales data plot is curved upward, Nasri’s supervisor suggests that he use the log-linear model.
Nasri then estimates the following log-linear regression equation:

ln yt = b0 + b1t + εt, t = 1,2, … , 60

Vol 1-48
Time-Series Analysis

With the log-linear regression results:

Regression statistics

R2 0.9524
Standard error 0.1235
Observations 60
Durbin-Watson 0.37
Coefficient Standard error t-statistic

Intercept 4.6842 0.0453 103.404


Trend 0.0686 0.0008 85.75

The high t-statistics for the intercept and trend coefficient suggest that the regression parameters are
significantly different from zero, so the log-linear model has explanatory power. Further, notice that the
R2 is now much higher than the previous R2 (95.24% versus 84.43%). An R2 of 0.9524 means that
95.24% of the variation in the natural log of ABC’s sales is explained solely by a linear trend.

This suggests that the log-linear model fits the sales data much better than the linear trend model. The
graph below plots the linear trend line suggested by the log-linear regression along with the natural
logs of the sales data. Notice that the vertical distances between the lines are quite small, and that the
residuals are not persistent (log actual sales are not above or below the trend line for an extended period
of time). Consequently, Nasri concludes that the log-linear trend model is more suitable for modeling
ABC’s sales compared with the linear trend model.

Ln ($ Millions)
10
9
8
7
6
5
4
3
2
1
0
91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06
Year

Ln Sales Trend

Vol 1-49
Learning Module 5

To illustrate how log-linear trend models are used in making forecasts, calculate ABC’s expected sales
for Q3 2006, or Quarter 63 (an out-of-sample forecast).

ln yt = 4.6842 + 0.0686(63) ≈ $8,151.8 million

Compared with the forecast for Quarter 63 sales based on the linear trend model (−1,212.46 +
125.3872(63) ≈ $6,686.9 million), the log-linear regression model offers a much higher forecast.

Note that if we plot the data from a time series with positive exponential growth, the observations
will form a convex curve. Negative exponential growth means that the observed values of the series
decrease at a constant rate, so the time series forms a concave curve.

An important difference between the linear and log-linear trend models lies in the interpretation of the slope
coefficient, b1:

● A linear trend model predicts that will grow by a constant amount (b1) each period. For example, if b1
equals 0.1%, yt will grow by 0.1% in each period.
● A log-linear trend model predicts that ln will grow by a constant amount () in each period. This means
that yt itself will witness a constant growth rate of eb1 − 1 in each period. For example, if b1 equals
0.1%, then the predicted growth rate of yt in each period equals e0.001 − 1 = 0.0010005 or 0.10005%.

Also, in a linear trend model, the predicted value of is 30 + 31t, but in a log-linear trend model, the predicted
value of yt is eb0 + b1t since eln yt = yt.

Trend Models and Testing for Correlated Errors

LOS: Describe factors that determine whether a linear or a log-linear trend should be used with
a particular time series and evaluate limitations of trend models.

If a regression model is correctly specified, the regression error for one time period will be uncorrelated
with the regression errors for other time periods. One way to determine whether the error terms are
correlated across time periods is to inspect the plot of residuals. A residual plot can indicate that there is
persistent serial correlation in the residuals of the model, but to confirm, especially in cases where the serial
correlation is less obvious, a formal test called the Durbin-Watson (DW) test is used.

In Example 2 assume that the DW statistic for the log-linear trend model equals 0.37. A statistically
significant DW statistic is not significantly different from 2. The null hypothesis is that there is no positive
serial correlation, at the given level of significance (in Example 2, the significance level is 5%).

In this case the critical value (d1) equals 1.55 (k = 1, n = 60). Since the value of the DW statistic, 0.37,
is less than d1, we would reject the null hypothesis and conclude that the log-linear model in fact suffers
from positive serial correlation. Consequently, a different kind of model is needed to represent the relation
between time and ABC Company’s sales.

Note that significantly small values of the DW statistic indicate positive serial correlation, while significantly
large values indicate negative serial correlation.

Vol 1-50
Time-Series Analysis

AR Time-Series Models and Covariance-Stationary Series

LOS: Explain the requirement for a time series to be covariance stationary and describe the
significance of a series that is not stationary.

Autoregressive (AR) Time-Series Models


An autoregressive (AR) model is a time series that is regressed on its own past values. These models
acknowledge that current period values are related to past values and look to effectively model this
relationship.

In these models the same variable is on both sides of the equation, where the dependent variable is the
current period we are trying to estimate while the independent variable is a past value. With this in mind, we
drop the normal notation of yt as the dependent variable and use xt. For example, an AR(1) model (first-
order autoregressive model) is represented as:

First-order (AR1) autoregressive model

xt = b0 + b1xt − 1 + εt

Note that this is an AR(1) model, in which only a single past value of xt − 1 is used to predict the current
value of xt. To include more periods, a general pth order autoregressive model is represented as:

pth-order autoregressive model

xt = b0 + b1xt − 1 + b2xt − 2 + … + bpxt − p+ εt

Covariance-Stationary Series
When an independent variable in the regression equation is a lagged value of the dependent variable, as
in an autoregressive time-series model, statistical inferences based on OLS regression are not always
valid. In order to conduct statistical inference based on these models, we must add the assumption that
the time series is covariance stationary. There are three basic requirements for a time series to be
covariance stationary:

● The expected value or mean of the time series must be constant and finite in all periods.
● The variance of the time series must be constant and finite in all periods.
● The covariance of the time series with itself for a fixed number of periods in the past or future must be
constant and finite in all periods.

If an AR model is used to model a time series that is not covariance stationary, the analyst would obtain
biased estimates of the slope coefficients. As such, results of any hypothesis tests would be invalid
and spurious.

One way to determine whether a time series is covariance stationary is by looking at a graphical plot of the
data. The inflation data in Example 1 appear to be covariance stationary. The data seem to have the same
mean and variance over the sample period. On the other hand, ABC Company’s quarterly sales appear
to grow steadily over time, which implies that the mean is not constant and, therefore, the series is not
covariance stationary.

A common issue analysts will encounter with many financial time series is that they are not covariance
stationary. Most will trend in a direction, up or down, over time, rendering a nonconstant mean. A company’s

Vol 1-51
Learning Module 5

sales figures can steadily trend in one direction or another over time, while economic data such as a
country’s GDP can also show positive (or negative) trends.

These data sets can be transformed to be more stationary, however, as stationarity in the past does not
indicate stationarity in the future.

Detecting Serially Correlated Errors in an AR Model

LOS: Describe the structure of an autoregressive (AR) model of order p and calculate one- and
two-period-ahead forecasts given the estimated coefficients.

LOS: Explain how autocorrelations of the residuals can be used to test whether the
autoregressive model fits the time series.

An autoregressive model can be estimated using OLS if the time series is covariance stationary and the
errors are uncorrelated.

The DW test can test for serial correlation in trend models, but it cannot be used to test for serial correlation
in an AR model. This is because the independent variables include past values of the dependent variable
and are more likely to be correlated. However, another test based on the autocorrelations of the error term
can be used to determine if the errors in the AR time series model are serially correlated.

This test focuses on the autocorrelations of the error term, or the correlations of the series with its own past
values. All AR models will have some degree of autocorrelation, but it must be determined whether the
autocorrelation is statistically significant. For this, a t-test is used to determine if the autocorrelations of the
error term are statistically different from zero.

The t-statistic is calculated as the autocorrelation of the errors divided by the standard error of the
residual autocorrelation:

Residual autocorrelation for lag


t-statistic =
Standard error of residual autocorrelation

Where:
Standard error of residual autocorrelation = 1/√T
T = Number of observations in the time series

To determine if the model is specified correctly:

● If any of the error autocorrelations are significantly different from 0, the errors are serially correlated
and the model is not specified correctly.
● If all the error autocorrelations are not significantly different from 0, the errors are not serially correlated
and the model is specified correctly.

There are three basic steps for detecting serially correlated errors in an AR time-series model:

● Estimate a particular AR model.


● Calculate the autocorrelations of the residuals from the model.
● Determine whether the residual autocorrelations significantly differ from zero.

Vol 1-52
Time-Series Analysis

Example 3 illustrates testing an AR model for serial correlation:

Example 3 Testing for serially correlated errors in an AR time-series model

Jack Wilshire uses a time-series model to predict ABC Company’s gross margins. He uses quarterly data
from Q1 1981 to Q4 1995. Since he believes that the gross margin in the current period is dependent on
the gross margin in the previous period, he starts with an AR(1) model:

Gross margint = b0 + b1(Gross margint − 1) + εt

Results from estimating the AR(1) model:

Regression statistics

R2 0.7521
Standard error 0.0387
Observations 60
Durbin-Watson 1.9132
Coefficient Standard error t-statistic

Intercept 0.0795 0.0352 2.259


Lag 1 0.8524 0.0602 14.159

Notice that the intercept (30 = 0.0795) and the coefficient on the first lag (31 = 0.8524) are highly
significant in this regression. The t-statistic of the intercept (2.259) and that of the coefficient on the first
lag of the gross margin (14.159) are both greater than the critical t-value of approximately 2.0 for a two-
tail test at the 5% significance level with 58 degrees of freedom.

Even though Wilshire concludes that both the regression coefficients individually do not equal zero (or
are statistically significant), he must still evaluate the validity of the model by ensuring that the residuals
from his model are not serially correlated. Since this is an AR model (the independent variables include
past values of the dependent variable), the DW test for serial correlation cannot be used.

Autocorrelations of the residuals from the AR(1) model

Lag Autocorrelation Standard Error t-Stat

1 0.0583 0.1291 0.4516


2 0.0796 0.1291 0.6166
3 −0.1921 0.1291 −1.4880
4 −0.1285 0.1291 −0.9954

The table shows the first four autocorrelations of the residual, along with their standard errors and
t-statistics. Since there are 60 observations, the standard error for each of the residual autocorrelations
equals 0.1291 (calculated as 1/√60). None of the t-statistics is greater than 2.0 (critical t-value) in
absolute value, which indicates that none of the residual autocorrelations significantly differs from 0.

Vol 1-53
Learning Module 5

Wilshire concludes that the regression residuals are not serially correlated and that his AR(1) model is
correctly specified. Therefore, he can use OLS to estimate the parameters and the standard errors of the
parameters in his model.

Note that if any of the lag autocorrelations were significantly different from zero (if they had t-statistics
that were greater than the critical t-value in absolute value), the model would be misspecified due to
serial correlation between the residuals. If the residuals of an AR model are serially correlated, the model
can be improved by adding more lags of the dependent variable as explanatory (independent) variables.
More and more lags of the dependent variable must be added as independent variables in the model
until all the residual autocorrelations are insignificant.

Once it has been established that the residuals are not serially correlated and that the model is correctly
specified, it can be used to make forecasts. The estimated regression equation in this example is
given as:

Gross margint = 0.0795 + 0.8524(Gross margint − 1)

Mean Reversion and Multi-Period Forecasts

LOS: Explain mean reversion and calculate a mean-reverting level.

LOS: Describe the structure of an autoregressive (AR) model of order p and calculate one- and
two-period-ahead forecasts given the estimated coefficients.

Mean Reversion
A time series is said to exhibit mean reversion if it tends to fall when its current level is above the mean
and tends to rise when its current level is below the mean. The mean-reverting level, xt, for a time series
is given as:

b0
xt =
1 − b1

● If a time series is currently at its mean-reverting level, the model predicts that its value will remain
unchanged in the next period.
● If a time series is currently above its mean-reverting level, the model predicts that its value will
decrease in the next period.
● If a time series is currently below its mean-reverting level, the model predicts that its value will increase
in the next period.

In the case of gross margins for ABC Company in Example 2, the mean-reverting level is calculated as:

0.0795 / (1 − 0.8524) = 0.5386 or 53.86%.

In this case, if the gross margin is currently 50%, the model predicts that next quarter’s gross margin will
increase to 0.5057 or 50.57%. However, if the gross margin is currently 60%, the model predicts that next
quarter’s gross margin will decrease to 0.5909 or 59.09%.

Vol 1-54
Time-Series Analysis

Additionally, note that all covariance stationary time series have a finite mean-reverting level. An AR(1) time
series will have a finite mean-reverting level if the absolute value of the lag coefficient, b1, is less than 1.

Multi-Period Forecasts and the Chain Rule of Forecasting


The chain rule of forecasting is used to make multi-period forecasts based on an autoregressive time-series
model. For example, a one-period forecast (eg, x̂t + 1) based on an AR(1) model is calculated as:

x̂t + 1 = 30 + 31xt

From there, the two-period forecast is calculated as:

x̂t + 2 = 30 + 31xt + 1

Since we do not know xt + 1 in period t, we must start by forecasting xt + 1 using xt as an input and then
forecast xt + 2 using the forecast of xt + 1 as an input.

Example 4 Chain rule of forecasting

Using the AR(1) model from Example 2, assume that ABC Company’s gross margin for the current
quarter is 65%. Forecast ABC’s gross margin in two quarters.

Solution

First forecast next quarter’s gross margin based on the current quarter’s gross margin:

Gross margint + 1 = 0.0795 + 0.8524(Gross margin t)

= 0.0795 + 0.8524(0.65) = 0.6336 or 63.36%

Using that value, forecast the gross margin in two quarters based on the next succeeding quarter’s
forecasted gross margin:

Gross margint + 2 = 0.0795 + 0.8524(Gross margint + 1)

= 0.0795 + 0.8524(0.6336) = 0.6196 or 61.96%

Note that multi-period forecasts entail more uncertainty than single-period forecasts because each period’s
forecast (used as an input to eventually arrive at the multi-period forecast) entails uncertainty. The more
periods forecast, the greater the uncertainty.

Comparing Forecast Model Performance

LOS: Contrast in-sample and out-of-sample forecasts and compare the forecasting accuracy of
different time-series models based on the root mean squared error criterion.

One way to evaluate the forecasting performance of two models is by comparing their standard errors. The
standard error for the time-series regression is typically reported in the statistical output for the regression.
The model with the smaller standard error will be more accurate as it will have a smaller forecast
error variance.

Vol 1-55
Learning Module 5

When comparing the forecasting performance, or accuracy, of various models, analysts must distinguish
between in-sample forecast errors and out-of-sample forecast errors. In-sample forecast errors are
differences between the actual values of the dependent variable used to fit the model and predicted values
of the dependent variable based on the estimated regression equation. In essence, in-sample forecasts are
the residuals from a fitted time-series model. For instance, in Example 1, the residuals of the regression, or
the differences between actual inflation and forecasted inflation over the January 1981 to December 2010
sample period, represent in-sample forecast errors.

If we were to predict inflation for a month outside the sample period, such as July 2012, based on this
model, the difference between actual and predicted inflation would represent an out-of-sample forecast
error. Out-of-sample forecasts are important because the future is always out of sample and because they
are necessary in evaluating the model’s contribution and applicability in the real world.

The out-of-sample forecasting performance of AR models is evaluated using their root mean square error
(RMSE), or the square root of the average error of the out-of-sample data. The model with the lowest RMSE
has the lowest forecast error and hence carries the most predictive power.

For example, a data set includes 35 observations of historical annual unemployment rates. Suppose we use
the first 30 years as the sample period to develop our time-series models: an AR(1) and an AR(2) model to
fit the data.

The remaining 5 years of data from Year 31 to Year 35 (the out-of-sample data) would be used to calculate
the RMSE for the two models, and the model with the lower RMSE would be judged to have greater
predictive power.

Bear in mind that the model with the lower RMSE (more accuracy) for in-sample data will not necessarily
have a lower RMSE for out-of-sample data. In addition to the forecasting accuracy of a model, the
stability of the regression coefficients (discussed in the next LOS) is an important consideration when
evaluating a model.

Instability of Regression Coefficients

LOS: Explain the instability of coefficients of time-series models.

The choice of sample period is a very important consideration when constructing time-series models.

● Regression estimates from time-series models based on different sample periods can be
quite different.
● Regression estimates obtained from models based on longer sample periods can be quite different
from estimates from models based on shorter sample periods.
● Regression models used for one time period may not be appropriate for another time period.

There are no clear-cut rules that define an ideal length for the sample period. Based on the fact that models
are only valid if the time series is covariance stationary, analysts look to define sample periods as times
during which important underlying economic conditions have remained unchanged. It is helpful for the
analyst to look at graphs of the data to see if the series looks stationary.

Even if the autocorrelations of the residuals of a time-series model are statistically insignificant, analysts
cannot conclude that the sample period used is appropriate (and hence deem the model valid) until they
are, at the same time, confident that the series is covariance stationary and that important external factors
have remained constant during the sample period used in the study.

Vol 1-56
Time-Series Analysis

Random Walks

LOS: Describe characteristics of random walk processes and contrast them to covariance
stationary processes.

A random walk is a time series in which the value of the series in one period equals its value in the previous
period plus an unpredictable random error, where the error has a constant variance and is uncorrelated with
its value in previous periods. A random walk is explained in the following equation:

Random walk

xt = xt − 1 + εt

Where:
Expected εt = 0
Expected εt = σ2
Expected εt is uncorrelated with all previous ε

These models are special cases of AR (1) models where b0 (or εt in this case) = 0 and b1 = 1. Therefore,
the best estimate of xt is xt − 1 given that the expected value of the error term is 0.

Standard regression analysis cannot be applied to a time series that follows a random walk because AR
models cannot be used to model a time series that is not covariance stationary. A random walk is not
covariance stationary for two reasons. It does not have:

● A finite mean-reverting level. For a random walk, the mean-reverting level is undefined:

b0 0 0
= =
1 − b1 1−1 0

● A finite variance. As t increases, the variance of xt grows with no upper bound, approaching infinity.

Fortunately, a random walk can be converted into a covariance stationary time series through a process
called first-differencing. This process subtracts the value of the time series in the previous period from its
value in the current period. The new time series, , is calculated as . The first difference of the random walk
equation is given as:

yt = xt − xt − 1 = εt, E(εt) = 0, E(εt ) = σ2, cov(εt, εs) = E(εt ε s) = 0 f or t≠ s

As with the random walk formula, the expected error term is 0, expected variance is constant, and the error
term for a given period t is uncorrelated with any previous error terms.

By using the first-differenced time series, we are essentially modeling the change in the value of the
dependent variable, or:

Δxt = xt − xt − 1

From the first-differenced random walk equation, note that:

● Since the expected value of the error term is 0, the best forecast of yt is 0, or that there will be no
change in the value of the current time series, xt − 1.
● The first-differenced variable, yt, is covariance stationary with a finite mean-reverting level of 0,
calculated as 0 / (1 − 0), as b0 and b1 both equal 0, and a finite variance [Var(εt) = σ2].

Vol 1-57
Learning Module 5

Therefore, we can use linear regression to model the first-differenced series.

To summarize: By definition, changes in a random walk (yt or xt − xt − 1) are unpredictable. Therefore,


modeling the first-differenced time series with an AR(1) model does not hold predictive value, as b0 and
b1 both equal 0. It only serves to confirm a suspicion that the original time series is indeed a random walk.
Example 5 illustrates this.

Example 5 Determining whether a time series is a random walk

Aaron Ramsey develops the following AR(1) model for the JPY/USD exchange rate (xt) based on
monthly observations over 30 years. He uses the following AR(1) model to estimate the values of xt:

xt = xt − 1 + εt

The results of his regression are:

Regression statistics

R2 0.9852
Standard error 5.8624
Observations 360
Durbin-Watson 1.9512
Coefficient Standard error t-statistic

Intercept 1.0175 0.9523 1.0685


Lag 1 0.9954 0.0052 191.42

Autocorrelations of the residuals

Lag Autocorrelation Standard Error t-Stat

1 0.0745 0.0527 1.4137


2 0.0852 0.0527 1.6167
3 0.0321 0.0527 0.6091
4 0.0525 0.0527 0.9962

Note that:

● The intercept term is not significantly different from 0. The low t-statistic of 1.06 does not allow you to
reject the null hypothesis that the intercept term equals 0 as it is less than the critical t-value of 1.972
at the 5% significance level.
● The coefficient on the first lag of the exchange rate is significantly different from 0. The high t-statistic
of 191.42 allows us to reject the null hypothesis that it equals 0 and is very close to 1. The coefficient
on the first lag of the time series equals 0.9954.

Vol 1-58
Time-Series Analysis

Both of these suggest that the JPY/USD exchange rate follows a random walk.

Additionally, we cannot use the t-statistics to determine whether the exchange rate is a random walk,
based on the hypothesis test on H0: b0 = 0 and H0: b0 = 1, as the standard errors of this AR model would
be invalid if the model is based on a time series that is not covariance stationary. Recall that a random
walk is not covariance stationary.

In order to determine whether the time series is indeed a random walk, we must run a regression on the
first-differenced time series. If the exchange rate is, in fact, a random walk:

● The first-differenced time series will be covariance stationary as b0 and b1 would equal 0.
● The error term will not be serially correlated.

The regression results for the first-differenced AR(1) model for the JPY/USD exchange rate are:

Regression statistics

R2 0.0052
Standard error 5.8751
Observations 360
Durbin-Watson 1.9812
Coefficient Standard error t-statistic

Intercept −0.4855 0.3287 −1.477


Lag 1 0.0651 0.0525 1.240

Autocorrelations of the residuals

Lag Autocorrelation Standard Error t-Stat

1 0.0695 0.0527 1.3188


2 −0.0523 0.0527 −0.9924
3 0.0231 0.0527 0.4383
4 0.0514 0.0527 0.9753

From this output, notice:

● The intercept term (b0) and the coefficient on the first lag of the first-differenced exchange rate (b1)
both individually do not significantly differ from 0. The absolute values of their t-statistics, 1.477 and
1.24, respectively, are lower than the absolute value of tcrit, 1.972, at the 5% significance level.
● The t-statistics for all the residual autocorrelations are lower than the critical t-value, 1.972. None
of the residual autocorrelations significantly differs from 0, which means that there is no serial
correlation.

We can be confident that the JPY/USD exchange rate is a random walk because:

● b0 and b1 for the first-differenced AR(1) model both equal 0.


● There is no serial correlation in the error terms of the first-differenced time series.

Vol 1-59
Learning Module 5

Random Walk with a Drift


One variation of a random walk is a random walk with a drift. A random walk with a drift is a time series that
increases or decreases by a constant amount in each period. The equation for a random walk with a drift
is given as:

Random walk with a drift

xt = b0 + b1xt − 1 + εt

Where:
b0 = 1 and b0 ≠ 0, or
E(εt) = 0

Unlike a simple random walk (which has b0 = 0), a random walk with a drift has b0 ≠ 0. Similar to a simple
random walk, a random walk with a drift also has an undefined mean-reverting level (because b1 = 1) and
is therefore not covariance stationary. Consequently, an AR(1) model cannot be used to analyze a random
walk with a drift without converting it through first-differencing. The first-difference of the random walk with a
drift equation is given as:

yt = xt − xt − 1

yt = b0 + εt

b0 ≠ 0

The Unit Root Test of Nonstationarity

LOS: Describe implications of unit roots for time-series analysis, explain when unit roots are
likely to occur and how to test for them, and demonstrate how a time series with a unit root can
be transformed so it can be analyzed with an AR model.

LOS: Describe the steps of the unit root test for nonstationarity and explain the relation of the
test to autoregressive time-series models.

One way to determine whether a time series is covariance stationary is by examining a graph that plots the
data. However, there are two, more exact, ways to determine whether a time series is covariance stationary.

The first is to examine the autocorrelations of the time series at various lags. For a stationary time series,
either the time series’ autocorrelations at all lags do not significantly differ from 0, or the autocorrelations
drop off rapidly to 0 as the number of lags grows large. For a time series that is not nonstationary,
autocorrelations will show neither of these characteristics.

The second, and preferred, approach is to conduct the Dickey-Fuller test for unit root. In the context of
an AR(1) model, the absolute value of the lag coefficient, or b1, must be less than 1 to be covariance
stationary. But if the lag coefficient equals 1, then the time series has a unit root. A time series that has a
unit root is a random walk and therefore is not covariance stationary.

For statistical reasons, simple t-tests cannot be used to test whether the coefficient on the first lag of the
time series in an AR(1) model is significantly different from 1. However, the Dickey-Fuller test can be used
to test for a unit root.

Vol 1-60
Time-Series Analysis

The Dickey-Fuller test starts by converting the lag coefficient, b1, in a simple AR(1)
(xt = b0 + b1xt − 1 + εt) model into g1, which effectively represents b1 − 1, by subtracting xt − 1 from both sides
of the AR(1) equation. This produces:

xt − xt − 1 = b0 + (b1 − 1)xt − 1 + εt

Substituting g for (b1 − 1) gives us:

xt − xt − 1 = b0 + g1xt − 1 + εt

● The null hypothesis for the Dickey-Fuller test is that g1 ≥ 0, which implies that b1 ≥ 1, and that the time
series has a unit root, which makes it nonstationary.
● The alternative hypothesis for the Dickey-Fuller test is that g1 < 0, which implies that b1 < 1, and that
the time series is covariance stationary and does not have a unit root.
● The t-statistic for the Dickey-Fuller test is calculated as it has been, but the critical values used in
the test are different. Dickey-Fuller critical values are larger in absolute value than conventional
critical t-values.

Example 6 Using first-differenced data in forecasting

Samir Nasri is convinced, after looking at the data, that ABC’s quarterly sales and the logs of ABC’s
quarterly sales do not represent a covariance stationary time series. He therefore first-differences the log
of ABC’s quarterly sales.

Log difference of ABC Company quarterly sales:

$ Millions
0.5
0.4
0.3
0.2
0.1
0.0
−0.1
−0.2
−0.3
−0.4
−0.5
91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06
Year

The plot shows that the first-differenced series does not exhibit any strong trend and appears to be
covariance stationary. He therefore decides to model the first-differenced time series as an AR(1) model:

ln (Salest ) − (ln Salest − 1) = b0 + b1[ln (Salest − 1) − (ln Salest − 2)] + εt

Vol 1-61
Learning Module 5

The results of the regression are:

Regression statistics

R2 0.1065
Standard error 0.0617
Observations 60
Durbin-Watson 1.9835
Coefficient Standard error t-statistic

Intercept 0.0485 0.0152 3.1908


Lag 1 0.3728 0.1324 2.8158

Autocorrelations of the residuals from the AR(1) model

Lag Autocorrelation Standard Error t-Stat

1 −0.0185 0.1291 −0.1433


2 −0.0758 0.1291 −0.5871
3 −0.0496 0.1291 −0.3842
4 0.2026 0.1291 1.5693

Notice that:

● At the 5% significance level, both the regression coefficients (30 = 0.0485, 31 = 0.3728) of the first-
differenced series are statistically significant as their t-statistics (3.19 and 2.82, respectively) are
greater than tcrit (2.00) with 58 degrees of freedom.
● The four autocorrelations of the residuals are statistically insignificant. Their t-statistics are
smaller in absolute value than tcrit, so we fail to reject the null hypotheses that each of the residual
autocorrelations equals 0. We can therefore conclude that there is no serial correlation in the
residuals of the regression.

These results suggest that the model is correctly specified and can be used to make predictions of ABC
Company’s quarterly sales.

The value of the intercept (30 = 0.0485) indicates that if sales have not changed in the current quarter (ln
Salest − ln Salest − 1 = 0) sales will grow by 4.85% in the next quarter (ln Salest + 1 − ln Sales t).

If sales have changed in the current quarter, the slope coefficient (31 = 0.3728) indicates that in the next
quarter, sales will grow by 4.85% plus 0.3728 times sales growth in the current quarter.

Suppose we want to predict sales for the first quarter of 2006 based on the first-differenced model. We
are given the following pieces of information:

Sales Q4 2005 = Salest = $8,157 million


Sales Q3 2005 = Salest − 1 = $7,452 million
Sales Q1 2006 = Salest + 1 = ?

Vol 1-62
Time-Series Analysis

Our regression equation is given as:

ln Salest − ln Salest − 1 = 0.0485 + 0.3728 (ln Salest − 1 − ln Salest − 2)

Therefore:

ln Salest + 1 − ln 8,157 = 0.0485 + 0.3728 (ln 8,157 − ln 7,452)


ln Salest + 1 = 0.0485 + (0.3728)(0.0904) + 9.0066 (ln Salest + 1)
= 9.0888(Salest + 1)
= $8,855.56 million

Moving-Average Time-Series Models

Smoothing Past Values with Moving Averages


Moving averages are calculated to eliminate the short-term fluctuations, or noise, from a time series in order
to focus on the underlying trend. The moving average for a period is the average of that period’s value and
the values over a set number of periods, n. Moving averages are commonly calculated with stock prices
to gauge the impact of large fluctuations, for example, against a 100-day or 200-day moving average. An
n-period moving average is based on the current value and previous n − 1 values of a time series. It is
calculated as:

n-period moving average

xt + xt − 1 + … + xt − (n − 1)
n

One of the weaknesses of the moving average is that it always lags large movements in the underlying
data. In this example, a short-term, large movement in a stock price would appear more muted and not be
as identifiable in moving average data. Further, even though moving averages are useful in smoothing out a
time series, they do not hold much predictive value since they give equal weight to all observations.

Moving-Average Time-Series Models for Forecasting


To enhance the forecasting performance of moving averages, analysts use moving-average time-series
models. A moving-average (MA) model of order 1, or MA(1) is given as:

Moving average of order 1 time series model

xt = εt + θεt − 1

The expected error term is 0, has a constant variance, and is uncorrelated with previous error terms.

In this model, θ (theta) is the parameter and is a moving average of εt and εt − 1. Additionally, εt and εt − 1 are
uncorrelated random variables that have an expected value of 0. Note that in contrast to the simple moving-
average model equation (where all observations receive an equal weight), this moving-average model
attaches a weight of 1 to εt and a weight of θ to εt − 1.

Vol 1-63
Learning Module 5

A moving-average model for order q, MA(q), is given as:

Moving average of order q time series model

xt = εt + θ1εt − 1 + … + θqεt − q

Where the expected error term is 0, has a constant variance, and is uncorrelated with any
prior error terms

To determine whether a MA(q) model fits a time series, examine the autocorrelations of the original
time series (not the residual autocorrelations that are examined in AR models to determine whether
serial correlation exists). For a MA(q) model, the first q autocorrelations will be significant, and all the
autocorrelations beyond that will equal 0.

The time-series autocorrelations can also be used to determine whether an autoregressive or a moving-
average model is more appropriate to fit the data.

● For most AR models, the time series autocorrelations start out large and then decline gradually.
● For MA models, the first q time series autocorrelations are significantly different from 0 and then
suddenly drop to 0 beyond that.

Note that most time series are best modeled with AR models.

Seasonality in Time-Series Models

LOS: Explain how to test and correct for seasonality in a time-series model and calculate and
interpret a forecasted value using an AR model with a seasonal lag.

Seasonality occurs when a time series shows regular patterns of movement within a year, which can
incorrectly imply that an autoregressive model is inappropriate to model a particular time series.

This can be addressed by using seasonal lags in an autoregressive model. The seasonal error
autocorrelation corresponds to the seasonal lag, which is the value of the time series one year before the
current period. For example, with monthly data, the seasonal lag would be the 12th lag of the series.

To detect seasonality in the time series, examine the autocorrelations of the residuals to determine whether
the seasonal autocorrelation of the error term is significantly different from 0. To correct for this, the
seasonal lag is added to the AR model.

Example 7 Seasonality in a time series

Robin Van Damm estimates an AR(1) model based on first-differenced sales data to model XYZ
Company’s quarterly sales for 10 years from Q1 1991 to Q4 2000. He develops the following regression
equation:

ln Salest − ln Salest − 1 = b0 + b1(ln Salest − 1 − ln Salest − 2) + εt

Vol 1-64
Time-Series Analysis

The results of the regression are:

AR(1) model on log first-differenced quarterly sales

Regression statistics

R2 0.1763
Standard error 0.0751
Observations 40
Durbin-Watson 2.056
Coefficient Standard error t-statistic

Intercept 0.0555 0.0087 6.3793


Lag 1 −0.3928 0.1324 −3.7338
Lag 4 0.3591 0.1052 2.8729

The intercept term and the coefficient on the first lag appear to be significantly different from 0, but
the striking thing about the data is that the fourth error autocorrelation is significantly different from 0,
suggesting seasonality. The t-statistic of 2.8729 is greater than the critical t-value of 2.024 (significance
level = 5%, degrees of freedom = 38), so we reject the null hypothesis that the residual autocorrelation
for the fourth lag equals 0. The model is therefore misspecified and cannot be used for forecasting.

Because we are working with quarterly data, the fourth autocorrelation is a seasonal autocorrelation. The
model can be improved, or adjusted for the seasonal autocorrelation, by introducing a seasonal lag as
an independent variable in the model. The regression equation will then be structured as:

ln Salest − ln Salest − 1 = b0 + b1(ln Salest − 1 − ln Salest − 2) + b2(ln Salest − 4 − ln Salest − 5) + εt

Note that this regression equation expresses the change in sales in the current quarter as a function
of the change in sales in the last (previous) quarter and the change in sales four quarters ago as a
regression second coefficient.

AR(1) model with seasonal lag on log first-differenced quarterly sales

Regression statistics

R2 0.3483
Standard error 0.0672
Observations 40
Durbin-Watson 2.031
Coefficient Standard error t-statistic

Intercept 0.0386 0.0092 4.1957


Lag 1 −0.3725 0.0987 −3.7741
Lag 4 0.4284 0.1008 4.25

Vol 1-65
Learning Module 5

Autocorrelations of the residuals from the AR(1) model

Lag Autocorrelation Standard Error t-Stat

1 −0.0248 0.1581 −0.1569


2 0.0928 0.1581 0.587
3 −0.0318 0.1581 −0.2011
4 −0.0542 0.1581 −0.3428

From the data in the second table, notice that the intercept and the coefficients on the first and second
lags of the time series are all significantly different from 0. Further, none of the residual autocorrelations
is significantly different from 0, so there is no serial correlation. The model is therefore correctly specified
and can be used to make forecasts. Also notice that the R2 in the second table (0.3483) is almost two
times the R2 in the first table (0.1763), which means that the model does a much better job in explaining
ABC’s quarterly sales once the seasonal lag is introduced.

To make predictions based on the model, we need to know sales growth in the previous quarter (ln
Salest − 1 − ln Salest − 2) and sales growth four quarters ago (ln Salest − 4 − ln Salest − 5). For example, if
the exponential growth rate in sales was 3% in the previous quarter and 5% four quarters ago, the model
predicts that sales growth for the current quarter would equal 4.88%, calculated as:

ln Salest − ln Salest − 1 = 0.0386 − 0.3725(0.03) + 0.4284(0.05) = 0.48845 or 4.88%

AR Moving-Average Models and ARCH Models

LOS: Explain autoregressive conditional heteroskedasticity (ARCH) and describe how ARCH
models can be applied to predict the variance of a time series.

Autoregressive Moving-Average Models


Autoregressive moving-average (ARMA) models are a more general method of modeling a time series.
They create better forecasts than simple AR models by combining the autoregressive lags of the dependent
variable and moving-average errors. The equation for an ARMA model with p autoregressive terms and q
moving-average terms is:

Equation for ARMA (p, q)

xt = b0 + b1xt − 1 + … + bpxt − p + εt +θ1εt − 1 + … +θqεt − q

Where the expected error term is 0, has a constant variance, and is uncorrelated with any
previous error term

Vol 1-66
Time-Series Analysis

ARMA models are not perfect, however. Some key limitations are:

● The parameters of the model can be very unstable. Slight variations in the data sample or initial
guesses for the model parameters can give widely different estimates of the final parameters.
● There are no set criteria for determining p and q, making these models more of an art than a science.
● Even after a model is selected, it may not do a good job of forecasting.

Autoregressive Conditional Heteroskedasticity (ARCH) Models


Often the variance of the error term is not constant, which makes the standard error of the regression
coefficients of AR, MA, and ARMA models incorrect.

Heteroskedasticity occurs when the variance of the error term varies with the independent variable. If
heteroskedasticity is present in a time series, one or more of the independent variables in the model may
appear statistically significant when it is not. The opposite of heteroskedasticity, and the ideal condition, is
homoskedasticity.

ARCH models are used to determine whether the variance of the error in one period depends on the
variance of the error in previous periods. In an ARCH(1) model, the squared residuals from a particular
time-series model (AR, MA, or ARMA model) are regressed on a constant and on one lag of the squared
residuals. The regression equation takes the following form:

Regression equation for an ARCH(1) model


ε2t = a0 + a1ε�
2
t − 1 + ut

The null hypothesis for the model is that the errors have no ARCH (H0:a1 = 0). If the null is rejected, then
the variances of the error terms are dependent on previous terms, and heteroskedasticity is present.

Put differently, if a1 = 0, the variance of the error term in each period is simply a0. The variance is constant
over time and does not depend on the error in the previous period. When this is the case, the regression
coefficients of the time series model are correct, and the model can be used for decision making.

However, if a1 is statistically different from 0, the error in a particular period depends on the size of the error
in the previous period. If a1 is greater (less) than 0, the variance increases (decreases) over time. Such
a time series is considered ARCH(1), or heteroskedastic, and the time-series model cannot be used for
decision making. The error in period t + 1 can then be predicted using the following formula:

Error in period t + 1 for an ARCH(1) model

σ�
2 � � �2
t + 1 = a0 + a1εt

Example 8 Testing for ARCH

Tina Rosicky developed an AR(1) model for monthly inflation over the last 15 years. The regression
results indicate that the intercept and lag coefficient are significantly different from 0. Also, none of the
residual autocorrelations are significantly different from 0, so she concludes that serial correlation is
not a problem. However, before using her AR(1) model to predict inflation, Rosicky wants to ensure
that the time series does not suffer from heteroskedasticity. She therefore estimates an ARCH(1)
model using the residuals from her AR(1) model.

Vol 1-67
Learning Module 5

The results of the regression are:

Regression statistics

R2 0.0154
Standard error 12.65
Observations 180
Durbin-Watson 1.9855
Coefficient Standard error t-statistic

Intercept 4.6386 0.892 5.2


Lag 1 0.1782 0.0658 2.708

The regression results indicate that the coefficient on the previous period’s squared residual (a1)
is significantly different from 0. The t-statistic of 2.708 is high enough to be able to reject the null
hypothesis that the errors have no ARCH (H0:a1 = 0).

Since the model does contain ARCH errors, the standard errors for the regression parameters
estimated in Rosicky’s AR(1) model are inaccurate and she cannot use the AR(1) model to forecast
inflation even though the OLS regression coefficients are different from 0 and the residuals do not
suffer from serial correlation.

Rosicky can use her estimated ARCH(1) model to predict the variance of the errors. For example,
if the error in predicting inflation in one period is 1%, the predicted variance of the error in the next
period is calculated as:

εt = 4.6386 + 0.1782ε̂t − 1 = 4.6386 + 0.1782(12) = 4.8168%

If ARCH errors are found to exist, generalized least squares may be used to correct for heteroskedasticity.
But also, if ARCH exists and has been modeled, we can now predict the variance of the errors. In Example
8 the variance of the error for a period was estimated at 4.82% if the error in the period before was 1%.

Regressions with More Than One Time Series

LOS: Explain how time-series variables should be analyzed for nonstationarity and/or
cointegration before use in a linear regression.

Whether linear regression can be used to analyze the relationship between more than one time series, one
corresponding to the dependent variable and the other corresponding to the independent variable, depends
on whether the two series have a unit root. The Dickey-Fuller test is used to make this determination.

Vol 1-68
Time-Series Analysis

There are three key scenarios, with some conditions, regarding the outcome of the Dickey-Fuller tests on
the two series:

● If neither of the time series has a unit root, linear regression can be used to test the relationship
between the two series.
● If either of the series has a unit root, the error term in the regression would not be covariance
stationary, and therefore, linear regression cannot be used to analyze the relationship between
the two time series. For example, in the case that we reject the null hypothesis of a unit root for
the independent variable, or that the time series does not have a unit root, but we fail to reject the
hypothesis for the dependent variable, the error terms will not be covariance stationary.
● If both the series have unit roots, we must determine whether they are cointegrated: if a long-term
economic relationship exists between them such that they do not diverge from each other significantly
in the long run. With cointegration, a few scenarios exist:
○ If the series are cointegrated, linear regression can be used as the error term will be covariance
stationary and the regression coefficients and standard errors will be consistent and can be used to
conduct hypothesis tests. However, analysts should still be cautious in interpreting the results from
the regression.
○ If they are not cointegrated but have a unit root, linear regression cannot be used as the error term
in the regression will not be covariance stationary.
○ In the case where both series have a unit root but are cointegrated, the error term from the
regression of one time series on the other will be covariance stationary and can be used for
hypothesis testing.

Testing for Cointegration


To test whether two time series that each have a unit root are cointegrated, perform the following steps:

● Estimate the regression.


● Test whether the error term (εt) has a unit root using the Dickey-Fuller test but with Engle-Granger
critical values.
● H0: Error term has a unit root versus Ha: Error term does not have a unit root.
● If we fail to reject the null hypothesis, the error term in the regression has a unit root, it is not
covariance stationary, the time series are not cointegrated, and the regression relation is spurious.
● If we reject the null hypothesis, the error term does not have a unit root, it is covariance stationary,
the time series are cointegrated, and therefore, the results of linear regression can be used to test
hypotheses about the relation between the variables.

If there are more than two time series, apply the following rules:

● If there is at least one time series and either the dependent variable or one of the independent
variables has a unit root and at least one of the time series does not, the error term cannot be
covariance stationary, so multiple linear regression cannot be used.
● If all of the variables have unit roots, the time series must be tested for cointegration using
a similar process as outlined previously (except that the regression will have more than one
independent variable).
○ If we fail to reject the null hypothesis of a unit root, the error term in the regression is not covariance
stationary, and we can conclude that the time series are not cointegrated. Multiple regression cannot
be used in this scenario.
○ If we reject the null hypothesis of a unit root, the error term in the regression is covariance stationary
and we can conclude that the time series are cointegrated. However, bear in mind that modeling
three or more time series that are cointegrated is very complex.

Vol 1-69
Learning Module 5

Other Issues in Time Series

LOS: Explain how time-series variables should be analyzed for nonstationarity and/or
cointegration before use in a linear regression.

Time series analysis is an extensive and complex topic filled with caveats and if/then scenarios. As in a
linear regression model, uncertainty exists in the resulting forecasts. The analyst must expect that the
uncertainty can be significant and vary across regimes. But methods for identifying uncertainty are identical
to those used to test uncertainty in a linear regression model.

Below is a step-by-step guide to build a model to predict a time series:

Suggested Steps in Time-Series Forecasting


● Understand the investment problem and make an initial choice of model. There are two options:
○ A regression model that predicts the future behavior of a variable based on hypothesized causal
relationships with other variables.
○ A time-series model that attempts to predict the future behavior of a variable based on the past
behavior of the same variable.

● If a time-series model is chosen, compile the data and plot it to see whether it looks covariance
stationary. The plot might show deviations from covariance stationarity, such as:
○ A linear trend
○ An exponential trend
○ Seasonality
○ A significant shift in the time series in the sample period where we observe a change in
mean or variance

● If you find no significant seasonality or a change in mean or variance, then either a linear trend or an
exponential trend may be appropriate to model the time series. In that case, take the following steps:
○ Determine whether a linear or exponential trend seems most reasonable (usually by plotting
the series).
○ Estimate the trend.
○ Calculate the residuals.
○ Use the Durbin-Watson statistic to determine whether the residuals have significant serial correlation
(unit root). If you find no significant serial correlation in the residuals, then the trend model is
specified correctly, and you can use that model for forecasting.

● If you find significant serial correlation in the residuals from the trend model, use a more complex
model, such as an autoregressive model. First, however, ensure that the time series is covariance
stationary. Following is a list of violations of stationarity, along with potential methods to adjust the time
series to make it covariance stationary:
○ If the time series has a linear trend, first-difference the time series.
○ If the time series has an exponential trend, take the natural log of the time series and then first-
difference it.

Vol 1-70
Time-Series Analysis

○ If the time series shifts significantly during the sample period, estimate different time-series models
before and after the shift.
○ If the time series has significant seasonality, include a seasonal lag.

● Once the raw time series has been successfully transformed into a covariance-stationary time
series, the transformed series can often be modeled with an autoregressive model. To decide which
autoregressive model to use, take the following steps:
○ Estimate an AR(1) model.
○ Test to see whether the residuals from this model have significant serial correlation.
○ If significant serial correlation doesn’t exist, the AR(1) model can be used to forecast.

● If significant serial correlation in the residuals exists, move to an AR(2) model and test for significant
serial correlation of the residuals of that model.
○ If you find no significant serial correlation, use the AR(2) model.
○ If you find significant serial correlation of the residuals, keep increasing the order of the AR model
until the residual serial correlation is no longer significant.

● Next, check for seasonality using one of two approaches:


○ Graph the data and check for regular seasonal patterns. This will show itself by large spikes or dips
in the data in regular intervals, typically over the course of a 12-month period.
○ Examine the data to see whether the seasonal autocorrelations of the residuals from an AR model
are significant. For example, if you are using quarterly data, you should check the fourth residual
autocorrelation for significance and whether other autocorrelations are significant. To correct for
seasonality, add a seasonal lag of the time series to the model.

● Test whether the residuals have autoregressive conditional heteroskedasticity. To test for
ARCH(1) errors:
○ Regress the squared residuals from your time-series model on a lagged value of the
squared residual.
○ Test whether the coefficient on the squared lagged residual differs significantly from 0.
○ If the coefficient on the squared lagged residual does not differ significantly from 0, the residuals do
not display ARCH and you can rely on the standard errors from your time-series estimates.
○ If the coefficient on the squared lagged residual differs significantly from 0, use generalized least
squares or other methods to correct for ARCH.

● As a final step, out-of-sample forecasting performance should be tested to determine how the model’s
out-of-sample performance compares to its in-sample performance.

Vol 1-71
Learning Module 5

Vol 1-72
Learning Module 6
Machine Learning

LOS: Describe supervised machine learning, unsupervised machine learning, and


deep learning.

LOS: Describe overfitting and identify methods of addressing it.

LOS: Describe supervised machine learning algorithms—including penalized regression,


support vector machine, k-nearest neighbor, classification and regression tree, ensemble
learning, and random forest—and determine the problems for which they are best suited.

LOS: Describe unsupervised machine learning algorithms—including principal components


analysis, k-means clustering, and hierarchical clustering—and determine the problems for which
they are best suited.

LOS: Describe neural networks, deep learning nets, and reinforcement learning.

What Is Machine Learning?

LOS: Describe supervised machine learning, unsupervised machine learning, and


deep learning.

Defining Machine Learning


Machine learning (ML) seeks to extract knowledge from large amounts of data. Like predictive statistical
models, ML looks to identify an underlying process in the data to make inferences about future values. But
unlike those models, ML doesn’t rely on a specific structure or assumptions needed to validate the model.

ML algorithms (ie, the “machines”) generalize, or learn, from known examples in order to determine an
underlying structure in the data without the restrictions of other statistical models. The emphasis is on
the ability of the algorithm to generate structure or predictions from data without any human help. An
elementary way to think of ML algorithms is that they “find the pattern,” and then “apply the pattern.”

Compared with statistical approaches, ML techniques are better able to handle problems with many
variables (high dimensionality) or with a high degree of nonlinearity.

ML approaches can be divided into three broad categories:

y Supervised learning
y Unsupervised learning
y Deep learning or reinforcement learning

Vol 1-73
Learning Module 6

Supervised Learning
In supervised learning, ML algorithms are used to infer patterns between a set of inputs or features (X) and
the desired output or target (Y) by mapping a given input set into a predicted output.

What distinguishes supervised learning from other ML processes is that it requires a labeled data set where
inputs or observations are matched with the associated output. From there, the ML algorithm is trained on
or fit to this data set to infer a pattern between the inputs and output.

Multiple regression is an example of ML: a regression model matches observed values of the independent
and dependent variables and uses it to estimate parameters of the regression model that define the
relationship between Y and X.

To validate the ML model, the inferred pattern arrived at by the algorithm is used to predict output values,
or Y, based on out-of-sample data. The difference between predicted and actual values of Y is used to
evaluate the predictive value of the model.

Supervised learning can be divided into two categories: regression and classification.

In a regression model, the target variable is continuous (ie, a numerical value). For these problems,
techniques include linear and nonlinear models. Nonlinear models are useful for problems involving large
data sets with large numbers of features, many of which may be correlated. While the term “regression”
is used for this category of ML problems, they are not regression problems as discussed in earlier
learning modules.

In classification problems, the target variable is categorical or ordinal. Classification focuses on sorting
observations into distinct categories. These can be binary, with just two possible categories, or can be
multi-category.

Unsupervised Learning
Unsupervised learning is ML that does not use labeled training data; the inputs (X) that are used for
analysis do not have specified targets (Y). Since the ML program is not given labeled data, it must discover
structure within the data. Unsupervised learning is useful for exploring new data sets as it can provide the
analyst with insights into data sets that may be too big or too complex to visualize. Two types of problems
that lend themselves to unsupervised ML are reducing the dimension of data, dimensional reduction, and
sorting data into clusters, clustering.

Dimensional reduction is used to reduce the set of features to a manageable size while retaining as much
of the variation, or information, in the data as possible. In risk management or quantitative investment
applications, it is useful to identify the most predictive features in a large data set.

Clustering refers to sorting observations into groups, or clusters, such that observations in the same cluster
are more similar to each other than they are to observations in other clusters. The criteria that define these
clusters may or may not be prespecified. Asset managers have used clustering to sort companies into
empirically determined groupings based on fundamental factors rather than conventional groupings based
on sectors or countries.

Deep Learning and Reinforcement Learning


Experts sometimes distinguish additional categories of ML, such as deep learning and
reinforcement learning.

y In deep learning, sophisticated algorithms are trained to do highly complex tasks, such as image
classification, face recognition, speech recognition, and natural language processing.
y In reinforcement learning, a computer learns from interacting with itself (or data generated by the
same algorithm).

Vol 1-74
Machine Learning

Deep learning and reinforcement learning are themselves based on neural networks (NNs, also called
artificial neural networks, or ANNs). NNs include highly flexible ML algorithms that have been successfully
applied to a variety of tasks characterized by nonlinearities and interactions among features.

Exhibit 1 summarizes these ML algorithms and types of target variables:

Exhibit 1

ML algorithm type
Variables
Unsupervised
Supervised (target variable)
(no target available)

Regression Dimensionality reduction

y Linear; penalized regression/ y Principal components


LASSO analysis (PCA)
Continuous y Logistic y Clustering
y Classification and regression y K-means
tree (CART)
y Hierarchical
y Random forest

Classification Dimensionality reduction

y Logit y Principal components


analysis (PCA)
y Support vector machine (SVM)
Categorical
y K-nearest neighbor (KNN) Clustering

Classification and regression tree y K-means


(CART) y hierarchical

Neural networks Neural networks


Continuous or
Deep learning Deep learning
categorical
Reinforcement learning Reinforcement learning

Evaluating ML Algorithm Performance

LOS: Describe overfitting and identify methods of addressing it.

ML algorithms are not perfect. The resulting model can be overly complex, may be sensitive to particular
changes in the data, and may fit the training data too well after incorporating the noise or random
fluctuations into its learned relationship. Consequently, the model will typically not hold predictive power
when applied to new data. This problem is known as overfitting: the fitted algorithm does not generalize,
or apply well to new data. A model that generalizes well is a model that retains its explanatory power when
predicting out-of-sample data.

Vol 1-75
Learning Module 6

The evaluation of any ML algorithm focuses on its prediction error when applied to new data rather than its
goodness-of-fit on the data with which it was trained.

Generalization and Overfitting


To understand generalization and overfitting of an ML model, we divide the data set into three
distinct samples:

y The training sample used to train the model (in-sample data)


y The validation sample for validating and tuning the model
y The test sample for testing the model’s ability to predict well on new data

The training and validation samples are considered to be the in-sample data, while the test sample is used
as out-of-sample data. The performance of the data on the test sample is used to determine whether the
model overfits or underfits the data.

The risk of overfitting increases as a model becomes more complex. A model’s complexity is based on the
number of features, terms, or branches it has, and whether the model is linear or nonlinear.

Exhibit 2 illustrates the concept of model fit:

Exhibit 2 Model fit

Underfit Overfit Good fit


Y Y Y

X X X
© CFA Institute

y The left graph shows four errors in this underfit model (three misclassified circles and one
misclassified triangle).
y The middle graph shows absolutely no errors.
y The right graph shows a good-fitting model with only one error, the misclassified circle.

Errors and Overfitting


To quantify the effects of overfitting, the error rates of the model’s in-sample and out-of-sample output
are compared.

y Total in-sample error (Ein) comes from differences between predicted target values based on the
inferred relationship and actual target outcomes within the training sample.
y Total out-of-sample error (Eout) comes from validation or test samples.
y Low or no in-sample error combined with large out-of-sample error suggests poor generalization.

Vol 1-76
Machine Learning

Out-of-sample error can come from three sources:

y Bias error: how well the inferred relationship fits the training data. Algorithms with erroneous
assumptions produce high bias from underfitting and high in-sample error, leading to poor
predictive value.
y Variance error: how much the model’s results change in response to new data from validation and
test samples. Unstable models pick up noise and spurious relationships, resulting in overfitting and
high out-of-sample error.
y Base error: errors from randomness in the data.

A learning curve plots the accuracy rate (calculated as 1 − Error rate) in out-of-sample data against the
amount of training data. These plots are helpful in determining the over- or underfitting of a model as a
function of bias and variance errors.

Exhibit 3 demonstrates how these out-of-sample errors can impact the learning curve:

Exhibit 3 Learning curves

A. High bias error B. High variance error C. Good tradeoff of bias


and variance error

100 100 100

0 0 0
Number of training samples Number of training samples Number of training samples

Desired accuracy rate Training accuracy rate


Validation accuracy rate
© CFA Institute

y If the model is robust, an increase in the size of the training sample should lead to an increase in
out-of-sample accuracy. This implies that error rates experienced in the validation/test samples (Eout)
and in the training sample (Ein) converge toward each other and toward a desired error rate, as in
Exhibit 3 Panel C.
y In an underfitted model (ie, one with high bias error), high error rates cause convergence below
the desired accuracy rate. Adding more training samples does not improve the model, as in
Exhibit 3 Panel A.
y In an overfitted model (ie, one with high variance error), the error rates of the validation sample and
training sample do not converge, as in Exhibit 3 Panel B.

Data scientists try to simultaneously minimize both bias and variance errors while selecting an algorithm
with good predictive value.

Out-of-sample error rates are also a function of model complexity. More complexity in the training set
decreases in-sample error rates (Ein), which reduces bias error, but as complexity increases, out-of-sample
error rates (Eout) rise, which means that variance error increases.

Vol 1-77
Learning Module 6

Linear functions are overall more susceptible to bias error and underfitting, as they tend to be too simple,
while nonlinear functions are more likely to have variance error and overfitting, as they tend to be too
complicated. An optimal point of model complexity exists, where the bias and variance error curves
intersect, and in- and out-of-sample error rates are minimized.

A fitting curve, which shows in- and out-of-sample error rates (Ein and Eout) on the y-axis plotted against
model complexity on the x-axis, is shown in Exhibit 4:

Exhibit 4 Fitting curve

Model error (Ein, Eout)

Optimal
complexity

Total
error

Variance Bias
error error

Model complexity
© CFA Institute

Finding the optimal point just before the total error rate begins to rise, due to increasing variance error, is
an integral part of the ML process and the key to successful generalization. Beyond that, added complexity
comes at the cost of more errors and reduced generalization, the trade-off between cost and complexity, so
overfitting risk must be mitigated.

Preventing Overfitting in Supervised Machine Learning


Two methods that are generally used to reduce overfitting are complexity reduction and proper
sampling techniques.

Complexity reduction limits the number of features selected during training. This process penalizes
algorithms that are too complex or too flexible by constraining them to include only parameters that reduce
out-of-sample error.

To avoid sampling bias, a cross-validation process is used to estimate the out-of-sample error by detecting
the error in the validation data sample. This is typically done through k-fold cross-validation, where data
in the training and validation samples are randomly shuffled and then divided into k equal subsamples,
using k − 1 samples as training samples and the kth sample as the validation sample. This process is then
repeated k times, and the data are shuffled each time, such that eventually each data point is used in the
training sample k − 1 time and in the validations sample once. The average of the k validation errors (mean
Eval) is then used as an estimate of the model’s out-of-sample error (Eout).

Vol 1-78
Machine Learning

Supervised ML Algorithms: Penalized Regression

LOS: Describe supervised machine learning algorithms—including penalized regression,


support vector machine, k-nearest neighbor, classification and regression tree, ensemble
learning, and random forest—and determine the problems for which they are best suited.

Penalized Regression
Penalized regression is used for making better predictions with large datasets by reducing a large number
of independent variables to a more manageable set, avoiding the tendency to overfit the model with
too many explanatory variables. This is particularly helpful when features are correlated, which limits
linear regression.

In ordinary linear regression, the regression coefficients are chosen to minimize the sum of the squared
residuals. In penalized regression, the regression coefficients are chosen to minimize both the sum of the
squared residuals and a penalty term that increases in size with the number of included variables with
nonzero regression coefficients. Thus, only variables with the most predictive power remain in the model.
LASSO (least absolute shrinkage and selection operator) is a commonly used type of penalized regression:
n K
(y1 − y2)2 + λ |bk|
i=1 k=1

Where the penalty term is:

K
λ |bk|
k=1

The greater the number of variables included, the larger the penalty term, so there is a trade-off between
the increase in explanatory power that a variable brings to the model versus the penalty for including it in
the model. Note that:

y Lambda (λ) is a hyperparameter, that is, a parameter whose value must be defined by the researcher
before training begins.
y If λ = 0, the expression is equivalent to an ordinary least squares (OLS) regression.
y The penalty term is added only during the training exercise (the model-building process). Once a
model has been built where each variable plays an essential role, the penalty term is no longer
needed, and the model is evaluated based on the sum of squared residuals generated from the
test sample.

Regularization methods such as LASSO have been applied to both linear and nonlinear models to reduce
overfitting. For example:

y LASSO has been used for forecasting default probabilities in industrial sectors where a multitude of
potential features have been reduced to fewer than 10 variables.
y Regularization methods have been used in portfolio management to get around the problems posed
by strong multicollinearity in asset returns in mean-variance optimization.

Vol 1-79
Learning Module 6

Support Vector Machine


Support vector machine (SVM) is a powerful ML algorithm that has been used for classification, regression,
and outlier detection. Suppose a simple data set (Exhibit 5) has only two features, x and y, where the
data can be clearly labeled into two groups (triangles and crosses) and separated by several straight lines
(Exhibit 5). These straight lines are known as linear classifiers.

Exhibit 5 Linear classifiers

A. Data labeled in two groups B. Data is linearly separable

Y Y

X X
© CFA Institute

In Exhibit 6, the data have n features, which would be graphically represented in an n-dimensional space.
The idea of SVM is to produce the widest possible strip that divides the observations into two groups. SVM
is a linear classifier that determines the hyperplane that optimally separates the observations into two sets
of data points.

Exhibit 6 Support vector machine: margin/penalty trade-off

Penalized data point Hyperplane Hyperplane

No penalty

Margin Smaller margin

In Exhibit 6, the straight line in the middle of the strip is known as the discriminant boundary or boundary.
The margin is defined by the observations closest to the boundary, the circled points in this example, and
these observations are called support vectors. Adding data points that are away from the support vectors
will not change the boundary, but adding points close to the hyperplane, or changing the set of support
vectors, can move the margin.

Some observations may fall on the wrong side of the boundary and be misclassified by the SVM algorithm.
Because most real-world data sets are not linearly separable, the SVM uses soft margin classification,
adding a penalty to the objective function for any misclassified observations in the training set. The result
is that the SVM algorithm chooses a discriminant boundary that optimizes the trade-off between a wider
margin and a lower total error penalty.

Vol 1-80
Machine Learning

An alternative to soft margin classification is a nonlinear SVM algorithm, which provides more advanced
separation boundaries (an n-dimensional hyperplane). While the resulting model may reduce the number
of misclassified instances in the training data, it will add to the model’s complexity by introducing more
features and can overfit the data.

K-nearest Neighbor
K-nearest neighbor (KNN) is a supervised learning technique, used most often for classification and
sometimes for regression. This technique classifies a new observation by finding similarities or “nearness”
between the observation in question and existing data. Exhibit 7 shows the scatterplot in Exhibit 6 and adds
a new observation. The diamond that is now on the plot must be classified as a cross or triangle.

Exhibit 7 K-nearest neighbor

K=3 K=5

3 3 5
2 2
1 1
4

y If k = 3, the triangle will be classified as a circle (the same category as two of its three
closest neighbors).
y If k = 5, the algorithm will look at the triangle’s 5 closest neighbors, 3 crosses and 2 circles,
and classify the observation based on simple majority. In this instance the observation will be
classified as a cross.

In the context of an investment problem, suppose a database of corporate bonds classified by credit rating
contains issuer features such as asset size, industry, leverage ratios, and cash flow ratios as well as
features of the bonds themselves like tenor, coupon, and embedded options. Now assume there is a new
bond issue that has no credit rating. KNN can identify the implied credit rating of the new issue based on
issuer and bond characteristics.

KNN is a straightforward, intuitive model that is nonparametric: it makes no assumptions about the
distribution of the data. Further, it can be used directly for multi-class classification. But a critical challenge
of KNN is defining what it means to be similar (or near). This requires knowledge of the data and an
understanding of the business objectives of the analysis, making defining the model’s parameters highly
subjective. While KNN algorithms tend to work better with a small number of features, the hyperparameter
of the model must be chosen carefully. A few things to keep in mind when deciding on the value of the k
hyperparameter are:

y If k is an even number, there may be ties and no clear classification.


y If k is too small, there may be a high error rate and sensitivity to local outliers.
y If k is too large, it would dilute the concept of nearest neighbors by averaging too many outcomes.

Vol 1-81
Learning Module 6

Classification and Regression Tree


The classification and regression tree (CART) supervised ML method can be applied to predict either a
categorical target variable, using a classification tree, or a continuous target variable with a regression tree.
CART is typically applied to binary classification or regression.

The simple binomial tree in Exhibit 8 has three key elements: the root node, the decision nodes, and the
terminal node(s). The root nodes and decision nodes each represent a feature and have cutoff values for
that feature.

Exhibit 8 Classification and regression tree: decision tree example

Debt-to-EBITDA ratio > 3

No Yes

FCFF growth > 5% FCFF growth > 10%

No Yes No Yes

Debt-to-EBITDA Interest coverage Interest coverage


No default risk
ratio > 1.5 ratio > 4 ratio > 1

No Yes No Yes No Yes

No default Default Default Current Default No default


risk risk risk ratio > 2 risk risk

No Yes

Default No default
risk risk

Note that the same feature can appear several times in a tree in combination with other features. Further,
some features may be relevant only if other conditions have been met.

At each node in the decision tree, the algorithm will choose the feature and the cutoff value for the selected
feature that generates the widest separation of the labeled data to minimize classification error. As the data
travels down the decision tree, it is partitioned into smaller segments that have reduced the error between
the instances in that group. When the classification error within each of these subgroups no longer changes
by a material amount, the process stops at a terminal node. At this point, the category, or label, of the
data points that are in the majority is selected for the classification if this is a classification problem. For
regression, the mean value of the majority is selected.

Vol 1-82
Machine Learning

To avoid overfitting, regularization parameters can be added, such as the maximum depth of the tree, the
minimum population at a node, or the maximum number of decision nodes. Alternatively, regularization
can occur via a pruning technique that reduces the size of the tree by removing sections that provide little
classifying power.

CART algorithms are a good alternative to multiple linear regression when there is a nonlinear relationship
between the features and the outcome because the same feature can appear several times in combination
with other features and some features may be relevant only if other conditions have been met.

Typical applications of CART in investment management include, among others, fraud detection, equity and
fixed-income selection, and client communications.

Ensemble Learning and Random Forest

LOS: Describe supervised machine learning algorithms—including penalized regression,


support vector machine, k-nearest neighbor, classification and regression tree, ensemble
learning, and random forest—and determine the problems for which they are best suited.

Ensemble learning is a technique that uses the average result from several models to predict a target
value in order to converge upon a more accurate prediction than one based on any single model. Ensemble
learning can be divided into two main categories: voting classifiers and bootstrap aggregation.

Voting Classifiers
A majority-vote classifier applies a training sample to several different algorithms such as SVM, KNN, and
CART. Once all of the results are aggregated, or the votes are collected, the algorithm assigns the new data
point to the predicted label with the most votes.

With these models it should be noted that the more algorithms are used, the greater the accuracy of the
aggregated prediction, until the point where accuracy deteriorates from overfitting. The idea is to look for
variety in the choice of algorithms, modeling techniques, and hypotheses so that the law of large numbers
produces the most accurate prediction.

Bootstrap Aggregating
With bootstrap aggregating, also called bagging, a single ML algorithm is applied to different sets of training
samples from the larger set of training data. Each new bag of data is generated by random sampling
with replacement from the initial training set. With these data samples, the algorithm is then trained on n
independent data sets that will generate n new models. With each new observation, majority vote is used
for classification.

Random Forest
A random forest classifier is a collection of decision trees trained with the bagging method. CART algorithms
are trained using the set of n independent data sets from the bagging process to generate many decision
trees that make up the random forest classifier. For any new observation, the collection of classifier trees—
the random forest—undertakes classification by majority vote.

To add more diversity to the model and create more individualized predictions with a view to improving
overall model prediction accuracy, the algorithm can take several hyperparameters, for example, by varying
the number of subset features in the dataset or by changing the number of trees used, the minimum size
(population) of each node, or the maximum depth of each tree.

Vol 1-83
Learning Module 6

Exhibit 9 Random forest classifier

Input

Original decision tree Decision tree 2 … Decision tree n

Feature 1 Feature 1 Feature 1

F2 F2 F2 F2 F2 F2

F3 F3 F3 F3 F3 F3 F3 F3 F3 F3

F4 F4 F4 F4 F4 F4 F4 F4 F4 F4 F4 F4 F4 F4 F4 F4

Output
(average or majority vote)

The process involved in random forest tends to mitigate overfitting risk. This technique also reduces the
noise-to-signal ratio, as errors cancel out across the collection of different classification trees. An important
drawback of random forest is that the model is not easy to interpret and can be considered a black-
box algorithm.

Despite its relative simplicity, random forest is a powerful algorithm with many investment applications,
including strategies for asset allocation and investment selection.

Vol 1-84
Machine Learning

Unsupervised ML Algorithms and Principal Component Analysis

LOS: Describe unsupervised machine learning algorithms—including principal components


analysis, k-means clustering, and hierarchical clustering—and determine the problems for which
they are best suited.

Principal Components Analysis


Sometimes adding too many features to the analysis may explain the data but also generate random noise
in the dataset. In such cases, dimension reduction is applied in order to reduce the set of features to a
manageable size while retaining as much of the variation or information in the data as possible.

Principal component analysis (PCA) is commonly used to reduce highly correlated features of data into a
few central and uncorrelated composite variables. Composite variables combine two or more variables that
are statistically strongly related to each other. The result is a lower-dimensional view of the structure of the
volatility in the data.

The new, mutually uncorrelated composite variables are known as eigenvectors. An eigenvector’s
eigenvalue represents the proportion of total variance in the initial data that is explained by the eigenvector.
The PCA algorithm orders the eigenvectors according to their eigenvalues, from largest to smallest, and
chooses the ones that add the most value to the analysis or that explain the largest proportion of the
dataset’s variance.

There is a trade-off between a lower-dimensional, more manageable view of a complex data set with a few
principal components selected and the loss of information. Scree plots, which show the proportion of total
variance in the data explained by each principal component, are helpful in determining how many principal
components to retain. But at the very least, principal components that explain 85% to 95% of total variance
in the initial data set should be retained.

The main drawback of PCA is that since the principal components are combinations of the data set’s initial
features, they typically cannot be easily labeled or directly interpreted by the analyst. As with random
forests, the end user might see these models as a black-box algorithm.

Dimension reduction is typically performed as part of exploratory data analysis, before training another
supervised or unsupervised learning model. ML models that are quicker to train tend to reduce overfitting
and are easier to interpret if provided with lower-dimensional data sets.

Vol 1-85
Learning Module 6

Clustering

LOS: Describe unsupervised machine learning algorithms—including principal components


analysis, k-means clustering, and hierarchical clustering—and determine the problems for which
they are best suited.

Clustering is used to organize data points into similar groups, such that observations in the same cluster
are more similar to each other than they are to observations in other clusters. This property is known as
cohesion. Conversely, observations in different clusters are as dissimilar as possible to other clusters, a
property known as separation.

Clustering has been used by asset managers to sort companies into empirically determined groupings
rather than conventional groupings based on sectors or countries. In portfolio management, clustering
methods have been used for improving portfolio diversification.

In practice, human judgment also plays an important role in using clustering algorithms. Once the set of
relevant features or characteristics is defined, the decision on what it means to be similar in terms of the
acceptable distance between two observations must be made.

K-means Clustering

LOS: Describe unsupervised machine learning algorithms—including principal components


analysis, k-means clustering, and hierarchical clustering—and determine the problems for which
they are best suited.

K-means clustering involves repeatedly grouping observations into a fixed number, k, of nonoverlapping
clusters. The value of k is a hyperparameter that is set by the researcher before learning begins. Each
cluster is defined by its centroid, or its center, and each observation is assigned by the algorithm to the
cluster whose centroid the observation is closest to. A key concept to keep in mind is that once the clusters
are formed, there is no defined relationship between them.

Exhibit 10 illustrates the iterative process that a k-means algorithm follows in clustering data (k = 3) with two
features represented by the vertical and horizontal axes:

Vol 1-86
Machine Learning

Exhibit 10

Step 1: Chooses initial Step 2: Assigns each Step 3: Calculates new


random centroids: c1, c2, c3 observation to nearest centroid centroids as the average values
(defining initial three clusters) of observations in a cluster

c1 c1 c1
c3 c3 c3

c2 c2 c2

Step 4: Reassigns each Step 5: Reiterates the process Step 6: Reassigns each
observation to the nearest of recalculating new centroids observation to the nearest
centroid (from Step 3) centroid (from Step 5),
completing second iteration

c1
c3 c1 c3 c1 c3

c2 c2 c2

The k-means algorithm will continue to iterate through the data until no observation is reassigned to
a new cluster, or when there is no longer a need to recalculate new centroids. The algorithm has then
converged and populated the final k clusters with their member observations. The k-means algorithm
minimizes intracluster distance, finding the maximum cohesion, and maximizes intercluster distance, the
maximized separation.

Note that the final assignment of observations to clusters can depend on the initial locations of the
centroids. To address this issue the algorithm is run several times using different sets of initial centroids to
add variation. Also, the hyperparameter, k, must be decided before k-means can be run, so the researcher
needs to estimate how many clusters are reasonable for the problem at hand and the data set being
analyzed. To address this, the researcher can iterate through a range of values for k to find the optimal
number of clusters.

The k-means algorithm is fast and works well on very large data sets with hundreds of millions of
observations. It is among the most-used algorithms in investment practice, particularly in data exploration
for discovering patterns or as a method for deriving alternatives to existing industry classifications.

Vol 1-87
Learning Module 6

Hierarchical Clustering

LOS: Describe neural networks, deep learning nets, and reinforcement learning.

In hierarchical clustering, algorithms create intermediate rounds of clusters until a final clustering is
reached. Although more computationally intensive than k-means clustering, hierarchical clustering has
the advantage of allowing the investment analyst to examine alternative segmentations of data of different
granularity before deciding which one to use.

Agglomerative clustering (or bottom-up) hierarchical clustering begins with each observation being treated
as its own cluster. Then, the algorithm finds the two closest clusters, defined by some measure of distance,
or similarity, and combines them into one new larger cluster. This process is repeated iteratively until all
observations are clumped into a single cluster. This method is more useful in identifying smaller clusters.

Divisive clustering (or top-down hierarchical clustering) starts with all the observations belonging to a
single cluster. The observations are then divided into two clusters based on some measure of similarity.
The algorithm then progressively partitions the intermediate clusters into smaller clusters until each
cluster contains only one observation. This method is more useful in identifying large clusters. Hierarchical
clustering is shown in Exhibit 11:

Exhibit 11 Hierarchical clustering

Divisive clustering

Stage 1 (1 cluster) Stage 2 (2 clusters)

A B D A B D
C C

E G E G
F F

Stage 3 (4 clusters) Stage 4 (7 clusters)

A B D A B D
C C

E G E G
F F

Vol 1-88
Machine Learning

Dendrograms
Dendrograms are a type of tree diagram for visualizing a hierarchical cluster analysis. For example, in
Exhibit 12 the x-axis represents the clusters while the y-axis shows the distance measure. Each cluster is
represented by the horizontal line, labeled here as the arch. The two vertical lines connecting the arch are
called dendrites. Shorter dendrites identify greater similarities between clusters. The horizontal lines show
the cutoffs for each subset of clusters, moving from two down to the eventual 11.

Exhibit 12 Dendrograms

.07
Arch

.06 Dendrite
9

.05

.04
7 8

.03

.02
1 3 5 6

.01
2 4

0
A B C D E F G H I J K
Cluster

2 clusters 6 clusters 11 clusters

© CFA Institute

Vol 1-89
Learning Module 6

Neural Networks, Deep Learning Nets, and Reinforcement Learning

LOS: Describe neural networks, deep learning nets, and reinforcement learning.

Neural networks, deep learning nets, and reinforcement learning are all at the center of the artificial
intelligence (AI) revolution. These highly complex algorithms can handle tasks such as image classification,
speech and face recognition, and natural language processing (NLP).

Neural Networks
The term neural network applies to a variety of complex tasks characterized by nonlinearities and
interactions among variables. Neural networks are commonly used for classification and regression in
supervised learning but are also important in reinforcement learning, which can be unsupervised.

Neural networks have three types of layers: an input layer, hidden layers, and an output layer. The neural
network in Exhibit 13 has an input layer (four nodes in this example), a single hidden layer (with five
nodes), and an output layer with one node. In this example, the four nodes of the input layer correspond to
four features used for prediction; this is called a dimension of four. Nodes are sometimes called neurons
because they process information.

Exhibit 13 Neural network

Input Hidden Output


Layer Layer Layer

Input 1

Input 1
Output
Input 1

Input 1

© CFA Institute

Neural networks are thus a more complex version of multiple regression: both use a set of inputs to predict
an output. However, where a multiple regression has two layers, inputs, and outputs, a neural network
has the added hidden layer where the learning occurs. In the hidden layer, the inputs are transformed
nonlinearly into new values and then combined into the target value.

Note that in these models, the inputs are typically scaled to account for differences in units of the data. Also,
the number of features, hidden layers, and outputs are hyperparameters.

Vol 1-90
Machine Learning

To demonstrate, consider the topmost hidden node in Exhibit 13. This node gets four values transmitted via
links from the input layer. Each of these links has a weight assigned to it to represent its importance. These
weights are initially assigned at random. Each node in the hidden layer has two functional parts:

y The summation operator weighs each value received and adds up the weighted values to form the
total net input.
y An activation function increases or decreases the strength of the input.

The final output of the node is then passed on to another hidden layer or, as is the case here, the output
layer node as the predicted value.

The activation function is a hyperparameter decided by the researcher and acts like a dimmer switch on a
light. Outputs that are below the threshold set by the activation function do not trigger and are not passed
to the next node; conversely, outputs that are above the stated threshold are passed on. This process of
transmission is referred to as forward propagation. Processes that work backward through the layers of
the network are called backward propagation.

Learning takes place through an iterative process where adjustments are made to the weights, initially
applied at random, at the nodes. The process repeats until a specific performance measure, comparing the
predicted values to the actual values, is achieved, with the goal of reducing the total error. The degree to
which adjustments are made to the weights is called the learning rate, which is also a hyperparameter set
by the researcher.

This structure—a network in which all the features are interconnected with nonlinear activation functions—
is what allows neural networks to uncover complex nonlinear relationships among features. But while
increasing the number of nodes and hidden layers improves a neural network’s ability to handle complexity,
it also comes with an increased risk of overfitting.

Research has shown that simple neural networks explain and predict equity values better than models
built using traditional statistical methods due to their ability to capture dynamic and interacting variables.
However, the trade-offs in using them are the lack of interpretability and the amount of data needed to train
such models.

Deep Neural Networks


Deep neural networks (DNNs) are neural networks with many hidden layers, generally at least two. These
are the foundation of deep learning and have been used in a wide range of AI applications.

Like a neural network, a DNN takes a set of inputs, or features, and passes them to a layer of neurons that
weight each passed feature through nonlinear, mathematical functions. These outputs are passed to the
next layer of neurons, and the process continues with the idea of minimizing a specified loss function.

DNNs typically require substantial time to train. Getting to that goal of a minimized loss function and
ultimately achieving optimal out-of-sample performance requires significant effort in deciding the model’s
hyperparameters. In practice, the number of nodes in the input and the output layers are typically
determined by the characteristics of the features and predicted output. However, there is no set of defined
rules that can be applied to decide which combination of elements, especially the number of hidden nodes
and their connectivity and activation, will get the researcher to that optimal performance.

Reinforcement Learning
Reinforcement learning (RL) is an unsupervised learning process that does not use direct labeled data
for each observation nor instantaneous feedback. Instead, the RL is essentially a trial-and-error process
where an algorithm observes its environment and learns by testing new actions, then reuses its previous
experiences to achieve a defined outcome. These trial-and-error iterations can number in the millions
and have been used by academics and investment managers to apply RL to evaluate the performance of
trading rules (the actions) in a specific market (the environment) to maximize profits.

Vol 1-91
Learning Module 6

Vol 1-92
Learning Module 7
Big Data Projects

LOS: Identify and explain steps in a data analysis project.

LOS: Describe objectives, steps, and examples of preparing and wrangling data.

LOS: Evaluate the fit of a machine learning algorithm.

LOS: Describe objectives, methods, and examples of data exploration.

LOS: Describe methods for extracting, selecting, and engineering features from textual data.

LOS: Describe objectives, steps, and techniques in model training.

LOS: Describe preparing, wrangling, and exploring text-based data for financial forecasting.

The investment management industry has increasingly used big data to gain an information edge to detect
anomalies and improve forecasts of asset prices, among other uses.

Specific methods are needed to make the data, which are often unstructured, ready for a computer to
process. These methods can be complex, so it is useful for analysts to understand how to transform the
data into a form manageable by a machine learning (ML) algorithm.

The three key characteristics of big data are volume, variety, and velocity:

y Volume refers to the quantity of data.


y Variety is the array of available data sources.
y Velocity is the speed at which the volume of data grows.

Another "V" to consider is veracity. Veracity refers to the reliability of the data and its source. This is a critical
part of the data collection process. Fake data can quickly create a garbage-in-garbage-out problem for the user.

Executing a Data Analysis Project

LOS: Identify and explain steps in a data analysis project.

Data represent assets to modern-day investment managers, and big data analytics are necessary to monetize
these assets. Managers use predictive models structured from ML methods to unlock value in their data assets.

Historically, forecasting methods relied on statistical or mathematical models using traditional structured
data, such as corporate financials and valuation ratios. ML methods can also consider unstructured data
such as topics (ie, what people are talking about) and sentiment (ie, how people feel) from textual big data
(eg, online news articles, internet financial forums, social networking platforms). With unstructured data,
the predictive power of models can be enhanced and offer quick insight into security or market conditions,

Vol 1-93
Learning Module 7

so long as these sources are updated on a real-time basis. Therefore, modern-day investment analysts
must understand how unstructured data can be restructured as inputs to ML methods.

Using unstructured data in a ML model requires incorporating it into a model with traditional data. However,
the traditional (structured data) ML model-building steps often do not apply to unstructured data.

The key differences in the process for modeling structured and unstructured data are in the four steps used
for processing unstructured data:

y Text problem formulation: Determine how to formulate the text classification problem and identify
the model's inputs and outputs.
y Data (text) curation: Gather external text data via web services or web spidering (scraping or
crawling) programs that extract raw content.
y Text preparation and wrangling: Clean and preprocess the unstructured data into structured inputs.
y Text exploration: Visualize the text through techniques such as word clouds and text feature
selection and engineering.

Data Preparation and Wrangling

LOS: Describe objectives, steps, and examples of preparing and wrangling data.

LOS: Evaluate the fit of a machine learning algorithm.

The objective of data preparation and wrangling is to clean and organize raw data into a format suitable for
further analyses and training of a ML model. This process can account for most of the project time and is
critical to ensuring the quality of the data before it is used to train the model.

Before the data is collected, the researchers must conceptualize the problem, define the expected
outcomes, decide which data points are needed, and identify the sources of data, both internal and
external. Once the data is collected it must be prepared (ie, cleansed), and wrangled (ie, preprocessed).

Structured Data

Exhibit 1 Model building using structured (traditional) data

1. Conceptualization

5. Model training 2. Data collection

3. Data preparation
4. Data exploration
and wrangling

Vol 1-94
Big Data Projects

Data Preparation (Cleansing)


Data preparation is the first step to ready the data for modeling. The data are put into a structured format
that is searchable and readable by computers for processing and analysis. Because raw data is rarely
complete or clean, it is necessary to examine, identify, and minimize errors. This step can be expensive and
time consuming because it can involve a high degree of human inspection. Exhibit 2 lists possible errors
in a data set:

Exhibit 2 Structured data preparation (cleansing): possible errors

Incompleteness Missing data

Invalidity Data outside of a meaningful range

Inaccuracy Data not a measure of true value

Inconsistency Conflict with data points or reality

Non-uniformity Data presented in different formats

Duplication Repeated observation

If the resources or time are not available to resolve the errors, the researcher has the option to remove the
data points. This can have varying impacts on the end model, depending on the proportion of data removed.

Data Wrangling (Preprocessing)


Data wrangling involves making data ready for ML model training: dealing with outliers, finding useful variables,
and making transformations such as scaling the data. Exhibit 3 lists common preprocessing transformations.

Exhibit 3 Transformation: structured data wrangling (preprocessing)

Extraction Creation of new variable from existing ones

Aggregation Consolidation of existing similar variables

Filtration Removal of unnecessary rows (observations)

Selection Removal of unnecessary columns (variables)

Adjustments of data (eg, currency, time


Conversion
zone)

Outliers almost always exist in a data set. They must first be identified, and, after examination, a decision
should be made to remove them or replace them with imputed values. There are a few techniques to
identify outliers. Generally, data points outside of three standard deviations from the mean are considered
outliers. The interquartile range (IQR) (ie, the difference between the 3rd and 2nd quartiles) can also be used
to identify outliers.

Vol 1-95
Learning Module 7

Other practical methods for handling outliers include:

y Trimming or truncation: Extreme values/outliers are removed from the data set.
y Winsorization: Extreme values/outliers are replaced with the maximum and minimum values from
the data that are not considered outliers.

Outliers must be removed before scaling, which is the process of adjusting the range of values by shifting
and changing the scale of the data. Some variables (eg, age and income) can have a diversity of ranges
that result in a heterogeneous data set. However, ML model training works best when all variables have
values in the same range to make the data set homogeneous.

The two most common methods of scaling are normalization and standardization, shown in Exhibit 4:

Exhibit 4 Scaling: structured data wrangling (preprocessing)

Normalization Standardization

Xi (normalized) = Xi − Xmin Xi (standardized) = Xi − μ


Xmax − Xmin σ

Xmin = Minimum value μ = Average


Xmax = Maximum value σ = Standard deviation
(Full data set values) (Full data set statistics)

Unstructured (Text) Data

LOS: Describe objectives, steps, and examples of preparing and wrangling data.

Unstructured data are not organized in a format that a computer can readily process. These data can be in
the form of text, images, audio, or video. As such, unstructured data must be transformed into structured
data for analysis and training a ML model.

Exhibit 5 Model building using unstructured (text) data

Text problem
formulation

Model training Data curation

Text preparation
Text exploration
and wrangling

Vol 1-96
Big Data Projects

Text Preparation (Cleansing)


Raw text data are sequences of characters that also contain elements crowding the data that are useless
for the researcher. For example, text data on a website may appear clean, but when downloaded for
analysis, the raw text can contain elements such as html tags, punctuation, and white spaces.

Cleansing—removing unneeded characters and elements from the raw text—is the first step in text
processing. The example in Exhibit 6 demonstrates the basic steps for text cleansing:

Exhibit 6 Text cleansing: example

<p>Net sales were $8,514 million, an increase


of 5.3%.</p>

Find and remove HTML tags

Net sales were $8,514 million, an increase of 5.3%.

Find and remove or substitute punctuation

Net sales were /dollarSign/ 8514 million an


increase of 53 /percentSign/ /end Sentence/

Find and replace numbers

Net sales were /dollarSign/ /number/ million an


increase of /number/ /percentSign/ /endSentence/

Find and remove extra white spaces

Net sales were/dollarSign//number/million an


increase of /number//percentSign//endSentence/

Text Wrangling (Preprocessing)


Tokenization is the preprocessing step of breaking down the cleaned text into its elemental words or
characters. A token is equivalent to a word; tokenization splits a given text into separate tokens. In other
words, text is a collection of tokens.

Just like structured data, text data requires normalization. Exhibit 7 shows an example of the steps
in normalization.

Vol 1-97
Learning Module 7

Exhibit 7 Normalization steps

Token "Bond" "offerings" "in" "June" "totaled" "currencysign"


Group 1 "billion" "and" "were" "not" "significant"

Lower case

"bond" "offerings" "in" "june" "totaled" "currencysign"


"billion" "and" "were" "not" "significant"

Remove stop words

"bond" "offerings" "june" "totaled" "currencysign"


"billion" "significant"

Stem

Token "bond" "offer" "june" "total" "currencysign"


Group 2 "billion" "signific"

Lemmatization or stemming may be used to convert inflected words into their morphological roots
(or lemmas). Because lemmatization is more computationally expensive and complicated, stemming is a
more common approach.

Stemming and lemmatization work to reduce the number of repeated words that occur in the text as
different variants of the same word, while keeping the context of the original text. Data sets with few
repeated words create sparseness in text data and can make training a ML model more complex.

Once the clean text data is normalized, a distinct set of tokens is created in a bag-of-words (BOW): a
set of words that does not capture the position or sequence of the words in the original text. For modeling
purposes, however, it is memory-efficient and manageable for text analysis.

The BOW is next used to build a document term matrix (DTM), a structured data table that is widely used
for text data. Each row of the matrix represents a single text file, and each column represents a token, as
shown in Exhibit 8:

Exhibit 8 Example of a document term matrix

bond offer june total billion signific

Text 1 3 2 0 9 12 1
Text 2 2 1 2 5 11 2
Text 3 9 0 1 1 7 1
Text 4 1 6 0 7 1 0
Text 5 11 1 0 4 5 0

Vol 1-98
Big Data Projects

Since BOW does not represent the word sequences or positions, it has limited use for advanced ML
training. N-grams are used to solve this problem. N-grams are word sequences that vary in length, for
example, a one-word sequence is a unigram, a two-word sequence is a bigram, and so on. Exhibit 9
presents an example of N-grams.

Exhibit 9 N-grams: examples

Clean text Bond offerings in June were not significant

Unigrams "Bond" "offerings" "in" "June" "were" "not" "significant"

"Bond_offerings" "offerings_in" "in_June" "June_were"


Bigrams
"were_not" "not_significant"

"Bond_offerings_in" "offerings_in_June" "in_June_were"


Trigrams
"June_were_not" "were_not_significant"

Data Exploration Objectives and Methods

LOS: Describe objectives, methods, and examples of data exploration.

In the data exploration stage, the prepared data is analyzed to understand distributions and relationships
among the features and how they relate to the target outcome. Data exploration involves three key steps:
exploratory data analysis, feature selection, and feature engineering.

In exploratory data analysis (EDA), exploratory graphs, charts, and other visualizations (eg, heat maps
and word clouds) are used to summarize data for inspection. Most statistical software and programs have
generic tools that can quickly show these relationships, but data can also be summarized using descriptive
statistics and more sophisticated measures for project-specific EDA.

Key objectives for EDA include:

y Understanding data properties


y Finding data patterns and relationships
y Establishing basic questions and hypotheses
y Documenting data distribution characteristics
y Planning modeling strategies

Insights taken from the EDA process are then used in feature selection and feature engineering.

EDA with Structured Data


With structured data, EDA can done on multiple features (multi-dimensional) or on a single feature (one-
dimensional). For multi-dimensional data, more advanced techniques, like principal component analysis
(PCA), can be used.

Vol 1-99
Learning Module 7

For one-dimensional data, common summary statistics are used, such as mean, median, standard
deviation, quartile ranges, skewness, and kurtosis. Data visualizations can also be created, such as
histograms, density plots, bar charts, and box plots.

Histograms use equal bins of values or value ranges to show the frequency of the data points in each bin,
and thus show the distribution of the data.

One-dimensional visualizations of multiple features are often stacked or overlaid on each other in a single
plot for comparison. For example, density plots are smoothed histograms overlaid on standard histograms,
used to understand the distribution of continuous data by normally distributing the mean, median, and
standard deviation.

To compare multiple features, multivariate data visualizations include stacked bar charts, multiple box plots,
and scatterplots. Scatterplots are helpful to show the relationship between two variables.

Feature Selection
In the feature selection stage, the researcher selects the most pertinent variables for ML model training
in order to simplify the model. Throughout the EDA stage, features that are both relevant and irrelevant
are identified, but statistical diagnostics are used to remove redundancy, heteroskedasticity, and multi-
collinearity, with the goal of minimizing the number of features while also maximizing the predictive power
of the model.

Dimensionality reduction identifies features that account for the greatest variance between observations,
reduces the volume of data, and creates new, uncorrelated combinations of features. But while both feature
selection and dimensionality reduction reduce the number of features, neither involves altering the data.

Feature Engineering
Feature engineering produces new features derived from the given features to help better explain the
data set. A ML model can only perform as well as the data used to train it, and feature engineering can
improve the data by uncovering structures inherent, but not explicit, in the data. Techniques include altering,
combining, or decomposing existing data.

For example, for continuous data, a new feature may be just the logarithm of another, which is helpful if the
data spans a large range or if the percentage differences are important. Other examples include bracketing
by assigning a binary to a data point.

For categorical data, new features can combine two features or decompose one feature into many,
converting categorical variables into a binary value; this is known as one hot encoding and is common in
handling categorical data in ML.

Unstructured Data: Text Exploration

LOS: Describe objectives, methods, and examples of data exploration.

LOS: Describe methods for extracting, selecting, and engineering features from textual data.

Exploratory Data Analysis


Text statistics vary by case and are used to analyze and reveal word patterns. Computing basic test
statistics using tokens is known as term frequency (TF), which is the ratio of how often a particular token
occurs to the total number of tokens in the data set. A collection of text data sets is called a corpus.

Vol 1-100
Big Data Projects

Topic modeling is a text data application where the most informative words are identified by calculating
TF. Using the TF, text statistics can be visually comprehended in the same way as structured data, using
methods such as a word cloud, which visualizes the most informative words and their TF values. Exhibit 10
is an example of a word cloud where the size of each word is decided by its TF value.

Exhibit 10 Word cloud interpretation from Alphabet's 20XX 10-K filing

Total revenue Cost of revenue


Result of operations Google cloud Increase
Fair value Effective tax rate Product mix Income tax
Google search Marketable security Google service
Cost Fair value

Google play
Foreign exchange effect
Impact of covid-19 Device mix

Currency exchange rate


Data center
Paid click Data
revenue percentage changes Information technology assets
Google network member Financial conditions revenue growth rate Distribution partner

Advertising revenue
Interest rate
Net cash revenue

Foreign currency
US dollars Geographic mix
Marketable equity security

Financial result Marketing expenses Compensation expenses

Content acquisition cost Purchase of property

Feature Selection
This stage removes a subset of tokens in the data with high TF values that are not material to the project.
These stop words are taken out of the corpus to decrease vocabulary size, or the BOW, which makes the
ML model simpler and thus more efficient, eliminating noisy features from the data set or tokens that detract
from or fail to benefit ML model training.

y Frequent tokens strain a ML model in deciding boundaries among texts, which causes model underfitting.
y Rare tokens mislead a ML model into classifying texts that contain rare terms into a specific class,
leading to model overfitting.

To minimize the impacts of these data points, the researcher can use general feature selection methods for
identifying and removing noisy features:

y Frequency measures reduce vocabulary by filtering tokens with very high and low TF values.
y Document frequency (DF) discards noisy features that carry no material information across all texts.
The DF of a token is calculated as the number of documents that contain that token divided by the
number of total documents in the data set.
y Chi-square tests examine the independence of two events: occurrence of the token and occurrence
of the class. The test ranks each token by its usefulness to each class in a text classification problem.
Tokens with higher chi-square test statistics for a given class are more frequently associated and
therefore have a higher discriminatory potential.

Vol 1-101
Learning Module 7

y Mutual information (MI) measures how much information a token contributes to a class of texts. An
MI = 0 indicates that the token distribution is identical in all text classes. As the MI value moves toward
1, it indicates that the token tends to occur more often only in a particular text class.

Feature Engineering
As with structured data, the power of a ML model using unstructured data can be improved by feature
engineering. Some techniques for feature engineering with text data are:

y In text processing, numbers are converted into a token such as "/number/." It can be useful to create
new tokens for numbers, with a specific length that may identify their purpose.
y N-grams are discriminative multi-word patterns with their connection kept intact.
y Name entity recognition (NER) is an algorithm that analyzes individual tokens and their surrounding
semantics to tag an object class to the token. Exhibit 11 shows the NER tags of the text "CFA Institute
was formed in 1947 and is headquartered in Virginia." The NER tags then become themselves a new
feature that can improve model performance.

Exhibit 11 NER Example

Token NER tag POS tag POS description

CFA ORGANIZATION NNP Proper noun

Institute ORGANIZATION NNP Proper noun

was VBD Verb, past tense

formed VBN Verb, past participle

in IN Preposition

1947 DATE CD Cardinal number

Coordinating
and CC
conjunction

Verb, 3rd-person
is VBZ
singular present

headquartered VBN Verb, past participle

in IN Preposition

Virginia LOCATION NNP Proper noun

© CFA Institute

y Parts of speech (POS) uses language structure and dictionaries to tag every token with a corresponding
part of speech. Some common POS tags are nouns, verbs, adjectives, and proper nouns. For example,
a large number of proper nouns can imply that the text is about people, a specific organization, or
country. POS is also useful for identifying words that can be used as more than one part of speech.

Vol 1-102
Big Data Projects

Model Training, Structured versus Unstructured Data, and


Method Selection

LOS: Describe objectives, steps, and techniques in model training.

Once the features are selected, the ML model can be trained. The training process is systematic, iterative,
and recursive and can become fairly complex. The nature of the problem at hand, the input data available,
and the level of performance needed to apply the model dictate that complexity. However, all ML model
training involves three tasks: method selection, performance evaluation, and tuning.

Method Selection
There are no set guidelines on which method to fit a model with. However, there are a few factors that steer
the researcher toward a broader process:

y Supervised or unsupervised learning: Supervised models have a ground truth, or a target dependent
variable that adds structure to the model. They can aim to predict a continuous value (regression) or a
set classification of a dependent variable. Unsupervised learning aims to reduce the number of features
that define the data set and group data points by similarities not immediately evident in the data.
y Type of data, such as numerical, text, images, or speech
y Size of data, including both the number of instances and features

Further complicating the model selection process are data sets with mixed inputs (eg, both numerical and
text data or both structured and unstructured data). In these cases, the results of one model can be used as
an input for another model.

Performance Evaluation
Measuring a model's performance is a critical step in assessing its goodness of fit. For models that predict
continuous variables, analysis of the error terms is used to measure fit. However, for binary classification
models, there are several techniques available to assess performance. Exhibit 12 shows three of
these techniques.

Exhibit 12 Techniques of ML performance evaluation

y Actual vs. predicted results (confusion matrix)


Error analysis y True/false positives/negatives: TP, FP, FN, TN
y Metrics: precision, recall, accuracy, and F1 score

y Trade-off between false and true positive rates


Receiver operating
y Distinct cutoff points and areas under the curve (AUC)
characteristic (ROC)
y Greater AUC (closer to 1) means better performance

y Appropriate for continuous data and regressions


Root mean square
y Measures all prediction errors
error (RMSE)
y Smaller RMSE means better performance

Vol 1-103
Learning Module 7

As with regression models, error analysis can be used to test model performance. Error analysis identifies
true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). A false positive is
called a Type I error, while a false negative is a Type II error. A confusion matrix is used to visualize each
of these four outcomes, as shown in Exhibit 13:

Exhibit 13 Confusion matrix

Actual training results

Class "1" Class "0"


(positive) (negative)

False positives Total predicted


Class "1" True positives
(FP) positives:
(positive) (TP)
Type I error TP + FP
Predicted
results
False negatives Total predicted
Class "0" True negatives
(FN) negatives:
(negative) (TN)
Type II error FN + TN

Total actual Total actual


positives: negatives:
TP + FN FP + FN

Using this information, the performance metrics of precision and recall can be used to measure how
well a model predicted each classification. Precision is the ratio of correctly predicted positive classes
to all predicted positive classes. This metric is particularly important when the cost of FPs, or Type I
errors, is high.

Recall measures the ratio of correctly predicted positive classes to all actual positive classes. This is best
used when the cost of FNs, or Type II errors, is high. Recall is the same calculation as the true positive rate:

TP
Precision =
TP + FP

TP
Recall =
TP + FN

Because precision and recall measure the costs of Type I and Type II errors, respectively, there is an
inherent trade-off between the two in business decisions. To reconcile the two, the accuracy and F1 score
can be calculated to assess the model's overall performance.

Accuracy is the percentage of correctly predicted classes out of the total predictions:

Accuracy
TP + TN
TP + FP +TN +FN

The F1 score is the harmonic mean of precision and recall:

F1 score
2 × Precision × Recall
Precision + Recall

Vol 1-104
Big Data Projects

The F1 score is the more appropriate of the two when there is an unequal distribution across the classes
and finding the equilibrium between the two is needed.

Receiver operating characteristic (ROC) involves plotting a curve showing the trade-off between the false
positive rate (x-axis) and true positive rate (y-axis).

Area under the curve (AUC) measures the area under the ROC curve. An AUC of 1.0 indicates perfection,
while an AUC below 0.5 indicates random guessing. The ROC curve becomes more convex with respect to
the true positive rate as AUC increases, as shown in Exhibit 14.

Exhibit 14 Area under the curve (AUC) for different


receiver operating characteristic (ROC) curves

True positive 1.0


rate (TPR)
Model Y
AUC = 80%

0.8 Model X
AUC = 95%

Model Z
0.6 AUC = 70%

Random guess
AUC = 50%

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate (FPR)

For a continuous data set, the root mean squared error (RMSE) metric can be used to assess a
model's performance. The RMSE captures all the prediction errors in the data (n) and is mostly used for
regression methods.

The formula for RMSE is:

RMSE

(Predictedi − Actuali)
��
n

Vol 1-105
Learning Module 7

Tuning

LOS: Describe objectives, steps, and techniques in model training.

Once a model's performance has been evaluated, steps can be taken to improve its performance. A high
prediction error on the training set indicates that the model is underfitting, while higher prediction errors on
the cross-validation set compared with the training set tell the researcher that the model is overfitting.

There are two types of errors when model fitting:

y Bias error is high when the model underfits the training data. This generally occurs when the model is
underspecified and the model is not adequately learning from the patterns in the training data. In these
cases, both the training set prediction errors and cross-validation errors will be large.
y Variance error is high when the model overfits to the training data, or the model is overly complicated.
The training set prediction error will be much lower than on the cross-validation set.

Neither of these errors can be completely eliminated, but the trade-off between the two errors should
minimize the aggregate error over the data series. Balance is necessary to finding the optimal model that
neither underfits nor overfits.

Finding the correct model parameters, such as regression coefficients, weights in neural networks
(NNs), and support vectors in support vector machines, is critical to properly fitting a model. The model
parameters are dependent on the training data and are learned during the training process through
optimization techniques.

Hyperparameters are not dependent on training data and are used for estimating model parameters.
Examples of these include the regularization term (λ) in supervised models, activation function and number
of hidden layers in NNs, number of trees and tree depth in ensemble methods, k in k-nearest neighbor
classification and k-means clustering, and p-threshold in logistic regression.

Researchers optimize hyperparameters based on tuning heuristics and grid searches rather than
using an estimation formula. A grid search is a method for training ML models through combinations of
hyperparameter values and cross-validation to produce optimal model performance (training error and
cross-validation error are close), which leads to a lower probability of overfitting. The plot of training errors
for each hyperparameter is called a fitting curve, shown in Exhibit 15:

Vol 1-106
Big Data Projects

Exhibit 15 Fitting curve

Large Error
High Variance High Bias

Errorev

Errortrain

Errorev >> Errortrain


Error
Overfitting
Underfitting

Optimum Regularization
Small Error
Slight Regularization Large Regularization
Lambda (λ)

© CFA Institute

When there is little or slight regularization, the model has the potential to "memorize" the training data.
This will lead to overfitting, where the prediction error on the training set is low, but high when the model
is tested on the cross-validation set. In this case the model is not generalizing well, and the variance error
will be high.

Conversely, large regularization will only use a few features, and the model will learn less from the data. In
these cases, the prediction errors on both the training set and the cross-validation set will be high, resulting
in a high bias error.

The optimal solution finds the balance between variance error and bias error. Model complexity is penalized
just enough to select only the most important features, allowing the model to learn enough from the data to
read the important patterns without simply memorizing the data.

If high bias or variance exists after tuning the hyperparameters, the researcher may need to increase the
number of training examples or reduce the number of features in the case of high variance, or increase
the number of features in the case of bias. Thereafter, it needs to be retuned and retrained. If a model is
complex and comprised of submodel(s), ceiling analysis can identify which parts of the model pipeline can
improve performance.

Vol 1-107
Learning Module 7

Vol 1-108
Economics
Learning Module 1
Currency Exchange Rates:
Understanding Equilibrium Value

LOS: Calculate and interpret the bid-offer spread on a spot or forward currency quotation and
describe the factors that affect the bid-offer spread.

LOS: Identify a triangular arbitrage opportunity and calculate its profit, given the bid-offer
quotations for three currencies.

LOS: Explain spot and forward rates and calculate the forward premium/discount for a
given currency.

LOS: Calculate the mark-to-market value of a forward contract.

LOS: Explain international parity conditions (covered and uncovered interest rate parity, forward
rate parity, purchasing power parity, and the international Fisher effect).

LOS: Describe relations among the international parity conditions.

LOS: Evaluate the use of the current spot rate, the forward rate, purchasing power parity, and
uncovered interest parity to forecast future spot exchange rates.

LOS: Explain approaches to assessing the long-run fair value of an exchange rate.

LOS: Describe the carry trade and its relation to uncovered interest rate parity and calculate the
profit from a carry trade.

LOS: Explain how flows in the balance of payment accounts affect currency exchange rates.

LOS: Explain the potential effects of monetary and fiscal policy on exchange rates.

LOS: Describe objectives of central bank or government intervention and capital controls and
describe the effectiveness of intervention and capital controls.

LOS: Describe warning signs of a currency crisis.

Foreign Exchange Market Concepts

LOS: Calculate and interpret the bid-offer spread on a spot or forward currency quotation and
describe the factors that affect the bid-offer spread.

An exchange rate represents the price of one currency in terms of another currency. It is stated as the
number of units of a particular currency (the price currency) required to purchase one unit of another
currency (the base currency).

Vol 1-111
Learning Module 1

CFA curriculum uses the convention P/B: the number of units of the price (P) currency needed to purchase
one unit of the base (B) currency. For example, suppose the USD/GBP exchange rate is currently 1.5125.
From this exchange rate quote, we can infer the following:

y The GBP is the base currency, and the USD is the price currency.
y 1 GBP will buy 1.5125 USD.
y It will take 1.5125 USD to purchase 1 GBP, or 1 GBP costs 1.5125 USD.
y A decrease in the exchange rate (eg, from 1.5125 to 1.5120) means that 1 GBP will be able to
purchase fewer USD.
○ Fewer USD will now be required to purchase 1 GBP (ie, the cost of 1 GBP has fallen).
○ This decrease in the exchange rate means that the GBP has depreciated (ie, lost value) against the
USD, or equivalently, that the USD has appreciated (ie, gained value) against the GBP.

Just like the price of any product, the price reflected in an exchange rate is the amount of the numerator
currency that can be purchased with one unit of the denominator currency.

Spot exchange rates (S) are quotes for transactions that call for immediate delivery. For most currencies,
immediate delivery means “T + 2” (ie, the transaction is settled 2 days after the trade is agreed upon by
the parties).

In professional FX markets, an exchange rate is usually quoted as a two-sided price. Dealers typically quote
both a bid price (ie, the price at which they are willing to buy) and an offer price (ie, the price at which they
are willing to sell). Bid-offer prices are always quoted in terms of buying and selling the base currency.

Bid-offer quotes in foreign exchange have two main points:

y The offer price is higher than the bid price, which creates the bid-offer spread, a compensation for
providing foreign exchange.
y Requesting a two-sided quote from the dealer allows a choice between whether the base currency
will be bought (ie, paying the offer) or sold (ie, hitting the bid). This choice provides flexibility in
transactions.

In FX, dealers have two pricing levels: one level for clients and another for the interbank market. Dealers
engage in currency transactions among themselves in the interbank market in order to adjust their
inventories and risk positions, distribute foreign currencies to clients, and transfer FX rate risk to willing
market participants. This global network handles large transactions, typically over 1 million units of the base
currency; nonbank entities like institutional asset managers and hedge funds can also access the network.

The bid-offer spread that dealers provide to clients is typically wider than what is observed in the
interbank market.

The bid-offer spread is sometimes measured in points, or pips, which are scaled to the last digit in the spot
exchange rate quote. Exchange rates for most currency pairs (except those involving the Japanese yen)
are quoted to four decimal places. For example, the bid-offer spread in the interbank market for USD/EUR
might be 1.2500–1.2504. This is a difference of 0.0004, or 4 pips, while a dealer’s spread for the same
currency pair may be 0.0006, or 6 pips.

The bid-offer spread in the FX market, as quoted to dealers’ clients, can vary widely among different
exchange rates and can change over time, even for a single exchange rate. The spread size is primarily
influenced by the bid-offer spread in the interbank market, transaction size, and the relationship between
the dealer and the client. A client’s creditworthiness can also be a factor, although, given the short
settlement cycle in the spot FX market, credit risk is not the primary determinant of bid-offer spreads.

Vol 1-112
Currency Exchange Rates: Understanding Equilibrium Value

The spread size in the interbank market depends on liquidity, which is influenced by the following:

y Currency pair: The liquidity and market participation levels differ between currency pairs. Major pairs
like USD/EUR and JPY/USD usually have high liquidity and narrower spreads due to widespread
market activity. Conversely, more obscure currency pairs (eg, MXN/CHF) have thinner market
participation, resulting in wider spreads.
y Time of day: Liquidity in the interbank FX market is highest when major trading centers like London
and New York overlap, typically from 8:00 a.m. to 11:00 a.m. (New York time). During this period, most
currency pairs are more liquid. The Asian session, starting around 7:00 p.m. (New York time) is less
liquid for most pairs.
y Market volatility: Greater uncertainty in the market—caused by events like geopolitical conflicts,
market crashes, or major data releases—leads to wider bid-offer spreads. During such times, market
participants seek to reduce their risk exposure or charge higher prices for assuming risk.

Arbitrage Constraints on Spot Exchange Rate Quotes

LOS: Identify a triangular arbitrage opportunity and calculate its profit, given the bid-offer
quotations for three currencies.

Bid-offer quotes provided by dealers in the interbank market must adhere to two arbitrage constraints:

y The bid quoted by a dealer cannot be higher than the current interbank offer, and the offer from
another dealer cannot be lower than the current interbank bid. Otherwise, other participants in the
interbank market would be able to earn riskless arbitrage profits by purchasing low and selling high. To
illustrate, assume that the current USD/EUR exchange rate in the interbank market is 1.3802–1.3806.
○ If a dealer quoted a (misaligned) exchange rate of 1.3807–1.3811, other market participants would
buy the EUR in the interbank market (at 1.3806 USD/EUR) and sell EUR to the dealer (at 1.3807
USD/EUR) to make a profit of 1 pip.
○ If a dealer quoted a (misaligned) exchange rate of 1.3797–1.3801, other market participants would
buy the EUR from the dealer (at 1.3801 USD/EUR) and sell EUR in the interbank market (at 1.3802
USD/EUR) to make a profit of 1 pip.

y The cross-rate bid quoted by a dealer cannot be higher than the implied cross rate offered in the
interbank market, while the cross rate offer quoted by another dealer cannot be lower than the implied
cross rate bids in the interbank market.
○ The implied A/C cross rate is derived from exchange rate quotes for two currency pairs, A/B and
C/B. This cross rate must align with the A/B and C/B rates to ensure consistency and prevent
arbitrage opportunities.
○ In FX cross rates, there are two trading methods: (1) using the direct cross rate A/C or (2) using the
A/B and C/B rates. Inconsistent rates prompt arbitrageurs to buy the undervalued currency and sell
the overvalued one, restoring balance.

To illustrate triangular arbitrage involving three currencies, assume the following exchange rate quotes
from a European bank: USD/EUR = 1.3802–1.3806 and GBP/EUR = 0.8593–0.8599.

When working with bid-offer quotes to determine whether to use the bid or offer rate in an exchange rate
quote for a specific transaction:

y first identify the base currency of the exchange rate quote, and then
y determine whether the client is buying or selling the base currency.

Vol 1-113
Learning Module 1

In this example, the bid and offer quotes for EUR/USD (with the USD as the base currency) are
calculated as:

EUR 1 1
= = = 0.7243
USDbid USD 1.3806
EURoffer

EUR 1 1
= = = 0.7245
USDoffer USD 1.3802
EURbid

Therefore, EUR/USD bid/ask rates = 0.7243−0.7245.

To calculate the market-implied bid-offer quote for the GBP/USD cross rate, first examine the transactions
needed to exchange GBP for USD, which involve the EUR/USD and GBP/EUR currency pairs. The
transactions are to:

y sell GBP and buy EUR, and then


y sell EUR and buy USD.

This transaction can be represented by the equation:

GBP EUR GBP


X =
EUR USD USD

On the right-hand side of the equals sign, “sell GBP, buy USD” is given in the GBP/USD price quote. Since
the aim is to buy the currency in the denominator (USD), the offer rate calculated is for GBP/USD.

On the left-hand side, this breaks down as follows:

y “Sell GBP, buy EUR”—the offer rate for GBP/EUR


y “Sell EUR, buy USD”—the offer rate for EUR/USD

Then, a GBP/USD offer rate is:


GBP EUR GBP
� � =� � � � = 0.7245 × 0.8599 = 0.6230
USD offer USD offer EUR offer

Calculating the implied GBP/USD bid rate follows the same process, but with “buy GBP, sell USD,”
which results in:

GBP EUR GBP


� � =� � � � = 0.7243 × 0.8593 = 0.6224
USD bid USD bid EUR bid

Therefore, the market-implied bid-offer quote for GBP/USD is 0.6224–0.6230.

To prevent arbitrage opportunities, the implied bid rate must be lower than the implied offer rate. The
constraints on implied cross rates are like those for spot rates but involve currency pairs. In practice,
any violations of these constraints quickly disappear since traders and algorithms exploit the pricing
inefficiencies.

Vol 1-114
Currency Exchange Rates: Understanding Equilibrium Value

Forward Markets

LOS: Explain spot and forward rates and calculate the forward premium/discount for a
given currency.

Forward contracts involve exchanging one currency for another on a future date at a pre-agreed
exchange rate. Any currency exchange transaction with a settlement date beyond T + 2 qualifies as a
forward contract.

When determining forward exchange rates, an arbitrage relationship is applied to ensure the return on two
equivalent investments is the same. Bid-offer spreads on exchange rates and money market instruments
should be ignored in the explanation of this relationship in order to keep it simple and clear.

In addition, assuming the domestic currency is the base currency in the exchange rate quote, the exchange
rate notation can be shifted from price/base currency (P/B) to foreign/domestic currency (f/d). This notation
helps to illustrate an investor’s decision-making process when choosing between domestic and foreign
investments, and it shows how arbitrage relationships ensure equalized returns when the investments’ risk
profiles are similar.

An investor with 1 unit of domestic currency for a year has two options:

y Invest in domestic cash at the domestic risk-free rate (id), resulting in (1 + id) at year-end.
y Convert domestic currency to foreign currency at the spot rate (S f/d), invest for a year at the foreign
risk-free rate (if), and end up with S f/d(1 + i f) units of foreign currency. These funds need to be
converted back to domestic currency. By using a one-year forward contract with a forward rate denoted
as Ff/d, the investor eliminates foreign exchange risk, as the future exchange rate is set at the start
of the period. The investment’s worth in domestic currency at the end of the year equals S f/d(1 +
if)(1 / Ff/d).

These two risk-free investment alternatives must provide the same return to avoid creating an opportunity
for riskless arbitrage.

1
(1 + id) = Sf/d(1 + if)� �
Ff/d

This explanation is based on a one-year horizon, but it holds for any investment timeline. In this arbitrage
relationship, risk-free assets, typically represented by bank deposits, are quoted using the appropriate
market reference rate (MRR) for each currency. Day count conventions, such as actual/360 or actual/365,
are used to calculate interest. For simplicity, this discussion consistently employs the actual/360-day count
convention, except for GBP, which uses the actual/365 convention. Integrating this day count convention
into the arbitrage formula leads to:

Actual Actual 1
�1 + id � �� = Sf/d �1 + if � ��� �
360 360 Ff/d

Vol 1-115
Learning Module 1

The equation can be rearranged to isolate the forward rate:

Actual
1 + if � �
360
Ff/d = Sf/d � �
Actual
1 + id � �
360

This equation represents covered interest rate parity, which relies on an arbitrage relationship involving
risk-free interest rates, spot exchange rates, and forward exchange rates. This relationship between
investment options asserts that the covered (ie, currency-hedged) interest rate differential between two
markets is equal to zero.

The forward premium or discount can be calculated using the rearranged covered interest rate
parity equation.

Actual
� �
360
Ff/d − Sf/d = Sf/d � � (if − id)
Actual
1 + id � �
360

The domestic currency trades at a forward premium (Ff/d > S f/d) only when the foreign risk-free interest
rate is greater than the domestic risk-free interest rate (if > id). If it is possible to earn a higher interest rate
in the foreign market than in the domestic market, the forward discount on the foreign currency will offset
the higher foreign interest rate. If not, covered interest rate parity would not be maintained, and arbitrage
opportunities would arise.

Conversely, when the foreign currency is at a higher rate in the forward contract than the spot rate, it is
trading at a forward premium. In this scenario, the foreign risk-free interest rate is less than the domestic
risk-free interest rate.

Furthermore, the premium or discount is influenced by the spot exchange rate (S f/d)—the interest rate
differential (if − Id ) between the markets—and is roughly proportional to the time to maturity (actual/360).

The covered interest rate parity equation can be expressed using price and base currencies(P/B), a more
standard exchange rate quoting convention.

Actual
1 + iP � �
360
FP/B = SP/B � �
Actual
1 + iB � �
360

The forward premium or discount equation can be expressed in the same way:

Actual
� �
360
FP/B − SP/B = SP/B � � (iP − iB)
Actual
1 + iB � �
360

Vol 1-116
Currency Exchange Rates: Understanding Equilibrium Value

Example 1 Calculating the forward premium (discount)

An analyst gathered the following information:

y Spot USD/EUR = 1.4562


y 180-day risk-free rate (USD) = 1.05%
y 180-day risk-free rate (EUR) = 2.38%

Calculate the forward premium (discount) for a 180-day forward contract for USD/EUR.

Solution

Actual 180
� � � �
360 360
FP/B − SP/B = SP/B � � (iP − iB) = 1.4562 � � (0.0105 – 0.0238) = –0.00957
Actual 180
1 + iB � � 1 + 0.238 � �
360 360

Therefore, the forward discount for a 180-day forward contract for USD/EUR is 0.957%.

In professional foreign exchange (FX) markets, FX rates are quoted in points (pips), indicating the
difference between the forward and spot rates (forward premium or discount). The pips are scaled to
correspond with the last digit in the spot quote, typically the fourth decimal place.

Exhibit 1 Sample spot and forward quotes (bid offer)

Maturity Spot rate or forward points

Spot USD/EUR 1.3802 / 1.3806

One month -5.4 / -4.9

Three months -15.8 / -15.2

Six months -36.9 / -36.2

12 months -93.9 / -91.4

It is important to note that:

y The bid rate is always lower than the offer rate.


y In this scenario, negative forward points indicate that the EUR (base currency) is trading at a forward
discount, while the USD (price currency) is trading at a forward premium.
y The absolute value of forward points increases with the time to maturity.

Vol 1-117
Learning Module 1

y Quoted forward points are scaled to each maturity and are not annualized, so no adjustment is needed
before adding them to the spot rate to calculate the forward rate.
y To convert forward point quotes into an actual forward exchange rate, divide the number of pips by
10,000 (scaling them down to the fourth decimal place), and then add the result to the quoted spot
exchange rate:

Spreads in the forward market are influenced by the term of maturity of the contract. Spreads tend to widen
with longer terms due to the:

y reduced liquidity of longer-term contracts,


y greater credit risk in longer-term contracts, and
y greater interest rate risk in forward contracts. Forward rates are based on interest rate differentials.
Longer maturities result in greater duration or higher sensitivity to changes in interest rates.

The Mark-to-Market Value of a Forward Contract

LOS: Calculate the mark-to-market value of a forward contract.

A forward contract is priced to have zero value to either party at contract initiation. In the case of currency
forwards, this no-arbitrage forward price is determined based on interest rate parity. However, once
the counterparties have entered the contract, changes in the forward price (due to changes in the spot
exchange rate or interest rate changes in either of the two currencies) will alter the contract’s mark-to-
market value. The contract then holds positive value for one counterparty and an equivalent negative value
for the other.

Example 2 Valuing a forward contract prior to expiration

An investor purchased GBP 10 million for delivery against AUD in 6 months (t = 180) at an all-in forward
rate of 1.5920 AUD/GBP. Four months later (t = 120), the investor wants to close out their position.

Spot exchange rate and forward points at t = 120


Maturity Spot rate or forward points

Spot AUD/GBP 1.6110 / 1.6115


One month 5.1 / 5.2
Two months 10.3 / 10.5
Three months 15.9 / 16.2
Four months 26.4 / 26.8

Assume that the 60-day risk-free rate at t = 120 is 4.20%.

y What position would the investor take, and on which contract, to effectively close out their forward
position at t = 120?
y Calculate the gain or loss the investor would incur when closing out the forward position at t = 120.

Vol 1-118
Currency Exchange Rates: Understanding Equilibrium Value

The all-in forward rate is simply the sum of the spot rate and the forward points, appropriately scaled
to size.

Solutions

y The investor initially held a long GBP position in a 6-month contract (t = 180). Four months into this
contract (at t = 120), to close out the forward position, the investor takes an opposite short position
in a GBP 10 million offsetting forward contract that expires in another 2 months (at t = 180).

To sell GBP forward, the relevant exchange rate is the all-in AUD/GBP bid. The appropriate all-in
two-month forward exchange rate is calculated based on the spot rate at t = 120 and the forward
points on the two-month forward bid exchange rate:
1.6110 + (10.3 / 10,000) = 1.61203 AUD/GBP
This means that the investor initially purchased 10 million GBP (for delivery at t = 180) at 1.5920
AUD/GBP and then (at t = 120) sold 10 million GBP (for delivery at t = 180) at 1.61203 AUD/GBP.
The GBP amounts will net to zero at settlement, but the AUD amounts will not, as the forward rate
has changed over the four months. At contract expiration, the investor stands to make a profit
(loss) of:
Profit (loss) = (1.61203 − 1.5920) AUD/GBP × 10,000,000 GBP = AUD 200,300
The investor profits since there was a long position on the GBP and the forward rate increased
(GBP appreciated) during the four-month period from t = 0 to t = 120. It is important to note that the
investor would realize this profit at t = 180 when both forward contracts settle.

To calculate the mark-to-market value of the investor’s position at t = 120 (when the forward position
is effectively closed out through the offsetting contract), it is necessary to discount the settlement
payment for two months (the time remaining until contract expiration) at the two-month discount rate.
Given that the 60-day risk-free rate at t = 120 is 4.20%, the mark-to-market value of the original long
GBP 10 million six-month forward contract two months before settlement is calculated as follows:

1
AUD 200,300 × = AUD 198,907.65
60
�1 + �0.042 × ��
360

The steps for marking to market a currency forward position are outlined below:

y Create an equal offsetting forward position to the initial forward contract. Ensure that the settlement
dates and notional amounts in both contracts are the same.
y Determine the all-in forward rate for the offsetting forward contract. If the base currency in the
exchange rate quote should be sold in the offsetting contract, use the bid side of the quote. If it should
be purchased, use the offer side.
y Calculate the profit or loss on the net position as of the settlement date.
○ A profit occurs if the currency the investor was long on in the initial forward contract has appreciated;
a loss occurs if that currency has depreciated.
○ A loss occurs if the currency the investor was short on in the initial forward contract has appreciated;
a profit occurs if that currency has depreciated.

y Calculate the present value of the profit or loss (as of the date of initiation of the offsetting contract).
Use the appropriate LIBOR rate and adjust it if necessary.

Vol 1-119
Learning Module 1

The International Parity Conditions

LOS: Explain international parity conditions (covered and uncovered interest rate parity, forward
rate parity, purchasing power parity, and the international Fisher effect).

Before discussing international parity relations, it is essential to grasp the following concepts:

y Long run versus short run: Parity relations provide estimates of exchange rates in the long run, and
they are typically poor predictors of exchange rates in the short run. Long-term equilibrium values
act as an anchor for exchange rate movements, and short-term exchange rates fluctuate around
those values.
y Expected versus unexpected changes: Expected changes are generally reflected in current
prices (including exchange rates), while unexpected changes introduce risk as they can lead to
more significant price movements. Consequently, investors demand a premium for bearing the risk
associated with unpredictable outcomes.
y Relative movements: Exchange rates are influenced by relative changes in economic factors
across countries, not absolute or isolated changes. Since an exchange rate represents the price of
one currency in terms of another, it is essential to evaluate the inflation rate in one country relative
to the inflation rate in the other country when determining the impact on the exchange rate between
their currencies.

There is no simple formula, model, or theory that can enable investors to accurately forecast exchange
rates. However, the theories that will be discussed offer a framework for developing a perspective on
exchange rates and for understanding some of the forces that influence them.

International Parity Conditions


International parity conditions serve as the foundational principles for most exchange rate models,
and include:

y Covered interest rate parity


y Uncovered interest rate parity
y Forward rate parity
y Purchasing power parity
y The international Fisher effect

These conditions aim to establish connections between expected inflation differentials, interest rate
disparities, forward and spot exchange rates, and anticipated future spot exchange rates, under idealized
circumstances. It is important to note that the conditions often depend on simplifying assumptions such
as negligible transaction costs, universally available information, risk neutrality, and freely adjustable
market prices.

In practice, empirical studies have shown that these parity conditions are rarely met in the short term.
However, they play a crucial role in forming a comprehensive, long-term perspective on exchange rates
and associated risk exposures. The exception to this rule is covered interest rate parity, which is the only
condition actively enforced by arbitrage opportunities.

Vol 1-120
Currency Exchange Rates: Understanding Equilibrium Value

Covered and Uncovered Interest Rate Parity and Forward Rate


Parity

LOS: Explain international parity conditions (covered and uncovered interest rate parity, forward
rate parity, purchasing power parity, and the international Fisher effect).

LOS: Describe relations among the international parity conditions.

LOS: Evaluate the use of the current spot rate, the forward rate, purchasing power parity, and
uncovered interest parity to forecast future spot exchange rates.

Covered interest rate parity describes a no-arbitrage condition in which the covered or currency-hedged
interest rate differential between two currencies equals zero. This means that there is a no-arbitrage
relationship among risk-free interest rates, spot exchange rates, and forward exchange rates.

y If the risk-free rate on one currency is higher than on another currency, the currency with the higher
risk-free rate will trade at a forward discount relative to the other currency. The benefit of the higher
interest rate is offset by a decline in the currency’s value.
y If the risk-free rate of the price currency is greater than that of the base currency, the base currency
will trade at a forward premium (ie, the forward exchange rate will be higher than the spot exchange
rate). This implies that the price currency will trade at a forward discount and is expected to depreciate
in the future.
y Conversely, the currency with the lower risk-free rate will trade at a forward premium relative to
the other currency. The benefit of the expected appreciation of the currency is offset by the lower
interest rate.

For covered interest rate parity to hold, zero transaction costs and free capital mobility must be assumed.
It is also assumed that the underlying money market instruments are identical in terms of liquidity, maturity,
and default risk. Covered interest rate differentials are generally close to zero under normal market
conditions, indicating that covered interest parity tends to hold.

Uncovered Interest Rate Parity


Uncovered interest rate parity (UIP) states that the expected return on an uncovered or unhedged foreign
currency (FC) investment should equal that of a comparable domestic currency (DC) investment. UIP
asserts that an investor’s expected return from the following investment options should be equal.

Option 1: Invest the funds at the domestic nominal risk-free rate (iDC) for a particular period of time.

y If 1 unit of DC is invested at iDC for 1 year, the value after 1 year would be (1 + iDC).

Option 2: Convert funds into a foreign currency (at the current spot rate, S FC/DC), invest them at the
foreign nominal risk-free rate (iFC) for the same period as in Option 1, and then convert them back into the
domestic currency after 1 year at the expected spot exchange rate 1 year from today (S eFC/DC).

y Convert 1 unit of DC into FC today: receive 1 DC × S FC/DC = S FC units of FC.


y Invest S FC units of FC at the foreign risk-free rate (i FC). After one year, receive S FC × (1 + i FC).
y Convert this amount back into DC at the expected spot exchange rate 1 year from today (S eFC/DC).
After 1 year, the value of the investment (in DC terms) equals:
SFC × (1 + iFC)
e
SFC/DC

Vol 1-121
Learning Module 1

Therefore, the uncovered interest rate parity equation (assuming a time horizon of 1 year) is given as:

SFC/DC
(1 + iDC) = (1 + iFC) × e
SFC/DC

This equation can be rearranged to derive the formula for the expected future spot exchange rate:

e (1 + iFC)
SFC/DC = SFC/DC ×
(1 + iDC)

The expected percentage change in the spot exchange rate can be calculated as:
e
e
SFC/DC − SFC/DC
%∆SFC/DC =
SFC/DC

The expected percentage change in the spot exchange rate can be estimated as:

%∆SeFC/DC ≈ iFC − iDC

Notice that the numerator-denominator rule also applies here. In an A/B exchange rate quote, Country A’s
interest rate is in the numerator and Country B’s interest rate is in the denominator of the uncovered interest
rate parity equation.

In covered interest rate parity, the investor locks in the forward exchange rate today and therefore is not
exposed to currency risk.

y If covered interest rate parity holds, the forward premium/discount offsets the yield differential.

In uncovered interest rate parity, the investor leaves the foreign exchange position uncovered (ie,
unhedged) and expects to convert foreign currency holdings back into the domestic currency at the
expected future spot rate.

y If uncovered interest rate parity holds, the expected appreciation/depreciation of the currency offsets
the yield differential.

Example 3 Covered versus uncovered interest rate parity

Consider the following information:

y Risk-free rate on the USD = iUSD = 4%


y Risk-free rate on the GBP = iGBP = 5%
y Current spot USD/GBP exchange rate = S USD/GBP = 1.5025
y Assume that the USD is the foreign currency, and the GBP is the domestic currency.

Compute the 1-year USD/GBP forward rate and the forward premium (discount), assuming that interest
rate parity holds.

Vol 1-122
Currency Exchange Rates: Understanding Equilibrium Value

Solution

Covered interest rate parity states that the holding period return on an investment in a domestic money-
market instrument and an investment in a fully currency-hedged foreign money-market instrument must
be equal.

If covered interest rate parity holds, we can compute the forward exchange rate today as:

1 + iUSD 1 + 0.04
FUSD/GBP = SUSD/GBP � � = 1.5025 × � � = 1.4882 USD/GBP
1 + iGBP 1 + 0.05

Then, the forward premium (discount) on the GBP can be computed as:

FUSD/GBP − SUSD/GBP 1.4882 − 1.5025


= ≈ −0.009524 ≈ −0.9524%
SUSD/GBP 1.5025

According to covered interest rate parity, the GBP would experience a forward discount of approximately
1% against the USD. This forward discount can be attributed to its relatively high interest rate compared
with the USD (5% versus 4%). Conversely, the USD would exhibit a forward premium of around 1%
against the GBP due to its lower interest rate.

Now, compute the expected USD/GBP spot rate in 1 year and the expected change in the spot
exchange rate over the year, assuming that uncovered interest parity is expected to hold.

Solution
Uncovered interest rate parity states that the expected holding period return on an investment in a
domestic money-market instrument and an unhedged investment in a foreign money-market instrument
(against currency risk) would be the same. If uncovered interest rate parity holds, the expected future
spot rate can be computed as follows:

e 1 + iUSD 1 + 0.04
SUSD/GBP = SUSD/GBP × � � = 1.5025 × � � = 1.4882 USD/GBP
1 + iGBP 1 + 0.05

The change in the spot exchange rate for the GBP against the USD over the following year is expected
to be:
e
SUSD/GBP − SUSD/GBP 1.4882 − 1.5025
= = −0.952%
SUSD/GBP 1.5025
According to uncovered interest rate parity, the GBP is expected to depreciate by about 1% over the
following year, with the spot exchange rate falling from 1.5025 USD/GBP to 1.4882 USD/GBP. This
depreciation is expected as the GBP has a higher interest rate relative to the USD (5% versus 4%).
Simultaneously, the USD is expected to appreciate by approximately 1% over the year against the GBP.

Vol 1-123
Learning Module 1

It is important to note that:

y Under uncovered interest rate parity, the predicted change in spot rates goes against intuition.
Typically, an increase in interest rates would lead to a currency’s appreciation, but uncovered
interest rate parity suggests the opposite. In this example, although the GBP has a higher interest
rate, if uncovered interest rate parity holds, it implies that the GBP is expected to depreciate against
the USD.
y Uncovered interest rate parity asserts that the expected return on the unhedged foreign investment
is the same as the return on the domestic investment. Nevertheless, the distribution of potential
return outcomes differs. The domestic currency return is known with certainty, whereas the
unhedged foreign investment return could:

○ Equal the domestic currency return: The percentage appreciation of the USD matches the interest
rate differential (1%), as in the second question in this example.
○ Be less than the domestic currency return: The percentage appreciation of the USD is less than
the interest rate differential (1%).
○ Be greater than the domestic currency return: The percentage appreciation of the USD is greater
than the interest rate differential (1%).

Due to the uncertainty associated with the future spot exchange rate, uncovered interest rate parity is
often violated. Investors, who are generally not risk neutral, demand a risk premium to compensate for
the exchange rate risk inherent in leaving their positions unhedged. Consequently, future spot exchange
rates typically do not equal the forward exchange rate. Forward rates, which are based purely on interest
rate differentials to prevent covered interest arbitrage, are therefore poor predictors of future spot
exchange rates.

The uncovered interest parity equation is quite similar to the covered interest parity equation, except that
the expected future spot exchange rate replaces the forward rate. In conclusion:

y Covered interest rate parity is a no-arbitrage condition that uses the forward exchange rate.
y Uncovered interest rate parity is a theory regarding expected future spot rates.

Empirical evidence suggests that:

y Uncovered interest rate parity is not valid in the short and medium terms but tends to work better in the
long term. In the short and medium terms, interest rate differentials fail to explain changes in exchange
rates; this makes forward rates (which are computed based on these differentials) poor predictors of
future exchange rates. In contrast, over the long term, uncovered interest rate parity and rate parity
have more empirical support.
y Current spot exchange rates are also unreliable predictors of future spot exchange rates due to the
high volatility in exchange rate movements. This indicates that exchange rates do not adhere to a
random walk.

Forward Rate Parity


Forward rate parity is based on both covered and uncovered interest rate parity. When covered interest
rate parity holds (which is generally the case since it is a no-arbitrage condition), the forward premium or
discount roughly matches the interest rate differential.

Forward premium (discount) as a % ≃ F/DC − SFC/DC ≃ iFC − iDC

Vol 1-124
Currency Exchange Rates: Understanding Equilibrium Value

If uncovered interest rate parity holds, the expected future spot rate equals the forward rate (S eFC/DC = FFC/
DC), and the expected change in the spot exchange rate roughly equals the interest rate differential.

e
Expected % change in spot exchange rate ≃ %∆SFC/DC ≃ iFC − iDC

Therefore, if both covered and uncovered interest rate parity hold, the forward premium (discount) will
roughly equal the interest rate differential, and the forward rate will roughly equal the expected spot
exchange rate. This condition, in which the forward rate equals the expected spot rate, is known as forward
rate parity.
e
FFC/DC − SFC/DC = SFC/DC − SFC/DC ≃ iFC − iDC → FFC/DC = SeFC/DC
In this condition, the forward rate is an unbiased forecast of the future spot exchange rate.

Purchasing Power Parity

LOS: Explain international parity conditions (covered and uncovered interest rate parity, forward
rate parity, purchasing power parity, and the international Fisher effect).

LOS: Describe relations among the international parity conditions.

LOS: Evaluate the use of the current spot rate, the forward rate, purchasing power parity, and
uncovered interest parity to forecast future spot exchange rates.

Purchasing power parity (PPP) is based on the law of one price, which states that identical goods should
have the same price across countries when valued in a common currency. For example, if a pen in the US
costs USD 2, and an identical pen in Europe is EUR 3, assuming no transaction costs or trade restrictions,
the USD/EUR exchange rate must be 0.667, as shown in the equation below.

USD USD
Price of pen in USD = Price of pen in EUR × exchange rate = USD2 = EUR3 × (0.667)
EUR EUR
Therefore, the law of one price can be expressed as follows:

Law of one price: Pfx = Pdx × Sf/d

If the price of these pens rises in Europe, there will be a flow of pens from the US to Europe, increasing the
supply of EUR (and demand for USD) to purchase pens. Eventually, the USD/EUR exchange rate will fall
until the price differential is eliminated.

Thus, according to the law of one price, a relative increase (decrease) in prices in one country will result in
depreciation (appreciation) of that country’s currency, ensuring that exchange rate-adjusted prices remain
constant across countries.

Absolute purchasing power parity (PPP) extends the law of one price to a broad range of goods and
services consumed in different countries. Instead of focusing on just one individual good, absolute PPP
states that one country’s general price level (Pd) should equal the currency-adjusted general price level (Pf)
in the other country. Absolute PPP can be expressed as:

Pf = Pd X S f / d

Vol 1-125
Learning Module 1

Absolute PPP assumes that all goods are tradeable, and that price indexes (used to determine the general
price level) in both countries include the same goods and services with identical weights. The equations
above can be rearranged to solve for the nominal exchange rate (S f/d):

Pf
Sf/d =
Pd

Absolute PPP asserts that the equilibrium exchange rate between two countries is determined by the ratio
of their respective national price levels. However, absolute PPP generally does not hold, due to differences
in product mixes and consumption baskets across countries, as well as the transaction costs and trade
restrictions involved in international trade.

Instead of assuming that there are no transaction costs and other trade impediments (as is the case with
absolute PPP), relative PPP assumes that these factors are constant over time. Relative PPP claims
that changes in exchange rates are linked to relative changes in national price levels, even if the relation
between exchange rate levels and price levels does not hold.

Given an f/d exchange rate quote, the foreign inflation rate will be in the numerator and the domestic
inflation rate will be in the denominator. Relative PPP can be expressed as follows:

1 + �f T
T 0
Sf/d = Sf/d × � �
1 + �d

According to relative PPP, changes in the spot exchange rate can be approximated as:

%∆Sf/d ≃ � f − � d

Relative PPP suggests that the percentage change in the spot exchange rate (%ΔS f/d) is entirely
determined by the difference between foreign and domestic inflation. For example, if US inflation is 5%
and Eurozone inflation is 8%, then the USD/EUR exchange rate should fall by 3% and the USD (a low-
inflation currency) should appreciate, while the EUR (a high-inflation currency) should depreciate. If relative
PPP is maintained across countries, currencies of countries with higher (lower) inflation rates depreciate
(appreciate). This is in keeping with the basic economic principle that inflation devalues currency as a
medium of exchange.

The ex ante version of PPP is based on relative PPP. While relative PPP asserts that actual changes in
the exchange rate are driven by actual relative changes in inflation, ex ante PPP suggests that expected
changes in spot exchange rates are entirely driven by expected differences in national inflation rates.
According to ex ante PPP, countries expecting persistently high (low) inflation rates should also expect
currency depreciation (appreciation) over time.
e
%∆Sf/d ≃ �ef − � ed

Historically, it has been observed that:

y In the short run, nominal exchange rates often deviate from the path predicted by PPP.
y Over the long run, nominal exchange rates tend to move toward their long-run PPP equilibrium values,
although sometimes this process is very slow. Thus, PPP does provide a valid framework for assessing
the long-run fair value of a currency.

Vol 1-126
Currency Exchange Rates: Understanding Equilibrium Value

The Fisher Effect, Real Interest Rate Parity, and International Parity
Conditions

LOS: Explain international parity conditions (covered and uncovered interest rate parity, forward
rate parity, purchasing power parity, and the international Fisher effect).

LOS: Describe relations among the international parity conditions.

LOS: Evaluate the use of the current spot rate, the forward rate, purchasing power parity, and
uncovered interest parity to forecast future spot exchange rates.

The Fisher effect asserts that the nominal interest rate (i) in a country is the sum of its real interest rate (r)
and the expected inflation rate (πe) and can be expressed as:

i = r + πe

Therefore, the expressions for the domestic and foreign nominal interest rates are given as:

id = rd + � ed
if = rf + �ef

Exhibit 2 International Fisher effect: real interest rate parity

Considerations Mathematical expression

Fisher effect in one currency i = r + �e

Fisher effect applied to domestic id = rd + �de


(d ) and foreign (f ) currency iƒ = rƒ + �ƒe

Nominal interest rate spread iƒ − id = (rƒ − rd) + (�ƒe − �ed)

Rearrange equation rƒ − rd = (iƒ − id) − (�ƒe − �ed)

International Fisher effect iƒ − id = �ƒe − �ed

Implication rƒ − rd = 0 or rƒ = rd

r = Real interest rate i = Nominal interest rate


�e = Expected inflation d = Domestic
ƒ = Foreign

Real interest rate parity can be seen as the international application of the law of one price for securities (as
real interest rates represent the real prices of securities).

y If uncovered interest rate parity holds, the nominal interest rate spread (if − id) roughly equals the
expected change in the exchange rate (%ΔS ef/d).
y If ex ante PPP holds, the difference in expected inflation rates (π ef − π ed) approximately equals the
expected change in the spot rate (%ΔS ef/d).

Vol 1-127
Learning Module 1

If both uncovered interest rate parity and ex ante PPP are assumed to hold, the real yield spread between
foreign and domestic countries (rf − rd) will equal zero. This proposition, holding that real interest rates
converge to the same level across different countries as real yield spreads equal zero, is known as the real
interest rate parity condition.

Further, if the real yield spread (rf − rd) equals zero in all markets, it follows that the foreign-domestic
nominal yield spread will be determined by the foreign-domestic expected inflation rate differential. This is
known as the international Fisher effect.

The international Fisher effect and real interest rate parity assume that currency risk is uniform worldwide.
In reality, however, countries with relatively high debt levels may carry greater currency risk, increasing the
likelihood of currency depreciation. In such cases, subtracting the expected inflation rate from the nominal
interest rate will yield a calculated real interest rate higher than in other countries, as the nominal interest
rate also includes a currency risk premium. Consequently, elevated risk may lead to a country’s nominal
and real risk-free rates being higher than expected under the international Fisher effect and real interest
parity conditions.

International Parity Conditions: Tying All the Pieces Together


Covered interest rate parity: Arbitrage ensures that differences in nominal interest rates equal the forward
premium (discount).

y Currencies with higher nominal interest rates trade at a forward discount, while those with lower rates
trade at a premium.

Uncovered interest rate parity: The expected change in the spot rate equals the nominal interest
rate spread.

y Currencies with higher nominal interest rates are expected to depreciate, and those with lower rates
are expected to appreciate.

If both covered and uncovered interest rate parity hold, the nominal interest rate spread equals the forward
premium (discount) and the expected appreciation (depreciation) in the exchange rate. Therefore, the
forward rate serves as an unbiased predictor of the future spot exchange rate.

Ex ante PPP: Differences in expected inflation rates lead to future changes in spot rates.

y Currencies with higher expected inflation rates are expected to depreciate, and those with lower
expected inflation rates are expected to appreciate.

International Fisher effect: Under the assumption that the Fisher effect holds in each market and real
interest rate parity holds, the difference between domestic and foreign nominal interest rates equals the
difference between domestic and foreign expected inflation rates.

If ex ante PPP and the Fisher effect hold, the expected inflation differential equals both the expected
change in the spot exchange rate and the nominal interest rate differential. This implies that uncovered
interest rate parity holds.

If all international parity relations held, global investors would be unable to consistently profit from currency
movements, and the expected percentage change in the spot rate would be equal to:

y the forward premium or discount (expressed as a percentage),


y the nominal yield spread between countries, and
y the difference in expected inflation rates across countries.

Vol 1-128
Currency Exchange Rates: Understanding Equilibrium Value

Exhibit 3 Spot exchange rates, forward exchange rates, and interest rates

Expected change in
spot exchange rate
e Forward rate as
%∆SFC/DC
Ex ante PPP an unbiased
predictor

Foreign-domestic
Forward discount
expected inflation Uncovered interest
FFC/DC − SFC/DC
differential rate parity
SFC/DC
�eFC − �D
e
C

International Fisher Covered


effect Foreign-domestic interest rate
interest rate parity
differential
iFC − iDC

The Carry Trade

LOS: Describe the carry trade and its relation to uncovered interest rate parity and calculate the
profit from a carry trade.

Uncovered interest rate parity states that countries with higher (lower) interest rates should expect their
currencies to depreciate (appreciate). If uncovered interest rate parity held consistently, investors would
be unable to profit from a strategy involving long positions in high-yield currencies and short positions in
low-yield currencies, as the spot rate change over the investment horizon would offset the interest rate
differential.

However, studies have found that, on average, high-yield currencies do not depreciate to the levels
predicted by interest rate differentials, and low-yield currencies do not appreciate to the predicted levels.
These findings imply that foreign exchange (FX) carry trades can potentially be profitable. FX carry trades
involve taking long positions in high-yield currencies and short positions in low-yield currencies (also known
as funding currencies).

Vol 1-129
Learning Module 1

Example 4 Carry trade

Consider the following information:

y iJPY = 1%
y iAUD = 3%
y S JPY/AUD today = 85

Compute the profit on a carry trade between the JPY and the AUD if S JPY/AUD in 1 year equals 84.

Solution

A carry trade involves borrowing the low-yield currency (JPY in this scenario) and investing in the high-
yield currency (AUD). The investor earns a return equal to the interest rate differential adjusted for the
depreciation or appreciation of the high-yield currency versus the low-yield currency.

Given that the JPY/AUD exchange rate remains unchanged after 1 year (S JPY/AUD in 1 year = 85),
the investor’s profit from the carry trade is calculated as the interest earned on the investment minus
both the interest paid on the funding currency and any depreciation or appreciation of the investment
currency. Therefore, the profit on this carry trade is 2%, calculated as 3% − 1% − 0%.

Now, compute the profit on a carry trade between the JPY and the AUD if S JPY/AUD in 1 year equals 84.

Solution

Given that the JPY/AUD exchange rate falls to 84 in 1 year, the depreciation of the AUD (investment
currency) in percentage terms is calculated as:

84 – 85
%,AUD = = –1.18%
85

If uncovered interest parity held, the AUD would have depreciated by 2% (the interest rate differential)
over the year, resulting in a 1% total return on either investment (1% nominal interest rate in JPY versus
3% in AUD, offset by a 2% AUD depreciation).

However, the AUD depreciated by only 1.18%, allowing the investor to earn an excess return on the
carry trade of 0.82% (3% − 1% − 1.18%). The return on the AUD investment (1.82%) was higher than
that on the JPY (1%).

Studies have shown that carry trades tend to earn positive excess returns in normal market conditions.
During periods of low turbulence, investors anticipate earning excess returns through the strategy due
to the limited potential for sudden, substantial adverse exchange rate movements. However, in relatively
turbulent times with significant asset price and FX volatility, returns on long high-yield currency positions
have dropped, while funding costs have increased.

In Example 4, under stable market conditions, the JPY/AUD exchange rate would remain close to 85 over
the short term, allowing the investor to earn excess returns of approximately 2% over the year. Even if the
JPY/AUD gradually moved toward a level consistent with uncovered interest rate parity (2% depreciation),
the carry trade investor would have a cushion of up to 2% (the interest rate differential between the
AUD and JPY).

However, in turbulent market conditions, the AUD investment’s return could decline significantly and rapidly
due to the rapid depreciation of the AUD or a decrease in AUD asset prices. Meanwhile, the investor’s
funding costs might rise considerably due to JPY appreciation.

Vol 1-130
Currency Exchange Rates: Understanding Equilibrium Value

The returns of carry trades do not follow a normal distribution. Instead, the distribution of returns is peaked,
with fatter tails that exhibit negative skew.

y The peaked distribution around the mean indicates that carry trades typically yield small gains more
frequently than expected in a normal distribution, which is positive.
y The negative skew and fatter tails suggest that carry trades result in larger losses more frequently
than a normal distribution would imply. This increased probability of significant losses is known
as crash risk.

Exhibit 4 Carry trade distribution of returns

−3.0 −2.0 −1.0 0.0 1.0 2.0 3.0


Normal distribution Carry trade distribution

The primary reason for crash risk is that the carry trade is essentially a leveraged trade.

y Investors borrow in one currency and invest the proceeds in another. In carry trades, like all leveraged
trades, gains and losses are magnified relative to their equity bases. When an adverse shock hits
the market, investors swiftly close their positions in order to avoid potential substantial losses due to
adverse exchange rate movements. They often place stop-loss orders with their brokers to limit losses
from carry trades, and these orders are triggered when adverse exchange rate movements occur.
This creates a cascade effect as more investors close their carry trade positions, resulting in extended
adverse currency movements.
y During turbulent periods, there is typically a “flight to safety.” Investors tend to favor low-yield
currencies (with low-risk premiums) for investment, while avoiding high-yield currencies (with high-risk
exposures). Consequently, exchange rates tend to quickly move against investors holding open carry
trade positions.

Vol 1-131
Learning Module 1

The Impact of Balance of Payment Flows

LOS: Explain how flows in the balance of payment accounts affect currency exchange rates.

A country’s balance of payments consists of its current account and capital account, with the capital
account encompassing all investment and financing flows. The current account reflects real economy
transactions related to the production of goods and services, while the capital account deals with financial
flows. Different entities with distinct perspectives and motivations make decisions regarding trade (current
account) and investment/financing (capital account), and these decisions are aligned with changes in
market prices, particularly exchange rates.

Countries that import more than they export have negative current account balances (ie, deficits), while
those that export more than they import have current account surpluses. These balances are offset by
corresponding balances in the capital account. In the long term, countries running persistent current
account deficits often experience currency depreciation as they rely on debt to finance imports. Conversely,
countries with persistent current account surpluses tend to see their currencies appreciate over time.

Investment and financing decisions play a dominant role in determining short- to intermediate-term
exchange rate movements due to several factors:

y The prices of tangible goods and services take longer to adapt compared with exchange rates and
other asset prices.
y The production of real goods and services is time-consuming, and demand decisions may be subject
to inertia. In contrast, financial markets offer swift redirection of financial flows.
y Current spending and production decisions pertain to purchases and sales of current production,
whereas investment and financing decisions involve both current expenditure financing and portfolio
reallocation.
y Expected exchange rate movements can trigger significant short-term capital flows, so exchange rates
are highly sensitive to currency views held by owners and managers of liquid assets.

Current Account Imbalances and the Determination of Exchange Rates


Current account trends impact exchange rates over time through three main channels:

y The flow supply/demand channel


y The portfolio balance channel
y The debt sustainability channel

The Flow Supply/Demand Channel


The flow supply/demand channel operates on the premise that international trade in goods and services
necessitates the exchange of domestic and foreign currencies for payment. When a country runs a current
account surplus, selling more goods and services than it purchases, it should experience increased demand
for its currency, which then appreciates in value. Conversely, countries with ongoing current account deficits
experience currency depreciation. In the long run, these trends help to balance trade competitiveness.

The extent of exchange rate adjustment necessary to restore balanced current accounts depends on
various factors, including the initial trade gap, the responsiveness of import and export prices to exchange
rate changes, and the reaction of import and export demand to price fluctuations. For countries with
substantial trade deficits, a significant currency depreciation might be needed to correct the imbalance.

Vol 1-132
The Portfolio Balance Channel

However, the pass-through effect of exchange rate changes on the prices of traded goods is often limited,
with import prices rising by less than the depreciation rate due to foreign producers’ efforts to maintain
market share. Consequently, the exchange rate adjustment required to correct a trade imbalance may be
larger than expected.

In addition, studies suggest that the response of import and export demand to price changes is often slow.
Thus, several years may pass between the initial exchange rate fluctuations, the price adjustments in traded
goods, and the eventual impact on import and export demand and the underlying current account balance.

The Portfolio Balance Channel


Current account disparities result in a redistribution of financial wealth from deficit nations to surplus
nations. Deficit countries often resort to increased borrowing to fund their trade gaps. This behavior
can trigger changes in global asset preferences, subsequently affecting exchange rates. For instance,
countries with substantial current account surpluses relative to the United States may discover that their
holdings of US dollar-denominated assets surpass their desired portfolio allocations. In order to align their
dollar holdings with their preferences, these countries may undertake measures that could significantly
devalue the dollar.

The Debt Sustainability Channel


The persistence of substantial current account deficits should have limitations. When a country consistently
maintains a large current account deficit, it will eventually accumulate unsustainable levels of external debt
owed to foreign investors. As a result, investors may anticipate that a significant currency depreciation will
be necessary to reduce the deficit and stabilize external debt at a manageable level.

Such persistent imbalances can alter market perceptions of the true, long-term equilibrium exchange rate.
Deficit nations are likely to experience a gradual reduction in expectations of their currency’s long-term
equilibrium value, while surplus nations may see upward revisions in their currency’s long-term value.
Consequently, currency values should generally align with trends in debt and asset accumulation.

The Capital Flows

LOS: Explain how flows in the balance of payment accounts affect currency exchange rates.

The integration of global financial markets and the movement of capital across borders have a significant
impact on exchange rates, interest rates, and asset prices. In some cases, excessive capital inflows have
led to economic crises in emerging markets. Governments often use measures like capital controls to
mitigate the effects of currency appreciation. However, such interventions may not be entirely effective, and
black markets can hinder control efforts.

Interest rate spreads can influence capital flows, but their impact on exchange rates may be limited.
For instance, high-yield currencies may attract global investors, but government interventions can affect
expected gains. Nevertheless, a one-sided flow of capital can persist over time, driven by changes in
monetary policy and economic policies. These factors tend to impact exchange rates gradually, and capital
flows can continue over the long term.

Equity Market Trends and Exchange Rates


The relationship between equity prices and exchange rates is complex and not consistently stable. For
example, while the US equity market and the dollar sometimes exhibit positive correlation, the long-term
correlation between them is close to zero. Short- to medium-term correlations can vary significantly based

Vol 1-133
Learning Module 1

on market conditions. The instability in correlations makes it challenging to predict currency movements
based solely on expected equity market performance.

Since the global financial crisis, there has been a negative correlation between the US dollar and the US
equity market. This is attributed to the dollar’s role as a safe haven asset. When investors are more risk
averse (ie, in risk-off mode), they seek safe haven assets like the dollar, causing its value to rise while
equity prices fall. Conversely, when investors are in a risk-on mode and thus more willing to take risks, they
favor equities, leading to a decline in the dollar’s value.

Monetary and Fiscal Policies

LOS: Explain the potential effects of monetary and fiscal policy on exchange rates.

The Mundell-Fleming Model


The Mundell-Fleming model outlines how changes in monetary and fiscal policies influence interest rates
and a country’s output level. These changes subsequently affect trade and capital flows, eventually leading
to shifts in the exchange rate. The underlying assumption is that the economy has a substantial output gap,
enabling adjustments in output without significantly impacting price levels and inflation.

Expansionary monetary policy stimulates growth by reducing interest rates, but also leads to capital
outflows, which exert downward pressure on the exchange rate. With flexible exchange rates, this causes
the domestic currency to depreciate as investors seek higher-yielding markets due to the lower interest
rates. The extent of the currency depreciation depends on the sensitivity of capital flows to interest
rate differentials. Over time, this depreciation boosts net exports and reinforces the overall impact of
expansionary monetary policy on investment and consumption.

In the case of fixed exchange rates, the central bank must counter the currency’s depreciation by
purchasing its own currency in the foreign exchange (FX) market, using its FX reserves. This action reduces
the monetary base and limits domestic credit expansion, offsetting the desired stimulatory effects of the
monetary policy. It is crucial to note that a central bank’s ability to maintain the fixed exchange rate depends
on its FX reserves.

Expansionary fiscal policy spurs economic growth through increased government spending or reduced
taxes, but such policy also leads to budget deficits that require government borrowing. This typically results
in higher interest rates, encouraging capital inflows and pushing the exchange rate upward.

Using flexible exchange rates, expansionary fiscal policy causes the domestic currency to appreciate as
high-interest rates attract capital inflows. If capital flows are highly responsive to interest rate differentials,
the currency can appreciate significantly. However, although this is uncommon, it must be noted that
if capital flows are less sensitive to interest rate differentials, the currency might depreciate as the
increase in aggregate demand worsens the trade balance by boosting imports without a corresponding
impact on exports.

In a fixed exchange rate system, the central bank must counter the currency’s appreciation by selling its
own currency in the FX market to maintain the fixed rate. This action increases the domestic money supply
and reinforces the impact of expansionary fiscal policy on aggregate demand.

Vol 1-134
The Debt Sustainability Channel

The Mundell-Fleming model provides the following insights:

y When domestic policymakers try to simultaneously achieve independent monetary policy as well
as free capital movement and a fixed exchange rate regime, it becomes challenging to satisfy all
three objectives.
y The level of capital mobility significantly affects the effectiveness of monetary and fiscal policy in an
open economy. Capital controls are essential for central banks that aim to maintain monetary policy
independence while also managing exchange rates.

With high capital mobility:

y Implementing a restrictive (expansionary) monetary policy under floating exchange rates will lead to
the appreciation (depreciation) of the domestic currency, and the extent of the exchange rate change
will be more pronounced.
y Enacting a restrictive (expansionary) fiscal policy under floating exchange rates will result in the
depreciation (appreciation) of the domestic currency.
y If both monetary and fiscal policies are simultaneously restrictive or expansionary, the overall impact
on the exchange rate may not be clear.

Exhibit 5 Mundell-Fleming model:


Government policy effects on exchange rates

High capital mobility Low capital mobility

Monetary policy Monetary policy

Expansionary Restrictive Expansionary Restrictive

Domestic Domestic
Expansionary Indeterminate currency Expansionary currency Indeterminate
appreciates depreciates

Domestic Domestic
Restrictive currency Indeterminate Restrictive Indeterminate currency
depreciates appreciates

Under low capital mobility, the effect of changes in monetary and fiscal policies on domestic interest rates
does not lead to significant shifts in capital flows. When capital mobility is limited, the influence of such
policies on the exchange rate primarily stems from trade flows rather than capital flows. Therefore, if capital
mobility is low:

y Implementing a restrictive (expansionary) monetary policy will decrease (increase) aggregate demand,
resulting in an increase (decrease) in net exports. This, in turn, will cause the domestic currency to
appreciate (depreciate).
y Enacting a restrictive (expansionary) fiscal policy will decrease (increase) aggregate demand, leading
to an increase (decrease) in net exports. This will also cause the domestic currency to appreciate
(depreciate).
y When monetary and fiscal stances differ (ie, one is restrictive while the other is expansionary), the
overall impact on the exchange rate may not be clear.

Vol 1-135
Learning Module 1

Monetary Models of Exchange Rate Determination


In the Mundell-Fleming model, monetary policy influences exchange rates through its effects on interest
rates and output, with changes in the price level and inflation playing no role. Conversely, monetary models
of exchange rate determination take a different perspective by assuming that output remains constant and
that monetary policy primarily affects exchange rates through its influence on the price level and inflation.

This perspective is grounded in the quantity theory of money, which posits that changes in the money
supply are the primary drivers of changes in the price level. If purchasing power parity (PPP) holds, then
any relative increase (decrease) in the domestic money supply that leads to an increase (decrease) in
domestic price levels, compared with foreign price levels, should result in a proportionate depreciation
(appreciation) of the domestic currency.

However, this perspective also assumes that PPP always holds. PPP tends to hold mainly in the long run.
Therefore, the monetary model does not provide an accurate explanation for the impact of monetary policy
on the exchange rate in the short and medium terms.

The Dornbusch model assumes that prices exhibit relative inflexibility in the short term but full flexibility in
the long term. Consequently, this model makes the following predictions:

y In the long term, if prices are entirely flexible, an increase in the domestic money supply will lead to
a proportional increase in domestic prices, resulting in a depreciation of the domestic currency. This
conclusion aligns with that of the monetary model and PPP.
y In the short term, if prices are relatively inflexible, an increase in nominal money supply translates into
an increase in real money supply. As real money supply rises, real interest rates fall, causing capital
outflows and a significant depreciation of the domestic currency in both nominal and real terms. In
fact, the domestic currency tends to overshoot its long-term equilibrium level, dropping to a lower
level than predicted by PPP. Ultimately, over the long term, as domestic prices and domestic interest
rates increase, the nominal exchange rate will rebound and approach the level anticipated by the
conventional monetary approach (in line with PPP). In addition, the real exchange rate will converge
toward its long-term equilibrium level.

The Portfolio Balance Approach


The portfolio balance approach assumes that global investors have diversified portfolios encompassing
foreign and domestic assets, particularly bonds, with investment preferences that depend on returns and
risk considerations. In an economy with persistent budget deficits, the government must issue bonds to
cover these shortfalls, leading to a continuous increase in the supply of that country’s bonds. This sustained
demand for funding will eventually result in (1) investors demanding a higher risk premium to invest in the
bonds and (2) currency depreciation.

The Mundell-Fleming and portfolio balance models can be integrated into a single framework. In the short
term, when a government implements expansionary fiscal policy, real interest rates increase, causing the
domestic currency to appreciate. However, in the long run, prolonged fiscal deficits can lead to a significant
accumulation of government debt. If the market perceives the debt level as unsustainable, the government
may be pressured to choose one of the following courses of action:

y Monetize the debt: The central bank would augment the money supply and purchase the government’s
debt using newly created money. This would result in a rapid depreciation (or a reversal of the initial
currency appreciation).
y Reverse the fiscal stance: To reestablish a fiscally sustainable position over the long term, the
government may need to overturn its initial expansionary fiscal policy, which had boosted the currency.
Adopting a new, contractionary fiscal stance would lead to depreciation of the domestic currency.

Vol 1-136
The Debt Sustainability Channel

Exchange Rate Management: Intervention and Controls

LOS: Describe objectives of central bank or government intervention and capital controls and
describe the effectiveness of intervention and capital controls.

Capital flows can have both positive and negative impacts on emerging economies. On the one hand,
they can support economic growth by filling the gap between domestic investment and savings. On the
other hand, excessive capital inflows can lead to economic imbalances, asset bubbles, and an overvalued
domestic currency. When these inflows suddenly reverse, it can trigger economic downturns, asset value
declines, and sharp currency depreciation, as seen in the Asian financial crisis of 1997 and 1998.

Capital flow surges can be attributed to “pull” and “push” factors:

y Pull factors are favorable developments in the domestic economy that attract foreign capital, including:
○ Reduced inflation and inflation volatility
○ More flexible exchange rate regimes
○ Improved fiscal positions
○ Privatization of state-owned entities
○ Liberalization of financial markets
○ Removal of FX regulations and controls
○ Better economic management by the government

y Push factors are favorable developments in foreign economies that drive capital flows abroad. Such
factors include:
○ Low interest rates in developed countries, which become primary sources of internationally
mobile capital
○ Changes in long-term asset allocation trends, such as increasing allocations to emerging market
assets in global portfolios

To mitigate the adverse effects of abrupt capital inflow reversals, emerging market policymakers can resort
to FX market interventions and capital controls (eg, deposit requirements) aimed at preventing currency and
asset bubbles and maintaining independent monetary policy.

Capital controls were previously less favored due to concerns about distortions in global trade and finance
and potential capital flow diversion. However, emerging markets have learned from past crises that capital
controls may be necessary.

Studies show that the effectiveness of central bank FX market interventions in developed countries is
insignificant due to the high trading volume of their currencies. In emerging markets, the impact is more
significant as the central banks’ substantial FX reserves relative to global market turnover enable them to
reduce exchange rate volatility and influence exchange rate levels.

Vol 1-137
Learning Module 1

Warning Signs of Currency Crisis

LOS: Describe warning signs of a currency crisis.

If capital inflows suddenly slow down, the outcome can be a financial crisis, characterized by economic
contraction, plummeting asset values, and a sharp depreciation of the currency. For instance, between
August 2008 and February 2009, the emerging market currencies of Brazil, Russia, and South Korea fell by
more than 50% against the US dollar.

There are two primary schools of thought regarding the underlying causes of currency crises. One school
argues that currency crises typically result from deteriorating economic fundamentals. The other contends
that, although deteriorating economic fundamentals may explain many crises, there can be instances where
economies with sound fundamentals witness severe currency pressure due to (1) an adverse change in
market sentiment unrelated to economic fundamentals or (2) spillover effects from crises in other markets.

An International Monetary Fund (IMF) study on potential early warning signs of a currency crisis found that
in the period leading up to a crisis, there typically has been:

y Capital markets liberalization


y Large inflows of foreign capital (relative to GDP), with short-term foreign currency-denominated funding
posing particular risks
y A banking crisis
y Fixed or partially fixed exchange rates
y Severely declining foreign exchange reserves
y A significant appreciation in the currency relative to its historical mean
y Some deterioration in the terms of trade
y Sharply rising broad money growth in nominal and real terms
y Relatively high inflation compared with tranquil periods

These nine factors are usually interrelated and often reinforce one another. The IMF and leading investment
banks have attempted to create composite early warning systems to alert policymakers and investors to
potential currency crises. An ideal early warning system should:

y have a strong track record in predicting actual crises and avoiding false signals;
y be based on macroeconomic indicators that are available promptly and can signal an impending crisis
well in advance of its actual onset; and
y be based on a wide range of potential crisis indicators.

Vol 1-138
Learning Module 2
Economic Growth

LOS: Compare factors favoring and limiting economic growth in developed and
developing economies.

LOS: Describe the relation between the long-run rate of stock market appreciation and the
sustainable growth rate of the economy.

LOS: Explain why potential GDP and its growth rate matter for equity and fixed
income investors.

LOS: Contrast capital deepening investment and technological progress and explain how each
affects economic growth and labor productivity.

LOS: Demonstrate forecasting potential GDP based on growth accounting relations.

LOS: Explain how natural resources affect economic growth and evaluate the argument that
limited availability of natural resources constrains economic growth.

LOS: Explain how demographics, immigration, and labor force participation affect the rate and
sustainability of economic growth.

LOS: Explain how investment in physical capital, human capital, and technological development
affects economic growth.

LOS: Compare classical growth theory, neoclassical growth theory, and endogenous
growth theory.

LOS: Explain and evaluate convergence hypotheses.

LOS: Describe the economic rationale for governments to provide incentives to private
investment in technology and knowledge.

LOS: Describe the expected impact of removing trade barriers on capital investment and profits,
employment and wages, and growth in the economies involved.

Factors Favoring and Limiting Economic Growth

LOS: Compare factors favoring and limiting economic growth in developed and
developing economies.

Capital investment encourages growth. A significant challenge in many developing countries is low capital
per worker, which results from inadequate savings and investment. Increasing the investment rate can be
difficult when the country’s level of disposable income is low, creating a cycle of poverty. Low savings lead
to reduced investment, which results in slow GDP growth and persistent low income. The design of policies

Vol 1-139
Learning Module 2

to boost domestic savings and investments in developing countries is complicated. However, foreign
investment can break the cycle of poverty by providing additional sources of funds.

Financial Markets and Intermediaries


In addition to the saving rate, efficient allocation of savings plays a crucial role in economic growth. The
financial sector is responsible for directing funds from savers to investment projects. To contribute to growth,
the financial sector can:

y Screen and monitor projects, ensuring that financial capital is channeled to those with the highest
risk-adjusted returns
y Encourage saving and risk-taking by offering attractive investment instruments that enhance risk
transfer, diversification, and liquidity
y Ease credit constraints for companies, enabling them to finance larger projects and benefit from
economies of scale

Countries with well-functioning financial markets tend to experience faster growth, but excessive risk and
declining credit standards can be counterproductive.

Political Stability, Rule of Law, and Property Rights


Stable and effective government, as well as a robust legal and regulatory system that protects property
(including intellectual property), are essential for economic growth. Property rights serve as an incentive
for domestic households and companies to invest and save. Developed countries have well-established
property rights, but in developing nations, arrangements for these rights may be lacking or ineffective.

Furthermore, economic uncertainty can arise from factors like wars, military coups, corruption, and political
instability. These factors increase investment risk, deter foreign investments, and hamper growth. In many
developing countries, the primary focus for fostering growth is the establishment of a legal system that
ensures property rights are protected and enforced.

Education and Health Care Systems


Inadequate education, brain drain, and poor health are significant obstacles to growth in many developing
countries. Education, particularly at the primary and secondary levels, is vital for enhancing workforce skills
and absorbing new technologies. Health issues and diseases such as AIDS have often hampered growth
in developing nations. Meanwhile, for developed countries, investment in post-secondary education is vital
when leaders seek to foster innovation.

Tax and Regulatory Systems


Tax and regulatory policies significantly influence growth and productivity, particularly for businesses.
Fewer regulations tend to promote entrepreneurship and attract new companies; the entry of new firms is
positively correlated with higher average productivity levels. Research conducted by the Organization for
Economic Co-operation and Development (OECD) suggests that low administrative startup costs are a
crucial factor in fostering entrepreneurship.

Free Trade and Unrestricted Capital Flows


An increase in foreign and domestic investments helps to break the cycle of low income, savings, and
investment. Foreign investments, whether occurring directly or indirectly, lead to higher productivity and
employment, while also allowing access to advanced technology. Countries such as Brazil and India have
reduced tariffs and removed restrictions as a way to encourage foreign investment.

Vol 1-140
Economic Growth

In addition, international trade in goods and services provides residents with more affordable products and
access to larger markets. Such trade exposes domestic companies to competition, ultimately promoting
economic growth.

Summary of Factors Limiting Growth in Developing Countries


Developing countries differ significantly from developed nations in terms of institutional structure, as well as
their legal and political environments. Inadequate institutions and unfavorable legal and political conditions
slow the growth of developing economies, contributing to poverty.

Factors that limit growth include low rates of saving and investment, underdeveloped financial markets,
weak or corrupt legal systems, a lack of property rights, political instability, and poor public education and
health services. Restrictive tax and regulatory policies, and limits on international trade and capital flows,
also inhibit growth.

While such factors are not entirely absent in developed countries, they are more prevalent in developing
nations. Growth in developing countries may also be hindered by a lack of physical, human, and public
capital, as well as limited innovation.

Policies that address these issues and mitigate their impact can enhance growth potential.

Why Potential Growth Matters to Investors

LOS: Describe the relation between the long-run rate of stock market appreciation and the
sustainable growth rate of the economy.

LOS: Explain why potential GDP and its growth rate matter for equity and fixed
income investors.

The valuations of both equity and fixed-income securities are closely linked to economic growth. Anticipated
growth in aggregate earnings is a fundamental driver of the equity market. A sustainable rate of economic
growth, as measured by potential GDP, sets a limit on how fast an economy can expand without
causing inflation. Investors must consider whether earnings growth is constrained by the growth rate of
potential GDP.

For earnings growth to exceed GDP growth, the ratio of corporate profits to GDP must increase over
time. However, this share of profits in GDP cannot rise indefinitely, which would lead to stagnant labor
income and weakened demand. In the long term, real earnings growth cannot surpass the growth rate of
potential GDP.

The relationship between economic growth and equity returns can be framed by the Grinold-Kroner (2002)
decomposition of equity returns:

E(Re) = Dividend yield + Expected capital gain

= Dividend yield + Expected repricing + Earnings growth per share

= Dividend yield + Expected repricing + Inflation rate + Real economic growth


− Change in shares outstanding

= dy + ∆(P/E) + i + g − ∆S

Vol 1-141
Learning Module 2

Dividend yield produces income returns, while changes in price-to-earnings (P/E) ratios indicate capital
appreciation. Both of these factors have a crucial influence on equity market returns over time. Higher
GDP growth typically leads to higher P/E ratios, signaling lower perceived risk and greater willingness of
investors to pay for earnings. Earnings growth per share—the primary link between economic growth and
equity returns—is affected by inflation, real economic growth, share changes, and other factors. The change
in shares outstanding, known as the dilution effect (∆S), plays a notable role.

The gap between economic growth and equity returns stems from two main factors, share buybacks and
new share issuance by publicly traded companies, both of which affect share dilution. Net buybacks quantify
the net effect of these actions at the national stock market level. In addition, the proportion of economic
growth driven by unlisted entrepreneurial firms indicates the economy’s relative dynamism, which affects
the difference between overall economic growth and the earnings growth of publicly listed companies. The
dilution effect equation (∆S = nbb + rd) includes both net buybacks (nbb) and relative dynamism (rd).

The relationship between economic growth and equity returns is complex. While greater economic growth
generally leads to higher equity returns, those returns can also be influenced by factors like capital dilution,
market composition changes, and the limited representation of equity markets in the overall economy.

Estimates of potential GDP and its growth rate are important for market predictions. Extrapolations based
on past GDP growth can be misleading, and changes in economic factors and policies have significant
implications for long-term stock market returns.

Potential GDP estimates are essential for fixed-income investors as they assess inflationary pressures.
Differences between actual GDP and potential GDP influence inflation and thus affect nominal interest rates
and bond prices. The growth rate of potential GDP also impacts real interest rates and the expected returns
on real assets; faster potential GDP growth means higher real interest rates are needed to encourage
savings for capital accumulation.

Potential GDP and its growth rate impact fixed-income analysis in several ways:

y Faster potential GDP growth enhances the credit quality of fixed-income securities, while slower
potential GDP growth increases the perceived risk of bonds.
y The output gap and the growth rates of actual and potential GDP are crucial for monitoring monetary
policy decisions and assessing the likelihood of central bank policy changes.
y Government budget deficits tend to fluctuate with economic cycles. Evaluating fiscal policy often
involves comparing actual fiscal positions with structural or cyclically adjusted deficits, which reflect
budgetary balances in an economy operating at its potential GDP.

Vol 1-142
Economic Growth

Production Function and Growth Accounting

LOS: Contrast capital deepening investment and technological progress and explain how each
affects economic growth and labor productivity.

LOS: Demonstrate forecasting potential GDP based on growth accounting relations.

Production Function
A production function is a model that quantitatively connects inputs (factors of production), technology, and
output. This model typically includes multiple factors (eg, labor, capital) and describes how they contribute
to the production process. The production function can be represented as:

Y = AF(K,L)

Where:
Y = The level of aggregate output in the economy
L = The quantity of labor or number of workers or hours worked in the economy
K = An estimate of the capital services provided by the stock of equipment and structures used to
produce goods and services
A = A multiplicative scale factor referred to as total factor productivity (TFP)

The function F( ) captures the relationship between inputs and output in the production process.

TFP is a measure of the overall productivity and technology in an economy. It reflects the cumulative
beneficial impact of scientific advances, research and development, management improvements, and
changes in production methods on the productivity of businesses and organizations.

In the context of an aggregate production function, changes in technology can affect both F( ) and A.
Innovations that allow the same output to be produced with the same capital but fewer workers are reflected
in F( ). However, TFP represents a change in technology that does not alter the relative productivity
of inputs. (Note that in discussions of economic growth, the term “technology” refers to TFP, unless
specified otherwise.)

It is helpful to utilize a specific functional form for F( ). One commonly used form is the Cobb-Douglas
production function:

F(K,L) = Kα L1 − α

The Cobb-Douglas production function is widely used due to its simplicity and historical data fitting. It
features a parameter, α, typically ranging from 0 to 1, that defines the distribution of output between capital
and labor. This parameter aligns with economic principles that link the compensation of factors to their
marginal product in competitive markets. The marginal product of capital (MPK) for the Cobb-Douglas
production function can be expressed as:

αY
MPK = αAKα − 1 L1 − α =
K

The value of α, which represents the share of GDP allocated to the providers of capital, can be derived by
equating MPK with the rental price of capital (r). Similarly, 1 − α represents the share of income given to
labor. This information is valuable because it allows a straightforward estimation of α for an economy, based
on an examination of capital’s income share in the national income accounts.

Vol 1-143
Learning Module 2

The Cobb-Douglas production function demonstrates constant returns to scale, so if all inputs are
increased by the same percentage, output also increases by that percentage. Under this assumption, the
production function can be modified to examine the determinants of output per worker, thus simplifying the
equation for analysis. First, multiply the production function by 1 / L:

Y K L K
= AF ⎛ , ⎞ = AF ⎛ ,1 ⎞
L ⎝ L L ⎠ ⎝ L ⎠

By defining y as the output per worker (average labor productivity) and k as the capital-to-labor ratio, the
expression is simplified to:

y = AF (k,1)

The Cobb-Douglas production function can be expressed in terms of output per worker (y) and other per
capita variables. In this form, it becomes:

Y K α L 1−α
y= = A⎛ ⎞ ⎛ ⎞
= AKα
L ⎝ L ⎠ ⎝ L⎠

The equation relates a worker’s productivity to the capital-to-labor ratio, technology (TFP), and the share of
capital in GDP (α). It uses two measures of efficiency—labor productivity and TFP—to scale the impact of
capital and labor inputs. Changes in TFP are estimated using a growth accounting method.

The Cobb-Douglas production function demonstrates diminishing marginal productivity for each individual
input. Diminishing marginal productivity implies that, if more workers are hired in a fixed-size factory,
each additional worker contributes less to output than the previous one, causing average labor productivity
(y) to decrease.

The importance of diminishing marginal returns in the Cobb-Douglas production function depends on the
value of α. When α is close to zero, diminishing marginal returns to capital have a substantial impact, with
each additional unit of capital leading to a rapid decline in extra output. Conversely, when α is close to 1,
the next unit of capital contributes almost as much to output as the previous one, resulting in diminishing
marginal returns that are relatively minor. It is essential to note that the exponents on the K and L variables
in the Cobb-Douglas production function sum to 1, indicating constant returns to scale. When the increases
in both inputs are proportional, there are no diminishing marginal returns.

Growth Accounting
Growth accounting, introduced by Solow in 1957, is a fundamental tool for evaluating economic
performance. It transforms the Cobb-Douglas production function into a formula for analyzing growth rates.
This equation dissects the percentage change in output, breaking it down into components associated with
capital, labor, and technology:

∆Y ∆A α∆K (1 − α)∆L
= + +
Y A K L

The growth accounting equation relates the growth rate of output to technological change, capital growth,
and labor growth. It uses α and (1 − α) to represent the output elasticity with respect to capital (α) and labor
(1 − α). Unspecified inputs, like natural resources, are accounted for within total factor productivity (TFP).

Data on output, capital, labor, and the elasticities of capital and labor are available for most developed
countries. Technological change is not directly measured and therefore must be estimated. The elasticities
of capital and labor represent their relative income shares and are derived from GDP accounts. In the
US, labor has a 0.7 share and capital has a 0.3 share. This means that an increase in labor’s growth rate
has about twice the impact on potential GDP growth as an equivalent increase in capital’s growth rate,
assuming other factors remain constant.

Vol 1-144
Economic Growth

The growth accounting equation is a versatile tool that estimates the contribution of technological progress
to economic growth and measures the drivers of growth, such as labor, capital, and TFP. This equation
also helps calculate potential output by considering trends in those drivers. For TFP, the trends are often
estimated using time-series models.

The labor productivity growth accounting equation is an alternative method to estimate potential GDP.
It simplifies the process by focusing on labor input and labor productivity without the need for estimating
capital input or TFP. However, it combines capital deepening and TFP progress in a way that may be
difficult to analyze and predict over extended periods. The equation for estimating potential GDP under this
approach is:

Growth rate in potential GDP = Long-term growth rate of labor force + Long-term growth rate
in labor productivity

Potential GDP growth results from the combined effects of the long-term growth rate of the labor force and
the long-term growth rate of labor productivity. For example, if the labor force is growing at 2% annually, and
labor productivity per worker is increasing at a rate of 3% per year, then the potential GDP is growing at a
total rate of 5% per year.

Extending the Production Function


The expanded production function incorporates various inputs, including raw materials (N), labor quantity
(L), human capital (H), non-ICT capital (KNT ), public capital (KP ), technological knowledge (A), and
information, computer, and telecommunications capital (KIT). These inputs collectively contribute to the
production process and economic growth. The expanded production function can be expressed as:

Y = AF(N, L, H, KIT , KNT , KP )

Capital Deepening Versus Technological Progress


Diminishing marginal returns are significant when assessing the contributions of capital and technology
to economic growth. When the relationship between per capita output and the capital-to-labor ratio is
examined, it becomes evident that, while adding more capital to a fixed number of workers increases per
capita output, the rate of increase diminishes.

Economic growth in terms of per capita output is influenced by two sources: capital deepening and
technological progress.

Exhibit 1 Per capita production function

Output
per worker C

Technological
progress

B
Capital deepening
A

Capital per worker

Vol 1-145
Learning Module 2

Capital deepening, an increase in the capital-to-labor ratio, reflects growing investment in the economy
and results in a move along the production function from point A to point B in Exhibit 1. However, as the
capital-to-labor ratio becomes very high (point B), further increases in capital have diminishing effects on
per capita output. This is due to the declining marginal productivity of capital as more capital is added.

The neoclassical model of growth suggests that per capita growth comes to a halt once the economy
reaches its steady state, where the marginal product of capital equals its marginal cost. Capital deepening
alone cannot sustain economic growth in this state. Per capita growth can continue only when the economy
operates below the steady state and technological progress is introduced.

In contrast, technological progress shifts the entire production function upward, as seen in the move
from point B to point C in Exhibit 1. Technological progress not only increases output per worker but also
improves the profitability of additional capital investments, mitigating the effects of diminishing marginal
returns. Even at the steady state, continuous growth in per capita output is possible if there is ongoing
technological progress—which makes such progress a critical factor for sustained economic growth.

Natural Resources

LOS: Explain how natural resources affect economic growth and evaluate the argument that
limited availability of natural resources constrains economic growth.

Natural resources, whether renewable or nonrenewable, play a significant role in economic growth.
Renewable resources (eg, forests) can be replenished, while nonrenewable resources (eg, oil, coal) are
finite and depleted once consumed.

While it might be assumed that countries rich in natural resources are wealthier, the relationship between
resource endowment and growth is complex. Access to resources (especially through trade) is essential,
but high-income levels can be achieved without owning and producing natural resources. Countries like
South Korea have experienced rapid growth despite a lack of such resources, while some resource-rich
countries (eg, Venezuela, Saudi Arabia) have seen slower growth compared with resource-poor countries
(eg, Singapore, South Korea).

The “resource curse” can restrain growth in resource-rich countries due to a lack of necessary economic
institutions and the delayed development of technology, while the “Dutch disease” occurs when resource
exports lead to currency appreciation, making other industries uncompetitive. Concerns about resource
depletion limiting growth are somewhat overstated since technological progress allows for more efficient
resource use and the development of substitutes. The declining share of national income going to land and
resources, along with the shift toward service-based economies, further mitigates concerns about resource
limitations on growth.

Vol 1-146
Economic Growth

Labor Supply

LOS: Explain how demographics, immigration, and labor force participation affect the rate and
sustainability of economic growth.

Economic growth is driven by increased inputs, mainly labor and capital. The growth of the workforce is a
key factor in economic expansion. The labor input depends on factors like population growth, labor force
participation, net migration, and average hours worked. The United States has seen robust growth, partly
since its labor force is expanding, unlike that in Europe or Japan. Developing countries with large labor
supplies (eg, China, India, Mexico) also benefit from this source of growth.

Population Growth
Long-term projections of labor supply are primarily driven by the growth of the working-age population.
Population growth is influenced by fertility and mortality rates, with developed countries having lower
population growth than developing nations. The age distribution of a population, especially the percentage
of people over 65 and under 16, is another crucial factor. Some developed countries are facing challenges
due to an aging population, while many developing countries will benefit from a growing labor force as the
proportion of younger people increases. Despite China’s economic size, the country is similar to advanced
economies in that it, too, has an aging population. Population growth impacts the overall economy but not
the per capita GDP growth rate.

Labor Force Participation


In the short term, the growth rate of a country’s labor force can differ from its population growth if the labor
force participation rate (ie, the percentage of the working-age population in the labor force) changes. This
rate has generally increased in many countries, largely due to higher participation rates among women.
Unlike population growth, an increase in the participation rate can boost per capita GDP growth. For
example, countries like Greece and Italy have lower female labor force participation rates compared with
the US and northern European countries, and an increase in women’s participation can enhance labor force
growth and potential GDP. However, changing participation rates typically indicate transitions to new levels
rather than permanent shifts, so the extrapolation of such trends should be done cautiously.

Net Migration
Increased immigration is a notable driving factor of economic and population growth, particularly in
developed countries with declining native birthrates. Ireland, Spain, the United Kingdom, and the US
experienced accelerated labor force growth from 2000 to 2010 due to immigration, although this growth
slowed from 2010 to 2018. In the 2000s, high population growth rates were seen in Ireland (1.71%) and
Spain (1.35%), surpassing other European countries. Consequently, both nations enjoyed GDP growth
rates above the European average during this period. The elevated growth rates were attributed to
open-border policies in the two countries, which significantly expanded their immigrant populations and
contributed to substantial labor force growth.

Average Hours Worked


Changes in the average hours worked per worker have an impact on labor’s contribution to overall output.
The long-term trend in advanced countries has been a move toward shorter workweeks. This trend is
influenced by various factors such as collective bargaining agreements, the growth of part-time and
temporary work, and economic factors like the “wealth effect” and high tax rates, which make leisure time
more valuable than labor income for workers in high-income countries. Average hours worked per year
per person in the labor force have been decreasing in most countries since 1995. There are substantial

Vol 1-147
Learning Module 2

differences in average hours worked across countries; for instance, South Korea had 46.1% more annual
hours worked per person than Germany in 2018. The rise in female labor force participation rates may also
contribute to shorter average workweeks, as female workers often take part-time jobs.

Labor Quality: Human Capital


Human capital refers to a workforce’s accumulated knowledge and skills acquired through education,
training, and life experience. This factor plays a crucial role in an economy’s growth. Workers with higher
levels of education and skills tend to be more productive and adaptable to technological and market
changes. Investment in human capital occurs through education and on-the-job training, and while it is costly,
research shows that it yields significant returns in the form of higher wages. Moreover, education can have
positive externalities, benefiting not only individuals but also those around them and the overall economy.

Education fosters innovation, improving the quality of the labor force and driving technological progress.
Enhanced education and training can lead to a sustained increase in an economy’s growth rate, particularly
when the enhancements result in more innovations and faster technological advancements. Investments in
the population’s health also contribute significantly to human capital, especially in developing countries.

ICT, Non-ICT, and Technology and Public Infrastructure

LOS: Explain how investment in physical capital, human capital, and technological development
affects economic growth.

The growth of an economy’s physical capital stock depends on maintaining positive net investment (ie,
gross investment net of depreciation). Therefore, countries with higher rates of investment tend to have
growing physical capital stocks and higher GDP growth rates. However, the impact on per capita GDP
growth is somewhat smaller when the population is also growing since a portion of net investment is used to
maintain the capital-to-labor ratio.

Greater investment as a percentage of GDP is associated with faster economic growth. For instance,
China’s substantial investment in factories, equipment, and infrastructure exceeds 40% of GDP and has
contributed to its rapid economic expansion. Conversely, countries with lower investment-to-GDP ratios
tend to have slower growth rates.

The sustainability of growth relies on other factors too. The impact of investment depends on the existing
capital stock, which varies across countries. Investment in information technology (IT) has had a significant
impact on growth, driven by technological innovations, while other forms of capital spending, such as non-IT
capital, are important but have a different effect on growth. The composition of investment and the level of
initial capital per worker matter for sustainable growth.

Technology
The most crucial factor driving per capita GDP growth is technology, especially in developed countries.
Technology allows economies to overcome diminishing marginal returns and leads to an upward shift in the
production function. It enables the production of more and higher-quality goods and services with the same
resources, creates new products and services, and improves organizational efficiency.

Technological progress can be embodied in human capital and new machinery, equipment, and software.
Investment in technology, particularly information and communication technology (ICT), is vital. Innovation
through research and development (R&D) is also essential, but its relationship with economic growth is
complex, involving both long-term gains and short-term disruptions due to creative destruction.

Vol 1-148
Economic Growth

Total factor productivity (TFP) reflects technological advancements, applied R&D, improved management
practices, and organizational methods that enhance business productivity. It is sensitive to labor and capital
input data. Labor productivity growth depends on both capital deepening and technological progress.
Capital deepening can temporarily boost growth but is not a sustainable long-term source of per capita
income growth.

Productivity, especially labor productivity, varies across countries and affects economic performance.
Developed countries like the US historically have higher productivity levels per hour worked compared with
developing countries like China.

Developing countries may experience higher productivity growth due to rapid human and physical capital
accumulation, and a long-term increase in labor productivity can boost the sustainable economic growth
rate, positively influencing the return on equities. Conversely, persistently low productivity growth can limit
long-term economic and earnings growth, potentially leading to lower equity returns.

Public Infrastructure
Investment in public infrastructure (eg, roads, bridges, water systems, dams) is considered an important
factor in productivity growth. Such investments are complementary to the production of private sector goods
and services and can have a positive impact on productivity beyond their direct benefits. Ashauer (1990)
found that including government infrastructure investment as an input in the production function is crucial for
assessing its role in economic growth and productivity.

Summary of Economic Growth Determinants


Sustainable long-term economic growth depends on the expansion of real potential GDP, driven by the
supply of factors of production (inputs) and technological advancements. These factors include human
capital, ICT and non-ICT capital, public capital, labor, and natural resources. Data on these sources of
growth are available from organizations like the OECD, and the Conference Board presents data on the
sources of output growth for various countries, derived from the growth accounting formula.

Theories of Growth

LOS: Compare classical growth theory, neoclassical growth theory, and endogenous
growth theory.

The factors influencing long-term economic growth are a subject of debate in economics. Three main
paradigms in the academic growth literature are the classical, neoclassical, and endogenous growth models.

y The classical model suggests that per capita economic growth is temporary due to population growth
and limited resources.
y In the neoclassical model, long-term per capita growth depends on exogenous technological progress.
y The endogenous growth model seeks to explain technology from within the model itself.

Classical Model
The classical growth theory, developed by Thomas Malthus in 1798, is often referred to as the Malthusian
theory. It emphasizes the impact of a growing population in a world with limited resources and is concerned
with issues of resource depletion and overpopulation.

Vol 1-149
Learning Module 2

In this model, the production function is simple, with labor as the primary input and land as a fixed factor.
The central assumption is that as per capita income rises above the subsistence level, population growth
accelerates, leading to diminishing returns for labor. Ultimately, population growth can outstrip technological
progress and result in a constant standard of living.

However, the Malthusian prediction did not hold true, due to the breakdown in the relationship between
income and population growth, as well as the rapid pace of technological progress that offsets diminishing
returns. This led economists to shift their focus toward capital and the neoclassical growth model.

Neoclassical Model
In the 1950s, Robert Solow developed the neoclassical theory of growth, which is centered around the
Cobb-Douglas production function, expressed as:

Y = AF(K,L) = AKα L1 − α

In this model, the potential output of the economy is determined by factors such as the stock of capital
(K), labor input (L), and total factor productivity (A). Both capital and labor are considered variable inputs
with diminishing marginal productivity. The neoclassical growth model aims to establish the long-term
growth rate of output per capita and its relationship to variables like the saving/investment rate, the rate of
technological change, and population growth.

Balanced (or Steady-State) Rate of Growth


The neoclassical growth model, like other economic models, seeks to identify the equilibrium position that
an economy tends to reach. In the Solow model, this equilibrium is the steady state where the output-to-
capital ratio remains constant. Balanced growth is achieved when capital per worker and output per worker
increase at the same rate. The analysis starts by employing the per capita version of the Cobb-Douglas
production function (where k = K/L):

Y
y= = Ak α
L

Then, the rates of change of capital per worker and output per worker can be expressed as:

∆k ∆K ∆L
= −
k K L
∆y ∆Y ∆L
= −
y Y L

From the production function, the growth rate of output per worker can be expressed as:

∆y ∆A α∆k
= +
y A k

The rate of change equations are precise only for very short periods of continuous time. In a closed
economy, where investment (I) is financed by domestic savings, gross investment is represented as I = sY,
with s being the proportion of income (Y) saved. Considering that the physical capital stock experiences
constant depreciation at a rate of δ, the alteration in the physical capital stock can be expressed as:

∆K = sY − δK

Deducting the growth of labor supply, setting ∆L / L ≡ n, and reorganizing the equation provides:

∆k sY
= −δ−n
k K

Vol 1-150
Economic Growth

In the steady state, the rate of increase in capital per worker is equivalent to the rate of increase in output
per worker. Thus,

∆k ∆y ∆A α∆k
= = +
k y A k

∆k / k = ∆y / y = ∆A / A + α∆k / k

from which follows:

∆A
⎛ ⎞

∆y ∆k ⎝ A ⎠
= =
y k 1−α

The steady sustainable growth rate of output per capita (which is equal to the growth rate of capital per
worker) is determined by the growth rate of TFP (θ) and the elasticity of output concerning capital (α). When
the growth rate of labor (n) is also included, it provides the overall sustainable growth rate of output:

θ
Growth rate of output per capita =
1−α
θ
Growth rate of output = +n
1−α

The key outcome of the neoclassical model is that the steady-state growth rate of labor productivity,
denoted by θ / (1 − α), is a fundamental factor when determining the equilibrium output-to-capital ratio,
represented by the constant Ψ. This outcome aligns with the labor productivity growth accounting equation
mentioned earlier.

Y 1 ⎡ θ
=⎛ ⎞ ⎛ ⎞
+ δ + n⎤ ≡ Ψ
K ⎝ S ⎠ ⎣⎝ 1 − α ⎠ ⎦
In the steady state, the output-to-capital ratio remains constant, and both the capital-to-labor ratio (k) and
output per worker (y) grow at a rate of θ / (1 − α). On this steady-state growth path, the marginal product of
capital, which equals α(Y / K) in the Cobb-Douglas production function, remains constant and corresponds
to the real interest rate in the economy. The increasing capital-to-labor ratio (k) in the steady state due to
capital deepening has no impact on the economy’s growth rate or the marginal product of capital, which
remains unchanged once the steady state is reached.

An intuitive way to grasp the steady-state equilibrium is to reframe it as a savings and investment equation:

θ
sy = ⎡ ⎛ ⎞ + δ + n ⎤k
⎣⎝ 1−α ⎠ ⎦
The steady-state equilibrium is reached at the output-to-capital ratio where savings and actual gross
investment per worker (sy) are just enough to account for:

y providing capital for new workers entering the workforce at rate n,


y replacing worn-out plant and equipment at rate δ, and
y deepening the physical capital stock at the rate θ / (1 − α) to maintain the marginal product of capital
equal to the rental price of capital.

Vol 1-151
Learning Module 2

The effects of various parameters in the model on the steady state are:

y Saving rate (s): An increase in the saving rate leads to an increase in both the capital-to-labor ratio (k)
and output per worker (y). This occurs as the higher saving rate results in more saving/investment at
every level of output, shifting the steady-state equilibrium to higher k and y. However, it does not affect
the steady-state growth rates of output per capita or total output.
y Labor force growth (n): An increase in the labor force growth rate reduces the equilibrium capital-to-
labor ratio and output per worker. A higher population growth rate requires a lower capital-to-labor ratio
to achieve the same gross saving/investment rate. This shift lowers both k and y but does not impact
the steady-state growth rates of output.
y Depreciation rate (δ): An increase in the depreciation rate decreases the equilibrium capital-to-labor
ratio and output per worker—since a higher depreciation rate means that the same rate of gross saving
results in less net capital accumulation. This affects the steady-state equilibrium in the same way as
labor force growth.
y Growth in TFP (θ): An increase in the growth rate of TFP reduces the steady-state capital-to-labor
ratio and output per worker for given levels of labor input and TFP. Although a higher TFP growth rate
implies faster future growth in output per worker, at a given point in time, output per worker is lower
than with a slower TFP growth rate. In this scenario, the economy follows a steeper trajectory from a
lower base of output per worker.

Factors like the saving rate, labor force growth rate, and depreciation rate affect the level of output per
worker but do not alter the permanent growth rate of that output. A permanent increase in this growth rate is
achievable only through a change in the growth rate of TFP.

During the transition to the steady-state growth path, the economy can experience either faster or slower
growth relative to the steady state. This transition can be described by equations for the growth rates of
output per capita and the capital-to-labor ratio:

∆y θ Y θ y
= ⎛ ⎞ + αs ⎛ − Ψ⎞ = ⎛ ⎞ + αs ⎛ − Ψ⎞
y ⎝
1−α ⎠ ⎝
K ⎠ ⎝
1−α⎠ ⎝
k ⎠

and

∆k θ Y θ y
=⎛ ⎞ +s⎛ − Ψ⎞= ⎛ ⎞ +s⎛ − Ψ⎞
k ⎝
1−α ⎠ ⎝
K ⎠ ⎝
1−α⎠ ⎝
k ⎠

The second equality in each line follows from the definitions of y (output per worker) and k (capital per
worker), which imply that (Y / K) is equal to (y / k).

If the output-to-capital ratio is higher than its equilibrium level (ψ), the second term in the previous equations
is positive, leading to growth rates in output per capita and the capital-to-labor ratio that are higher than the
steady-state rate, θ / (1 − α).

This situation occurs when actual saving and investment exceed required investment, resulting in above-
average growth in per capita output driven by faster capital deepening. This typically indicates a lower
capital-to-labor ratio, but it could theoretically be associated with higher TFP. As capital grows faster than
output, the output-to-capital ratio declines over time, and both output per capita and the capital-to-labor ratio
eventually converge to the steady-state rate.

Conversely, if the output-to-capital ratio is below its steady-state level, the second term in the previous
equations is negative. In this scenario, actual investment is insufficient to sustain the trend rate of capital-
to-labor ratio growth, resulting in slower growth rates of both output per capita and the capital-to-labor ratio.
This situation usually corresponds to a relatively high and unsustainable capital-to-labor ratio but could also
be due to lower TFP and, consequently, lower output. Over time, output outpaces capital growth, causing
the output-to-capital ratio to rise, and growth converges to the trend rate.

Vol 1-152
Economic Growth

Implications of Neoclassical Model


The neoclassical model yields several key conclusions:

y Capital accumulation:
○ The accumulation of capital influences the level of production, but it doesn’t impact the long-term
rate of economic growth.
○ Growing economies eventually reach a steady-state growth path with a constant output growth rate.
○ In a steady state, the rate of output growth matches the sum of the labor force growth rate and the
rate of total factor productivity (TFP) growth, adjusted for labor’s portion of income.

y Capital deepening versus technology:


○ Rapid growth above the steady-state rate occurs initially during capital accumulation, but growth
slows over time.
○ Sustainable long-term growth cannot rely solely on capital deepening, as excessive capital
accumulation results in diminishing returns.
○ Sustainable growth cannot be achieved by increasing the supply of some inputs too rapidly
relative to others.
○ Long-term growth in potential GDP per capita relies on technological change and
improvements in TFP.

y Convergence:
○ Developing countries should have higher growth rates than developed countries due to capital
scarcity and potentially higher saving rates.
○ Over time, there should be convergence of per capita incomes between developed and
developing countries.

y Effect of savings on growth:


○ An increase in the saving rate temporarily raises the growth rate during a transition period.
○ During the transition, the economy progresses toward an elevated level of per capita output
and efficiency.
○ Once in the steady state, the growth rate of output no longer depends on the saving rate.
○ Nations that exhibit greater saving rates will experience increased per capita output, capital-to-labor
ratios, and labor productivity.

Extension of Neoclassical Model


The neoclassical model, as developed by Solow and Denison, emphasized the role of total factor
productivity (TFP) in explaining economic growth. Their studies of the US economy showed that TFP
accounted for a significant portion of growth, with more than 80% (Solow) and nearly 70% (Denison) of per
capita growth attributed to it. However, the neoclassical model was criticized for its inability to provide an
explicit explanation for technological progress, as it treated technology as an exogenous factor.

Furthermore, the model suggested that the steady-state rate of economic growth was unrelated to the rate
of saving and investment, with higher rates of investment and saving having only a transitory impact on
growth. This prediction was at odds with empirical evidence showing a positive correlation between saving
rates and growth rates across countries. The neoclassical model also predicted that returns to investment
would decline over time in economies where the stock of capital grew faster than labor productivity, a
prediction that did not hold true for advanced countries.

Vol 1-153
Learning Module 2

In response to these critiques, two lines of subsequent research emerged:

y Augmented Solow approach: This theory sought to reduce the portion of growth attributed to
unexplained technological progress within the neoclassical framework. It aimed to improve the
measurement of inputs in the production function and broaden the definition of investment. The
augmented approach included considering human capital, research and development, and public
infrastructure as inputs. By adding these inputs to the production function, the approach provided
a more accurate measure of technological progress while still adhering to the notion of diminishing
marginal returns.
y Endogenous growth theory: This theory made technological progress an integral part of the model.
It aimed to explain the determinants of technological progress within the model itself, rather than
treating it as an exogenous factor.

Both these approaches represented efforts to refine and expand the neoclassical model to better account
for the complexities of economic growth, particularly regarding the role of technological progress and the
relationship between saving, investment, and growth.

Endogenous Growth Model


Endogenous growth theory offers an alternative to the neoclassical model by focusing on explaining
technological progress within the model itself, rather than treating it as an exogenous factor. In endogenous
growth models, self-sustaining growth naturally emerges, and the economy may not necessarily converge
to a steady-state rate of growth. These models do not exhibit diminishing marginal returns to capital for the
entire economy, allowing for increased saving rates to permanently boost the rate of economic growth. In
addition, such models accommodate the possibility of increasing returns to scale.

Romer (1986) contributed a model of technological progress that broadened the definition of capital to
include human capital, knowledge capital, and research and development (R&D). R&D, considered as
investment in new knowledge that enhances the production process, is viewed as a factor of production
alongside traditional capital and labor. R&D spending aims to create new products or production methods
for profit. However, a key distinction is that the output of R&D spending is ideas, which can potentially be
adopted by other companies in the economy. This leads to substantial positive externalities or spillover
effects, in which the spending of one company benefits others and contributes to the overall pool of
knowledge available to all firms.

These positive externalities highlight a difference between the private and social returns of capital. While
individual companies may not fully capture all the benefits of their R&D investments, there are broader
societal returns. This distinction resolves the assumption of diminishing marginal returns to capital and
instead suggests constant returns to private capital with increasing returns to all factors when considered
collectively. The elimination of the diminishing returns assumption prevents the emergence of monopolies
and supports the idea of increasing returns to scale for the economy due to spillover benefits from R&D.

Endogenous growth models do not conform to the neoclassical model’s predictions of a transitory increase
in growth above the steady state. Instead, they suggest that saving and investment decisions can generate
self-sustaining growth at a permanently higher rate. The absence of diminishing marginal returns to capital
in these models is a key reason behind this difference. In the endogenous growth model, the production
function is represented as a straight line given by:

ye = f (ke) = cke

where output per worker (ye) is directly proportional to the stock of capital per worker (ke).

The constant c signifies the marginal product of capital in the aggregate economy within this model, and
the subscript e designates the endogenous growth model. This differs from the neoclassical model, which
features a curved production function that ultimately flattens out.

Vol 1-154
Economic Growth

The introduction of constant returns to aggregate capital accumulation has a significant implication. In this
model, the output-to-capital ratio remains fixed at c, meaning that output per worker (ye) always grows
at the same rate as capital per worker (ke). Consequently, variations in the pace of capital accumulation
correspond directly to changes in the rate of growth in output per capita. Substituting the previous equation
into ∆k / k = sY / (K − δ − n) provides an equation for the growth rate of output per capita in the endogenous
growth model:

∆ye ∆ke
= = sc − δ − n
ye ke

The key takeaway from the endogenous growth model is that, due to constant returns to aggregate capital
accumulation, a higher saving rate (s) leads to a permanently higher growth rate in both the short run and
the long run. This result is a fundamental feature of the model.

The presence of positive externalities related to spending on R&D and knowledge capital suggests that
private companies may underinvest in these goods from a societal perspective, creating a market failure.
Government intervention, such as direct spending on R&D or tax incentives and subsidies for private
knowledge capital production, may be necessary to address this market failure and promote faster long-
term economic growth.

In contrast to the neoclassical model, the endogenous growth theory suggests that there is no automatic
convergence of incomes between developed and developing countries. The potential for constant or
even increasing returns related to investments in knowledge capital allows developed countries to
maintain or potentially accelerate their growth rates compared with developing countries, making income
convergence uncertain.

Convergence Hypothesis

LOS: Explain and evaluate convergence hypotheses.

Convergence in economics refers to whether developing countries will catch up with developed countries in
terms of per capita income levels over time. There are three types of convergence:

y Absolute convergence suggests that all countries, regardless of their characteristics, will eventually
reach the same per capita income level due to access to the same technology. While theory implies
convergence in per capita growth rates, it does not guarantee equal absolute income levels.
y Conditional convergence depends on countries sharing the same saving rate, population growth
rate, and production function. If these conditions are met, countries will converge in both per capita
income levels and steady-state growth rates.
y Club convergence suggests that only certain groups of countries, often with similar characteristics
and institutions, will converge, while others will continue to lag.

Data show that not all countries are converging, and some developing countries are diverging from
developed ones. Investments in developing countries that are part of a convergence club may offer higher
returns due to their faster growth, but they also come with higher risks.

Convergence can occur through capital accumulation and capital deepening. Developed countries reach
a point where additional capital has minimal impact on productivity, while developing countries can
significantly boost labor productivity with more capital.

Vol 1-155
Learning Module 2

The second source of convergence is technology adoption. Developing countries can import advanced
technology and narrow the income gap if they invest in mastering and applying this technology. This
investment is like R&D spending and is essential for countries seeking to join a convergence club.

The endogenous growth model, unlike the neoclassical model, does not assume convergence. High-income
countries can maintain their lead through investments in knowledge and human capital. The convergence
hypothesis is not always supported by data as not all poorer countries experience faster growth.

Growth in an Open Economy

LOS: Describe the economic rationale for governments to provide incentives to private
investment in technology and knowledge.

LOS: Describe the expected impact of removing trade barriers on capital investment and profits,
employment and wages, and growth in the economies involved.

Opening an economy to trade and financial flows can have significant impacts on economic growth. The
key reasons to do so are:

y Access to global funds: With an open economy, a country can borrow or lend funds in global financial
markets, meaning that domestic investment is not solely reliant on domestic savings.
y Comparative advantage: Open economies can reallocate resources toward industries in which they
have a comparative advantage, leading to increased overall productivity.
y Global market access: Companies in open economies can tap into a larger global market, which allows
them to exploit economies of scale and potentially benefit more from successful innovations.
y Imported technology: Open economies can import technology from other countries, accelerating the
rate of technological progress.
y Increased competition: Global trade fosters competition in the domestic market, prompting companies
to enhance product quality, productivity, and cost-efficiency.

According to the neoclassical model, opening an economy to free trade and international borrowing and
lending can speed up convergence between countries. The dynamic adjustment process in an open
economy is as follows:

y Developing countries typically have lower capital per worker, leading to a higher marginal product
of capital and, consequently, a higher rate of return on investments relative to countries with higher
capital per worker.
y Global savers looking for greater investment returns will direct their investments to capital-poor
countries. In an open economy, capital will flow from countries with high capital-to-labor ratios to those
with lower ratios.
y Capital inflows into developing countries will lead to faster growth in their physical capital stock, even
if their domestic saving rates are relatively low. This increased capital growth will result in higher
productivity and contribute to the convergence of per capita incomes.
y Capital flows must be balanced by trade flows, which results in capital-poor countries running trade
deficits as they borrow globally to finance domestic investment. In contrast, developed countries tend
to run trade surpluses as they export capital.
y During the transition phase to the new steady state, capital inflows will temporarily elevate the growth
rate in countries with limited capital, surpassing the steady-state rate, while capital-exporting countries
will experience growth rates below the steady state.

Vol 1-156
Economic Growth

y Over time, capital-poor countries will witness an increase in their physical capital stock, leading to
a decrease in the returns on investments and a subsequent decline in investment rates and trade
deficits. Economic growth will gradually slow down and approach the steady-state growth rate. If
investment falls below the level of domestic savings, a country may shift from a trade deficit to a trade
surplus, becoming a net capital exporter.
y In the Solow model, following the redistribution of global savings, there is no enduring enhancement
in the economic growth rate for any economy. Both developed and developing nations eventually
converge to the steady-state growth rate.

Endogenous growth models, in contrast to the neoclassical model, suggest that a more open trade policy
can lead to a permanent increase in the rate of economic growth. Open trade impacts global output through
various mechanisms:

y Selection effect: Increased foreign competition prompts less efficient domestic companies to exit the
market while encouraging more efficient ones to innovate. This process enhances the overall efficiency
of the national economy.
y Scale effect: Open trade allows producers to fully leverage economies of scale by accessing larger
markets, leading to increased production efficiency.
y Backwardness effect: Less advanced countries or sectors of an economy can catch up with more
advanced counterparts through knowledge spillovers and technology transfer.

Open trade also influences the innovation process by driving higher investments in research and
development (R&D) and human capital. Access to larger markets and increased knowledge flow among
countries incentivizes companies to invest in innovation, resulting in higher returns on new investments and
faster economic growth.

In general, most countries benefit from open trade, with smaller countries benefiting from the scale effect
and poorer, less developed countries benefiting from the backwardness effect. However, in certain cases,
particularly in small countries that lag technology leaders, open trade may discourage domestic innovation
as companies fear losing out to more efficient foreign competitors.

Developing countries have historically pursued two main economic development strategies:

y Inward-oriented policies emphasize developing domestic industries by limiting imports and producing
local alternatives, even if they are less cost-effective. This approach can lead to slow economic growth.
y Outward-oriented policies focus on integrating domestic industries into the global economy through
trade and prioritizing exports for economic growth. Such open-market policies often result in
faster GDP growth.

Many East Asian countries, like Singapore and South Korea, have succeeded with outward-oriented
policies, experiencing significant growth and convergence with developed nations. Conversely, countries
that rely on inward-oriented strategies have faced slower growth and inefficient industries. The empirical
evidence strongly supports the idea that open and trade-oriented economies tend to grow more quickly.
As a result, many countries have shifted toward more outward-oriented policies and experienced improved
economic growth.

Vol 1-157
Learning Module 2

Vol 1-158
Learning Module 3
Economics of Regulation

LOS: Describe the economic rationale for regulatory intervention.

LOS: Explain the purposes of regulating commerce and financial markets.

LOS: Describe anticompetitive behaviors targeted by antitrust laws globally and evaluate the
antitrust risk associated with a given business strategy.

LOS: Describe classifications of regulations and regulators.

LOS: Describe uses of self-regulation in financial markets.

LOS: Describe regulatory interdependencies and their effects.

LOS: Describe tools of regulatory intervention in markets.

LOS: Describe benefits and costs of regulation.

LOS: Describe the considerations when evaluating the effects of regulation on an industry.

Regulation in financial markets is a form of government intervention that sets and enforces the markets’
rules. These rules can impact the economy on both the macro and micro levels, dictating how companies
and individuals operate within the financial system. While rules are often proactive, they can also be
reactive. Many regulations have been passed in response to new market technologies and products as well
as recent crises or failings of the financial system.

Regulation can significantly affect businesses and markets. Therefore, an analyst must be able
to dissect new and current rules and evaluate how they alter the process of making financial and
investment decisions.

Economic Rationale for Regulation

LOS: Describe the economic rationale for regulatory intervention.

Regulatory intervention by governing bodies is essential to protect the integrity of markets due to the risks
of informational frictions, externalities, and weak competition, and to achieve social objectives. The first of
these factors, informational frictions, are market inefficiencies that result in suboptimal outcomes within
the market. They come in the form of information asymmetries, such as uneven access to information or
data that is inadequate and can result in adverse selection and moral hazard.

Adverse selection occurs when a market participant exploits its informational advantage. Moral hazard
occurs when risks and their costs are misaligned, such that a single party can take risks without incurring
the potential costs. In both situations, one party has an undue advantage over other parties. Regulation

Vol 1-159
Learning Module 3

often mutes the impact of such imbalances, defining the rights and responsibilities of each party in order to
establish a fair and transparent marketplace.

Externalities are the effects of production and consumption activities on parties not directly involved in a
particular transaction, activity, or decision. Some externalities are positive and provide spillover benefits,
while many are negative and generate spillover costs to adjacent parties. A common concern for regulators
is the systemic risks posed by failures of financial institutions. Environmental pollution is another example of
negative externalities.

Regulation also targets weak competition. The absence of competition in a market, or a scarcity of
producers of a certain good or service, can be detrimental to consumers as such conditions result in higher
prices, less consumer choice, and an absence of innovation. Weak competition is often associated with
dominant firms having significant market power or colluding to keep prices high.

Regulation may also be imposed to achieve social objectives such as establishing public goods—
commodities or services that private markets would not otherwise provide. The consumption of public goods
(eg, national defense, street lighting systems) is nonexclusive (that is, available to all), which creates both
a positive externality (ie, everyone benefits) and a potential free-rider problem. Thus, in order to ensure a
society produces the optimal quantity and distribution of public goods, regulations may be needed, such as
placing regulatory obligations on the firms contracted to provide those goods.

Regulations may also address a broad range of additional issues, including safety, privacy, property rights,
environmental goals, labor/employment, commerce/trade, and financial security.

Rationale for the Regulation of Financial Markets


Failures in the financial system can have significant consequences for society. Among the consequences
are financial losses to individuals or institutions, but also a loss of confidence in the market and the
disruption of commerce. Regulation that improves the transparency of information and equitable access
to that information, with the intention of protecting smaller investors, helps maintain confidence in the
financial system.

Transparency and confidence are also enhanced by required disclosures imposed on market participants.
Such disclosures make information available for assessing risks, potential returns, and relative asset value,
thus enabling investment. The disclosures come in various forms, including financial statements held to
predetermined accounting standards, prospectus disclosures for new and current investments, and proxy
voting announcements.

The prudential supervision of financial institutions is critical for protecting economies, capital markets,
and society from the costs of a single institution’s failure. Such supervision promotes financial stability,
reduces system-wide risks, and protects the customers of financial institutions. The financial crisis of 2008
highlighted the need for prudential supervision, and increased globalization has led to deeper concerns
about financial contagion and regulatory competition (ie, competing efforts by different jurisdictions to
attract business).

Besides the obvious costs of implementation, regulation can have other monetary consequences. If, for
example, regulators require an institution to be insured against specific risk events, this can create a moral
hazard in which the institution is incentivized to take on greater risk since the insurance has effectively set
a floor on its losses. Also, capital holding requirements can limit the amount of lendable capital institutions
have available to offer, thus hurting economic growth.

Vol 1-160
Economics of Regulation

Regulation of Commerce

LOS: Explain the purposes of regulating commerce and financial markets.

Governments play an important role in regulating commerce at the local, national, regional, and global
levels. They are responsible for establishing a legal framework for contracting and setting standards, which
is an important part of creating an environment where the terms of trade can be enforced and business
can prosper. Government regulation is also essential in various aspects of business such as protecting
workers’ and employers’ rights and responsibilities, the health and safety of consumers, and local and
international commerce.

Some examples of government regulation of commerce include:

y Governments setting legal standards for the recognition and protection of different types of
intellectual property
y Government regulations covering laws related to contracts, companies, taxes, bankruptcy, and banking
y Steps taken toward protecting domestic business interests against unfair competition (eg, by imposing
anti-dumping duties)
y The ongoing development of regulation to address online privacy and "big data" gathered by
technology companies

Antitrust Regulation and Framework

LOS: Describe anticompetitive behaviors targeted by antitrust laws globally and evaluate the
antitrust risk associated with a given business strategy.

At the domestic level, governments aim to promote competition (or alternatively, prevent anticompetitive
practices). For example, regulatory approval is required for mergers and acquisitions of large companies.
When evaluating mergers and acquisitions, regulators assess whether they will lead to monopolization of
the market. Antitrust and competition laws also prohibit other types of anticompetitive behavior, such as
exclusive dealings or refusals to deal, price collusion, and predatory pricing.

When faced with antitrust issues, large companies operating in multiple markets need to satisfy regulators
from several countries simultaneously. Competing firms may also rely on regulation as part of a business
strategy for catching up to their rivals.

For analysts, it is important to evaluate whether an announced merger or acquisition would be blocked by
regulators on antitrust grounds. In cases of international mergers, the companies involved may need to
satisfy a range of regulators across multiple jurisdictions.

Vol 1-161
Learning Module 3

Classification of Regulations and Regulators

LOS: Describe classifications of regulations and regulators.

LOS: Describe uses of self-regulation in financial markets.

Regulations may be imposed by governmental bodies or, on a voluntary basis, by nongovernmental groups.
When instituted by governments, regulations may be monitored by government departments, government
agencies, or independent regulators that have legal authority to enforce them. One such regulator is the US
Securities and Exchange Commission (SEC).

Regulators develop a set of rules and standards of conduct for a target industry and can affect individual
businesses and the overall economy. Regulations can be classified as:

y Statutes: laws enacted by legislative bodies


y Administrative regulations: rules issued by government agencies and other regulators
y Judicial law: interpretations of courts

Exhibit 1 Classification of regulators

Departments Independent
Government (sanctioned)
and agencies regulators

Empowered by statutes

Government agency status

Funded by government

Subject to political influence

Examples in the US IRS, SEC PCAOB

Self-regulatory Self-regulating
Industry (self-regulated)
bodies organizations (SROs)

Private organization that receives


authority from its members to:

y Impose membership requirements


y Expel members

Authority and enforcement power from


government body or agency

Vol 1-162
Economics of Regulation

Self-regulatory Self-regulating
Industry (self-regulated)
bodies organizations (SROs)

Funded by government

Subject to pressure from members

Industry
Examples in the US FINRA, NYSE
associations

While most regulations are developed and enforced by state-backed government departments and
agencies, independent regulators can also make regulations within their powers and objectives. In contrast
to those bodies, industry self-regulatory bodies are private organizations that both represent and regulate
their members. Although such organizations are independent of the government and, to an extent, isolated
from political pressure, they may be subject to pressure from their members.

Industry self-regulatory bodies are used particularly in the securities industry and derive authority from their
members, who agree to comply with enforcement of the organization’s rules and standards. This authority
does not have the force of law, but industry self-regulatory bodies do have the power to exclude or expel
parties from being members. To ensure minimum standards are maintained, certain entry requirements,
such as training or ethical standards, may also be imposed.

Another type of self-regulating body is a self-regulating organization (SRO). SROs differ from industry
self-regulatory bodies as they are given recognition and authority, including enforcement power, by a
government body or agency. However, SROs are funded independently, not by the government. The
Financial Industry Regulatory Authority (FINRA) is an example of a SRO. FINRA manages some regulatory
responsibilities for the US SEC, a government agency, and thus has the authority to enforce rules and
securities law.

Regulatory Interdependencies

LOS: Describe regulatory interdependencies and their effects.

The relationship between regulations and the regulated entities can be multifaceted. Those working under
a regulatory body may view new regulations negatively or oppose the body’s current rules, but evidence
has shown that regulation often serves the larger interests of the regulated entities. This concept, called
regulatory capture, may take the form of limiting the number of competitors in the industry, imposing
quality standards, or promoting price controls.

Regulatory competition refers to regulators competing to provide an environment that will attract certain
entities as members. Those entities, in turn, can use this competition to wage regulatory arbitrage, that is,
capitalizing on loopholes in regulatory systems that allow them to circumvent unfavorable regulations. This
is seen when companies, rather than change certain behaviors, relocate to jurisdictions where regulators
allow those behaviors.

Vol 1-163
Learning Module 3

Exhibit 2 Regulatory competition and regulatory arbitrage

Regulatory COMPETITION
Country Country
A Countries competing for firms B
provide more favorable
regulations regarding:
• Taxation
• Corporate laws (eg, bankruptcy)
• Disclosure requirements
• Labor laws
• Environmental laws

More Regulatory ARBITRAGE Less


regulated Firms seeking favorable regimes regulated

While different countries may have common regulatory issues, the perspectives of regulators in those
countries—or the trade-offs faced when addressing specific issues—may vary. This can lead to differences
in regulatory treatments that encourage companies to pursue regulatory arbitrage. For example, in the
aftermath of the global financial crisis of 2008, many European and Asian countries were slower than the
United States to respond and implement derivatives reforms. As a result, there were concerns that US
markets would suffer from regulation of US-based financial institutions that might be unduly restrictive.

There have been similar problems in coordinating efforts to deal with global warming and pollution on
a global scale. Since regulations limiting greenhouse gas emissions are not applied consistently by all
countries, polluters have the option to simply relocate to developing countries where the regulations are not
as restrictive.

Further, there can sometimes be conflicts between different regulatory bodies within the same country,
leading to a regulatory framework that seems inconsistent.

Regulatory Tools

LOS: Describe tools of regulatory intervention in markets.

A market where regulations and policies are vague or volatile can make it difficult or impossible for a
company to have enough confidence to operate in that jurisdiction. The ideal regulatory environment is
one where market participants can reasonably assume a stable environment to work in. Several regulatory
tools are used by governments to help maintain such an environment as they regulate industries within their
jurisdictions. The tools available to authorities include:

y Taxes and subsidies used as price mechanisms: Taxes are imposed to discourage certain behaviors
(eg, taxes on cigarettes to deter smoking), while subsidies are given to encourage other behaviors (eg,
subsidies for farmers to increase domestic production of certain crops).
y Regulatory mandates and restrictions on behaviors: Governments can mandate certain activities (eg,
minimum capital requirements for banks) while restricting others (eg, insider trading).
y Provision of public goods and financing for private projects: Governments can provide public goods
(eg, national defense, transportation infrastructure) or provide loans to individuals and businesses for

Vol 1-164
Economics of Regulation

specific activities that the government wants to promote. The extent of government provision of public
goods and financing for private projects depends on the structure of the government, its philosophy,
and the country’s GDP.

Generally, it is feasible to consider more than one regulatory approach in a specific situation. For example,
both regulator-level and corporate-level restrictions could be used to restrict corporate insider trading A
regulatory action might require insiders to disclose trades, while a corporate action might impose a blackout
period when insiders are banned from trading their company’s stock. The appropriate action to take would
depend on the facts, circumstances, and specific context of the case.

Effective regulation requires that regulators have the ability to impose sanctions on violators, either at the
corporate or individual level. Company sanctions are appropriate when a company has caused harm to
others, but in a case of accounting fraud, sanctions on the company would entail imposing sanctions on its
shareholders, who themselves are victims of the wrongdoing. In that case, a more compelling argument
could be made for penalizing the individual perpetrators.

However, it is usually difficult to prosecute or achieve settlements with individual violators, for the
following reasons:

y It is difficult to identify the individuals who were directly at fault.


y Individuals have strong incentives to fight in order to protect their reputation and livelihood.
y Individuals are usually able to fight using corporate resources due to indemnification provisions in their
employment contracts.

Cost–Benefit Analysis

LOS: Describe benefits and costs of regulation.

The benefits of regulation are generally clear. The cost of regulation, however, can be more difficult to
assess, which makes the total impact of regulation less clear.

Regulatory burden refers to the costs imposed on a regulated entity, that is, the private costs of regulation
or government burden. Under this concept, the net regulatory burden is the private costs associated with
the regulation less the private benefits of that regulation.

When conducting cost-benefit analysis, regulators should consider the direct and indirect costs of
regulation. An example of direct costs is the cost of hiring compliance attorneys, while indirect costs refer
to the costs of economic decisions and behavioral changes resulting from regulation. Further, there may
be "unintended" costs associated with regulation, whether those costs are direct—such as the need for
more compliance lawyers than originally anticipated—or indirect. It is important for regulators to assess the
unintended consequences of potential regulations as they can result in high unanticipated costs.

It is easier to assess the costs and benefits of regulations on a retrospective basis than on a prospective
basis (ie, looking forward). An after-the-fact analysis allows a better-informed comparison of items of
interest before and after regulation occurred. However, in some instances, trial or pilot periods are used to
test proposed regulations and analyze the potential impacts.

Ideally, regulatory judgments should reflect economic principles and full consideration of the economic costs
and benefits, rather than the preferences of current decision makers.

Vol 1-165
Learning Module 3

Analysis of Regulation

LOS: Describe the considerations when evaluating the effects of regulation on an industry.

There are general and specific considerations that an analyst or investor should take into account when
evaluating the effects of a specific regulation on a particular industry or company. Analysts must understand
how regulation affects companies and industries currently, but they should also understand and anticipate
the impact of proposed new or changing regulations on the future prospects for companies and industries.

Exhibit 3 General framework for analysis of proposed regulation

Step What to assess

y Regulator's intentions and analytical framework


Likelihood of change
y Engagement of firms and political pressure

y Price limits, ban on products, labels (eg, tobacco)


Impact on revenues y Transparency requirements (eg, breakdown of bills)
y Limits on ROA (natural monopolies)

y Higher operating expenditures (eg, new technologies)


Impact on costs y Information disclosure, data protection, inspections
y Flexibility to pass on costs to consumers

Business risk y Risk of fines, bans on activities, indemnifications

Areas of focus can be divided into:

y The likelihood of regulatory change: This requires understanding the regulator’s intentions, the
cost–benefit analysis framework used by the regulator, and the extent of engagement with regulated
companies. This assessment must also consider both the effect of public or political pressure on the
implementation of a regulation and how those pressures may change the probability of implementation.
y The impact of regulatory change: This incorporates the analyst’s or investor’s opinion regarding how
the regulation will impact revenues, cost, and overall business risk. Regulators may introduce price
limits, tariffs, rents, or fees to protect consumers. They may also impose bans or require more detailed
product descriptions that can warn consumers before purchases are made. These warnings are
common on food or tobacco product labeling. Analysts should evaluate whether companies may find
alternative ways of generating revenues, helping to offset the negative impact of the regulation.
y The costs associated with complying with the regulation: These can come in the form of operational
or capital costs for companies. Operational costs may be additional features added to products
to meet safety requirements, while capital costs may be investment in new equipment needed for
compliance. If it is not possible to quantify specific additional costs, analysts will need to consider the
potential reduced flexibility of the company’s operations. Analysts must also understand the company’s
competitive position in the industry and evaluate the degree to which higher costs can be passed on
to customers.

Vol 1-166
Economics of Regulation

Other business risks are also relevant if failure to comply with regulations results in fines, customer
compensation, or bans on certain activities. These risks can be substantial yet difficult to forecast. Analysts
should account for such risks by either attempting to assign probabilities to them or incorporating them into
the discount rate used to value the company.

Vol 1-167
Learning Module 3

Vol 1-168

You might also like