Compiled Notes: Mscfe 610 Econometrics
Compiled Notes: Mscfe 610 Econometrics
Compiled Notes
Module 1
MScFE 610
Econometrics
Table of Contents
Unit 4:
Distributions.......................................................................................................... 20
Bibliography....................................................................................................................... 44
Unit 1: Introduction
Econometrics is the study of statistical methods that are able to extract useful information from
economic data which, unlike most statistical data, is not experimental in origin. This makes
econometrics different from classical statistics. We will delve into these differences in Module 2.
The first step of any econometric analysis is obtaining a strong understanding of the probabilistic
properties of one’s data. For this you need a strong statistical analysis package.
We will use the powerful and open source, community-based statistical programming language, R.
In this course, we will not be exploring R in depth, but will rather introduce further features of R as
we go along. You will find that this module is self-contained and includes everything you need to
replicate each step performed in R.
For help with R installation and basic tutorials, there are a large set of online resources you can try,
starting with the main sources and repositories: Comprehensive R Archive Network
(https://cran.r-project.org/) and Quick R (https://www.statmethods.net/). If you need help on a
package or module in R, the R documentation page is very helpful. It has a standard help-file type
document for every module/method (https://www.rdocumentation.org/).
Unit 2: Basics of R
You can download the basic R program from CRAN (Comprehensive R Archive Network) available
here: https://cran.r-project.org. CRAN also hosts all the packages from the network of
contributors.
To get started, download and run through the basic installation on your system and then launch
the graphical user interface (GUI). 1
This basic GUI is not very informative, especially if one is new to R or used to programs like Eviews
or MATLAB.
1
One can also use 𝑅𝑅 from the command prompt of the operator system, which is useful for batch processing.
In this module, we will stick to a GUI presentation as it is simpler and more intuitive, and thus faster to learn.
1. Obtaining packages
We will use the basic GUI for installing the necessary packages for this course.
chooseCRANmirror()
To make the download as fast as possible, choose the mirror nearest to you.
3. Installing packages
Next, we will install all the packages we will be using throughout this course, by running the
following commands:
If you want to use a function or method in a package, you must place it in the current “global
environment” so that R is able to call the functions in the package. To do this, use the command:
RStudio is also greatly customizable. Here is a screen capture of R studio set to dark grey (useful to
lower eye-strain if you code for many hours at a time).
Figure 1
First is the “terminal” where you can type in R commands directly (Figure 2):
Figure 2
In Figure 3, the “script writer” where you can write a sequence of R commands, or, eventually, R
programs, to be executed as a whole. In this example, the script 1st_example.R, when executed,
draws 100 observations from a standard normal distribution and plots the draws against the index
of the observation:
Figure 3
In Figure 4, the plot output of these commands is displayed on the bottom right:
Figure 4
Whatever variable or dataset you generate will be shown at the top right (Figure 5)t:
Figure 5
In this example, we created 100 observations of a variable 𝑥𝑥 that has a standard normal
distribution. This means that each element of the 100-value vector is a random draw from a
normal distribution with expected value 0 and standard deviation 1. In the rest of the course, we
will consider various ways of visualizing and interpreting the meaning of each graph/ figure.
Getting used to the structure underlying the R environment will help you be efficient and
productive with your analysis.
For instance, our data import step and mean calculation could be written up in a script, and all
executed at once by clicking on “source” or executed line-by-line by using the key combination
ctrl-enter.
You will be writing a lot of code for different applications in this course. Therefore, it is best
practice to keep to a meticulous commenting regime. Every step needs an explanation of why it’s
done, otherwise you will waste time later in trying to reconstruct your thoughts. There are often
many different ways of coding the same function, so keeping track of how you have coded
functions is important.
In R you can use the # character to make comments. R ignores everything after # in a line of code
when it runs blocks of code.
As an example, you could begin your work with something like this, allowing anyone to read your
code as easily as you do.
Next, we will turn to some systematic exploratory data analysis to learn the characteristics of our
data. One can get misleading results if you apply a method to data it is not created for. As we will
be developing a large array of methods in this course, we need to understand the characteristics of
our data to help us select the correct models.
For you to know whether or not to invest in a portfolio of two assets, you would need to
understand how they are likely to move individually, but also to understand how they are likely to
have the same stochastic trend. If two stocks are anything less than perfectly correlated, a way to
reduce portfolio risk is to lower than either of the component assets will always exist.
The financial engineer works with uncertain variables that evolve over time. For example, the
returns on different banks may behave differently as a group during “normal times” to how they
would behave during a housing boom or a banking sector crisis.
The properties we will impose on this time-varying behavior are captured statistically by the joint
cumulative probability distribution, and it is the features of this mathematical object that we will
study in this module. We will apply the joint cumulative probability distribution to some real-
world data to extract some useful information. This information will tell us which of the models we
develop in this course will be applicable to a certain set of data.
Time plots
The first step in exploratory data analysis is a visualization of the variable of interest over time.
This is called a “time plot”.
We will use the ggplot packages that form part of tidyverse/Rstudio package. The commands
(taken from the R script attached to this module) are:
The main call establishes the dataset involved and can recognize dated data like ours without any
further human input. This is one of the reasons why ggplot packages are great options.
For every additional element you wish to add to the graph, you add a + geom_XXX call. In our
example we will use:
+ geom_line(…)
Add an “aesthetic”, i.e., a visual graph, that maps the x variable date to the y variable MSFT. Since
the calling function is “geom_line” the result will be a line plot. It would have been a scatter plot if
the calling function had been “geom_point”.
color = “darkblue”
This sets the color of the line to a color in ggplot called “darkblue”.
The ggsave command saves a png file of the graph you generated.
Compare the time plot of the price of the series to that of its implied log returns by viewing the
two graphs you just generated:
We immediately notice a stark difference: the closing price seems to meander upward while the
log returns fluctuate around a visually constant mean. The closing price exhibits properties of non-
stationarity, while we would call the log returns stationary. Module 3: Univariate Time Series
Models will go into more detail about what the difference is. For now, we will focus on the
stationary log return series to introduce the concepts of exploratory data analysis in the static
setting. All of these methods have extensions to the non-stationary world that you should explore
as necessary in your career.
Of key interest is how two or more series move together over time. A first step we can take is to
plot line graphs of each and compare obvious trends.
Let’s consider the time plots of the four assets in the data set for this module:
There are some similarities and some distinct aspects to these time plots. We observe that:
a) They all have seemingly constant means over the sample – there are clearly no long-term
deviations from a mean just above zero.
b) The variance of each series shows periods of high volatility followed by periods of low
volatility.
c) There are sometimes clusters of volatility across series, most notably the universal
increase in volatility somewhere between 2008 and 2010, which corresponds to the
global financial crisis.
a) Almost no auto-correlation
Asset return levels tend to be uncorrelated with their own past, except at very low
frequency data.
b) Heavier tails than normal distribution (conditional heavy tails here as well)
We observe large negative returns more often than large positive returns.
d) Aggregation to normality
The lower the frequency of observation (e.g. monthly vs. daily) the closer the return
resembles a normally distributed variable.
e) Volatility clustering
The absolute value of a return series tends to be auto-correlated. This would not happen
with a normally distributed variable.
g) Leverage effect
There tends to be a negative correlation between the volatility and returns of an asset.
Unit 4: Distributions
Normal distribution is a bell-shaped continuous probability density function (PDF) defined by
parameters mean and standard deviation.
Log normal distribution, which is usually assumed to be the case for returns on many assets, is
a continuous distribution in which the log – e.g. returns – of a variable has a normal distribution.
Uniform distribution is a PDF in which all possible events – e.g. returns – are equally likely to
happen.
The following three distributions are especially important because they are used in hypothesis
testing for several econometric procedures.
𝒕𝒕 distribution is symmetrical, bell-shaped, and similar to the standard normal curve. The higher
the degrees of freedom the closer the 𝑡𝑡 will resemble the standard normal distribution with mean
zero and standard deviation of one.
Chi-squared distribution is the distribution of the sum of squared standard normal deviates,
where the degrees of freedom of the distribution is equal to the number of standard normal
deviates being summed. The table is represented below.
F distribution
If 𝑈𝑈 and 𝑉𝑉 are independent chi-square random variables with 𝑟𝑟1 and 𝑟𝑟2 degrees of freedom
respectively, then follow an F-distribution with 𝑟𝑟1 numerator degrees of freedom and 𝑟𝑟2
denominator degrees of freedom. A partial table is below:
\ df1=1 2 3 4 5 6 7 8 9
df2=1 39.86346 49.50000 53.59324 55.83296 57.24008 58.20442 58.90595 59.43898 59.85759
We want to explore skewness and symmetry. The example below is for the monthly simple returns
of a med cap company. Do the returns normally distribute? This small size is for instructional
purpose. Skewness is hard to judge in smaller samples.
Skewness is an indicator used in distribution analysis as a sign of asymmetry and deviation from a
normal distribution. The equation for skewness (called the third moment around the mean) is
defined as:
𝑛𝑛 𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ 3
Σ� �
(𝑛𝑛 − 1)(𝑛𝑛 − 2) 𝑠𝑠
Interpretation:
• Skewness > 0 – Right skewed distribution, most values are concentrated on left of the
mean, with extreme values to the right.
• Skewness < 0 – Left skewed distribution, most values are concentrated on the right of the
mean, with extreme values to the left.
• Skewness = 0 – mean = median, the distribution is symmetrical around the mean.
According to David P. Doane and Lori E. Seward (McGraw-Hill, 2011:155), Applied Statistics in
Business and Economics, 3e: “If the sample size is 30 observations then 90% of the time the
skewness stat will lie between -0.673 and 0.673 for normality to hold. If the sample
size is 60 observations, then 90% of the time the skewness stat will lie between
-0.496 and 0.496 for normality. If the sample size is 150 observations, then 90% of the time the
skewness stat will lie between -0.322 and 0.322 to conclude normality”. In the last example above,
we reject normality.
Normality (kurtosis)
This is an indicator used as a sign of flatness or "peakedness" of a distribution. The formula for
kurtosis is:
4
∑𝑛𝑛𝑖𝑖=𝑛𝑛�𝑋𝑋𝑖𝑖 − 𝑋𝑋𝑎𝑎𝑎𝑎𝑎𝑎 �
𝑘𝑘 = 𝑛𝑛
𝑛𝑛 2
(∑𝑖𝑖=𝑛𝑛�𝑋𝑋𝑖𝑖 − 𝑋𝑋𝑎𝑎𝑎𝑎𝑎𝑎 � )2
𝑛𝑛2
Interpretation:
• Kurtosis > 3 – Leptokurtic distribution, sharper than a normal distribution, with
values concentrated around the mean and thicker tails. This means high probability for
extreme values.
• Kurtosis < 3 – Platykurtic distribution, flatter than a normal distribution with a wider
peak. The probability for extreme values is less than for a normal distribution, and the
values are wider spread around the mean.
• Kurtosis = 3– Mesokurtic distribution - normal distribution for example.
You should also examine a graph (histogram) of the data and consider performing other tests for
normality, such as the Shapiro-Wilk test.
Finance tends to use the “time-consistent” rate of return, also called the continuously
compounded rate of return. This will be used in the econometric analysis for the duration of the
course.
𝑟𝑟𝑡𝑡 = 𝑙𝑙𝑙𝑙(𝑃𝑃𝑡𝑡/𝑃𝑃𝑡𝑡 − 1)
where:
These rates are often expressed as percentages, so the returns might be read as 10.54% for 30 Sep
then 5.72% for 29 Sep and 12.52% for 28 Sep.
These returns are called time consistent because if the returns are summed, they equal the return
from the start of the analysis to the end: Start at 27 Sep when the price was $15 and end at 30 Sep
when the price was $20. The rate of return was
𝑙𝑙𝑙𝑙(20/15) = 28.77%
Instead, add all the returns from the table above= 10.54% + 5.72% + 12.52% = 28.77%
Exercise:
For the following prices go into Excel and find the returns as percentages.
Step 1
P – Google Price
r – Periodic return
Step 2
Step 3
= 0.0126
Note: This is just the average of the squared returns for the period under analysis.
In contrast, the following graph (again with a trend line) suggests a non-linear relationship
between 𝑥𝑥 and 𝑦𝑦 (exponential relationship between 𝑥𝑥 and 𝑦𝑦).
𝑙𝑙𝑙𝑙(𝑦𝑦) = 𝑎𝑎 + 𝑏𝑏(𝑥𝑥)
Exercises:
1 Can the following equation be expressed linearly? If so how?
2 For the data below, what kind of functional relationship exists between x and y, and how
this might be transformed using the transformations discussed in this section?
Dummy variables
A dummy variable (binary independent variable) assumes a value of 1 if some criterion is true,
otherwise it will be zero. For example:
If D=1 (female) the CEO can expect to make $25 000 less each year.
Categorical variables
Related to dummies are categorical variables. These noncontinuous variables can take on several
values, some of which may be meaningful while others may not.
1 Yellow
2 Blue
3 Green
4 Red
Preferring yellow to blue does not demonstrate any intensity in the variable being measured. That
is, the green is not 3 times as intense as yellow. Simply running this variable in a regression doesn’t
mean anything. This type of variable can be transformed into meaning by creating a dummy
variable.
For example:
D3 = 1 if yellow is favorite color
D3 = 0 otherwise.
In this case a CEO whose favorite color is yellow earns $40 000 more per year than one whose
favorite color is something else.
A categorical variable may express intensity and might be included in a regression as a categorical
variable without resort to the use of a dummy. For example:
C is average household income
1 $0 to $20 000
2 $20 001 to $40 000
3 $40 001 to $60 000
4 $60 001 to $80 000
5 More than $80 000
In this case, intensity is represented by the data. The data represent a proxy for an unobservable
variable: continuous income. For example, in a regression of:
𝑏𝑏 is the percentage change in 𝑦𝑦 resulting from a one percent change in 𝑥𝑥. For example, if 𝑏𝑏 = 5 then
a one percent increase in 𝑥𝑥 results in a 5% increase in 𝑦𝑦.
Exercises:
1 The following equation evaluating business majors’ starting salary contains two dummy
variables D1 and D2.
Finance models
We are largely considering linear relationships in this course. As a consequence, we wish to have
an independent variable, 𝑥𝑥, made a linear function.
Linearizing models
1 It may be you are trying to model a nonlinear relationship between 𝑦𝑦 and independent variable
𝑥𝑥:
This type of model can be estimated using linear regression, which we shall talk about later. It is
important to note that if we change 𝑥𝑥, the change in 𝑦𝑦 is as follows:
𝑑𝑑𝑑𝑑 = 𝑏𝑏(𝑌𝑌/𝑋𝑋) 𝑑𝑑𝑑𝑑 [it depends on the values chosen for 𝑋𝑋 and 𝑌𝑌 – e.g. mean values]
𝑌𝑌 = 𝑒𝑒 𝑎𝑎+𝑏𝑏𝑏𝑏
Bear in mind the relationship is not linear in 𝑋𝑋 and 𝑌𝑌 but instead is illustrated by the graph below:
𝑌𝑌 = 𝑎𝑎 + 𝑏𝑏 𝐼𝐼𝐼𝐼 (𝑋𝑋)
3 Suppose that X is the independent variable and Y the dependent variable, but a strict linear
relationship does not exist. Instead:
𝑌𝑌 = 𝑎𝑎 + 𝑏𝑏𝑏𝑏 + 𝑐𝑐 𝑋𝑋 2 [𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝]
This can still be estimated using the linear regression techniques of this course, but we treat 𝑋𝑋 2 as
a new variable 𝑍𝑍 = 𝑋𝑋 2 .
Exercise:
𝑌𝑌 = [𝑎𝑎 + 𝑏𝑏𝑏𝑏]−1
The alpha coefficient (𝛼𝛼𝑖𝑖 ) is the constant (intercept) in the security market line of the CAPM.
The alpha coefficient indicates how an investment has performed after accounting for the risk it
involved.
If markets are efficient then 𝛼𝛼 = 0. If 𝛼𝛼 < 0 the security has earned too little for its risk. If 𝛼𝛼 > 0
the investment has provided higher returns for the level of risk.
According to Fama and French, two types of stocks outperform the market:
i. small caps
ii. value stocks (have a high book-value-to-price ratio)
They then added two factors to CAPM to reflect a portfolio's exposure to these two classes:
𝑆𝑆𝑆𝑆𝑆𝑆 (Small Minus Big) is the average return on the three small portfolios minus the average
return on the three big portfolios:
𝑆𝑆𝑆𝑆𝑆𝑆 = 1/3 (𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 + 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 + 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺ℎ) – 1/3 (𝐵𝐵𝐵𝐵𝐵𝐵 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉
+ 𝐵𝐵𝐵𝐵𝐵𝐵 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 + 𝐵𝐵𝐵𝐵𝐵𝐵 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺ℎ)
𝐻𝐻𝐻𝐻𝐻𝐻 (High Minus Low) is the average return on the two value portfolios minus the average return
on the two growth portfolios:
1 1
𝐻𝐻𝐻𝐻𝐻𝐻 = (𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 + 𝐵𝐵𝐵𝐵𝐵𝐵 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉) − (𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺ℎ + 𝐵𝐵𝐵𝐵𝐵𝐵 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺ℎ)
2 2
The daily and weekly Fama-French factors can be found at the following:
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
The APT produces a rate of return that can be checked against actual return to see if the asset is
correctly priced.
Assumptions:
• Security returns are a function of a number of factors.
• There are sufficient securities for firms-specific (idiosyncratic) risk to be diversified away.
• Well-functioning security markets do not allow for persistent arbitrage opportunities.
• Risky asset returns are said to follow a factor intensity structure if they can be expressed
as:
where:
𝐵𝐵 is called the factor loading or the change in return resulting from a unit change
in the factor,
Idiosyncratic shocks are assumed to be uncorrelated across assets and uncorrelated with the
factors.
The APT states that if asset returns follow a factor structure then the following relation exists
between expected returns and the factor sensitivities:
where:
The logic behind the econometrics associated with the yield curve can be understood by looking at
investor expectations about what future yields will be.
Investors expect (expected value) the return on a one year bond maturing in two years to be
𝐸𝐸𝑡𝑡 �𝑟𝑟1,𝑡𝑡+1 � or short-term rate one year from today; a one-year bond maturing in three years has
expected yield of 𝐸𝐸𝑡𝑡 (𝑟𝑟1,𝑡𝑡+2) etc. The yield on a 𝑛𝑛-period bond is equal to the average of the short-
term bonds plus a risk/liquidity premium, 𝑃𝑃 or
The expected one-year returns are all at time 𝑡𝑡. Assume the investor bases his/her expectation on
the current one-year bond, 𝑟𝑟1,𝑡𝑡 .The estimating equation for the 𝑛𝑛 –year bond (long-term) is 𝑟𝑟𝑛𝑛,𝑡𝑡 =
𝑎𝑎 + 𝑏𝑏 𝑟𝑟1,𝑡𝑡 [estimating equation for yield curve].
Note that the theory (if it panned out) suggests that 𝑏𝑏 = 1. This can be tested using the 𝑡𝑡-test from
OLS.
It is therefore of crucial importance to detect and to predict potential financial stress. There are a
host of early warning systems regarding financial or credit crises.
𝑌𝑌 = 𝑏𝑏𝑏𝑏 + 𝑣𝑣
and
𝑊𝑊 = 𝑐𝑐𝑐𝑐 + 𝑢𝑢
where 𝑌𝑌 is a crisis indicator, 𝑋𝑋 is an observable potential crisis cause, and 𝑊𝑊 is a latent variable
thought to represent the severity of the crisis. Disturbance terms 𝑢𝑢 and 𝑣𝑣 are explained later.
Macroeconomic variables that can be used as crisis indicators include:
• Credit growth
• Property price growth
• Credit to GDP gap
• Equity price growth
Sharpe ratio
The Sharpe ratio is frequently used to indicate the return provided considering the investment
risk. When comparing two assets, the one with a higher Sharpe ratio provides a superior return for
the same risk, 𝜎𝜎.
Suppose you have invested in a mutual fund and want to find the Sharpe ratio for the past 11
years. You calculate the year-over-year returns from your investment as follows:
Subtract the risk-free return from the actual return by using the formula command in Excel. This is
your Excess Return from d1:d11 in the chart below:
Then use the AVERAGE command in statistical formulas of Excel to get the average excess return
for the period. In this case 0.0538.
Next use the standard deviation command for the excess returns:
𝐷𝐷𝑡𝑡+1 𝐷𝐷𝑡𝑡+2
𝑃𝑃𝑡𝑡 = 𝐸𝐸𝑡𝑡 { + + …..}
(1 + 𝛿𝛿) (1 + 𝛿𝛿)2
where:
If 𝐷𝐷 is assumed to be a constant, the equation reduces to a perpetuity and the equation takes the
form of:
𝐷𝐷𝑡𝑡
𝑃𝑃𝑡𝑡 =
𝛿𝛿
Assumptions:
• The returns from the asset/portfolio are normally distributed. This allows the use of the
two params from the normal mean and standard deviation.
• We choose a level of confidence. If the confidence is 95% sure the worst case is not going
to happen, choose 𝑧𝑧-value = -1.645. If the confidence is 99% sure the worst case is not
going to happen choose 𝑧𝑧-value = -2.33. The 𝑧𝑧-value represents the number of standard
deviations away from the mean.
• The returns are assumed to be serially independent so no prior return should influence
the current return.
Example:
Choose a 95% confidence level, meaning we wish to have 5% of the observations in the left-hand
tail of the normal distribution. That means that the observations in that area are 1.645 standard
deviations away from the mean (assume to be zero for short period-of-time return data). The
following are data for a security investment:
The VaR at the 95% confidence level is 1.645 𝑥𝑥 0.0199 or 0.032736. The portfolio has a
market value of £10 million, so the
Bibliography
Berlinger, E. et al. Mastering R for Quantitative Finance. Packt Publishing.
de Prado, M.L. (2018). Advances in Financial Machine Learning. Wiley Doan, D.P. and Seward, L. S.
(2011) Applied Statistics in Business and Economics, 3rd edition McGraw-Hill p. 155.
Halls-Moore, M.L. (2017). Advanced Algorithmic Trading. Part III Time Series Analysis.
Jansen, S. (2018). Hands-On Machine Learning for Algorithmic Trading. Packt Publishing
Scott, M. et al. (2013). Financial Risk Modelling and Portfolio Optimization with R. Wiley.
Cont, R. (2001) ‘Empirical properties of asset returns: Stylized facts and statistical issues’,
Quantitative Finance, 1(2), pp. 223–236. doi: 10.1080/713665670.
Ruppert, D. and Matteson, D. S. (2015) Statistics and Data Analysis for Financial Engineering, with
R examples. Second. Ithaca, NY, USA: Springer Texts in Statistics.