0% found this document useful (0 votes)
354 views44 pages

Compiled Notes: Mscfe 610 Econometrics

Uploaded by

Rahul Devtare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
354 views44 pages

Compiled Notes: Mscfe 610 Econometrics

Uploaded by

Rahul Devtare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

MScFE xxx [Course Name] - Module X: Collaborative Review Task

Compiled Notes
Module 1
MScFE 610
Econometrics

© 2019 - WorldQuant University – All rights reserved.


Revised: 12/30/2020
1
MScFE 610 Econometrics – Notes Module 1

Table of Contents

Module 1: Basic Statistics .................................................................3


Unit 1: Introduction ............................................................................................................ 4

Unit 2: Basics of R ...............................................................................................................5

Unit 3 Using Scripts ......................................................................................................... 12

Unit 4:

Distributions.......................................................................................................... 20

Types of Econometric Models .................................................................... 29

Finance Topics and Econometrics ................................................................ 35

Bibliography....................................................................................................................... 44

© 2019 - WorldQuant University – All rights reserved.


2
MScFE 610 Econometrics – Summary Module 1

Module 1: Basic Statistics


Module 1 introduces Econometrics and the use of R to perform statistical analysis and estimate
econometric models. The module begins by explaining the procedure to download R from a free
online source and install packages relevant to the implementation of econometric models. The
module continues by explaining exploratory data analysis techniques relevant to the financial
engineer and by describing univariate and multivariate statistical distributions applicable to
econometrics. Several regression models showing linear or non-linear relationships between
financial variables are also described at the end of the module.

© 2019 - WorldQuant University – All rights reserved.


3
MScFE 610 Econometrics – Notes (1) Module 1: Unit 1

Unit 1: Introduction
Econometrics is the study of statistical methods that are able to extract useful information from
economic data which, unlike most statistical data, is not experimental in origin. This makes
econometrics different from classical statistics. We will delve into these differences in Module 2.

The first step of any econometric analysis is obtaining a strong understanding of the probabilistic
properties of one’s data. For this you need a strong statistical analysis package.

We will use the powerful and open source, community-based statistical programming language, R.

In this module, we will explore:

1. the basics of coding and statistical analysis with R;


2. the characteristics of univariate and multivariate distributions; and
3. the important exploratory data techniques relevant to the financial engineer.

In this course, we will not be exploring R in depth, but will rather introduce further features of R as
we go along. You will find that this module is self-contained and includes everything you need to
replicate each step performed in R.

For help with R installation and basic tutorials, there are a large set of online resources you can try,
starting with the main sources and repositories: Comprehensive R Archive Network
(https://cran.r-project.org/) and Quick R (https://www.statmethods.net/). If you need help on a
package or module in R, the R documentation page is very helpful. It has a standard help-file type
document for every module/method (https://www.rdocumentation.org/).

© 2019 - WorldQuant University – All rights reserved.


4
MScFE 610 Econometrics – Notes (2) Module 1: Unit 2

Unit 2: Basics of R
You can download the basic R program from CRAN (Comprehensive R Archive Network) available
here: https://cran.r-project.org. CRAN also hosts all the packages from the network of
contributors.

To get started, download and run through the basic installation on your system and then launch
the graphical user interface (GUI). 1

The GUI should look something like this:

This basic GUI is not very informative, especially if one is new to R or used to programs like Eviews
or MATLAB.

1
One can also use 𝑅𝑅 from the command prompt of the operator system, which is useful for batch processing.
In this module, we will stick to a GUI presentation as it is simpler and more intuitive, and thus faster to learn.

© 2019 - WorldQuant University – All rights reserved.


5
MScFE 610 Econometrics – Notes (2) Module 1: Unit 2

1. Obtaining packages
We will use the basic GUI for installing the necessary packages for this course.

The following convention will be followed in this module:

• Commands to be typed in R will be presented as courier font in blue with


beige background.
• R output will be presented as courier font in black with beige
background.
• Everything that follows a # is a R comment – i.e. R ignores this as text to explain something
to the human user.

2. Choose a CRAN mirror


Choose a CRAN mirror site for the download. Type the following command in R:

chooseCRANmirror()

To make the download as fast as possible, choose the mirror nearest to you.

3. Installing packages
Next, we will install all the packages we will be using throughout this course, by running the
following commands:

install.packages("tidyverse") # for all modules


install.packages("tseries") # for module 3
install.packages("forecast") # for most modules
install.packages("fGarch") # for module 4
install.packages("vars") # for module 5
install.packages("evir") # for module 6
install.packages("copula") # for module 7

If you want to use a function or method in a package, you must place it in the current “global
environment” so that R is able to call the functions in the package. To do this, use the command:

library(tidyverse) # loads the tidyverse package

© 2019 - WorldQuant University – All rights reserved.


6
MScFE 610 Econometrics – Notes (2) Module 1: Unit 2

Other GUI options:


There are some alternatives to the basic GUI available for you to use, such as RStudio. RStudio is a
particularly creative suite of packages for R, with its own GUI that is very intuitive to use and
makes interactivity easy. You can download the basic version for free at https://www.rstudio.com.

RStudio is also greatly customizable. Here is a screen capture of R studio set to dark grey (useful to
lower eye-strain if you code for many hours at a time).

Figure 1 illustrates the main “windows” of Rstudio:

Figure 1

© 2019 - WorldQuant University – All rights reserved.


7
MScFE 610 Econometrics – Notes (2) Module 1: Unit 2

First is the “terminal” where you can type in R commands directly (Figure 2):

Figure 2

© 2019 - WorldQuant University – All rights reserved.


8
MScFE 610 Econometrics – Notes (2) Module 1: Unit 2

In Figure 3, the “script writer” where you can write a sequence of R commands, or, eventually, R
programs, to be executed as a whole. In this example, the script 1st_example.R, when executed,
draws 100 observations from a standard normal distribution and plots the draws against the index
of the observation:

Figure 3

© 2019 - WorldQuant University – All rights reserved.


9
MScFE 610 Econometrics – Notes (2) Module 1: Unit 2

In Figure 4, the plot output of these commands is displayed on the bottom right:

Figure 4

© 2019 - WorldQuant University – All rights reserved.


10
MScFE 610 Econometrics – Notes (2) Module 1: Unit 2

Whatever variable or dataset you generate will be shown at the top right (Figure 5)t:

Figure 5

In this example, we created 100 observations of a variable 𝑥𝑥 that has a standard normal
distribution. This means that each element of the 100-value vector is a random draw from a
normal distribution with expected value 0 and standard deviation 1. In the rest of the course, we
will consider various ways of visualizing and interpreting the meaning of each graph/ figure.

Getting used to the structure underlying the R environment will help you be efficient and
productive with your analysis.

© 2019 - WorldQuant University – All rights reserved.


11
MScFE 610 Econometrics – Notes (3) Module 1: Unit 3

Unit 3: Using Scripts


So far, we have been exercising the basics of R in the command window, but this is not a good way
of doing complicated analyses that need to be replicable. In practice, you should always do your
code-writing in a script, or a set of scripts and functions as you become more proficient with
coding.

For instance, our data import step and mean calculation could be written up in a script, and all
executed at once by clicking on “source” or executed line-by-line by using the key combination
ctrl-enter.

You will be writing a lot of code for different applications in this course. Therefore, it is best
practice to keep to a meticulous commenting regime. Every step needs an explanation of why it’s
done, otherwise you will waste time later in trying to reconstruct your thoughts. There are often
many different ways of coding the same function, so keeping track of how you have coded
functions is important.

In R you can use the # character to make comments. R ignores everything after # in a line of code
when it runs blocks of code.

As an example, you could begin your work with something like this, allowing anyone to read your
code as easily as you do.

© 2019 - WorldQuant University – All rights reserved.


12
MScFE 610 Econometrics – Notes (3) Module 1: Unit 3

Next, we will turn to some systematic exploratory data analysis to learn the characteristics of our
data. One can get misleading results if you apply a method to data it is not created for. As we will
be developing a large array of methods in this course, we need to understand the characteristics of
our data to help us select the correct models.

Exploratory data analysis


The financial engineer will inevitably be interested in investing in a portfolio of assets to obtain his
or her goals. You will learn a lot about the optimal construction of such a portfolio during this
degree so to begin with we will just keep to the basics.

For you to know whether or not to invest in a portfolio of two assets, you would need to
understand how they are likely to move individually, but also to understand how they are likely to
have the same stochastic trend. If two stocks are anything less than perfectly correlated, a way to
reduce portfolio risk is to lower than either of the component assets will always exist.

The financial engineer works with uncertain variables that evolve over time. For example, the
returns on different banks may behave differently as a group during “normal times” to how they
would behave during a housing boom or a banking sector crisis.

The properties we will impose on this time-varying behavior are captured statistically by the joint
cumulative probability distribution, and it is the features of this mathematical object that we will
study in this module. We will apply the joint cumulative probability distribution to some real-

© 2019 - WorldQuant University – All rights reserved.


13
MScFE 610 Econometrics – Notes (3) Module 1: Unit 3

world data to extract some useful information. This information will tell us which of the models we
develop in this course will be applicable to a certain set of data.

Time plots
The first step in exploratory data analysis is a visualization of the variable of interest over time.
This is called a “time plot”.

We will use the ggplot packages that form part of tidyverse/Rstudio package. The commands
(taken from the R script attached to this module) are:

# Basic Time Plots:


# Load necessary packages
library(tidyverse)
library(ggplot2)
# view time plots of the closing price of Microsoft:
ggplot(data = FinData) +
geom_point(mapping = aes(x = Date,y = MSFT),
color = "darkblue") +
labs(x = "year", y = "Microsoft - Daily Closing Price")
ggsave("C:/R_illustration/Microsoft_closing_price.png")
# generate plots for each of the log return graphs of the four assets
ggplot(data = FinData)
+ geom_line(mapping = aes(x = Date,y = APPL_lr),
color = "darkred") +
labs(x = "year",y = "Apple - Log Daily Return")
ggsave("C:/R_illustration/Apple_log_returns.png")
ggplot(data = FinData)
+geom_line(mapping = aes(x = Date,y = INTC_lr),
color ="darkgreen")
+labs(x = "year",y = "Intel - Log Daily Return")
ggsave("C:/R_illustration/Intel_log_returns.png")
ggplot(data = FinData)
+ geom_line(mapping = aes(x = Date, y = IBM_lr),
color = "darkcyan")
+labs(x = "year",y = "IBM - Log Daily Return")
ggsave("C:/R_illustration/IBM_log_returns.png")

© 2019 - WorldQuant University – All rights reserved.


14
MScFE 610 Econometrics – Notes (3) Module 1: Unit 3

The ggplot method works as follows:

The main call establishes the dataset involved and can recognize dated data like ours without any
further human input. This is one of the reasons why ggplot packages are great options.

For every additional element you wish to add to the graph, you add a + geom_XXX call. In our
example we will use:

+ geom_line(…)

This calls for a line graph to be created.

Its arguments mean the following:

mapping = aes(x = Date,y = MSFT)

Add an “aesthetic”, i.e., a visual graph, that maps the x variable date to the y variable MSFT. Since
the calling function is “geom_line” the result will be a line plot. It would have been a scatter plot if
the calling function had been “geom_point”.

color = “darkblue”

This sets the color of the line to a color in ggplot called “darkblue”.

Lastly, we will add another component to the figure, the labels:

+labs( x ="year",y = "Microsoft - Daily Closing Price")

The ggsave command saves a png file of the graph you generated.

© 2019 - WorldQuant University – All rights reserved.


15
MScFE 610 Econometrics – Notes (3) Module 1: Unit 3

Compare the time plot of the price of the series to that of its implied log returns by viewing the
two graphs you just generated:

We immediately notice a stark difference: the closing price seems to meander upward while the
log returns fluctuate around a visually constant mean. The closing price exhibits properties of non-

© 2019 - WorldQuant University – All rights reserved.


16
MScFE 610 Econometrics – Notes (3) Module 1: Unit 3

stationarity, while we would call the log returns stationary. Module 3: Univariate Time Series
Models will go into more detail about what the difference is. For now, we will focus on the
stationary log return series to introduce the concepts of exploratory data analysis in the static
setting. All of these methods have extensions to the non-stationary world that you should explore
as necessary in your career.

Of key interest is how two or more series move together over time. A first step we can take is to
plot line graphs of each and compare obvious trends.

Let’s consider the time plots of the four assets in the data set for this module:

© 2019 - WorldQuant University – All rights reserved.


17
MScFE 610 Econometrics – Notes (3) Module 1: Unit 3

There are some similarities and some distinct aspects to these time plots. We observe that:

a) They all have seemingly constant means over the sample – there are clearly no long-term
deviations from a mean just above zero.
b) The variance of each series shows periods of high volatility followed by periods of low
volatility.
c) There are sometimes clusters of volatility across series, most notably the universal
increase in volatility somewhere between 2008 and 2010, which corresponds to the
global financial crisis.

© 2019 - WorldQuant University – All rights reserved.


18
MScFE 610 Econometrics – Notes (3) Module 1: Unit 3

Stylized facts of financial asset returns


With formal testing, which we study in this course, we can find a set of stylized facts that describes
most returns series of financial assets:

a) Almost no auto-correlation

Asset return levels tend to be uncorrelated with their own past, except at very low
frequency data.

b) Heavier tails than normal distribution (conditional heavy tails here as well)

There is a higher probability of extreme events in empirical distributions of asset returns


than in the best fitting normal distribution.

c) Asymmetric gain/loss behavior

We observe large negative returns more often than large positive returns.

d) Aggregation to normality

The lower the frequency of observation (e.g. monthly vs. daily) the closer the return
resembles a normally distributed variable.

e) Volatility clustering

We observe periods of high volatility followed by periods of relative tranquillity.

f) Auto-correlated absolute values

The absolute value of a return series tends to be auto-correlated. This would not happen
with a normally distributed variable.

g) Leverage effect

There tends to be a negative correlation between the volatility and returns of an asset.

h) Correlation between volume and volatility

Trading volumes tends to be correlated with any measure of volatility.

Before we consider co-movements (or correlations), we study individual marginal distributions.


That is, we consider each returns series as a set of independent draws from a constant
distribution.

© 2019 - WorldQuant University – All rights reserved.


19
MScFE 610 Econometrics – Notes (4) Module 1: Unit 4

Unit 4: Distributions
Normal distribution is a bell-shaped continuous probability density function (PDF) defined by
parameters mean and standard deviation.

Log normal distribution, which is usually assumed to be the case for returns on many assets, is
a continuous distribution in which the log – e.g. returns – of a variable has a normal distribution.

Uniform distribution is a PDF in which all possible events – e.g. returns – are equally likely to
happen.

The following three distributions are especially important because they are used in hypothesis
testing for several econometric procedures.

𝒕𝒕 distribution is symmetrical, bell-shaped, and similar to the standard normal curve. The higher
the degrees of freedom the closer the 𝑡𝑡 will resemble the standard normal distribution with mean
zero and standard deviation of one.

© 2019 - WorldQuant University – All rights reserved.


20
MScFE 610 Econometrics – Notes (4) Module 1: Unit 4

© 2019 - WorldQuant University – All rights reserved.


21
MScFE 610 Econometrics – Notes (4) Module 1: Unit 4

Chi-squared distribution is the distribution of the sum of squared standard normal deviates,
where the degrees of freedom of the distribution is equal to the number of standard normal
deviates being summed. The table is represented below.

© 2019 - WorldQuant University – All rights reserved.


22
MScFE 610 Econometrics – Notes (4) Module 1: Unit 4

F distribution
If 𝑈𝑈 and 𝑉𝑉 are independent chi-square random variables with 𝑟𝑟1 and 𝑟𝑟2 degrees of freedom
respectively, then follow an F-distribution with 𝑟𝑟1 numerator degrees of freedom and 𝑟𝑟2
denominator degrees of freedom. A partial table is below:

\ df1=1 2 3 4 5 6 7 8 9

df2=1 39.86346 49.50000 53.59324 55.83296 57.24008 58.20442 58.90595 59.43898 59.85759

2 8.52632 9.00000 9.16179 9.24342 9.29263 9.32553 9.34908 9.36677 9.38054

3 5.53832 5.46238 5.39077 5.34264 5.30916 5.28473 5.26619 5.25167 5.24000

4 4.54477 4.32456 4.19086 4.10725 4.05058 4.00975 3.97897 3.95494 3.93567

5 4.06042 3.77972 3.61948 3.52020 3.45298 3.40451 3.36790 3.33928 3.31628

Normality (graph, skewness)


Normality is an important aspect of many cases in econometrics. An important assumption often
made in finance is that returns are normally distributed. It is also used in portfolio allocation
models, pricing options, and Value-at-Risk (VaR) measurements. Frequently, analysts find that
many real-world distributions have fat (heavy) tails and sometimes sharp peaks.

© 2019 - WorldQuant University – All rights reserved.


23
MScFE 610 Econometrics – Notes (4) Module 1: Unit 4

We want to explore skewness and symmetry. The example below is for the monthly simple returns
of a med cap company. Do the returns normally distribute? This small size is for instructional
purpose. Skewness is hard to judge in smaller samples.

Skewness is an indicator used in distribution analysis as a sign of asymmetry and deviation from a
normal distribution. The equation for skewness (called the third moment around the mean) is
defined as:
𝑛𝑛 𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ 3
Σ� �
(𝑛𝑛 − 1)(𝑛𝑛 − 2) 𝑠𝑠

Interpretation:
• Skewness > 0 – Right skewed distribution, most values are concentrated on left of the
mean, with extreme values to the right.
• Skewness < 0 – Left skewed distribution, most values are concentrated on the right of the
mean, with extreme values to the left.
• Skewness = 0 – mean = median, the distribution is symmetrical around the mean.

© 2019 - WorldQuant University – All rights reserved.


24
MScFE 610 Econometrics – Notes (4) Module 1: Unit 4

According to David P. Doane and Lori E. Seward (McGraw-Hill, 2011:155), Applied Statistics in
Business and Economics, 3e: “If the sample size is 30 observations then 90% of the time the
skewness stat will lie between -0.673 and 0.673 for normality to hold. If the sample
size is 60 observations, then 90% of the time the skewness stat will lie between
-0.496 and 0.496 for normality. If the sample size is 150 observations, then 90% of the time the
skewness stat will lie between -0.322 and 0.322 to conclude normality”. In the last example above,
we reject normality.

Normality (kurtosis)
This is an indicator used as a sign of flatness or "peakedness" of a distribution. The formula for
kurtosis is:
4
∑𝑛𝑛𝑖𝑖=𝑛𝑛�𝑋𝑋𝑖𝑖 − 𝑋𝑋𝑎𝑎𝑎𝑎𝑎𝑎 �
𝑘𝑘 = 𝑛𝑛
𝑛𝑛 2
(∑𝑖𝑖=𝑛𝑛�𝑋𝑋𝑖𝑖 − 𝑋𝑋𝑎𝑎𝑎𝑎𝑎𝑎 � )2
𝑛𝑛2

Interpretation:
• Kurtosis > 3 – Leptokurtic distribution, sharper than a normal distribution, with
values concentrated around the mean and thicker tails. This means high probability for
extreme values.
• Kurtosis < 3 – Platykurtic distribution, flatter than a normal distribution with a wider
peak. The probability for extreme values is less than for a normal distribution, and the
values are wider spread around the mean.
• Kurtosis = 3– Mesokurtic distribution - normal distribution for example.

You should also examine a graph (histogram) of the data and consider performing other tests for
normality, such as the Shapiro-Wilk test.

Periodic (or time consistent or continuously compounded) rate of return

Finance tends to use the “time-consistent” rate of return, also called the continuously
compounded rate of return. This will be used in the econometric analysis for the duration of the
course.

© 2019 - WorldQuant University – All rights reserved.


25
MScFE 610 Econometrics – Notes (4) Module 1: Unit 4

The formula for the rate of return is

𝑟𝑟𝑡𝑡 = 𝑙𝑙𝑙𝑙(𝑃𝑃𝑡𝑡/𝑃𝑃𝑡𝑡 − 1)
where:

𝑃𝑃𝑡𝑡 = Asset price at time 𝑡𝑡.

𝑃𝑃𝑡𝑡−1 = Asset price at time 𝑡𝑡 − 1

𝑙𝑙𝑙𝑙 = Natural logarithm

𝑟𝑟𝑡𝑡 = Rate of return at time 𝑡𝑡.

These rates are often expressed as percentages, so the returns might be read as 10.54% for 30 Sep
then 5.72% for 29 Sep and 12.52% for 28 Sep.

These returns are called time consistent because if the returns are summed, they equal the return
from the start of the analysis to the end: Start at 27 Sep when the price was $15 and end at 30 Sep
when the price was $20. The rate of return was

𝑙𝑙𝑙𝑙(20/15) = 28.77%

Instead, add all the returns from the table above= 10.54% + 5.72% + 12.52% = 28.77%

© 2019 - WorldQuant University – All rights reserved.


26
MScFE 610 Econometrics – Notes (4) Module 1: Unit 4

Exercise:

For the following prices go into Excel and find the returns as percentages.

Calculating simple volatility of returns for a series


Given a series of prices, such as Google prices, it is possible to calculate the periodic return as
above.

Step 1
P – Google Price

r – Periodic return

Where periodic return is 𝑟𝑟𝑡𝑡 = 𝑙𝑙𝑙𝑙(𝑃𝑃𝑡𝑡/𝑃𝑃𝑡𝑡 − 1).

© 2019 - WorldQuant University – All rights reserved.


27
MScFE 610 Econometrics – Notes (4) Module 1: Unit 4

Step 2

Square the log returns

Step 3

Use the following formula to get simple volatility

∑𝑁𝑁 (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ )2


𝑠𝑠 = � 𝑖𝑖=1
𝑁𝑁 − 1

= 0.0126

Note: This is just the average of the squared returns for the period under analysis.

© 2019 - WorldQuant University – All rights reserved.


28
MScFE 610 Econometrics – Notes (5) Module 1: Unit 4

Unit 4: Types of Econometric Models

Plotting data to look for patterns to suggest functional form of


regression
Linear relationship: The following plot of data (with a linear trend line/equation from Excel)
suggests a linear model between 𝑥𝑥 and 𝑦𝑦 is appropriate.

In contrast, the following graph (again with a trend line) suggests a non-linear relationship
between 𝑥𝑥 and 𝑦𝑦 (exponential relationship between 𝑥𝑥 and 𝑦𝑦).

© 2019 - WorldQuant University – All rights reserved.


29
MScFE 610 Econometrics – Notes (5) Module 1: Unit 4

This can be transformed into a linear relationship as follows:

𝑙𝑙𝑙𝑙(𝑦𝑦) = 𝑎𝑎 + 𝑏𝑏(𝑥𝑥)

Some other possible transformations are given in the table below.

Exercises:
1 Can the following equation be expressed linearly? If so how?

𝑦𝑦 = 𝑒𝑒 𝛽𝛽0 + 𝛽𝛽1 ∗𝑋𝑋1 +𝛽𝛽2 ∗𝑋𝑋2 + 𝜀𝜀

2 For the data below, what kind of functional relationship exists between x and y, and how
this might be transformed using the transformations discussed in this section?

© 2019 - WorldQuant University – All rights reserved.


30
MScFE 610 Econometrics – Notes (5) Module 1: Unit 4

Interpretation of variables in equations

Dummy variables

A dummy variable (binary independent variable) assumes a value of 1 if some criterion is true,
otherwise it will be zero. For example:

D=0 if respondent is male.

D=1 if respondent is female.

CEO earnings (in thousands of USD):

Y = 90 + 50x – 25 D [x is the number of years of experience]

If D=1 (female) the CEO can expect to make $25 000 less each year.

Categorical variables

Related to dummies are categorical variables. These noncontinuous variables can take on several
values, some of which may be meaningful while others may not.

Variable C takes on several values. CEO’s favorite color:

1 Yellow
2 Blue
3 Green
4 Red

Preferring yellow to blue does not demonstrate any intensity in the variable being measured. That
is, the green is not 3 times as intense as yellow. Simply running this variable in a regression doesn’t
mean anything. This type of variable can be transformed into meaning by creating a dummy
variable.
For example:
D3 = 1 if yellow is favorite color

D3 = 0 otherwise.

The regression coefficient for D3 might be

© 2019 - WorldQuant University – All rights reserved.


31
MScFE 610 Econometrics – Notes (5) Module 1: Unit 4

CEO salary = …+40 D3+…….

In this case a CEO whose favorite color is yellow earns $40 000 more per year than one whose
favorite color is something else.

A categorical variable may express intensity and might be included in a regression as a categorical
variable without resort to the use of a dummy. For example:
C is average household income
1 $0 to $20 000
2 $20 001 to $40 000
3 $40 001 to $60 000
4 $60 001 to $80 000
5 More than $80 000

In this case, intensity is represented by the data. The data represent a proxy for an unobservable
variable: continuous income. For example, in a regression of:

Double log equation


𝑙𝑙𝑙𝑙(𝑦𝑦) = 𝑎𝑎 + 𝑏𝑏𝑏𝑏𝑏𝑏(𝑥𝑥)

𝑏𝑏 is the percentage change in 𝑦𝑦 resulting from a one percent change in 𝑥𝑥. For example, if 𝑏𝑏 = 5 then
a one percent increase in 𝑥𝑥 results in a 5% increase in 𝑦𝑦.

Exercises:
1 The following equation evaluating business majors’ starting salary contains two dummy
variables D1 and D2.

D1=1 if college major was accounting, zero otherwise.

D2=1 if college major was finance, otherwise zero.

Salary = 34 + 3.76D1 + 8.93 D2 [starting salary of business majors]


a. What is the starting salary of a business grad majoring in accounting?
b. What is the starting salary of a business grad majoring in finance?
c. What is the starting salary of other business majors?

© 2019 - WorldQuant University – All rights reserved.


32
MScFE 610 Econometrics – Notes (5) Module 1: Unit 4

Finance models
We are largely considering linear relationships in this course. As a consequence, we wish to have
an independent variable, 𝑥𝑥, made a linear function.

Linearizing models
1 It may be you are trying to model a nonlinear relationship between 𝑦𝑦 and independent variable
𝑥𝑥:

𝑌𝑌 = 𝑎𝑎𝑋𝑋 𝑏𝑏 (where 𝑎𝑎 and 𝑏𝑏 are constant parameters)

This can be linearized by taking the natural log of both sides:

𝐼𝐼𝐼𝐼 (𝑌𝑌) = 𝑙𝑙𝑙𝑙(𝑎𝑎) + 𝑏𝑏𝑏𝑏𝑏𝑏(𝑋𝑋) [log-log relationship]

This type of model can be estimated using linear regression, which we shall talk about later. It is
important to note that if we change 𝑥𝑥, the change in 𝑦𝑦 is as follows:

𝑑𝑑𝑑𝑑 = 𝑏𝑏(𝑌𝑌/𝑋𝑋) 𝑑𝑑𝑑𝑑 [it depends on the values chosen for 𝑋𝑋 and 𝑌𝑌 – e.g. mean values]

Suppose instead the relationship between 𝑥𝑥 and 𝑦𝑦 is

𝑌𝑌 = 𝑒𝑒 𝑎𝑎+𝑏𝑏𝑏𝑏

This nonlinear relationship can be expressed as:

𝐼𝐼𝐼𝐼 𝑌𝑌 = 𝐼𝐼𝐼𝐼 [𝑒𝑒 𝑎𝑎+𝑏𝑏𝑏𝑏 ] = 𝑎𝑎 + 𝑏𝑏𝑏𝑏

Bear in mind the relationship is not linear in 𝑋𝑋 and 𝑌𝑌 but instead is illustrated by the graph below:

© 2019 - WorldQuant University – All rights reserved.


33
MScFE 610 Econometrics – Notes (5) Module 1: Unit 4

2 Another semi-log relationship is:

𝑌𝑌 = 𝑎𝑎 + 𝑏𝑏 𝐼𝐼𝐼𝐼 (𝑋𝑋)

3 Suppose that X is the independent variable and Y the dependent variable, but a strict linear
relationship does not exist. Instead:

𝑌𝑌 = 𝑎𝑎 + 𝑏𝑏𝑏𝑏 + 𝑐𝑐 𝑋𝑋 2 [𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝]

This can still be estimated using the linear regression techniques of this course, but we treat 𝑋𝑋 2 as
a new variable 𝑍𝑍 = 𝑋𝑋 2 .

Exercise:

Transform the following equation into a linear relation:

𝑌𝑌 = [𝑎𝑎 + 𝑏𝑏𝑏𝑏]−1

Hint: What is the definition for the new dependent variable?

© 2019 - WorldQuant University – All rights reserved.


34
MScFE 610 Econometrics – Notes (6) Module 1: Unit 4

Unit 4: Finance Topics and Econometrics


Financial econometrics looks at functional relationships between variables in the financial
environment, such as returns on financial assets or trends in variables such as bonds yields.
Consequently, we estimate equations that are linear or nonlinear in nature, using the techniques
developed in this course.

Capital asset pricing model (CAPM)


CAPM shows a linear relationship between risk and expected return and that is used in the pricing
of risky securities. The security market line of the CAPM is defined as

SML: 𝑅𝑅𝑖𝑖𝑖𝑖 − 𝑅𝑅𝑓𝑓𝑓𝑓 = 𝑎𝑎𝑖𝑖 + 𝛽𝛽𝑖𝑖 �𝑅𝑅𝑀𝑀𝑀𝑀 − 𝑅𝑅𝑓𝑓𝑓𝑓 � + 𝜀𝜀𝑖𝑖𝑖𝑖

The alpha coefficient (𝛼𝛼𝑖𝑖 ) is the constant (intercept) in the security market line of the CAPM.

The alpha coefficient indicates how an investment has performed after accounting for the risk it
involved.

If markets are efficient then 𝛼𝛼 = 0. If 𝛼𝛼 < 0 the security has earned too little for its risk. If 𝛼𝛼 > 0
the investment has provided higher returns for the level of risk.

Fama-French model (FFM)


The FFM is a factor model that expands on the capital asset pricing model (CAPM) by adding size
and value factors along with the market risk factor in CAPM. They maintain that value and small
cap stocks outperform markets on a consistent basis.

According to Fama and French, two types of stocks outperform the market:

i. small caps
ii. value stocks (have a high book-value-to-price ratio)

book value = assets - liabilities and preferred shares

© 2019 - WorldQuant University – All rights reserved.


35
MScFE 610 Econometrics – Notes (6) Module 1: Unit 4

They then added two factors to CAPM to reflect a portfolio's exposure to these two classes:

𝑟𝑟 − 𝑅𝑅𝑓𝑓 = 𝑏𝑏𝑏𝑏𝑏𝑏𝑎𝑎3 × �𝐾𝐾𝑚𝑚 − 𝑅𝑅𝑓𝑓 � + 𝑏𝑏3 × 𝑆𝑆𝑆𝑆𝑆𝑆 + 𝑏𝑏𝑦𝑦 × 𝐻𝐻𝐻𝐻𝐻𝐻 + 𝑎𝑎𝑎𝑎𝑎𝑎ℎ𝑎𝑎


where:

𝑟𝑟 is the portfolio's return rate,

𝑅𝑅𝑓𝑓 is the risk-free return rate (US T bills),

𝑆𝑆𝑆𝑆𝑆𝑆 (Small Minus Big) is the average return on the three small portfolios minus the average
return on the three big portfolios:

𝑆𝑆𝑆𝑆𝑆𝑆 = 1/3 (𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 + 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 + 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺ℎ) – 1/3 (𝐵𝐵𝐵𝐵𝐵𝐵 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉
+ 𝐵𝐵𝐵𝐵𝐵𝐵 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 + 𝐵𝐵𝐵𝐵𝐵𝐵 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺ℎ)

𝐻𝐻𝐻𝐻𝐻𝐻 (High Minus Low) is the average return on the two value portfolios minus the average return
on the two growth portfolios:

1 1
𝐻𝐻𝐻𝐻𝐻𝐻 = (𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 + 𝐵𝐵𝐵𝐵𝐵𝐵 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉) − (𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺ℎ + 𝐵𝐵𝐵𝐵𝐵𝐵 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺ℎ)
2 2

𝐾𝐾𝑚𝑚 is the return of the whole stock market.

Momentum is sometimes used as another factor.

The daily and weekly Fama-French factors can be found at the following:
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html

Arbitrage Pricing Theory (APT)


APT states that the return of a financial asset is a linear function of a number of factors such as
macroeconomic factors (e.g., the market rate of interest, economic growth, inflation) and market
indices such as:

• short term interest rates;


• the difference in long-term and short-term interest rates;
• a diversified stock index, such as the S&P 500 or NYSE Composite;
• oil prices;
• gold and other precious metal prices; and
• currency exchange rates.

© 2019 - WorldQuant University – All rights reserved.


36
MScFE 610 Econometrics – Notes (6) Module 1: Unit 4

The APT produces a rate of return that can be checked against actual return to see if the asset is
correctly priced.

Assumptions:
• Security returns are a function of a number of factors.
• There are sufficient securities for firms-specific (idiosyncratic) risk to be diversified away.
• Well-functioning security markets do not allow for persistent arbitrage opportunities.
• Risky asset returns are said to follow a factor intensity structure if they can be expressed
as:

𝑟𝑟𝑗𝑗 = 𝑎𝑎𝑖𝑖 + 𝑏𝑏𝑖𝑖1 𝐹𝐹1 + 𝑏𝑏𝑖𝑖2 𝐹𝐹2 + ⋯ + 𝑏𝑏𝑖𝑖𝑖𝑖 𝐹𝐹𝑛𝑛 + ε𝑖𝑖

where:

𝑎𝑎𝑖𝑖 is a constant for asset 𝑖𝑖,

𝐹𝐹𝑖𝑖 is a systematic factor where 𝑛𝑛 such factors exist in this model,

𝐵𝐵 is called the factor loading or the change in return resulting from a unit change
in the factor,

ε𝑖𝑖 is called the idiosyncratic random shock with zero mean.

Idiosyncratic shocks are assumed to be uncorrelated across assets and uncorrelated with the
factors.

The APT states that if asset returns follow a factor structure then the following relation exists
between expected returns and the factor sensitivities:

𝐸𝐸 �𝑟𝑟𝑗𝑗 � = 𝑟𝑟𝑓𝑓 + 𝑏𝑏𝑗𝑗1 𝑅𝑅𝑃𝑃1 + 𝑏𝑏𝑗𝑗2 𝑅𝑅𝑃𝑃2 + ⋯ + 𝑏𝑏𝑗𝑗𝑗𝑗 𝑅𝑅𝑃𝑃𝑛𝑛

where:

𝑅𝑅𝑃𝑃𝑘𝑘 is the risk premium of the factor,

𝑟𝑟𝑓𝑓 is the risk-free rate.

© 2019 - WorldQuant University – All rights reserved.


37
MScFE 610 Econometrics – Notes (6) Module 1: Unit 4

Yield curve (term structure of interest rates)


Liquidity Preference Theory suggests market participants demand a premium for long-term
investments in bonds (such as Treasuries, gilts, or French OATs) compared to short term loans to
the government. Even in a situation where no interest rate change is expected, the curve will
slightly slope upwards because of the risk/liquidity premium investors demand from buying long
term. This is the usual case (normal interest curve), but inverse or inverted curves are possible.

The logic behind the econometrics associated with the yield curve can be understood by looking at
investor expectations about what future yields will be.

Investors expect (expected value) the return on a one year bond maturing in two years to be
𝐸𝐸𝑡𝑡 �𝑟𝑟1,𝑡𝑡+1 � or short-term rate one year from today; a one-year bond maturing in three years has
expected yield of 𝐸𝐸𝑡𝑡 (𝑟𝑟1,𝑡𝑡+2) etc. The yield on a 𝑛𝑛-period bond is equal to the average of the short-
term bonds plus a risk/liquidity premium, 𝑃𝑃 or

�𝑟𝑟1,𝑡𝑡+1 + 𝐸𝐸𝑡𝑡 �𝑟𝑟1,𝑡𝑡+2 � + ⋯ + 𝐸𝐸𝑡𝑡 �𝑟𝑟1,𝑡𝑡+𝑛𝑛−1 ��


𝑟𝑟𝑛𝑛,𝑡𝑡 = 𝑃𝑃 +
𝑛𝑛

The expected one-year returns are all at time 𝑡𝑡. Assume the investor bases his/her expectation on
the current one-year bond, 𝑟𝑟1,𝑡𝑡 .The estimating equation for the 𝑛𝑛 –year bond (long-term) is 𝑟𝑟𝑛𝑛,𝑡𝑡 =
𝑎𝑎 + 𝑏𝑏 𝑟𝑟1,𝑡𝑡 [estimating equation for yield curve].

Note that the theory (if it panned out) suggests that 𝑏𝑏 = 1. This can be tested using the 𝑡𝑡-test from
OLS.

Early warning systems


It is known that financial imbalances commonly lead to widespread financial stress which may
cause the collapse of banks or other companies, financial crises, and recessions.

It is therefore of crucial importance to detect and to predict potential financial stress. There are a
host of early warning systems regarding financial or credit crises.

© 2019 - WorldQuant University – All rights reserved.


38
MScFE 610 Econometrics – Notes (6) Module 1: Unit 4

One is the MIMIC that consists of two sets of equations:

𝑌𝑌 = 𝑏𝑏𝑏𝑏 + 𝑣𝑣
and
𝑊𝑊 = 𝑐𝑐𝑐𝑐 + 𝑢𝑢

where 𝑌𝑌 is a crisis indicator, 𝑋𝑋 is an observable potential crisis cause, and 𝑊𝑊 is a latent variable
thought to represent the severity of the crisis. Disturbance terms 𝑢𝑢 and 𝑣𝑣 are explained later.
Macroeconomic variables that can be used as crisis indicators include:

• Credit growth
• Property price growth
• Credit to GDP gap
• Equity price growth

Sharpe ratio
The Sharpe ratio is frequently used to indicate the return provided considering the investment
risk. When comparing two assets, the one with a higher Sharpe ratio provides a superior return for
the same risk, 𝜎𝜎.

Suppose you have invested in a mutual fund and want to find the Sharpe ratio for the past 11
years. You calculate the year-over-year returns from your investment as follows:

© 2019 - WorldQuant University – All rights reserved.


39
MScFE 610 Econometrics – Notes (6) Module 1: Unit 4

Subtract the risk-free return from the actual return by using the formula command in Excel. This is
your Excess Return from d1:d11 in the chart below:

Then use the AVERAGE command in statistical formulas of Excel to get the average excess return
for the period. In this case 0.0538.

© 2019 - WorldQuant University – All rights reserved.


40
MScFE 610 Econometrics – Notes (6) Module 1: Unit 4

Next use the standard deviation command for the excess returns:

The Sharpe ratio = 0.0538 / 0.1384 = 0.3887.

Stock valuation (present value model)


The price, 𝑃𝑃, of a share of company stock at time 𝑡𝑡 is equal to the expected discounted sum of all
future dividends (cash payments):

𝐷𝐷𝑡𝑡+1 𝐷𝐷𝑡𝑡+2
𝑃𝑃𝑡𝑡 = 𝐸𝐸𝑡𝑡 { + + …..}
(1 + 𝛿𝛿) (1 + 𝛿𝛿)2
where:

𝐷𝐷 = dividend expected at some time in the future

𝛿𝛿 = the constant discount factor.

If 𝐷𝐷 is assumed to be a constant, the equation reduces to a perpetuity and the equation takes the
form of:

𝐷𝐷𝑡𝑡
𝑃𝑃𝑡𝑡 =
𝛿𝛿

© 2019 - WorldQuant University – All rights reserved.


41
MScFE 610 Econometrics – Notes (6) Module 1: Unit 4

In log form this looks as follows:

𝐼𝐼𝐼𝐼(𝑃𝑃𝑡𝑡 ) = −𝐼𝐼𝐼𝐼(𝛿𝛿) + 𝐼𝐼𝐼𝐼(𝐷𝐷𝑡𝑡 )

which can be estimated by OLS with the following specification:

𝐼𝐼𝐼𝐼 (𝑃𝑃𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 𝐼𝐼𝐼𝐼(𝐷𝐷𝑡𝑡 )

OLS topic will be presented in detail in Module 2.

Parametric (analytical) Value-at-Risk


Value-at-Risk (VaR) as the name suggests is measure of risk of loss for a particular asset or
portfolio. The probability level is one minus the probability of a VaR break (extremely risky
scenario). For example if the confidence level is the extreme left-most 1%, the specified
confidence level is 100%-1% or 99% confidence level.

Assumptions:
• The returns from the asset/portfolio are normally distributed. This allows the use of the
two params from the normal mean and standard deviation.
• We choose a level of confidence. If the confidence is 95% sure the worst case is not going
to happen, choose 𝑧𝑧-value = -1.645. If the confidence is 99% sure the worst case is not
going to happen choose 𝑧𝑧-value = -2.33. The 𝑧𝑧-value represents the number of standard
deviations away from the mean.
• The returns are assumed to be serially independent so no prior return should influence
the current return.

Steps required to get VaR


1 Find (log) returns from financial data (keep things as percentages)
2 Calculate mean (percentage)
3 Calculate standard deviation of means (percentage)
4 Choose a confidence level (we assume 𝑧𝑧=1.645)
5 Calculate the dollar loss associated with your investment as a money manager. Assume you
invested $10m in a security.

© 2019 - WorldQuant University – All rights reserved.


42
MScFE 610 Econometrics – Notes (6) Module 1: Unit 4

Example:

Choose a 95% confidence level, meaning we wish to have 5% of the observations in the left-hand
tail of the normal distribution. That means that the observations in that area are 1.645 standard
deviations away from the mean (assume to be zero for short period-of-time return data). The
following are data for a security investment:

Amount of dollars invested: $10 million

Standard deviation: 1.99% or 0.0199

The VaR at the 95% confidence level is 1.645 𝑥𝑥 0.0199 or 0.032736. The portfolio has a
market value of £10 million, so the

VaR of the portfolio is 0.032736 x 10,000,000 = $327,360

𝑧𝑧 = 2.33 for a 99 % confidence level considering the normal distribution.

𝑧𝑧 = 1.64 for a 95 % confidence level considering the normal distribution.

Relationship of correlation and VaR


Covariance VaR changes linearly as the underlying model or factor variable instantaneous
correlations change. Likewise, non-linear VaR will change non-linearly as the instantaneous
correlations change. Dynamic hedging costs should properly reflect the VaR cost of cap.

Graph from “Python for Finance” by Yuxing Yan.

© 2019 - WorldQuant University – All rights reserved.


43
MScFE 610 Econometrics - Module 1: Bibliography

Bibliography
Berlinger, E. et al. Mastering R for Quantitative Finance. Packt Publishing.

Chan, E.P. Quantitative Trading. Wiley Trading.

de Prado, M.L. (2018). Advances in Financial Machine Learning. Wiley Doan, D.P. and Seward, L. S.
(2011) Applied Statistics in Business and Economics, 3rd edition McGraw-Hill p. 155.

Daroczi G. et al. Introduction to R for Quantitative Finance. Packt Publishing.

Greene, W. (2000). Econometric Analysis, Prentice-Hall, NY.

Gujarati, D. (2004). Basic Econometrics, McGraw-Hill.

Halls-Moore, M. L. (2017). Successful Algorithmic Trading.

Halls-Moore, M.L. (2017). Advanced Algorithmic Trading. Part III Time Series Analysis.

Jansen, S. (2018). Hands-On Machine Learning for Algorithmic Trading. Packt Publishing

Jeet P. and Vats P. Learning Quantitative Finance with R. Packt Publishing.

McNeil, A. J. et al. Quantitative Risk Management. Princeton University Press.

Ojeda et al. i, Packt Publishing.

Scott, M. et al. (2013). Financial Risk Modelling and Portfolio Optimization with R. Wiley.

Yuxing, Y. (2017). Python for Finance. Packt Publishing.

Cont, R. (2001) ‘Empirical properties of asset returns: Stylized facts and statistical issues’,
Quantitative Finance, 1(2), pp. 223–236. doi: 10.1080/713665670.

Ruppert, D. and Matteson, D. S. (2015) Statistics and Data Analysis for Financial Engineering, with
R examples. Second. Ithaca, NY, USA: Springer Texts in Statistics.

Tsay, R. S. (2010) ‘Analysis of Financial Time Series’. Wiley.

© 2019 - WorldQuant University – All rights reserved.


44

You might also like