PAF 2022 Woven
PAF 2022 Woven
FIE450
Lecture Notes
Nils Friewald
1
Nils Friewald FIE 450en
Contents
1 Introduction 6
3 Volatility 22
3.1 EWMA model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.1 Maximum likelihood approach . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.2 Functions in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.3 Optimization in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.4 Searching for the optimal EWMA parameter . . . . . . . . . . . . . . . . 33
3.2 GARCH model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Volatility forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Option-Implied Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4 Monte-Carlo simulation 44
4.1 Fundamental Theorem of Asset Pricing . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Principals of Monte-Carlo simulation . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 A model for the index price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4 Monte-Carlo Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.5 Variance reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.6 Interpreting the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5 Data processing 63
5.1 Obtaining the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Data cleansing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3 Rolling observations forward in time . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.4 One stock price observation per month . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5 Return computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.6 Market capitalization and weights . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.7 Market returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.8 From long to wide-format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.9 Risk-free interest rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6 Mean-Variance Portfolios 82
6.1 Optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2 Expected returns and covariances . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3 Solving for the optimal portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Page 2
Nils Friewald FIE 450en
Index 100
Page 3
Nils Friewald FIE 450en
Course Information
Who should take this course?
You should take this course if you
An alternative to this course is BUS455 (“Applied Programming and Data Analysis for Busi-
ness”)
Note that if you decide to choose the alternative courses, please do not forget to sign off from
this course.
Teaching style
• 12 lectures (January 10 to February 16):
Page 4
Nils Friewald FIE 450en
Assessment
• Group-based assignments
• Each assignment must be answered in English and counts 47.5% towards one final grade.
The reflection note counts for the remaining 5%. Each group assignment will have an
R-code as its delivery and needs to be submitted electronically.
• Don’t send us any R code; Neither I nor my teaching assistant (TA) will search for errors
in your code.
• We do not respond to questions on topics that have been already covered in-depth during
the lectures. That is, no reiteration of course content for those who had been absent.
• “I will be working in Oslo. Can I still be enrolled in the course and complete the assign-
ments?” Yes, but it is your responsibility to form groups for the assignments.
• For more questions and answers, see the FAQ announcement on Canvas
Page 5
Nils Friewald FIE 450en
1 Introduction
This course introduces you into the programming language R. R is an extremely powerful lan-
guage. It is very similar to the commercial S language and, to some degree, also comparable
to Matlab. However, contrary to these languages, which are rather expensive, R is freeware.
R is developed under the GNU license. This means that basically everyone can contribute to
developing R. Compared to more established languages such as C++ and FORTRAN, R is much
more user-friendly and allows for a steep learning curve. R is widely used in academia and has
also become very popular in the financial industry.
To get started, you first need to install R on you computer. Download R by choosing the
appropriate package for your operating system.2 The installation process is pretty straightfor-
ward. Run R by selecting the icon in the menu. When you launch R it will open the R console.
This is the area where you can type in your comments.
Consider R as a calculator. You can type in a command after the prompt “>” and execute
the command by pressing enter/return. What happens is that the command will be evaluated
and the result will be printed out to the console.3
10
## [1] 10
The [1] indicates that the number on the right refers to the first number in the output. While
useless in this case it may be more informative when printing more numbers. Of course, you
can also compute more “complicated” expressions such as:
10 * 2 + 4
## [1] 24
Spaces between operators and/or numbers are not mandatory. They just shall make the
code more readable. We usually do not use the console like this because whenever you need
to redo a calculation you need to type in all the expressions in exactly the same order again.
This is cumbersome. Instead, we write R scripts which is a set of R commands. Basically, any
text editor is appropriate for this purpose. For example, you could use Notepad or Wordpad. A
typical workflow then looks like this:
2. Select all.
Page 6
Nils Friewald FIE 450en
Figure 1: RStudio
• Again, the top left window that you have just opened is the editor window (or script
window). You will write your collection of commands into this window.
• The bottom left window is the console window (or command window). You can either
write R commands directly into this window or let RStudio execute your script in the
4
https://www.rstudio.com/products/rstudio/download/
5
As you will later see I am not using RStudio. Instead, I am using emacs, which is my favorite editor. However,
emacs has a ton of shortcuts and without remembering at least some of them emacs is almost useless.
Page 7
Nils Friewald FIE 450en
editor window. You do so by pressing Code → Run Region → Run All or by pressing the
corresponding button.
Equipped with an IDE we are now ready to start writing programs. Let’s begin with the
famous “Hello world!” example, although it does not provide much insight. But basically every
programming tutorial or book starts with this trivial example. Thus, we follow this unwritten
rule and do the same. Write the following code fragment into the script window and then
execute it as described above. cat
cat("Hello World!")
The command cat is a function that prints its argument (the one enclosed by the parentheses)
to the console. That’s it. I don’t explain RStudio any further. There are many more things you
can do but I will leave it up to you to explore other functionalities. You are now familiar with
the main use case, that is, to write an R script and run the script. Let’s continue with more
interesting stuff.
Page 8
Nils Friewald FIE 450en
An apparent route to follow is to download the index and measure its past return and
volatility. The index is available on the webpage of the Oslo Stock Exchange.6 Download all
index prices available since first launched and save the file as “OBX.xlsx” at a location where
you will find it again.7 Open the file with Excel. What we see is the daily last, high and
low prices as well as the turnover. You may have noticed that the data is given in inverse
chronological order, that is the most recent observation is on the top. Of course, we could now
easily do all the calculations in Excel but since this course is about programming we will do it
with R instead. Thus, we first convert the file into a text format. We do so by exporting the file
into a CSV file first. Click on File → Save as and then choose “CSV UTF-8 (Comma delimited)
(*.csv)”. This format can be read into R easily. Save the file as “OBX.csv”. Note, however, that
the exact format of your CSV file depends on your operating system, system settings as well as
the general settings in Excel. This is important to keep in mind when loading the file into R
because we need to tell R how the format looks like. Thus, the following description will most
likely not apply to everyone.
read.csv is a pre-defined function in R. As its name suggests it reads CSV files which are
just data files in text format. The function needs at least one argument which is the name of
the file (including its path if necessary). Arguments are always enclosed in parentheses after
the function name. The data from the file is read in and immediately assigned to a variable.
There are two assignment operators in R, both of them are equivalent: <- and = . There <-
are some rules to follow in choosing variable names. First, variable names are case sensitive, =
that is, obx is different to OBX. Second, do not use special characters with the exception being
“.” and “_”. Third, you are allowed to use numbers but not at the beginning of the variable
name. Once you have created a variable it remains accessible in the R environment. To be more
precise, all entities that R creates and manipulates are known as objects. The specific type of
this object in our case is a data frame. A data frame is comparable to a single spreadsheet in
6
https://www.oslobors.no/ob_eng/markedsaktivitet/#/details/OBX.OSE/overview. Please note, if you
click this link you are going to be redirected to EURONEXT which now maintains the OBX index.
7
Since EURONEXT has been in charge of publishing the index the description that follows is not valid
anymore. The reason is that the data is structured differently, e.g., EURONEXT only publishes daily price
information. Still, the description that follows describes the old data that was available through the Oslo Stock
Exchange.
Page 9
Nils Friewald FIE 450en
Excel. There is one important exception. All values of a column must be of the same data type,
be it characters, a number or something else. You can continue working with the variable and
even override it. By the way, if we would not have assigned the results to a variable in the
above case the results would have be printed to the console instead. If the data is big it may
take a while until all the data is printed. So be careful.
We now want to verify whether the import of the file was done correctly, for example,
whether each column was indeed interpreted as a separate column. We can use the function
head which displays the first six rows of the data: head
head(obx)
If you only want to print the first three rows you would instead need to write:
head(obx, 3)
Anyway, in my case the data appears to be imported correctly. How about yours? Does
your data only show up with one single column? This may happen if the columns in your CSV
file are not separated by a comma (“,”) but by another character. For example, columns could
have been separated, e.g., by semicolons or pipes instead. Recall that the export properties of
Excel depend on your specific settings. We need to tell the function how to interpret your CSV
file. Let’s consult the help page for further instructions. We write:
?read.csv
The help page provides a lengthy description about what the function is doing, its usage, the
arguments needed, the return value, references to other related functions, and some examples.
These help pages are extremely useful. From the Usage section we see, for example, that the
first argument is always the filename. The second argument tells the function whether there is
a header (that is column names) in the CSV file. This argument has a default value by saying
header=TRUE . The third argument describes how columns of the CSV files are separated. By
default the function assumes that they are separated by a comma (“,”). The fourth argument
describes how vector of characters (i.e. strings) are quoted in the text file. The fifth argument
tells the function what character to use for decimal points and so on. Suppose you want call
the function with separator argument set to a comma, then you would need to write:
Page 10
Nils Friewald FIE 450en
But why do we need to add sep= to define the separator but no file= to provide the
filename? This is indeed a good question. This is a so-called named argument. To better
understand this concept let’s go back to the help page. As you notice we can provide several
arguments to the functions. Without providing the argument name R simply matches the list
of arguments one by one according to the function definition in the help page. So by writing
R would interpret the first argument as the filename and the second as the header. This call
fails because the argument for the header needs to be of a different type. R throws an error in
this case. To make a correct function call we would need to write:
But this is cumbersome because we do not really need to provide the second argument.
Again, it is set by default to TRUE , a data type that we discuss later. So we prefer to name all
arguments instead which allows us to omit those arguments that we do not really need because
they are set to an appropriate default value. You could also name the first argument with
file= but this is not needed here because the first argument refers to the filename anyways.
So there cannot be any confusion when not providing the argument name. Let’s move on. In
order to get an idea of how large the data set is we can use the following command to get its
dimension: dim
dim(obx)
## [1] 5532 5
This function returns, first, the number of rows and, second, the number of columns. It is
always given exactly in this order! Alright, the numbers seem quite plausible. Next we will
have a closer look at the data.
obx[1, 2]
## [1] 765.94
Apparently, the first argument in the [ operator refers to the row index and the second
to the column index. Note, that the white space in this command is not a requirement. It
Page 11
Nils Friewald FIE 450en
just makes your script more readable. Anyway, the command displays a number. You may
think, sure, it’s a price what else should we expect. However, if the numbers in your text file,
for example, use commas (“,”) instead of a period (“.”) as a decimal point it would not be
interpreted as a number. To see this, just have a look at the column with the turnover, which
is column five.
obx[1, 5]
## [1] "3,834,616,180.00"
This column does not contain numbers (although it looks like) but instead is of type factor,
a data type that we also will discuss later. Why does this happen? Well, the turnover uses
a comma to separate thousands. So why are they not interpreted as separate columns then?
Well, Excel further encloses the turnover with double quotes.8 Note, there is no direct way in
R to interpret the turnover column correctly. You would need to go back to Excel, change the
format of that column and then export the file again.
How do we show the complete first row instead of just one entry? Well, just leave out the
second argument in the above command.
obx[1, ]
This is the first row of our data frame as indicated by “1” on the very left. In fact “1” is a row
name and does not necessarily be a number but can also be a string. In our case, read.csv
automatically assigned row names to our data set. We also observe that the first row of the
text file was (correctly) interpreted as containing the column headers.
So far we have only used indices to access columns. Usually, it makes the code more readable
when we use the names of the columns instead. For example, to print the first element of the
column Last we may use the following lines of codes, all of them are equivalent: $
[[
obx[1, "Last"]
## [1] 765.94
obx$Last[1]
## [1] 765.94
obx[["Last"]][1]
## [1] 765.94
Whew, this looks complicated. Let’s discuss each line of code separately. The first is very similar
to what we have done so far. The only thing that has changed was that we use the column
8
You see this by opening the CSV file with a text editor, e.g. RStudio.
Page 12
Nils Friewald FIE 450en
name instead of the column index. The second line of code makes use of the list operator $
which directly addresses one particular column. (In fact, a data frame is similar to a list which
we will cover later.) After we have selected the entire column we select the first element of
that vector. The third line of code is exactly the same as the one before. We just use the [[
operator instead of the $ operator to access the desired column.
Suppose we now want to access the first three elements of column Last . Instead of providing
a single row index, we provide a vector of indices. To create a vector we use the function c .
This is how we create a simple vector: c
c(1, 2, 3)
## [1] 1 2 3
To access the first three elements of column Last we write, for example:
obx$Last[c(1, 2, 3)]
However, if you would like to display the first 50 rows the previous approach will be too
cumbersome. An easy way out of this problem is to use the operator : instead. This operator
creates a vector (or sequence) of numbers. :
1:50
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50
The above command creates a vector between 1 and 50. Further, you see that every line
starts with a number given in square brackets. This number refers to the index of the number
given to the right of it. (In our example, this information is useless because the vector that we
have creates itself refer to the index.) To combine everything (with the output hidden):
obx[1:50, ]
Finally, what about displaying the last three rows of the data set? We use the function
nrow to determine the number of rows in our data frame. nrow
obx[(nrow(obx) - 2):nrow(obx), ]
An easier way to obtain the same result is to use the function tail with the appropriate
arguments. tail
Page 13
Nils Friewald FIE 450en
tail(obx, n = 3)
As you can see from the output, whenever there is data missing we get NA for “not available”.
This is a special data type in R that we will repeatedly encounter when working with data.
You may have noticed that the function call differs from the previous example. It is now on
the left hand side of the assignment operator. This needs some explanation. Recall that obx is
an object. An object may have attributes which can be altered. In our case we alter the column
names of obx by assigning a vector of new names. To create a vector we use the function c .
You can also use names directly to check what the column names are. The function returns
you a vector of strings.
names(obx)
obx[1, 1]
## [1] "12.01.18"
As this output shows R did not convert the column to type Date. (In fact, it was converted to
type factor but, again, more on this later.) Let’s do the conversion using the function as.Date .
We see from one of the previous outputs that the first part refers to the day, the second to the
month and the third to the two-digit year. Note, that this does not necessarily apply in your
case! You may have a totally different date format in your text file, depending on your system
settings. To convert the column into a Date object we use: as.Date
Page 14
Nils Friewald FIE 450en
The first argument to the function is the date column (or vector) that we would like to
convert into Date object. The second tells the function how it needs to interpret the content.
%m refers to the month, %d to the day and %y to the (two-digit)year. We also tell the function
that the components are separated by a period (“.”). If your date is formatted differently you
need to apply another conversion specification. You can get a complete list of all available
specifications by calling the help page of ?strftime .9 So far so good. Again we need to add
format= so that R understands what the second argument stands for. Look at the help page.
From the Usage section we see that the first argument is always the object that we intend to
convert. The second argument can by either format or origin but both have totally different
meanings. So R cannot know which of the two arguments we provide unless we name them.
Finally, we write the result back to the same column, that is, we overwrite its previous content.
Now let’s check whether it was correctly converted:
obx[1, 1]
## [1] "2018-01-12"
This seems to be correct. To be really sure that the column is now indeed of type Date use
the function class . class
class(obx[, 1])
## [1] "Date"
By the way, Date objects are internally stored as integer numbers which refer to the number
of days past since 1970-01-01. So if you add 1 to a Date object it returns the next following
date. For example:
as.Date("2018-01-02") + 1
## [1] "2018-01-03"
Next we need to sort the data in chronological order for which we use the function order .
This function returns the indices of an unsorted vector that helps to sort the vector. A little
example will demonstrate its usage. order
## [1] 4 2 3 1
v[i]
## [1] 10 30 40 50
9
Alternatively, you can access the information online: https://stat.ethz.ch/R-manual/R-devel/library/
base/html/strptime.html
Page 15
Nils Friewald FIE 450en
Since we do not use the columns Low , High , and Turnover we remove these columns
from our data frame. We do this by selecting the columns that shall remain:
So far we have only printed (part of) the data which gives us a first impression on how it
looks like. However, to look at the complete time series of our index it is probably better to
plot it. plot
400
300
200
100
Page 16
Nils Friewald FIE 450en
The first argument of the plot function are the x coordinates, and the second the y
coordinates. The argument type="l" tells the function to draw lines (instead of just points).
The arguments xlab and ylab define the labels of both axes.
In order to estimate the index characteristics we first need to compute returns. We use log
returns for this purpose, that is
log
obx$r = c(NA, diff(log(obx$Last)))
The arithmetic function log computes the natural logarithm of its argument. Since we
provide the complete column vector the function computes the log of the number in that vector.
The diff takes the differences of the vector of numbers. For example:
diff(c(1, 2, 4, 8))
## [1] 1 2 4
Since we take differences we loose one observation. To end up with a vector of the same length
as the initial price vector we add an NA at the beginning of the vector. This makes sense
because there is no return available at t = 0. The next step is to add a new column r to
our data frame. We now have done all the necessary steps and are ready to compute some
descriptive statistics.
Before we do so we save our data frame to a file because for very large CSV files it may take
a while to import the data which is not what we want. Given that we have imported the CSV
file correctly we can save it as an R data file, which uses compression and will thus not be a
text file anymore. The R data files typically have “RData” as an extension (suffix). save
Let’s remove obx from the environment and then load the data in again but this time from
the R data file. Of course, you won’t notice any speed advantage here because the file is so tiny. rm
load
rm(obx)
load("OBX.RData")
head(obx)
## Date Last r
## 5532 1995-12-29 NA NA
## 5531 1996-01-02 77.9852 NA
## 5530 1996-01-03 79.7573 0.022469208
## 5529 1996-01-04 79.7102 -0.000590716
## 5528 1996-01-05 79.4104 -0.003768215
## 5527 1996-01-08 80.3380 0.011613392
Kahoot!
1
Page 17
Nils Friewald FIE 450en
sd
mean(obx$r)
## [1] NA
sd(obx$r)
## [1] NA
Ups, what has happened? Recall that we have missing values ( NA ) in the returns. Thus, we
will also have NA s in the returns. If there is just one single NA in the vector, both functions
(and many others) will return NA . That shall make you aware of that there is something wrong
here. Well, in our case we know what is going on and enforce the calculation by providing the
appropriate argument:
## [1] 0.0004131256
## [1] 0.01492927
Alright, we now have the result. But the return and volatility look very small. Don’t forget
we calculated both on daily data and, thus, we have a daily expected return and volatility. We
need to scale them to get the results on an annual basis. Daily returns are scaled by multiplying
it with the number of trading days in a year, while the volatility is scaled by the square root of
the number of days. Do you remember why? sqrt
## [1] 0.1032814
sigma
## [1] 0.2360525
mu/sigma
## [1] 0.4375357
Page 18
Nils Friewald FIE 450en
A Sharpe ratio like this is not too bad. But beware, this is an ex-post figure and not an ex-ante
measure.
While the volatility can be estimated reasonably well it is much harder to get a precise
estimate for the expected return if only a limited number of observations is available (e.g., less
than 15 years). Let’s verify this. How good is our estimate?
For this we make use of an important theorem in statistics. It is the elementary central limit
theorem that states that if the n observations of random variables r1 , r2 , . . . , rn are independent
and identically distributed with expectation µ and variance σ 2 then the sample mean
n
1X
rn = ri = µ̂ (2)
n
i=1
is an estimate of the true mean µ and satisfies
σ
(3)
n→∞
µ̂ ⇒ N µ, √
n
√
Because the term σ/ n is so important it is called the standard error (SE). Knowing the
distribution of the sample mean (in the limit) is an extremely powerful result. This allows
us to make the following inference. What is the probability p that the true value µ lies in the
confidence interval [µ̂−z ·SE; µ̂+z ·SE] with z being the corresponding quantile of the standard
normal distribution?
To compute the interval we need to determine SE and z for a given confidence level p. SE
can be calculated once we know σ. However, the parameter σ would typically be unknown in a
setting in which µ is also unknown. There is a way out of this dilemma. We can easily estimate
the standard deviation:
v
u n
u 1 X
σ̂ = t (ri − r̂n )2 (4)
n−1
i=1
For brevity we now define the variables r and n and compute SE na.omit
length
r <- na.omit(obx$r)
n <- length(r)
SE <- sd(r)/sqrt(n) * 250
SE
## [1] 0.05018987
The function na.omit takes a vector and returns again a vector with all NA being removed
first. The function length determines the length of a vector, the number of elements it
comprises of. We annualize the standard error because we want to make inferences on the
annualized return. Be careful, the standard error is annualized by multiplying the daily standard
error by 250.
Now we need to determine p. From statistics we know that for a standard normal random
variable Z we have
1−p
Prob(Z ≤ (−z)) ≡ Φ(−z) = (5)
2
Page 19
Nils Friewald FIE 450en
with Φ being the standard normal distribution function. See Figure 4.3. Taking the inverse we
get
1−p
z = −Φ−1 (6)
2
The (standard) normal distribution function Φ in R is called pnorm and its inverse function
Φ−1 is qnorm . Try to reproduce my hand-made plot using the the normal density function
dnorm ! Let’s assume a confidence probability of p = 0.99 then dnorm
qnorm
p <- 0.99
z <- -qnorm((1 - p)/2)
z
## [1] 2.575829
## [1] -0.02599912
## [1] 0.2325619
Page 20
Nils Friewald FIE 450en
The bottom line is that the mean estimate is still not very precise and can very tremendously
despite the long sample period of 22 years!
2.5 Exercises
1. Repeat the excercise in this section with return observations starting in 2010. What is
the 99.9% confidence interval?
2. Download the complete history of monthly stock prices in USD for IBM (Ticker: IBM) from
finance.yahoo.com.10 Plot the complete time-series of stock prices. Compute simple and
log returns. Based on both series compute an estimate for the expected return. Which
one is larger? Why? Further compute the 1% quantiles
3. Download the complete history of daily and monthly stock prices in USD for Microsoft
(Ticker: MSFT) from finance.yahoo.com. For both sampling frequencies compute the
annualized means, standard deviations, variances, and 99% confidence intervals. Compare
the results.
Kahoot!
10 2
Historical data can be downloaded by clicking on the tab Historical Data.
Page 21
Nils Friewald FIE 450en
3 Volatility
First, plot the returns and analyze its pattern. What do you see?
−0.05
−0.10
Page 22
Nils Friewald FIE 450en
• Remember, you are not so much interested in past volatility but in future volatility. You
need to predict volatility!
Let’s define the volatility σt of a market variable on day t as estimated at the end of day t − 1.
The square of the volatility is referred to as the variance, σt2 . Recall the definition of the
unbiased sample variance of a series of n return observations rt−n , . . ., rt−2 , rt−1 is
n
1 X
σt2 = (rt−i − r)2 (7)
n−1
i=1
• n − 1 is replaced by n.
Thus we get
n
1X 2
σt2 = rt−i . (8)
n
i=1
Every return observation contributes equally to the variance. That is, the one back in 1996
has the same meaning as the one yesterday. To estimate the current level of volatility it is much
more sensible to put more weight to more recent observations. A more general model for the
variance is
n
(9)
X
σt2 = 2
αi rt−i
i=1
with αi being a weight for return observation on day ti . If we choose αi−1 > αi then less weight
is given to older observations. The weights must sum to unity. Let’s discuss some specific
models.
σt2 = λσt−1
2 2
+ (1 − λ)rt−1 . (10)
The estimate of the volatility σt made at the end of day t − 1 is a weighted combination of
the estimate σt−1 and the return realization rt−1 . If we substitute for σt−1 and continue that
for all observations n we get
n
(11)
X
σt2 = (1 − λ) λi−1 rt−i
2
+ λn σt−n
2
.
i=1
Page 23
Nils Friewald FIE 450en
n
(12)
X
σt2 = (1 − λ) λi−1 rt−i
2
.
i=1
• σt is the volatility estimate for day t given all information up (and including) day t − 1.
It’s a forecast!
• The λ governs how responsive the estimate of the daily volatility is to the most recent
return realization.
• A low λ gives a great deal of weight to recent return observations whereas a high λ
produces estimates that respond slowly to new information.
• In practice λ is often set 0.94 as suggested by RiskMetrics but it can also be estimated by
maximum likelihood (more on this later).
Now we are ready to implement Eq. (12). Let’s assume that λ = 0.94. We first create a
vector lambda.vec and then multiply that one with the corresponding squared return series
r.vec . Let’s do this step by step: sum
n <- length(r)
lambda <- 0.94
lambda.vec <- lambda^(0:(n - 1))
r.vec <- r[length(r):1]
sigma2 <- (1 - lambda) * sum(lambda.vec * r.vec^2)
sigma <- sqrt(sigma2)
sigma * sqrt(250)
## [1] 0.09773794
Given a vector of n return observations rt−n , . . . , rt−1 this is the predicted (annualized)
volatility for day t based on the EWMA model. Note that this volatility is way lower than the
sample standard deviation that we computed earlier which was:
sd(r.vec) * sqrt(250)
## [1] 0.2360525
## [1] 0.09773794
Page 24
Nils Friewald FIE 450en
Instead of selecting the most recent return observations first, the second most recent next and
so on, we simply revert the complete return vector using the function rev .
As an exercise we now compute and plot the “historical” EWMA volatilities using an ex-
panding window. That is, each day, we use all past observations available for our calculation.
To put it differently, we “loop” over the entire return series. We thus first need to introduce
the for statement. How does that work? A little example will demonstrate. for
v <- 1:5
for (i in v) {
cat(i, "\n")
}
## 1
## 2
## 3
## 4
## 5
The variable i will “run” through the vector v taking each value one by one. In each
cycle we print the value of the counter variable i to the console. Recall that cat prints the
value of a variable. The “\n” is a special character that makes a new line. Putting everything
together we now compute the historical EWMA volatilities like this:
Note that we first initialize the variance vector sigma2 , that is we create the variable but
do not write anything into the vector. It is of length zero. We then sequentially add variance
estimates to sigma2 by going forward in time. How does the resulting time-series of volatilities
look like?
Page 25
Nils Friewald FIE 450en
0.8
0.6
EWMA
0.4
0.2
Are the EWMA volatility estimates at the beginning of the sample period good forecasts?
Why? Why not? The EWMA volatility estimate of the total sample period is a reasonable risk
measure that we could report to our client. It is a better short-term predictor than the sample
standard deviation.
Page 26
Nils Friewald FIE 450en
What is the likelihood of observing rt if we knew its volatility σ (or equivalently its variance
σ 2 )? Well, since we assume returns to be normally distributed it is simply defined by its
probability density function. From standard statistics textbook we know that density of r = rt
is given by
1 1 rt 2
(13)
√ e− 2 σ ,
2πσ 2
assuming µ is zero. What is the likelihood of observing all n observations together? Again,
we know from statistics that if returns are independent then the likelihood of observing the n
observations is just the product of the probability densities of all individual observations. The
so-called likelihood function is:
n r 2
1 − 12 t−i
(14)
Y
√ e . σ
We maximize this function by differentiating with respect to σ 2 (not w.r.t σ) and setting the
final expression to zero.
n
n 1 X 2
− 2+ 4 rt−i = 0 (16)
σ σ
i=1
The variance that maximizes the log likelihood function is given by:
n
1X 2
σ2 = rt−i (17)
n
i=1
Of course, this expression looks familiar to us. The only difference is the 1/n versus the
1/(n − 1) that we find in the equation of the sample standard deviation. In this example we
applied the log likelihood method and found an analytical solution for the variance σ 2 . Note
that for more complex cases we perhaps may not get a closed-form expression for the parameter
in question and, thus, we need to resort on a numerical approach. This is what we do next with
regard to the EWMA model.
In what follows we use the same method to derive a solution numerically for the EWMA
volatility. The only difference to the example above is that the variance is assumed to be time-
varying, that is, we have σt2 . The log likelihood function (after some simplifications) is given
by
Page 27
Nils Friewald FIE 450en
n
r2
(18)
X
2
− log(σt−i ) − t−i
2
i=1
σt−i
with the expression for the EWMA variance given in Eq. (12).
Again, we want to maximize the log likelihood function numerically. How can we do this in
R? For this we first need to discuss the following topics:
• How to optimize in R?
3.1.2 Functions in R
Let’s first define a function div in R that divides a number a by b. This is how you do it: function
return
div <- function(a, b) {
result <- a/b
return(result)
}
We define the function div using the command function . For the function name the same
rules apply as for any other variable name. In parentheses we tell R what arguments the func-
tions requires. In our example we have the arguments a and b which are both being processed
inside the function body. Note that we do not necessarily need to write return(result) , in-
stead we could just write result or a/b . But it is good programming style since it makes
immediately clear what the function returns. After having defined the function we can enter its
name.
div
## function(a, b) {
## result <- a/b
## return(result)
## }
If we just enter the function name then R returns the definition of the function that is,
it does not call the function. We know, in order to call the function we need to provide its
arguments:
div(4, 2)
## [1] 2
div(2, 4)
## [1] 0.5
Page 28
Nils Friewald FIE 450en
As you see, it is highly important (in this case) to provide the arguments in the right order.
Thus, you need to know how your function is defined. If you use internal functions or functions
provided by someone else make sure that you know how to run it. Check the help page!
Nevertheless we have some flexibility in calling the function. This brings us back to the “named
argument” concept. Both calls return the same result:
div(4, 2)
## [1] 2
div(b = 2, a = 4)
## [1] 2
In the first example R interprets the arguments as given in the function definition. In the second
example we tell R how to interpret the arguments and thus we do not strictly need to follow the
function definition in terms of the sequence of the arguments. This gives us some flexibility.
Sometimes you want to have default values for some of the arguments so that you do not
necessarily need to provide them for each function call. Suppose for the moment, that you may
want to divide a number a by 2 in most of the cases. You could define a function:
Now you have the option whether to provide the denominator or not:
div2(4)
## [1] 2
div2(4, 3)
## [1] 1.333333
There is one more important thing to mention about functions. A frequent requirement is
to allow one function to pass on argument settings to another nested function. This can be done
by including an extra argument, literally ... , to the function, which may then be passed on.
Suppose we want to write a function plot.normal that plots the normal density with mean
mu and volatility sigma from x1 to x2 . This is one way we could do it: seq
Page 29
Nils Friewald FIE 450en
Note that the previous function makes use of the function seq which creates a vector of values
starting with x1 and stopping at x2 with increments given by by . We can also define
the function above using the ... argument which does just the same but shorter and more
...
versatile.
But what is the difference? We did not specify the properties of the normal density in plot.normal .
Using the ... argument in our function specification allows us to pass on further arguments
in the function call. These arguments are then used in the function body. In our case they are
provided to the function dnorm . There they are interpreted as the mean and the volatility.
Check the dnorm help page that this is indeed true.
3.1.3 Optimization in R
Now let’s do some optimization. I will demonstrate the optimization procedure with a little
example. Assume we want to find the maximum of a function y = f (x) = −(x − 2)2 which
looks like this:
Page 30
Nils Friewald FIE 450en
0
−2
−4
y
−6
−8
−1 0 1 2 3 4 5
It is quite obvious from the figure as well as from the equation that the maximum must be at
x = 2. However, we would like to find it numerically using optimization. The optimization
problem formally is given by:
We maximize the objective function f by varying x. Let’s define the function f in R first:
f <- function(x) {
y <- -(x - 2)^2
return(y)
}
One way to search for x is to do by trial and error so that f (x) is maximized:
f(1.5)
## [1] -0.25
f(2)
## [1] 0
Page 31
Nils Friewald FIE 450en
f(2.5)
## [1] -0.25
Of course, this may be too cumbersome if we have a very complex function and no idea where
the optimal solution is. Thus we use an optimizer similar to the Excel Solver. There are many
optimizer available in R, each to be used for a specific need. For example, some optimizers
are designed for linear some for non-linear problems. Some optimizers allow to specify bound-
aries, some do not. We are going to use nlm here. Note, however, that this function carries
out a minimization instead of a maximization. This is no problem because we can redefine
our objective function. Maximizing f (x) is the same as minimizing −f (x). Let’s redefine f
accordingly.
f <- function(x) {
y <- (x - 2)^2
return(y)
}
Then apply the optimization and assign the result to a variable res: nlm
## $minimum
## [1] 0
##
## $estimate
## [1] 2
##
## $gradient
## [1] 0
##
## $code
## [1] 1
##
## $iterations
## [1] 2
The first argument of nlm is the name of the objective function which in our case is f . The
second argument is used to tell the solver where to start to search for the minimum. The
function returns a list which is a collection of variables of different types similar to a data frame.
The list contains the following components:
minimum
Value of the estimated minimum of f .
estimate
Point at which the minimum value of f is obtained.
Page 32
Nils Friewald FIE 450en
gradient
Gradient at the estimated minimum of f .
code
An integer indicating why the optimization process has been terminated. For example, “1”
indicates a probable solution.
iterations
Number of iterations performed.
Kahoot!
3
3.1.4 Searching for the optimal EWMA parameter
Now it is time to come back to our original problem. Remember we have the EWMA volatility
model that has one parameter which is λ. We want to estimate λ so that the model best reflects
our historical data. We need to:
1. Define a function ewma.var that computes the EWMA variance following Eq. (12).
3. Specify an objective function which is the maximum likelihood function given in Eq. (18).
Let’s call this function ewma.ll.fun .
Let’s do this step-by-step. We start defining a function that computes the EWMA volatility
given the parameter λ.
It is always a good idea to comment what a function does, what arguments are needed and
what exactly the function returns. Now we use the previous function to compute the historical
EWMA variances.
Page 33
Nils Friewald FIE 450en
We then define the log likelihood function which serves as the objective function in our
optimization problem.
The first argument of the objective function is the parameter to be searched for, i.e. lambda .
This is important because every optimization routine in R assumes the first argument of the
objective function to be the unknown parameter. We then calculate a vector of historical
EWMA variances. Next we need to match the vector of forecasted and realized variances. That
is why we drop the last forecast (because we do not have a corresponding realization) and the
first realized variance (because we have no forecast right at the beginning of our sample period).
Note that we use negative indices to de-select values in a vector. Finally, we return the negative
of the likelihood function because almost all optimization routines in R minimize instead of
maximize a function.
We choose to use nlminb for this purpose because contrary to nlm it allows to constrain
the parameters. This makes sense here because λ is only defined between 0 and 1. nlminb
The first parameter shall be a sensible starting value for λ, i.e. the objective function needs
to be defined there. Further note that contrary to nlminb the first parameter for nlm was
the objective function. The second argument is the objective function (the negative of the
Page 34
Nils Friewald FIE 450en
likelihood function). The variable lower is the lower bound of λ. Here we do not choose zero
because zero is not a valid value for the objective function. Try it. The variable upper is the
upper bound of λ. The same applies as for the lower bound restriction. Finally, r is the vector
of all return observations. This is the ... argument in the nlminb function definition. So
nlminb just passes on the variable to our objective function ewma.ll.fun . Now let’s look at
the results. The function returns an object of type list:
res
## $par
## [1] 0.9231018
##
## $objective
## [1] -43123.89
##
## $convergence
## [1] 0
##
## $iterations
## [1] 7
##
## $evaluations
## function gradient
## 11 9
##
## $message
## [1] "relative convergence (4)"
par:
The optimal λ that maximizes the log-likelihood or (equivalently) minimizes the objective
function. It is almost the value as suggested by RiskMetrics!
objective:
Value of the objective function.
convergence:
Indicates the optimization routine has converged (see help page).
iterations:
Number of iterations needed until it is considered to have converged.
Again, the parameter λ that we have estimated is very close to the one recommended by
RiskMetrics. You can even test different length of the sample period. The parameter does not
change dramatically. What is the EWMA volatility forecast given λ = 0.92?
Page 35
Nils Friewald FIE 450en
## [1] 0.09773794
## [1] 0.09517522
The difference is negligible. Again, the EWMA estimate would be a reasonable estimate for
today’s volatility of the OBX index. However, to impress your client even further you look for
a more advanced variance model which we discuss in the next section.
γ+α+β =1 (21)
We see that the variance is calculated from a long-run variance rate VL as well as from σt−1
2 and
rt−1 . In this respect it is similar to the EWMA model with a small but important extension.
2
The “(1,1)” indicates that the forecast σt2 depends on the most recent squared return observation
of rt−1
2 and the most recent estimate of the variance rate σt−1
2 . Setting ω ≡ γV we get
L
σt2 = ω + αrt−1
2 2
+ βσt−1 (22)
Note that for a stable GARCH(1,1) model we need that α + β < 1 because otherwise we would
have a negative weight on the long-term variance.
The GARCH(1,1) model recognizes that over the time the variance tends to get pulled back
to its long-run mean VL . Thus, the GARCH process follows a mean-reverting process whereas
the EWMA model does not incorporate mean-reversion. This is the important difference which
makes the GARCH model more appealing than the EWMA model.
Moreover, in contrast to the EWMA model, there are no standard parameters that can be
used. We need to estimate them using maximum likelihood. The log likelihood function is the
same as for the EWMA model. Recall that it is given by
n
r2
(23)
X
2
− log(σt−i ) − t−i
2 .
i=1
σ t−i
The only difference, of course, is how the σt2 is defined. Analogously to the EWMA model we
first define a function garch.var that computes the GARCH variance following Eq. (22):
Page 36
Nils Friewald FIE 450en
This needs some explanation. The formula for the GARCH variance is a recursive formula, that
is, it depends on previous variance estimates.
t−n
First return observation in our sample We cannot make any forecast at t − n because we do
not have any information before that date.
t−n+1
We can still not apply our recursive formula given by Eq. (22) because we do not know the
forecast at σt−n
2 . However, at some point we need to start. We thus make the following
t−n+2
From now on we can recursively apply Eq. (22) using the for statement.
We now define the log likelihood function to estimate the GARCH model:
Page 37
Nils Friewald FIE 450en
Again, the first argument of the objective function must always be the parameter that we are
looking for. In this case, however, we have three parameters (i.e. ω, α, and β). This is no
problem because we can put them into one single vector called par that shall be the first
argument of the function. We are now ready to do the optimization.
What do we find?
res
## $par
## [1] 3.253296e-06 1.101613e-01 8.744163e-01
##
## $objective
## [1] -43204.7
##
## $convergence
## [1] 0
##
## $iterations
## [1] 26
##
## $evaluations
## function gradient
## 63 94
##
## $message
## [1] "relative convergence (4)"
## [1] 0.0002109465
sqrt(VL * 250)
## [1] 0.2296446
This is nearly the same as the unconditional standard deviation that we have estimated earlier.
Page 38
Nils Friewald FIE 450en
2
Et−1 [σt+m ] = σt2 (24)
To put it differently, although we weight recent return observations more heavily than past
observations, the daily volatility forecast remains constant.
What about the GARCH model? Here we need to be more careful because the GARCH
model was specified as a mean-reverting process. This implies that the volatility is reverting
to its long-run mean VL . For example, if the volatility today is above its long-run mean we
“expect” the future volatility to be pulled back to its long-run mean. The opposite is true if its
below its long-run mean. Formally, the expected volatility m + 1 days ahead is
2
Et−1 [σt+m ] = VL + (α + β)m (σt2 − VL ). (25)
This equation tells us that our prediction for future volatility is not constant compared with
the EWMA volatility model or if we were to use the simple sample standard deviation. We get
a term-structure of volatilities.
It’s time now to put everything together and make a forecast, for example, for the return
from t to t + m. I am going to use the following notation: rt→t+m . This return is not known by
now, its random, of course. It consists of all m + 1 daily (random) returns up to t + m, that is:
This is the standard rule you are familiar with for scaling variances (and volatilities)!
Again, for the GARCH model the forecasted variances are not constant but mean-reverting.
Thus we have
m
(29)
X
2
Var[rt→t+m ] = σt+i
i=0
where σt+i
2 is defined as in Eq. (25).
Let’s make a real forecast using both models. Suppose we want to make a one-year forecast.
For the EWMA model we get an annualized volatility of
Page 39
Nils Friewald FIE 450en
## [1] 0.09773794
Using the GARCH model we need to compute predictions for all 250 trading days.
m <- 249
garch.sigma2 <- garch.var(r, omega, alpha, beta)
garch.sigma2.t <- garch.sigma2[length(garch.sigma2)]
garch.sigma2.vec <- VL + (alpha + beta)^(0:m) * (garch.sigma2.t -
VL)
0.18
0.16
0.14
0.12
Days
Page 40
Nils Friewald FIE 450en
sqrt(sum(garch.sigma2.vec))
## [1] 0.2065874
How do you interpret both results. Why is the GARCH volatility higher than the EWMA
volatility? Kahoot!
4
3.4 Option-Implied Volatility End of
lecture 4
So far we have used historical data to estimate future volatility. An alternative is to use options
traded on the index. We know that the price of an option depends on the volatility of the
underlying asset (in our case it’s the OTX index). Given that we observe the price of the
option we can back out its implied volatility. This approach does not rely on historical data.
In fact, the volatility that we estimate is forward-looking because option prices incorporate the
expectation about the future risk of the underlying.
First, we need to get information about options traded on the OBX index. We get this
information from the Oslo Stock Exchange.11 There are several options traded. Which one to
choose? Well, that depends. In principal it shouldn’t matter because according to the Black-
Scholes model there is just one volatility that prices all options traded on a given underlying.
Recall that according to the model volatility is constant! We thus would like to have an option
that is traded frequently so that its price indeed reflects investors opinion about the future
prospects of the underlying index. The most liquid options are usually traded at-the-money,
that is, the strike is close to the current value of the index which trades at 766 as of January
24, 2018. Options with shorter expirations typically also have higher liquidity. We decide to
take a call but put options would also be fine. The following quotes are obtained from the Oslo
Stock Exchange on January 24, 2018:
Note how large the bid-ask spread of this option is. What does this mean? We take the
mid-quote of the prices.
We then use the Black-Scholes formula. If you can’t remember here is the formula again:
You should already be familiar with this famous equation. It will be more of a challenge to
transform these three equations into an executable R script. We will first define the parameters
11
Options on OBX: https://www.oslobors.no/ob_eng/markedsaktivitet/#/derivativeUnd/OBX.OSE
Page 41
Nils Friewald FIE 450en
as given above, that is, we assign the numbers to the variables. For the risk-free rf interest rate
we use the Nibor which should reflect the rate of return banks can borrow from each other. You
will get the Nibor rate for the corresponding maturity also from the Oslo Stock Exchange.12
As an approximation we use the 1-month Nibor rate. as.numeric
Again, it is good programming style to comment your script. You do this using the character
“#”. We compute the number of days between the valuation date and maturity by taking the
differences between those dates. Remember this returns you the number of days in between.
We then need to transform this value into a numeric using the command as.numeric . Don’t
forget to compute the year fraction of the time-to-maturity. Next, we define a function that
computes the price of a European call option. exp
Now we must define the objective function. But what do we really want to maximize or minimize
here? We do not have a maximum likelihood function as for the EWMA or GARCH models.
Actually, we don’t need one here. Note that we are looking for a target σ so when plugged into
the Black-Scholes equation returns us the observed market price of the Call option Cmarket .
The subscript reminds us that this is the market price. Thus we can write for the objective
function
which refers to the squared error between model and market price. We thus wish to minimize
this function. The optimization problem is:
Page 42
Nils Friewald FIE 450en
res
## $minimum
## [1] 3.642174e-15
##
## $estimate
## [1] 0.1270815
##
## $gradient
## [1] -3.920127e-08
##
## $code
## [1] 1
##
## $iterations
## [1] 4
We find that the option-implied volatility is above the EWMA volatility but still much lower
than the GARCH volatility estimate and the simple sample standard deviation.
Proud of having estimated four different measures of volatility you now go back to your
client. But what estimate do you advice him to use? It very much depends on the investment
horizon of your client. For short-term investments of less than three month, let’s say, your client
should use the option-implied volatility or the EWMA volatility. For “long-term” investments
your client should instead use the GARCH volatility estimate.
3.5 Exercises
1. Download the complete history of monthly stock prices in USD for Google (Ticker: GOOG)
from finance.yahoo.com. Estimate an EWMA model to forecast the 1-month volatility.
2. Download the complete history of monthly stock prices in USD for Google (Ticker: GOOG)
from finance.yahoo.com. Estimate a GARCH(1,1) model to forecast the 1-month volatil-
ity.
3. Go to finance.yahoo.com and search for call options on Google. Choose the shortest
maturity and compute the implied volatilities of a range of options with varying strike
prices. Assume a risk-free interest rate of 50 bp. Can you replicate the implied volatilities
that are given on the same page?
Page 43
Nils Friewald FIE 450en
4 Monte-Carlo simulation
WKN PR1LWN
Name Knock-out warrant call
Type Down-and-out call option
Underlying EUROSTOXX 50
Strike price 3220
Barrier level 3220
Selling date January 27, 2017
Selling price 0.84
Maturity April 27, 2017
Ratio 1:100
A down-and-out call option is a barrier option. The payoff depends on whether the price of
the underlying falls below a certain barrier level during its life time. A knock-out option ceases
to exist when the underlying falls below a certain barrier level.
Let’s make this a little bit more formal now. A down-and-out call option with barrier b,
strike K, and expiration T has the payoff
Page 44
Nils Friewald FIE 450en
Normally, we would try to solve the above equation in closed-form. That is, obtain an
equation where we plug in all the numbers to calculate the price of the option. However,
discretely monitored barrier options cannot be solved analytically. Therefore, one either needs to
make some simplifications or resort to a numerical approach. A widely used numerical approach
in finance is Monte-Carlo simulation. Although continuously monitored barrier options can be
priced in closed-form we still frequently rely on Monte-Carlo techniques. The reason is that
Monte-Carlo methods are straightforward to implement and are the most general technique for
the pricing of derivatives.
• Monte-Carlo simulation essentially uses the law of large numbers to evaluate the expecta-
tion E[X] of a random variable X.
• The law of large numbers states that the sample mean of a sequence of independent and
identically distributed random variables X1 , . . . , Xn , converges to the expected value E[X],
i.e.
n
1X
(38)
n→∞
Xn = Xi → E[X]
n
i=1
• The mean X n is an estimator of the expected value E[X] and is usually denoted by X̂n .
Let’s do a little example. If we roll a dice what is the expected roll? In this case it’s trivial
to determine. It’s simply the sum of the roll times the probability of its occurrence. More
formally the expected value of a roll X is
n
1 1
(39)
X
E[X] = Prob[X = xi ] · xi = 1 + . . . + 6 = 3.5.
6 6
i=1
So the answer is 3.5. But what if we did not know this? Alternatively and quite intuitively
you would start rolling the dice many times and write down the rolls. After many rounds you
take the mean of the rolls. You should end up close to the true value of 3.5. Thus we can
compute or estimate an expectation by simply simulating and then taking the mean of the
outcomes. This is a very important result. Let’s try it numerically using the computer. Since
the rolls are all equally likely to occure they are uniformly distributed. How can we sample
from such a distribution in R? sample
Page 45
Nils Friewald FIE 450en
n <- 1000
roll <- sample(1:6, n, replace = TRUE)
We have used the command sample to sample from a given set of elements. The elements are
a sequence of numbers between 1 and 6 corresponding to the outcome of rolling a dice. We
repeat this n times and tell the function to replace the outcomes. That means any roll can
occur again in the next trial. table
table(roll)
## roll
## 1 2 3 4 5 6
## 164 182 155 182 155 162
mean(roll)
## [1] 3.468
We use the function table to get a summary of all our drawings. It returns the number of
observations in our simulation. We see that they all occur with almost the same frequency. We
then compute the mean which intuitively is an estimate of the expected roll. We see it is fairly
close to the expected value.
We are now ready to apply the Monte-Carlo principle in a more general setting. In fact it
allows us to compute any expectation of a random variable. Note that the law of large numbers
is a numerical method for evaluating integrals, because
Z
E[X] ≡ x · f (x)dx (40)
with f being some density function of X. The law of large numbers can also be applied on a
function of a random variable, i.e. Y = g(X)
Z
E[Y ] = g(x)f (x)dx (41)
Look at the similarity of the general definition of the expected value and the one we have used
to compute the expected dice roll. The only difference is that in the dice example we have a
discrete distribution whereas the above equation refers to continuous distribution.
Page 46
Nils Friewald FIE 450en
Well, we first need a model for the dynamics of the index price which defines its distribution.
In the dice example above it was straightforward. The roll was uniformly distributed. Here we
follow the Black-Scholes model by assuming the dynamics to follow the stochastic differential
equation (SDE)
√
1 2
S(t + ∆t) = S(t) exp rf − σ ∆t + σ ∆t , (45)
2
where again refers to a standard normal random variable. It’s worth to remember this equation
because it’s so widely used in finance!
You may have noticed that our process is growing with the risk-free interest rate. Why is
that? Shouldn’t the expected return be much higher since we are exposed to risk? In fact, we
have estimated the expected return in the first case study but now we are simulating the price
process with the risk-free rate. Let’s look into this in more detail:
• In a risk-neutral world investors do not increase the rate of return beyond the risk-free
rate to be compensated for risk.
Page 47
Nils Friewald FIE 450en
• We know that risky cash flows (such as those from our option) need to be discounted by
the right rate. The higher the risk, the higher the discount rate. But what rate shall we
use exactly? In real world, everyone is differently risk averse.
• It turns out that once we know that we can replicate the option by the underlying and
a risk-free investment, risk preferences become unimportant. In fact, we are then in a
risk-neutral world which has two very important implications:
Now it’s time to simulate a few paths of our stock index. The only ingredient we should know
is the risk-free interest rate r and the volatility σ of the EURO STOXX 50. We could download
historical prices of the index and then use one of our previous models to estimate the volatility.
However, there is a much better and easier approach. We use the volatility index based on the
EURO STOXX 50 (VSTOXX). The VSTOXX indices are based on EURO STOXX 50 realtime
options prices and are designed to reflect the market expectations of near-term up to long-term
volatility by measuring the square root of the implied variance across all options of a given time
to expiration. Find the end-of day volatility of the index on January 27, 2017.13 We also look
for the end-of day price of the EURO STOXX 50 on the same webpage. Furthermore, we know
from the product specification the time-to-maturity, the strike price and the boundary level.
We define all these variables next. We assume the risk-free interest rate to be rf = 20 bp. We
are going to simulate n = 5000 scenarios (that is paths).
Next we have to decide on the time grid. We are going to simulate (as an approximation) the
index twice a day. Thus the time step is
Note that in doing so we discretely monitor the barrier option. That is, we only check twice a
day whether the stock index has fallen below the barrier. However, we still can simulate finer
intervals later on to see by how much it affects the price of the option. Now we are ready to
simulate an index path. Let’s define a function for this task. rnorm
13
VSTOXX: https://www.stoxx.com/index-details?symbol=V2TX
Page 48
Nils Friewald FIE 450en
The function above iterates over the time grid and simulates the price one time-step ahead
given today’s price. It then continues using tomorrow’s price to simulate the one for the day
after tomorrow. It stops when it reaches the time horizon T . Be careful here. You must not
simulate an entire path based solely on today’s price. Always take the most recent simulated
price. To make this more explicit look at the following figure.
Think carefully about why only the right figure can be correct. How does one specific simulated
path may look like? Here is an example: abline
Page 49
Nils Friewald FIE 450en
3350
3300
3250
3200
0 20 40 60 80 100 120
Index
End of
What is the value of the option in this case? We need many more scenarios to determine the lecture 5
expected value of the option. We thus define another function that simulates a set of n paths. cbind
Page 50
Nils Friewald FIE 450en
return(S)
}
This function uses the function simulate.path which returns a vector of stock index
prices. We then create a matrix object. A matrix is very similar to a data frame. The major
difference is that all its elements must be of the same type (for example, numerics). We use
the command cbind to append columns to another column or, more generally, a matrix. The
following example illustrates how this works:
v1 <- c(1, 2, 3)
v2 <- c(4, 5, 6)
cbind(v1, v2)
## v1 v2
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
If we need to start off with a matrix and then append columns one by one, we can write:
m <- c()
v1 <- c(1, 2, 3)
cbind(m, v1)
## v1
## [1,] 1
## [2,] 2
## [3,] 3
Of course, the arguments of cbind must either have the same number of elements or must be
NULL. Otherwise R will complain. By the way, a similar function exists to append rows to each
other: rbind . Now let’s simulate n = 5000 paths: rbind
...you may notice that this takes quite a while. And we would need many more paths to get a
reliable estimate for our option price as you will see later. Why is this so slow and what can
we do to increase speed? The reason why the code is so slow is that we have used for loops.
The for loop in the code above goes through each time step and each scenario one by one.
That makes R so slow. R is much more efficient if you allow it to process commands in a bulk.
How do we achieve this? Well, first we try to simulate a single path without using a for loop.
The way you do this is to draw the random variables all at once and then process them more
efficiently. For example:
Page 51
Nils Friewald FIE 450en
This piece of code makes use of the fact that part of Eq. (45) can be computed all at once and
with a single line of code. The result is a vector and assigned to the variable e . This variable
refers to the exponential function in Eq. (45). We then just need to compute the cumulative
product because
S1 = S0 · e1
S2 = S1 · e2 = S0 · e1 · e2
S3 = S2 · e3 = S0 · e1 · e2 · e3
··· = ··· (46)
The function to compute a cumulative product in R is called cumprod . (If you ever need to
compute the cumulative sum, then use cumsum .) cumprod
cumsum
S <- c(S0, S0 * cumprod(e))
We could use the previous code to make the function simulate.path more efficient. How-
ever, we then still need to do this n times, that is, for each scenario separately. Which is, of
course, an improvement but there is no reason to stop here. Why not just continue and do all
at once? matrix
Yuk yuk, that looks complicated. Let’s discuss the last line of code. We basically did the
same as before. The difference is that now we do not just draw m random variables but m · n.
We then compute the exponential expression in Eq. (45) as we did before. This would return
a vector of length m · n. However, we instead need it in a “spreadsheet” format, that is where
the rows refer to the time steps and the columns to the scenarios. We thus create a matrix that
shall have m rows and n columns. The function to define a matrix is matrix . Some examples
shall demonstrate how to use this function:
matrix(1:6, 2, 3)
The first argument refers to the elements the matrix shall consist of. The second argument to
the number of rows. The third argument to the number of columns. The elements are always
filled up columnwise by default, that is, first column then second column, etc. If you want to
have it filled up rowwise write:
Page 52
Nils Friewald FIE 450en
If you only provide two numbers as the first argument but the dimension of the matrix is larger
R will try to recycle the first two elements to fill up the matrix. This is an important principle
in R.
matrix(c(1, 2), 2, 3)
Coming back to our problem. Given our variable e we now need to compute the cumulative
product over each column. We do this using the function apply : apply
The first argument of the apply function is typically a matrix. The second argument which
is set to 2 means to apply the function given as the third argument columnwise. If we want to
apply something rowwise we would need to set the second argument to 1. See the helppage.
m <- matrix(1:6, 2, 3)
m
apply(m, 2, sum)
## [1] 3 7 11
apply(m, 1, sum)
## [1] 9 12
Page 53
Nils Friewald FIE 450en
The following code shall demonstrate how much faster the more efficient code is to the one
initially. We are going to simulate n = 5000 paths: system.time
set.seed
set.seed(1)
system.time(S1 <- simulate.paths(S0, rf, sigma,
dt, T, n=5000))
set.seed(1)
system.time(S2 <- simulate.paths.fast(S0, rf, sigma,
dt, T, n=5000))
The function system.time measures the time needed to execute the code provided as an
argument. set.seed initializes the random number generator. In doing so we ensure that
both functions use exactly the same sequence of random numbers and thus should yield the
same price paths. This allows us later to verify whether the two pieces of code indeed result in
the same option values. If we were not initializing the random number generator before calling
each function we will not know whether the final option values differs. They could be different
either because we use a different set of random numbers or because we computed it differently.
Anyways, the result above indicates that the simulate.paths.fast is much, much faster than
simulate.paths . The CPU time that has elapsed for calling simulate.paths is more than
100 quicker by calling simulate.paths.fast . This is a tremendous increase in speed. Thus
it’s worth spending some time to make code more efficient in R.
Next we want to verify whether the set of paths of both functions indeed yield the same
option price. We therefore define a function that determines the discounted payoffs for each
scenario. pmax
all
## Computes the option payoff for each scenario.
##
## S: Matrix of stock index prices with (m + 1) rows and n columns.
## K: Strike price
## b: Boundary level
## rf: risk-free interest rate
## f: Scaling factor for the payoff
## Returns a vector of discounted option payoffs.
Page 54
Nils Friewald FIE 450en
The previous code probably deserves some explanation. The first code in the body deter-
mines whether the index price of a given path is always above the boundary level. We again
make use of the function apply because we need to determine this condition for every column,
that is for each path. We first test whether S > b which returns a matrix of many TRUE s and
FALSE s. Recall this object is of type logical. The function all test whether the condition is
TRUE for the entire path. For example: TRUE
FALSE
x <- c(100, 110, 90, 120)
y <- 100 >
x > y <
all(x > y)
## [1] FALSE
The first line, thus, corresponds to the indicator function given by the payoff equation. You
may object that I have said the indicator function is either 0 or 1. But the apply function
returns either TRUE or FALSE . This is the same becasue FALSE is always interpreted as 0
and TRUE always as 1. ==
1 != 1
## [1] FALSE
1 == 1
## [1] TRUE
TRUE == 0
## [1] FALSE
TRUE == 1
## [1] TRUE
FALSE == 0
## [1] TRUE
FALSE == 1
## [1] FALSE
Page 55
Nils Friewald FIE 450en
We check whether two expressions are equal (non equal) using the operator == ( != ). Thus,
multiplying a number with FALSE returns 0. The second line in our function body does exactly
this. It multiplies the indicator function with the payoff function. The third line discounts the
payoff with the risk-free interest rate. That’s it. Using our two sets of scenarios we compute
the value of the option following Eq. (37). We approximate for the expectation using the mean
of the discounted payoffs.
## [1] 0.9633144
## [1] 0.9633144
As you see we get exactly the same results no matter whether we use the inefficient or the efficient
function. This confirms that both functions do exactly the same. But more importantly, is this
a fair price? Recall the investor sold the security for 0.84. Actually, we can not say because
we do not know how large the error is that we have made using our numerical technique which
gives us just an approximation of the true value. Next we look at the Monte-Carlo error and
try to quantify it.
for (i in 1:10) {
S <- simulate.paths.fast(S0, rf, sigma, dt, T, n = 5000)
X0 <- payoffs(S, K, b, rf, f)
C <- mean(X0)
cat(i, ":", C, "\n")
}
## 1 : 0.9633144
## 2 : 0.9407483
## 3 : 0.9467339
## 4 : 0.9200611
## 5 : 0.9370876
## 6 : 0.939594
## 7 : 0.9357843
## 8 : 0.9643571
## 9 : 0.9135791
## 10 : 0.9532851
Page 56
Nils Friewald FIE 450en
We see that drawing a new set of paths leads to a different option price. We also see that
it varies quite a bit. The lowest value is around 0.9 which is already fairly close to the selling
value. However, we need to compute the range in which we expect the true value to be given
a certain probability. This is called the confidence interval. (We had this already earlier when
we computed the expected return of the OBX index.)
We next need to define the Monte-Carlo error (MCE) for a given number of simulation trials
n. It is defined as the difference between the estimate X̂n and the expectation E[X], i.e.
SE <- sd(X0)/sqrt(length(X0))
SE
## [1] 0.02571523
Let’s compute the confidence interval for a confidence probability α = 99% probability.
This a rather wide confidence interval. Though we can get a more precise estimate by
simulating more scenarios, this is not the most efficient way. Again we would need to simulate,
e.g., four times as many scenarios to cut the MCE by half. There are better ways to gain
precision. We will discuss one approach next. End of
lecture 6
Page 57
Nils Friewald FIE 450en
• the pairs (Y1 , Ỹ1 ), (Y2 , Ỹ2 ), . . . , (Yn , Ỹn ) are identically and independently distributed (i.i.d.)
• for each i, Yi and Ỹi have the same distribution but are not necessarily independent
Clearly, the antithetic variates estimator is simply the average of all 2n observations:
n n n
! !
1 X 1 X Yi + Ỹi
(49)
X
ŶAV = Yi + Ỹi =
2n n 2
i=1 i=1 i=1
From the last equation it becomes evident that ŶAV is the sample mean of n independent
observations. Thus, when calculating the variance to determine the SE we need to do that
based on the independent sample, i.e.
" #
Y + Ỹ
2
σAV = Var . (50)
2
Like before, we estimate the second moment (i.e. the variance) using the sample standard
deviation from which we calculate the SE. Now let’s implement antithetic sampling.
Page 58
Nils Friewald FIE 450en
z <- rnorm(n*m)
z.as <- -z
Z <- matrix(c(z, z.as), m, n*2)
e <- exp((rf - 0.5 * sigma^2) * dt + sigma * sqrt(dt)*Z)
S <- apply(e, 2, cumprod)
S <- S * S0
S <- rbind(S0, S)
return(S)
}
We first draw n*m standard normal random variables. We then compute their antithetic
counterparts. Based on the total set of random variables we compute the price paths. The
following code simulates paths based on antithetic sampling and plots one specific antithetic
pair. range
lines
n <- 2500
S.as <- simulate.paths.fast.as(S0, rf, sigma, dt, T, n = n)
plot(S.as[, 1], type = "l", col = "blue", xlab = "", ylab = "Antithetic pair",
ylim = range(S.as[, c(1, n + 1)]))
lines(S.as[, n + 1], type = "l", col = "red")
3600
3500
3400
Antithetic pair
3300
3200
3100
3000
0 20 40 60 80 100 120
Page 59
Nils Friewald FIE 450en
The function range determines the minimum and maximum value of its argument. The
argument ylim tells the function plot on what range values shall be plotted on the y-axis.
The function lines is used in a similar way as plot but prevents creating a new plotting
window which would otherwise override the old plot.
We now compute the discounted option payoffs and the estimated option value. Finally we
determine the SE and the confidence interval.
## [1] 0.9753244
## [1] 0.02240072
Don’t forget to calculate the means of all the antithetic pairs first which is important when
we want determine the standard deviation. The sample needs to be independen! Clearly, the
SE is lower with antithetic sampling compared without using antithetic sampling. But the
improvement is not much which is also evident from comparing the confidence intervals. The
question is under what condition is an antithetic variate estimator better than an ordinary
Monte Carlo estimator? Using antithetics reduces variance if
2n
" #
h i 1 X
Var ŶAV < Var Yi , (51)
2n
i=1
where ŶAV is defined in Eq. (49). It directly follows that the necessary condition for antithetic
sampling to reduce variance is
h i
Cov Y, Ỹ < 0. (52)
To put it differently, this condition requires that the negative dependence in the inputs Z
(i.e. standard normal random variable) translates into a negative correlation in the outputs
Y (i.e. discounted payoffs). A simple sufficient condition is the monotonicity of the payoff
function. Is this condition here satisfied?
## [1] -0.274992
Page 60
Nils Friewald FIE 450en
We see that the condition holds true but the absolute correlation is much lower than the
correlation of the input parameters which is −1.
abs(0.84/0.92 - 1) * 2
## [1] 0.173913
The function abs takes the absolute value of its argument. This is a reasonable spread for a
security like this. Now assume (pessimistically) that the fair value was at the upper bound of
the confidence interval, that is at 1.03, then the spread would be
abs(0.84/1.03 - 1) * 2
## [1] 0.368932
This would be probably too much. However, spreads beyond 10% are not unlikely. Look at
the quotes for options on the OTX. They can be huge too. But be careful here. We made the
assumption of discretely monitoring the underlying to verify whether it falls below the boundary
level. In fact we do it just twice a day. It may still be the case that between the monitoring
dates the stock index could have fallen below the boundary and knocked out the option. To get
a more realistic estimate we need to rely on a finer time grid. Let’s monitor the underlying ten
times a day, for example.
As you see the selling price is now very close to the lower bound of the confidence interval
and thus the fair market price. The reason is that a finer time grid makes it more likely that
the underlying falls below the boundary which in turn makes the call option less valuable. We
close this case by concluding that the price was fair.
Page 61
Nils Friewald FIE 450en
4.7 Exercises
1. Use Monte Carlo simulation with 10000 scenarios to price an at-the-money European call
option with expiration in 2.5 years. The underlying trades at 85 and has a volatility of
35% p.a. The risk-free interest rate is 80 bp. Verify the obtained estimate using the
Black-Scholes option pricing formula.
2. Use Monte Carlo simulation with 5000 pairs of antithetic variates to price an at-the-money
European put option with maturity in 0.5 years. The underlying trades at 50 and has a
volatility of 15% p.a. The risk-free interest rate is 50 bp. Verify the obtained estimate
using the Black-Scholes option pricing formula for put options.
Kahoot!
5
Page 62
Nils Friewald FIE 450en
5 Data processing
5. In Restrictions add Trade Date GreaterOrEqualThan 1980-01-01 and Trade Date LessThan
2000-01-01. The reason is that only 50000 stock prices can be downloaded at once.
6. Deselect Preview.
8. Repeat the previous steps for the period “2000–2010” and “2010–2014”, and “2014–1018”.
## [1] 131822
14
Amadeus 2.0: http://mora.rente.nhh.no/borsprosjektet/amadeus/client/publish.htm
Page 63
Nils Friewald FIE 450en
summary(stocks)
## V1 V2 V3
## Length:131822 Min. : 6000 Length:131822
## Class :character 1st Qu.: 6245 Class :character
## Mode :character Median : 48327 Mode :character
## Mean : 500791
## 3rd Qu.:1249743
## Max. :2101943
##
## V4 V5 V6
## Length:131822 Min. : 0.01 Min. : 0.0
## Class :character 1st Qu.: 5.04 1st Qu.: 6.5
## Mode :character Median : 33.78 Median : 30.0
## Mean : 81.82 Mean : 475.8
## 3rd Qu.: 102.50 3rd Qu.: 83.5
## Max. :25000.00 Max. :339915.6
## NA's :57459 NA's :57482
## V7 V8
## Min. :0.000e+00 Mode:logical
## 1st Qu.:0.000e+00 NA's:131822
## Median :2.488e+06
## Mean :5.845e+07
## 3rd Qu.:3.015e+07
## Max. :2.027e+10
##
The function summary is particularly useful because it computes the minimum, 25% quan-
tile, median, mean, 75% quantile, and maximum of all columns that are of type numeric.
For non-numeric datatypes no such statistic can be computed, of course. For example, for
SecurityName only the number of occurrences of each name is reported. It’s now time to
discuss this data type a little bit further. Before doing so, however, we assign sensible column
names first and drop the last column which seems to be empty.
Page 64
Nils Friewald FIE 450en
instead but factors typically have many advantages compared with characters. A good example
are firms’ credit ratings. Assume we know the ratings of several firms and want to store them
in a factor. We would use the function factor : factor
The output shows that R converts the character vector into a categorical variable with all
the categories shown below. In older versions of R factors had a lower memory consumption
than character vectors because not all occurrences of a string were stored in memory separately.
Instead a “pointer” was referring to the string. There is mostly no memory advantage anymore
in newer version of R. Factors, however, have other pros. First, they can be reordered. Second,
factors are internally represented as numerical values which is often very useful. Third, we can
easily change the labels in a factor. Let’s do some examples to demonstrate what I mean by
that.
As you know credit ratings are measured on an ordinal scale meaning that a AAA is better
than AA but it does not say anything about how much “better” it is. Looking at the levels of
the factor in our previous example we see that these are not ordered to be consistent with a
standard credit rating scale. We can do this as follows:
The first argument is simply the previous factor that we are going to redefine. The second
argument reorders the levels so that the ordering is consistent with a rating scale. AAA refers
to the highest rating, AA to the second highest and so on. You may have also noticed that
I have added additional levels (“CCC”, “CC”, “C”) to complete our scale despite having no
observation of firms having these very low ratings.
Again, we may find it useful to work with numbers when using credit ratings. For example,
we could use a credit rating variable in a regression model. The following function converts a
factor into a numerical value. Thus, it is important that the ordering of the levels is correct.
as.numeric(ratings)
## [1] 4 3 5 5 4 1 2 2
Page 65
Nils Friewald FIE 450en
levels(ratings)
## [1] "AAA" "AA" "A" "BBB" "BB" "B" "CCC" "CC" "C"
Now assume that we want to map the credit ratings onto our own rating scale which shall by
represented by different letters (for example Moody’s and Standard and Poor’s use a different
notation).
## [1] d c e e d a b b
## Levels: a b c d e
Other advantages of using factors are that they are converted automatically to dummy
variables in regressions. They are also extremely useful when we want to “summarize” them,
in particular if the number of levels are relatively low like in our example:
summary(ratings)
We see factors can be quite useful. We now continue with the data cleansing. We first select
“Ordinary Shares” only. Then remove the corresponding column. Further make sure there is a
price observation available and that the price ( Last ) is larger then NOK 10. We also need to
make sure that there are shares outstanding. is.na
## [1] 37063
Note that the third line in the previous code removes all rows in our data where AdjLast is
NA . To do so we must use the function is.na ! The ! inverts the selection, i.e. any TRUE
becomes FALSE and vice versa. We do the same for Last . Note that SharesIssued may
contain very large numbers.15 Next we convert the date column into a Date object.
15
About 20 firms in the sample have had at least once more than 2 billion shares issued. Albeit large this
number is not unreasonable. For example Norsk Hydro has more than 2 billion shares outstanding. What is
suspicious, however, is that for some observations the number of shares is exactly to 231 − 1. To computer
scientists this number refers to a maximum value of a 32-bit signed integer number. This points towards towards
a bug in Amadeus.
Page 66
Nils Friewald FIE 450en
As you see we cannot go directly from the end of one month to the next because each month
have different numbers of dates. That is why we first create a vector of starting dates of each
month and then subtract one single day to get the last date of the previous month. This does the
trick. We then are “rolling forward” the trading days to the auxiliary dates. For example if the
trading day is “2005-04-06”, we roll it forward to the end-of-month date which is “2005-04-30”.
Here we make use of the function cut and of factors̊. cut
The first argument of cut is a vector that can be ordered. The second argument defines the
boundaries of the intervals into which the first argument shall be “cut”. The third argument
indicates if the intervals should be closed on the right. Let’s do some basic examples to see how
cut works:
as.numeric(c)
## [1] 1 2 3 NA 1 2 3
You see that cut returns a factor with the label being the intervals. Note that “(“ stands for
left open and “]” for right close, i.e. 0 does not belong to the interval (0, 5] but so does the
number 5. The factor can be converted into a numeric or as in our case into a Date object.
Page 67
Nils Friewald FIE 450en
However, going back to our real example we see that we did not exactly get the desired result
because it seems we have rolled the trading date backwards to the previous end-of-month date.
To see this
stocks$Date[1]
## [1] "1980-01-01"
stocks$Date.aux[1]
## [1] 1979-12-31
## 457 Levels: 1979-12-31 1980-01-31 1980-02-29 ... 2017-12-31
i <- as.numeric(stocks$Date.aux)
stocks$Date.aux <- months[i + 1]
stocks$Date[1:5]
stocks$Date.aux[1:5]
list
num <- aggregate(stocks$SecurityId, list(stocks$Date.aux, stocks$SecurityId),
length)
The first argument of aggregate is a vector on which we apply the function given by the third
argument. The second argument defines how to group the first vector. We want to compute the
Page 68
Nils Friewald FIE 450en
length (that is the number of observations) per month ( Date.aux ) and firm ( SecurityId ).
This information must be provided as a list for which we use the function list . A list object
is a very versatile data type because it can comprise other objects of different data types and
different lengths. A few examples
## [[1]]
## [1] 1 2 3
##
## [[2]]
## [1] 100 200
##
## [[3]]
## [1] 7
## [[1]]
## [1] "A" "B"
##
## [[2]]
## [1] 4 5 6
## $letters
## [1] "A" "B"
##
## $numbers
## [1] 3 5 6
The last line demonstrates that we can also give names to the list’s constituents. How does
aggregate work? The following example demonstrates how to use this function:
## id val
## 1 a 1
## 2 a 2
## 3 a 3
## 4 b 4
## 5 b 5
Page 69
Nils Friewald FIE 450en
## Group.1 x
## 1 a 6
## 2 b 9
## Group.1 x
## 1 a 3
## 2 b 2
head(num[num$x > 1, ])
## Group.1 Group.2 x
## 65 1992-06-30 6002 2
## 384 1985-09-30 6015 2
## 987 1989-05-31 6035 2
## 1684 1994-01-31 6041 2
## 1766 1993-12-31 6042 2
## 1931 1993-12-31 6048 2
The result above shows that there are some cases where we have more than one observation.
To give you a specific example:
What shall we do? We will take the most recent observation in a given month. This does the
trick:
We first ordered the data frame based on SecurityId and Date . We then added a new column
to the data frame which contains the row number. Next we use the function aggregate to
Page 70
Nils Friewald FIE 450en
determine the largest row number within each group as defined by SecurityId and Date .
Since we have ordered the data frame beforehand we made sure that the highest row number
in each group refers to the most recent observation of the stock price in a given month. The
variable rows allows selecting the corresponding rows in stocks . This ensures that we end
up with only one observation per month and firm.
Next we need to make sure that the actual trade is not too old to be considered as an
end-of-month trade. We define a trade as not being too old if it occurs at most five days before
the end-of-month.
From now on we do not need the actual trading date and only refer to the auxiliary end-of-month
date. Thus, we remove the trading date and rename the auxiliary date.
Page 71
Nils Friewald FIE 450en
unlist
stocks <- stocks[order(stocks$SecurityId, stocks$Date), ]
stocks$R <- unlist(tapply(stocks$AdjLast, stocks$SecurityId,
function(v) c(v[-1]/v[-length(v)] - 1, NA)))
The function tapply is very similar to aggregate but can easier cope with functions that
do not just return one value per group but several values. Since each group produces data with
varying length tapply returns a list. We use unlist to convert the list in a simple vector.
Note also that here we made use of an anonymous function definition. This is just a short cut.
We could have defined our function also explicitly. The following simple examples demonstrate
the use of tapply .
## id val
## 1 a 100
## 2 a 200
## 3 a 120
## 4 b 140
## 5 b 160
fn <- function(v) {
c(NA, diff(v))
}
tapply(df$val, list(df$id), fn)
## $a
## [1] NA 100 -80
##
## $b
## [1] NA 20
## $a
## [1] NA 100 -80
Page 72
Nils Friewald FIE 450en
##
## $b
## [1] NA 20
## a1 a2 a3 b1 b2
## NA 100 -80 NA 20
Clearly, the last two function calls produce the same result. The first explicitly defines a function
while the second is an anonymous function definition. Note that we defined the returns in a
“forward looking” manner, i.e. the return in a given row for time t defines the relative price
change from t to t + 1. This is just a convention and eases the further process.
Given the returns we need to check whether they are all on a monthly frequency. Although
we have ensured that stock prices are all end-of-month values there could be cases when a stock
stops trading for some period and then trades again. This would distort our return computation
because the underlying price observations may be a long way from each other. We first make
sure that the data frame is sorted and then use tapply again to compute the time differences.16 .
summary(stocks$R)
## [1] 30650
Page 73
Nils Friewald FIE 450en
on to construct a market portfolio. We first add a column MarketCap that shows the market
capitalization in millions Norwegian Kronor:
Note that we use the unadjusted stock price because any change due to stock splits must be
reflected in SharesIssued already. Next we compute the total market capitalization each
month and plot the time series.
1000000
500000
0
In order to compute the weights we need to bring back the total market capitalization into
our data frame stocks , i.e. we need to merge both stocks and res . To do so we use the
function merge . merge
## [1] 30650
Page 74
Nils Friewald FIE 450en
The function merge requires at least three arguments. The first two refer to the data frame to
merge and the third tells the function based on what columns to merge. We are now ready to
compute the weights.
Of course, the weight vector for each month must always sum up to one.
Now we plot both market indices by computing the cumulative returns first. legend
Page 75
Nils Friewald FIE 450en
Equally−weighed
Value−weighed
300
250
200
Market index
150
100
50
0
We use the function legend to add a legend to the plot. The first argument tells the
function where to put the legend in the plot window. The second provides the legend labels
and the third and fourth refers to the line width and color. We now save the current data to a
an R data file. We can write several data objects into the same file.
Page 76
Nils Friewald FIE 450en
## 2 1980-02-29 -0.09890118 NA NA
## 3 1980-05-31 NA 0.022221315 -0.05600039
## 5 1980-06-30 NA 0.008696503 0.02542392
The first argument of reshape is the data frame that we wish to have in wide-format.
Note that we need to select the right columns first. The argument v.names tells the function
which column shall be transformed into wide-format. Since we want to have the returns of all
stocks as columns we set this argument to R . The next argument specifies the key in the wide
format which is, of course, the Date column. timevar tells reshape that we want to have
the returns of each SecurityId to vary over time. The final column specifies the data format
which is wide .
The following little example shall demonstrate of how to transform a standard data frame
from long to wide-format and then back again.
## id t val f
## 1 a 1 100 10
## 2 a 2 20 32
## 3 b 2 40 32
## t f val.a val.b
## 1 1 10 100 NA
## 2 2 32 20 40
## t f id val
## 1.a 1 10 a 100
## 2.a 2 32 a 20
## 1.b 1 10 b NA
## 2.b 2 32 b 40
The original data frame contains firms ( id ) for which we have some firm specific information
( val ) over time ( t ) and some only time-dependent information ( f ). We then follow exactly
the same procedure to transform the market capitalization into a wide-format.
Page 77
Nils Friewald FIE 450en
Given that stocks did not have any non-missing values in any of its columns it is assured that
if there is a return for some stock from month t to t + 1 there will also be a market capitalization
for month t. To finish, we save both data frames in wide-format to the same R data file.
1980–1985
1-month Eurokrone money market interest rates17
1986–2013
1-month Nibor18
2014–2018
1-month Nibor19
Page 78
Nils Friewald FIE 450en
Note that we need to skip the first twelve rows using the skip argument in read.csv because
the data series starts only thereafter. Further we only select the first and second column which
refers to the date column and the 1-month money market rate. We then add a column with
the end-of-month dates of the previous interest rate period to make it consistent to how we set
up the stock returns. Recall, that we have used a forward-looking notation. A return at month
t refers to the period t to t + 1. We use the same setup here. After having done so we sort
the data frame and convert the rates into decimals. Next we use the 1-month Nibor rates from
Norges Bank.
Finally, we use the 1-month Nibor rates from Oslo Stock Exchange. Note that since the data is
on a daily frequency we use the first rate available in each month.
Let’s plot all three time series to see whether they indeed fit together.
Page 79
Nils Friewald FIE 450en
0.25
0.20
0.15
0.10
0.05
0.00
The spike in the interest rate is due to a currency crisis. We then need to combine all three
time series and make sure that they do not overlap. Further, we scale the interest rates so that
they reflect a monthly investment period.
range(df1$Date)
range(df2$Date)
range(df3$Date)
Page 80
Nils Friewald FIE 450en
5.10 Exercises
1. Compute for each month the number of return observations in the data frame stocks
that we used during the lecture. Create a plot showing the time-series of observations
each month.
2. Download the complete daily stock price information for the following stock as given by
their tickers from finance.yahoo.com: IBM, AAPL, XOM, KO and GS. Merge all samples,
compute daily simple returns, and transform the data frame into wide-format. Compute
the annualized mean and volatility. Determine also the covariance and the correlation
using the functions cov and cor . cov
cor
3. Obtain the 3-month Nibor rates from Norges Bank and Oslo Stock Exchange and extend
this series with the 3-month Norwegian Treasury Bills in the primary market to construct
a monthly time-series of risk-free interest rates.
Page 81
Nils Friewald FIE 450en
6 Mean-Variance Portfolios
µ1
>
(59)
E[Rp ] = µp = |{z}
ω µ = ω1 ω2 = ω1 µ1 + ω2 µ2 ,
|{z} |{z} µ2
1×1 1×2 2×1
where µi is the expected return of stock i and ωi its weight. The portfolio variance (squared
volatility) can also be written in matrix notation. For the expansion we use the fact that
σ11 = σ12 , σ22 = σ22 and the symmetry of the covariance matrix, i.e. σ12 = σ21 .
ω > × |{z}
Var[Rp ] = σp2 = |{z} Σ × |{z}
ω
|{z}
1×2 2×2 2×1
1×1
σ11 σ12 ω1
= ω1 ω2
σ21 σ22 ω2
ω1
= ω1 σ11 + ω2 σ21 ω1 σ12 + ω2 σ22
ω2
= ω12 σ11 + ω1 ω2 σ21 + ω1 ω2 σ12 + ω22 σ22
= ω12 σ12 + 2ω1 ω2 σ12 + ω22 σ22 (60)
Here, σij is the covariance between stock i and j and Σ refers to the covariance matrix. We
wish to find a portfolio, that is the weights, that yields the highest expected return for a given
target variance of the portfolio. Or to put it differently we look for the lowest variance for a
given target expected return. This is a classic optimization problem. So let’s first define the
problem formally:
Page 82
Nils Friewald FIE 450en
(62)
X
E[Rp ] = µ∗ and ω = 1,
where the expected portfolio return and variance are computed as given in (59) and (60). This
is a so-called Quadratic Programming Problem.
We then merge the stock returns with the risk-free interest rate and the market return using
the function merge . This helps us to align the dates of the different data samples.
The previous line of code adds two columns to the stock returns, i.e. the risk-free interest rate
( rf ) and the market return ( RM.vw ). We add the market return for later usage. We then
deduct the risk-free interest rate from all stock returns to get excess returns.
Note that the last two lines of the previous code snippet again demonstrates how replication in
R works. To reiterate the principle of replication:
m <- matrix(1:4, 2, 2)
m
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
m - c(1, 2)
Page 83
Nils Friewald FIE 450en
## [,1] [,2]
## [1,] 0 2
## [2,] 0 2
Since we might use the excess returns again we save the data to a file.
We pretend to be on December 31, 2017 because this is the last price observation in our data.
We will only consider stocks that have been traded on that day and the end of the previous
month to make sure that we could potentially invest in the stocks going forward. We further
restrict the sample in that we select only stocks that have been traded at least 75% of the time
during the last 60 months.
The argument use tells the function cor and cov , respectively, that for computing the
correlation and the covariance pairwise complete observations shall be used. Note that we scale
the returns and covariances accordingly so that we have these metrics in annual terms. This
scaling does not change our results but eases the interpretation of returns and variances. Let’s
look at the estimates for the expected return and correlation to check whether they make any
sense: lower.tri
summary(mu)
summary(Rho[lower.tri(Rho)])
Page 84
Nils Friewald FIE 450en
We use the function lower.tri to get the lower triangular part of the correlation matrix so
that we do not double count the correlation coefficients when computing the summary statistics.
Clearly, some expected excess return estimates are negative. Who would be willing to invest in
a risky stock with a negative expected excess return? Nobody! These are just estimated badly.
We set them to zero.
require
min(−d> b + 1/2b> Db) (63)
solve.QP
with the constraints
A> b ≥ b0 . (64)
Let’s assume we wish to determine the optimal risky portfolio that promises a return of 5%
p.a. with the lowest risk. We first need to specify the variables in Equations (63) and (64).
Since our problem only consists of a quadratic but no linear term we set d = 0. In the notation
above b corresponds to the weight vector ω which solve.QP will search for. The variable D
corresponds to the covariance matrix Σ. Thus we have
0
0
d = . . (65)
..
0
Next we need to define our two linear constraints. First, all weights shall sum up to one and,
second, the return shall be 0.05.
Page 85
Nils Friewald FIE 450en
b1
1 b2
1 1 ··· 1
(66)
µn ...
=
µ1 µ2 · · · 0.05
| {z } | {z }
A> bn b0
| {z }
b
The following lines of code do the optimization. The function t transposes a vector or matrix
which means that columns and rows are swapped. The function rep replicates the first argu-
ment by the number given as the second argument. Note that the last argument to solve.QP
says that the first two constraints (out of two) shall be interpreted as equalities instead of
inequalities. t
rep
require(quadprog)
A <- t(rbind(1, mu))
mu.star <- 0.05
d <- rep(0, length(mu))
b0 <- c(1, mu.star)
solve.QP(Dmat = Sigma, dvec = d, Amat = A, bvec = b0, meq = 2)
## Error in solve.QP(Dmat = Sigma, dvec = d, Amat = A, bvec = b0, meq = 2): matrix
D in quadratic function is not positive definite!
Oh dear, what’s wrong here? What does it mean that the matrix is not positive definite?
Well, if a matrix is not positive definite we cannot invert it. In our case Sigma is not invertable.
But apparently, solve.QP needs to invert this matrix to solve the entire problem. So why can
it be that Sigma is non positive definite. This is a common problem in empirics if you estimate
the covariance (and also the correlation) matrix of high dimensionality. Remember we have 53
firms which is quite a lot. There is a remedy to this issue because there are ways to convert
nearly non positive definite matrices into positive definite ones. The function is called nearPD
and is available in the package Matrix . nearPD
as.matrix
require(Matrix)
Sigma2 <- nearPD(Sigma)$mat
Sigma2 <- as.matrix(Sigma2)
After having transformed Sigma into a positive definite matrix it is not given as a matrix
object. Thus, we need to convert it into a matrix object using as.matrix . Now let’s try again:
Page 86
Nils Friewald FIE 450en
hist(omega)
Histogram of omega
20
15
Frequency
10
5
0
omega
The R function hist is used to plot a histogram. Using the weights we then check whether we
end up with the desired return. We also compute the portfolio standard deviation.
## [,1]
## [1,] 0.05
## [,1]
## [1,] 0.000101595
We use the operator %*% to do matrix multiplication. Be careful here, the matrices have to
have the right dimension so that they can be multiplied. And what are the results? Well, we
get the desired portfolio return using the weights omega . But look at the portfolio standard
deviation! It’s tiny and already in annual terms! It basically means that you earn 5% without
risk. Too good to be true. What’s wrong here? A big disadvantages of the Markowitz portfolio
optimization is that it is so sensitive to the input parameters, in particular, to the expected
return estimates. Recall that these are very hard to estimate. It is always sensible to add
Page 87
Nils Friewald FIE 450en
constraints to the optimization. For example, we require all weights to be positive because our
investor cannot short stocks. How do we do this? We need to define A and b0 as follows. Note
that the equality applies to the first two equations only while the inequality to the rest.
1 1 ··· 1 1
µ1 µ2 · · · µn b1 0.05
1 0 · · · 0 b2 0
0 1 · · · 0 .. = (≥) 0 (67)
.
. .. . . . .
.. . . .. bn ..
0 0 ··· 1 | {z } 0
| {z } b | {z }
A> b0
## [,1]
## [1,] 0.05
## [,1]
## [1,] 0.09029294
summary(omega)
The function diag creates a diagonal matrix with the diagonal values given by the first ar-
gument and the number of rows and columns by its second argument. The results are much
more convincing now than before. However, another disadvantage of the Markowitz portfolio
optimization is the large number of parameters to be estimated. In our case we have:
This is quite a lot. And the number of estimates grows exponentially with the number of
stocks we consider. This implies that the more estimates you have the more errors your are
Page 88
Nils Friewald FIE 450en
going to make. We know already how difficult it is to estimate, for example, expected returns.
A similar issue arises for the covariance (or correlation matrix). Since it is huge it can happen
that correlation coefficient can be mutually inconsistent and thus lead to nonsensical results.
The higher the number of estimates the larger the total estimation error. We know, garbage in
– garbage out. This seems to be the case in the previous computation. What can we do? We
can either put more constraints to the optimization problem or simplify our model by using an
index model instead. This is what we do next.
Ri = αi + βi Rm + i (68)
The intercept αi in this equation is the security’s expected return when the market return is
zero. In an arbitrage-free world we should not expect to find a non-zero α. Therefore, it is often
sensible to assume it to be zero. The slope coefficient βi is the security beta and measures the
sensitivity of the security to the market index. The residual i is zero-mean and corresponds to
the firm-specific surprise in returns.
coefficients
reg <- apply(R[, -1], 2, function(v) {
res <- lm(v ~ RM$RM.vw) residuals
c(coefficients(res), var(residuals(res)))
})
rownames(reg) <- c("alpha", "beta", "var.eps")
This probably needs some further explanation. I will demonstrate how to run a regression using
a very simple example. Let’s regress y on x as follows:
x <- c(4, 2, 6, 7, 3, 4)
y <- c(100, 200, 140, 160, 170, 190)
lmres <- lm(y ~ x)
lmres
##
## Call:
## lm(formula = y ~ x)
##
Page 89
Nils Friewald FIE 450en
## Coefficients:
## (Intercept) x
## 192.5 -7.5
plot(x, y)
abline(lmres)
200
180
160
y
140
120
100
2 3 4 5 6 7
summary(lmres)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## 1 2 3 4 5
## -6.250e+01 2.250e+01 -7.500e+00 2.000e+01 -2.842e-14
## 6
## 2.750e+01
Page 90
Nils Friewald FIE 450en
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 192.500 41.926 4.591 0.0101 *
## x -7.500 9.007 -0.833 0.4519
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 37.5 on 4 degrees of freedom
## Multiple R-squared: 0.1477,Adjusted R-squared: -0.06534
## F-statistic: 0.6933 on 1 and 4 DF, p-value: 0.4519
coefficients(lmres)
## (Intercept) x
## 192.5 -7.5
residuals(lmres)
## 1 2 3 4
## -6.250000e+01 2.250000e+01 -7.500000e+00 2.000000e+01
## 5 6
## -2.842171e-14 2.750000e+01
You get the coefficients, standard errors, t-values and p-values all together with:
summary(lmres)$coefficients
summary(lmres)$r.squared
## [1] 0.1477273
summary(lmres)$adj.r.squared
## [1] -0.06534091
End of
lecture 11
Page 91
Nils Friewald FIE 450en
2
Cov[Ri , Rm ] = βi σm (71)
while the covariance between stock i and j with i 6= j is
2
σij 2
= Cov[Ri , Rj ] = βi βj σm . (72)
The next step is to write down the expressions for the expected portfolio return and variance
in the familiar matrix notation
Note, that I have scaled returns, variances and covariance so that they are in annual terms.
When markets are efficient a stock should not have α. We thus set the intercept to zero when
determining the expected (excess) stocks returns. The R function diag accesses the diagonal
of a matrix.
Page 92
Nils Friewald FIE 450en
## [,1]
## [1,] 0.05
## [,1]
## [1,] 0.07206284
summary(omega)
The previous result makes more sense compared with the result that we obtained based on the
full estimation of the covariance matrix. Note that the weights of the optimal solution are all
positive without explicitly imposing a constraint. This, however, does not necessarily apply to
all optimal portfolios, that is, portfolios that yield other target returns. The guarantee that
weights are positive for other portfolios we impose the no-shortselling constraint.
## [,1]
## [1,] 0.05
## [,1]
## [1,] 0.07206284
Page 93
Nils Friewald FIE 450en
summary(omega)
Would you advice your client to invest in these stocks with the corresponding weights? To
answer this question we need to plot the frontier first and check whether the obtained portfolio
is efficient.
Page 94
Nils Friewald FIE 450en
0.20
0.15
Expected return
0.10
0.05
0.00
Volatility
We see that the optimal portfolio that delivers a target return of 5% is indeed efficient.
That is, there is no other portfolio that has a lower risk and still yields 5%. Further, we could
also improve the perfomance by taking into consideration a risk-free bank account which we are
going to do next.
Page 95
Nils Friewald FIE 450en
E[Rp ] ω>µ
max p = max √ (76)
ω Var[Rp ] ω ω > Σω
subject to
(77)
X
E[Rp ] = µ∗ ωi = 1
Unfortunately, this does not immediately look like a quadratic programming problem. Thus, it
seems we cannot use the solveQP anymore. However, a little trick shows that the problem is
still quadratic. Let’s divide the numerator and denominator of the objective function by some
scalar k ≡ 1/(ω > µ), that is,
1 1
max q = max √ , (78)
w ω> ω
Σ ω> w w> Σw
ω> µ µ
E[Rp ] = µ∗ . (80)
which is a standard quadratic programming problem. The result of this optimization problem
is a portfolio on the CAL. To get the tangency portfolio (that is the one fully invested in the
risky assets) we need to rescale the weight vector w so that it sums to 1. Let’s do this first
using the unconstraint version. points
Page 96
Nils Friewald FIE 450en
0.10
0.05
0.00
Volatility
The points function plots the stock returns and its standard deviations as symbols as
defined by pch . Since the investor desires to earn 5% p.a. in excess of the risk-free interest
rate we advise him to invest the following fraction of his NOK 50 million in the risky tangency
porfolio:
0.05/y
## [,1]
## [1,] 0.5906588
Page 97
Nils Friewald FIE 450en
0.05/y * x
## [,1]
## [1,] 0.06753101
The correspondnig weights and dollar amounts to be invested in the stocks are given by: round
round(omega, 2)
## [1] 0.02 0.02 0.02 0.02 0.03 0.03 0.00 0.02 0.01 0.02 0.08
## [12] 0.04 0.00 0.02 0.01 0.02 0.04 0.01 0.00 0.05 0.01 0.01
## [23] 0.01 0.05 0.02 0.09 0.00 0.01 0.01 0.06 0.01 0.00 0.00
## [34] 0.00 0.01 0.00 0.01 0.02 0.01 0.02 0.01 0.00 0.02 0.03
## [45] 0.01 0.05 0.02 0.00 0.00 0.01 0.01 0.02 0.02
With the function round we round decimals as given by the first argument to the number of
digits as given by the second argument. Finally, the ex-ante Sharpe ratio is:
y/x
## [,1]
## [1,] 0.7404006
6.6 Exercises
1. Use the estimates of the single index model and find the optimal portfolio that has a
volatility of 8% p.a. assuming you cannot short stocks.
2. Write a function that computes the unconstrained frontier, i.e. the expected returns and
the corresponding volatilities. The function shall be defined as follows:
Page 98
Nils Friewald FIE 450en
3. Use the estimates based a single-index model and do an unconstrained portfolio opti-
mization to find the tangency portfolio. Backtest this strategy using a rolling window
of exactly 60 return observations. That is, start at the earliest date possible where you
have at least 60 return observations. Find the tangency portfolio. Invest in this tangency
portfolio for one month. Find a new tangency portfolio and invest in the new one. Plot
the return series this strategy generates. What is the Sharpe ratio?
End of
lecture 12
Page 99
Nils Friewald FIE 450en
Index
..., 30 log, 17
:, 13 lower.tri, 84
<-, 9 matrix, 52
<, 55 mean, 18
==, 55 merge, 74
=, 9 na.omit, 19
>, 55 names, 14
FALSE, 55 nearPD, 86
TRUE, 55 nlminb, 34
[[, 12 nlm, 32
[, 11 nrow, 13
$, 12 order, 15
abline, 49 plot, 16
abs, 61 pmax, 54
aggregate, 68 points, 96
all, 54 qnorm, 20
apply, 53 range, 59
as.Date, 14 rbind, 51
as.matrix, 86 read.csv, 9
as.numeric, 42 rep, 86
cat, 8 require, 85
cbind, 50 reshape, 76
class, 15 residuals, 89
coefficients, 89 return, 28
cor, 81 rev, 24
cov, 81 rm, 17
cumprod, 52 rnorm, 48
cumsum, 52 round, 98
cut, 67 sample, 45
c, 13 save, 17
diag, 88 sd, 18
diff, 17 seq, 29
dim, 11 set.seed, 54
dnorm, 20 solve.QP, 85
exp, 42 sqrt, 18
factor, 65 summary, 64
for, 25 sum, 24
function, 28 system.time, 54
head, 10 table, 46
hist, 86 tail, 13
install.packages, 85 tapply, 72
is.na, 66 t, 86
legend, 75 unlist, 72
length, 19
levels, 65
lines, 59
list, 68
lm, 89
load, 17
Page 100