Market Volatility Calculation Method
Market Volatility Calculation Method
Peter Gross
Advisor: Dr. Jialing Dai
Final Report
URA Spring 2006
Abstract
The Black-Scholes equation is a hallmark of mathematical finance,
and any study of this growing field would be incomplete without having
seen and understood the logic behind this equation. The initial focus of
this paper will be to explore the arguments leading to the equation and
the financial background necessary to understand the arguments. The
problem of estimating the only parameter which is not observable directly
in the market, the volatility, is then tackled through two different methods:
historical volatility and implied volatility. The goal is then to determine
the “best” way to estimate volatility, by comparing the theoretical price
the equation predicts with the actual price in the market, based on data
from the Chicago Board Options Exchange on six selected stock options.
1 Introduction
The Black-Scholes equation was first presented in [6]. It was for this work that
Scholes received the Nobel Prize of Economics in 1997 (Black had passed away
two years earlier). The problem presented in this paper was one of finding the
“fair” value of a stock option. What a stock option actually is, and what exactly
fair means in this context, will be discussed later on. The equation is widely
seen to have paved the way for an influx of mathematical sophistication into
certain aspects of finance. Furthermore, it has spawned the field of financial
engineering, which is concerned with the design of financial contracts and the
pricing of derivatives. The pricing formula for a put option is shown below
1
Section 2 will discuss the necessary finance background for understanding
this project. It will also introduce the reader to the type of economic reasoning
which - apart from a good mathematics background - is valuable to know for
study of the field of mathematical finance. Section 3 will contain an introduction
to the relevant mathematical concepts, which are valuable in understanding the
equation and some of which will be used in the derivation of the pricing equation
in Section 4. Section 5 will discuss how the data was obtained, and give a
general overview of how to work with the data base. Since volatility is the only
parameter in the equation which is not directly observable in the market, this
paper will discuss different ways of estimating it. Volatility can be estimated
using historical data, by calculating it from historical prices. One goal of this
paper will be to vary the time scale one takes into account, i.e. to change the
time windows from which prices are used to calculate the volatility. In section
6, we will then look at which time scale seems to be optimal in the sense of
approximating the option price as near as possible empirically, based on eight
stocks. Section 7 will then investigate a second way of estimating volatility, by
looking at so-called implied volatility. To calculate this, we need to take the
option price on a specific day, and then find out which volatility would produce
this option price, given the other known parameters in the model.
Implied volatility is often interpreted as representing what investors think
volatility is at any given point of time. Since we cannot explicitly solve the
equation for σ, we need to rely on numerical techniques, which we borrowed
from [3], to approximate implied volatility. We will then determine this measure
of volatility at different points of time, and look at how it compares directly to
historical volatility up to that point. The goodness of fit for this - as in the case
of historical volatility - will be the amount to which the calculated option prices
deviate from the actual prices in the market.
In the conclusion of the paper we will discuss some pitfalls of the Black-
Scholes model, which have been already identified by the literature and lead to
developments of new models. We will also check if our analysis corroborates
these flaws, and what assumptions specifically have to be revised, based on our
results.
2 Finance Background
2.1 Definitions and basic Facts
2.1.1 Basic Terminology
In what follows, we will introduce the reader to the concepts and terminology
used in finance, as far as it is important for this paper. A derivative is a financial
instrument whose value depends on the value of some underlying variable. In
most cases this is the price of a stock on a certain date, but it could also depend
on some interest rate, or even on something more unusual like the amount
of rainfall in a certain week. The important feature of a derivative is that
its value is well-defined, given the value of the underlying variable. Among
2
these derivatives, the regular stock option is the one we will discuss in detail.
An option gives the holder the right, but not the obligation, to sell or buy a
stock at a certain price, the strike price, on a pre-specified date, the expiration
date. If the right to buy the stock is conferred upon the holder, then this is
called a call option, or simply call, while if the right to sell is conferred, it is
called a put option, or put. Another distinction is made between European and
American options, where the first one only allows the option to be exercised
on the expiration date, while the latter allows the holder to exercise the option
any time up to the expiration date. Exercising the put here refers to the act of
buying the stock and using the option to sell the stock at a higher price, while
exercising the call consists of using the option to buy the stock at a lower price
than the current market price, and then sell it at the market price.
3
he might also buy a call option with a strike price of $60 for a 100 shares,
which is going to cost less than buying 100 shares, and then exercise these calls
on expiration date, realizing a profit of $20 on each call, which cost less than
buying the actual stock. We see that the yield, the profit realized divided by the
money invested, is higher when using the option than when buying the stock.
The option is therefore said to have leverage. On the other hand, the speculator
is also not exposing himself to losses on the stock, if his prediction of the stock
would turn out to be false, since the worst he can do by buying the options
is lose the money he paid for the option, since if the stock will go below $60,
he will simply not exercise the option. If he holds the stock itself, he can lose
significantly, since the stock might go down, and with it his investment. These
two examples serve to illustrate some of the reasons why options are being used
by investors. A more comprehensive account of strategies involving options is
given in [2].
4
cally should watch out for when dealing with data is the issue of stock splits.
Sometimes a company decides to split m of its current shares into n new shares,
each of which will then have a value of P m n , where P is the current stock price.
Since this is just a change in the units and not in value, the value of the options
one holds should also stay the same. For this reason, the option contract is then
changed by changing the strike price to m n of its former value, and changing
n
the volume of shares the contract is good for to m of the former volume. This
should then be accounted for in the data analysis, although it is probably easiest
to select periods without stock splits if possible.
The first such rational pricing considerations is given by the upper bound on
the value of a call, namely
C ≤ S,
5
where C is the price of the call and S is the price of the stock. To see this,
suppose C were greater than S. Then a holder of the call could realize a risk-
less profit by buying the stock for S, and then selling the option for C. Another
way of seeing this is that it should not be more expensive to buy the right to
buy a stock than it is to buy the stock itself. On the other hand, we have
P ≤ K,
where P is the price of the put and K is the strike price. If it were the case that
K is less than P , then an arbitrageur could easily realize a profit by writing a
put, and investing the profit at the risk-free rate. He would then have max(P −
K, P )erT , where T is the time until expiration since the only thing he committed
to is possibly buying the stock for K at some future time, and even that only
if the option holder would actually exercise the call. By requiring that there
are no such arbitrage opportunities, we actually arrive at the slightly stronger
condition,
P ≤ Ke−rT ,
C ≥ max(S − Ke−rT , 0)
The final bound we want to derive is the lower bound for the European
put. For this, consider two portfolios again, the first one containing a put and
6
a share, the second one cash worth Ke−rT . On the day of expiration, the first
portfolio will then be worth
while the second one will be worth K. Therefore, the first one is always worth
at least as much as the second one in the absence of arbitrage, so that
P ≥ max(Ke−rT − S, 0),
C + Ke−rT = P + S.
Note that in this formula we are stretching the notation a bit by making S be
the stock price at the time at which we want to price the option, and not at
expiration date as it is used for above.
3 Mathematics Background
3.1 Random Variables
The background necessary to read this paper is basic probability and second
semester calculus. The reader unfamiliar with probability is referred to [1] for
an excellent introduction to the field. Some more advanced concepts will be
discussed, but for the most part are self-contained. Since the price of an option
depends on the path the underlying stock takes, we will discuss ways of making
7
sense of the random nature of the stock price, and how to take each individual
stock’s characteristics into consideration when pricing the option on it. First of
all, it will be useful to make a couple of definitions.
A random variable X is a function from a sample space Ω to the set of real
numbers, R (or in more general cases, to Rn ) A particular value of X is called
a realization of the random variable. A natural way to subdivide the set of all
random variables is into discrete and continuous ones (there are actually also
mixed random variables, which are discrete for certain values, and continuous
for others, so our subdivision is not a partition). A discrete random variable is
one in which can only take on a denumerable amount (either finite or countably
infinite) of values, meaning that the cardinality of X(Ω) is at most countably
infinite. Continuous random variables are those whose range is a non-empty
interval of real numbers, which is of course uncountable infinite. We will mostly
work with discrete random variables in the beginning.
Let P be a probability measure on Ω. A probability measure is a function
on a sample space1 , such that the following conditions are satisfied:
(1) ∀ω ∈ Ω, P (ω) ≥ 0.
(2) P (Ω) = 1
(3) For any countable, pairwise disjoint collection of sets {Ai }i∈Λ ⊆ Ω,
[ X
P( Ai ) = P (Ai ).
i∈Λ i∈Λ
With this in mind, we can now define the probability distribution of a discrete
random variable as
px = P (X = x) := P (X −1 (x)).
The probability that the random variable takes on value x is given by the prob-
ability of the event that is mapped to x by X. We will call P (X = x) the
probability mass function for X. To see why this is a quite natural definition,
consider Ω to consist of all possible outcomes when a fair coin is flipped twice,
i.e. Ω = {HH, HT, T H, T T } Then, we can define X to associate with each
event in this space the number of heads in the event. Then, keeping in mind
that each outcome is equally likely, we have that
1
P (X = 2) = P (X −1 (2)) = P (HH) = .
4
1 Strictly speaking, the function is defined over a collection of subsets of the sample space,
which satisfies certain properties, a so-called σ-algebra. If one follows this definition, the first
condition about P would have to change to P ({ω}) ≥ 0. We will relax this requirement, since
we are mostly dealing with discrete random variables, in which case the σ-algebra actually is
equal to the powerset of Ω, so that we can use events instead of the sets containing the events.
We will therefore use slightly abusive notation by just writing P (ω) instead of P ({ω})
8
3.2 Stochastic Processes
3.2.1 Definitions
A stochastic process {Xt }t∈T is a family of random variables, indexed by a
parameter t, which runs over an index set T . One simple example for a stochastic
process would be the action of throwing a die five times, and record in each
individual trial the number of dots. For any particular sequence of realizations
of the random variable, we use the term sample path. So, for instance, {1,4,2,3,6}
and {2,6,3,4,2} are both sample paths of the stochastic process just described.
Two common classifications of stochastic processes are based upon distinctions
about the state space, which is the range of the random variables, and the
index set T . We say that a process is a discrete state process or a continuous
state process, if its state space is discrete or continuous, respectively. Similar
distinctions are made with respect to the index set. A special case, which is
often used in practice, is that of setting T = [0, ∞], since then the stochastic
process can be representing a continuously occurring random phenomenon, such
as the diffusion of particles in an aqueous solution, or the price of a stock. We
see that our example of throwing a die five times is a discrete-state, discrete-
time process, since there are finitely many outcomes possible in each trial, and
the number of trials is finite as well (they could actually both be denumerable,
so long as they are not uncountably infinite, they are discrete).
There are some further properties of stochastic processes that appear in nu-
merous applications, so that it is useful to look at them.
1) If for any t1 < t2 < t3 , Xt2 − Xt1 , Xt3 − Xt2 , are independent, then Xt is said
to be a process with independent increments.
2) If for any ti the distribution of Xti +h − Xti depends only on h, then Xt is
said to have stationary increments.
3) Xt is said to have the Markov property if given Xt and s > t, Xs is indepen-
dent of Xu , for all u < t.
The first property basically says that the amount the random variable changed
(or the sample path taken) over some time interval does not affect the prob-
ability of the change in any other ensuing, disjoint time interval. The second
property is best explained with an example. If we assume for the moment being
that the stochastic process describing the path of a stock satisfies this property,
then we can expect to see the same change in stock price over a fixed time
interval, say a day, no matter if we choose to look at the market today, in a
week, in a month or in a year. The only thing that influences the distribution
of change is the length of the time interval, regardless of which time we start
looking at the price. The third property is the most important property in
this context. It basically states that all that matters for the future path of the
stochastic process is the current state, and we can discard all states before this
moment in making statements about the future. In a sense, the Markov prop-
erty expresses the notion of the process having no memory. One such example
9
is the drawing balls of different color from an urn with replacement. Every time
we draw a ball, the probability of drawing a certain color stays the same, since
we put the ball we drew before back into the urn. The urn hence does not
”remember” what we did before, since the situation is always the same in every
trial. In essence, this is one of the assumptions we will be making about the
path of the stock price. At any given point of time, past developments should
not really matter since only current information should have influence on the
price. The standard academic finance argument for this is that if there were
some past patterns in the market that people would use to predict the future,
then if there were some real gain from this, many investors would start using
this procedure, which would influence the price and ruin the entire prediction,
making it useless. There is however controversy as to how much past patterns
can really say about the future, and the entire field of technical analysis makes
use of techniques that exploit patterns. The usefulness of this area is disputed,
and not the topic of this paper. For the purpose of deriving the Black-Scholes
model, this assumption will be used, since it is a reasonable starting point.
10
3.2.3 Binomial Price Process
We are now in the position to describe the model which will lay the groundwork
for the derivation of the Black-Scholes equation. It was first proposed to simplify
the derivation of the equation in 1979, by Cox, Ross, and Rubinstein (see [5]).
This section is largely adopted from [4]. We assume that we are operating in
an N -period economy, where in each period either a “good” or “bad” state
materializes. Hence, we can model this by letting Ω0 = {0, 1}, and further
require that P (0) = q, and P (1) = 1 − q, where 0 ≤ q ≤ 1. Bad in this
case is associated with 0, and good with 1. We let Ω = (Ω0 )N be the sample
space describing the outcome in each period. Then we can define the (random)
projection mapping
πt : Ω → {0, 1}
by πt (ω) = ωt , where ωt is the state of the economy in period t. The function
then just picks the tth coordinate from the random vector ω = (ω1 , ω2 , ..., ωN ).
It is also of interest to us to measure the number of good and bad states that
the economy was in at any given time t. To do this, we define
t
X
nt (ω) = πt (ω)
i=1
The interpretation of this is that for any particular sample path ω, nt (ω) will
give us the number of periods up to time t in which the economy was good,
and similarly t − nt (ω) will give us the number of periods in which the economy
turned out bad. Since for each i, ni is a Bernoulli random variable (only two
outcomes possible), we can see that nt , as the sum of independent Bernoulli
random variables, is binomially distributed. Therefore, similar to the genetic
example before, the distribution function is given by
t k
P (nt = k) = q (1 − q)t−k , k ∈ {1, 2, ..., N }
k
The next step is to relate the stock price to these states of the economy. Suppose
that initially the stock price is S0 . Then, in period 1, the stock price is either
uS0 , or dS0 , where u > d > 0 are the returns on the stock. The first case
occurs when the economy is in a good state, and the second if the economy is
in a bad state. We need to make d greater than zero to ensure that the stock is
always strictly positive, since a zero or negative price would make no sense. We
further assume that in any period s thereafter, we have the same u and d, so
that St+1 = uSt , or St+1 = dSt , for each t, depending again on the state of the
economy. Therefore, the price of the stock at time t given any particular ω ∈ Ω
is
11
St u
ln = nt (ω) ln(u) + (t − nt (ω)) ln(d) = nt (ω) ln + t ln(d). (3)
S0 d
We see that the logarithm of the ratio of the price at time t and time 0
is a linear function of a binomial random variable, and hence itself binomial.
This will lead us to introduce in the next section the model of geometric Brow-
nian Motion, for the stochastic process governing the price, which is obtained
by taking the limit as the time interval between price changes approaches 0.
Although the exact derivation of this will not be shown, anyone who has taken
some statistics should be familiar with the binomial approximation to the Nor-
mal distribution. The idea here is similar, in that the process described above is
binomial, and hence approximately normal, so that in the limit the logarithm of
the ratio of successive prices is normally distributed. The term geometric comes
from the fact that we are considering the logarithm of the ratio of successive
prices, the so-called log-return, or log of percentage change over some period,
instead of the prices themselves.
Now, one might ask why assuming that the price of a stock goes up or down
with the same probability and by the same percentage amount in any given
period is realistic. This is definitely not true if we let the period be a day, or
even a week. However, we never said how long the period would really be. If
we think of the period as consisting of a second, then this might be a pretty
realistic approximation. If we want certain continuity assumptions, which lead
to the Brownian Motion, then we want the stock to only undergo very small
changes in very small time intervals, and not have a jump discontinuity. There
are models for the case that the stock jumps from time to time, but we will
not discuss them here, because the Black-Scholes model does not build on this
assumption.
12
4 The Black-Scholes Equation
4.1 Example of Hedging Portfolio
We will now present one way of deriving the Black-Scholes equation, via a
binomial pricing process, which is based on the binomial price model described
in the last section. This approach was first taken in [5], and represents the
clearest way of deriving the equation. The presentation will closely follow the
original paper. We assume that there is one risk-free interest rate at which we
can borrow or lend money, and that there is one period left until the expiration
date of a call option on the stock. Now, suppose we write 3 calls with strike
price $100 at C each, buy 2 shares of the stock at $100, and borrow $50 at an
interest rate of 25%. Suppose further that the stock either doubles or halves in
value at the end of the period. Then, if the stock goes up to $200, the calls we
wrote are going to be exercised, so that we lose 3($200 - $100) = $300. We also
have to pay back the $50 we borrowed, plus the interest of $25 accumulated.
Finally, since the stocks we had went up to $200, we have $400 in stocks. Adding
all of these together yields $25. Similarly, if the stock went down to $50, we
have $100 in stocks, the calls will not be exercised, and we owe $75 principal
and interest of the money we borrowed. This again amounts to $25. So, no
matter what happens, we will end up with $25 at the next period. Since we
require there to be no risk-less arbitrage opportunities, the cost of setting up
this portfolio must equal its profit, so that we require that 3C − 200 + 50 = 25,
and then C = 58.33.
We hence have determined the fair value of the option, by requiring that it be
the value that equals its expected payoff. The interesting thing to note is that
we have determined this merely by knowing the interest rate, the underlying
stock price, the strike price, and the range of values that the stock can take
on after one period. There was no mention anywhere of the probability of the
stock going up or down, which is somewhat surprising. We only had to know
the possible values the stock could take on after this period, which is what the
volatility in a way will tell us later on in the Black-Scholes equation.
13
either ∆uS + r1 B or ∆dS + r1 B, with probability q and 1 − q, respectively. Let
us now choose ∆ and B in such a way to equate the end of period payoffs from
the call with the end of period values of the portfolio, i.e.
[pCu + (1 − p)Cd ]
C = ∆S + B = (4)
r1
r1 − d u − r1
p= , 1−p=
u−d u−d
Notice again how the probability of the stock going up or down does not appear
in the equation for the value of the call. Another nice feature of introducing p
is that it has the properties of probability, since it is always between 0 and 1.
To see this, notice that r1 > d, so that r1 − d > 0. Similarly, u − d > 0, so p as
the ratio of these two values is certain non-negative. To see that it is less than
1, we note that since u > r1 , it follows that u − d > r1 − d, and hence since
1 −d
r1 − d is strictly greater than zero, p = ru−d < 1.
We now go over to a three period model, where we are interested again in
pricing the call in period 0, and it expires in period 2. The stock can take on
3 possible values, u2 S, duS, or d2 S. the value of the call at expiration date is
then again the maximum of zero and each of these values, depending on the
state of the economy that materialized during period 1 and period 2. We can
now derive recursively the value at which the option should be priced in the
first period, by starting at the two possible outcomes in the first period, and
asking ourselves what it should be there. Applying (4) to each of these cases,
we arrive at
14
Cu = [pCuu + (1 − p)Cud ] (5)
Cd = [pCdu + (1 − p)Cdd ]
Taking these values as given, we can apply (4) once more to figure out the
value of C,
C = [p2 Cuu + 2p(1 − p)Cud + (1 − p)2 Cdd ]/r12 . (6)
We recognize the inside of the parentheses to be a binomial expansion. Exploit-
ing the pattern that emerges when we increase the number of periods, we are
able to write down the general formula for the value of a call at period 0 that
expires n periods from now as
n
X n k
C=[ p (1 − p)n−k max(0, uk dn−k S − K)]/r1n (7)
k
k=0
This can be further simplified and brought into a form that resembles the Black-
Scholes very much. Let b = min{a ∈ N| ua dn−a S > K}. Then b is the smallest
number of positive states of the economy necessary for the option to be in-the-
money at expiration date. Then for every k < b, max(uj dn−j S − K, 0) = 0,
and for every k ≥ b, max(uj dn−j S − K, 0) = uj dn−j S − K, by the way b was
defined. Since all the periods up to the bth are then zero in the sum in (7), it
simplifies to
n
X n k
C=[ p (1 − p)n−k (uk dn−k S − K)]/r1n (8)
k
k=b
If b > n, then the option is clearly worthless, since then the minimum amounts
of good states in the economy necessary to bring the stock up to a value that
will make the option worth exercising is larger than the number of periods, n.
Separating (8) some more we get
n k n−k n
X n k n−k u d −n
X n k
C = S[ p (1 − p) ] − Kr1 [ p (1 − p)n−k ] (9)
k r1n k
k=b k=b
15
probability that in n periods there will be at least b successes , given n total
periods, and “probability” p0 of success in any given period (it is not actually
a probability, but it behaves like one). This is the celebrated Binomial Option
Pricing Formula, and it looks a lot like the Black-Scholes equation already.
This in turn will now be obtained by a limiting procedure, where the number
of periods goes to infinity. 2
which will yield the Black-Scholes equation for a call option. The put option
formula, which we stated in the beginning of the paper, can then be obtained
via the put-call parity relationship in section 2.2.2.
We proceed by showing the convergence of the second term only, since the
work necessary to show convergence of the first term is very similar. The com-
plementary binomial probability distribution, Φ(b, n, p), gives the probability
that given n total periods, the number of upward moves of the stock is greater
than or equal to b, where p is the probability of the stock moving upward. Re-
calling that nt is the number of upward moves of the stocks in n periods, we
can write the following expression:
If we let S0 and St denote the the asset price now and n periods later, then we
can take the expectation and variance of equation (3) to get
St u
E[ln ] = n p ln + ln(d) (14)
S0 d
u 2
St
V ar[ln ] = np(1 − p) ln (15)
S0 d
2 One should of course think of these periods to subdividing a finite time interval, namely
the one between the time one wants to price the option at, and the expiration date of the
option. The limiting process is hence one of infinite subdivision of a finite interval, not one
of going indefinitely into the future, keeping the subintervals constant, which would make no
sense in this context.
16
We now require that this mean and variance should in the limit, as n approaches
infinity, agree, since they are both describing the same underlying process of the
log-returns of stock prices. Hence,
u σ2
lim n p ln + ln(d) = (r − )(ndt) (16)
n→∞ d 2
u 2
lim np(1 − p) ln = σ 2 (ndt) (17)
n→∞ d
where dt stands for the length of the period, and ndt gives the time until expi-
ration date of the option. In the original equation, (1), T = ndt To see where
these values came from, see the following derivation.
d2
√
= d1 − σ ndt
ln(S/K) + (r + σ 2 /2)ndt √
= √ − σ ndt
σ ndt
ln(S/K) + (r − σ 2 /2)ndt
= √
σ ndt
We hence see that since this is the standardized value of the log-return, the
mean is (r − σ 2 /2)ndt, and the variance is σ 2 (ndt), since it is the square of the
standard deviation, which is found in the denominator of the above expression.
ln( Kn )
By definition, a is the smallest non-negative integer such that b ≥ ln Sdu . As
(d)
a consequence, there exist an in (0, 1] such that
K
ln Sd n
k−1= − .
ln ud
Now, taking the limit as n approaches infinity, and utilizing the fact that the
binomial distribution converges to the Normal distribution in the limit, we get
σ2
ln K
S − r − 2 τ
lim (P r(nt < b − 1)) = N √ , (19)
n→∞ σ τ
which is exactly the term in the Black-Scholes equation when taking the com-
plement and using the symmetry property of Normal Distribution. The second
convergence can be shown similarly, so this completes our derivation of Black-
Scholes option pricing formula via the Binomial Pricing Formula.
17
5 Data Collection and Methodology
5.1 General Comments on Data Selection
I was using the Berkeley Options Database for the period from January 1989
to December 1993, which was made available to me through the Finance De-
partment at the University of Arizona. It contains all option trades and quotes
recorded at the Chicago Board Options Exchange between those dates, with
data about volume of trade, price of option, price of underlying stock, strike
price, expiration date, and a time stamp that reveals at which time, up to the
second, the trade was made. For quotes, the bid and ask spread is revealed.
The exact expiration date can be determined by looking at the the third Fri-
day of a given expiration month. The program Date.java I have written, which
can be found in the appendix, will determine this automatically via date func-
tions. Since all of the options traded on American exchanges are American
options, meaning that they can be exercised before expiration date, and the
Black-Scholes model is actually for valuation of European Options, it was nec-
essary to select stocks which did not pay any dividends between 1989 and 1993
(or at least over the life of a particular option during that period). If a stock
pays no dividend, then the incentive to exercise early will vanish, due to the
time value of the option, i.e. the possibility of favorable future developments.
The arguments laid out before for the pricing of an European option should
technically hold for non-dividend paying stocks. The first thing, however, that
I had to determine to make the data useful, was which options were actually
continuously traded on the exchange. Prof. Lamoureux provided me with a list
of options that were traded on January 3rd, 1989, and then with another list
of options that were traded on December 30, 1993. With the help of a small
program which determined which ticker symbols were present at both dates (the
number of symbols was around 200 and 600, respectively, making it too tedious
to do this manually), I arrived at a list of about 150 options.
The next task was to find out which of these 150 options on different stocks
paid no dividends. I started out typing in ticker symbols at Yahoo! Finance,
and looking at historical prices, which has a feature to just display stock splits
and dividends. After a fair amount of symbols had not even shown up on Yahoo,
I began to wonder, researched a bit online, and found out that the symbols on
the options do not necessarily match the symbols on the stock. So, I used the
Options Clearing Corporation’s web page to translate the option ticker symbol
into the underlying stock symbol, which still did not work for some symbols.
This was partly because some companies had gone bankrupt since then, and
hence were neither listed in the Option Clearing Corporation, or Yahoo. Some-
times, mergers between companies made the historical data irretrievable, at
least through Yahoo, and sometimes new companies had taken older compa-
nies’ ticker symbols over, which led to some confusion. After going through all
of the symbols, I found six stocks, which actually did not pay dividends. The
companies I have selected are FedEx Corporation, Computer Sciences Corpo-
ration, Forest Laboratories Inc., Oracle Corp., National Semiconductor Corp.,
18
and Molex Corporation. Some of these companies did not have any information
about dividends listed on Yahoo, so I looked up dividend information about
these up in the Daily Investor’s Guide in the library.
1CSC890106143110 3 04500004000000504710
Here, the 1 indicates a Trade, as opposed to a quote, the CSC denotes the
underlying stock (Computer Sciences Corporation), 890106 is the date, 143110
means it was traded at 2:31 pm and 10 seconds (the time is military time), the
3 denotes the expiration Month (March), the next empty space means that it
is a call, whereas in the case of a put there would be a -, the 04500 denotes
the strike price ($45), the 00400 denotes the price at which the option was
traded ($4), 00005 is the quantity of options traded (5), and 04710 is the price
of the underlying asset ($47.10). Notice that the price of the option is already
standardized to be the per-unit price, so that we don’t have to divide it by the
number of options traded. We notice that the price of the option, $4, reflects the
value of immediate exercise, $47.10 - $45 = $2.10, i.e. the intrinsic value, plus
the time value of $1.90. Detailed information about working with the Berkeley
Option database can be found in [7].
5.3 Methodology
There are now several ways we could proceed based on the data we have. Since
in many cases there is more than one trade of a certain option on a given day,
we decided to look at the last trade on any given day, as the option price on that
day. In accordance with this, we also chose to consider the corresponding asset
price to be the one we use to calculate the historical volatility. It is important
to be consistent here, since we initially thought of taking the closing prices from
Yahoo! Finance, but realized that the prices there sometimes differed from
the prices at which the option trades were made. This could in extreme cases
lead to arbitrage opportunities, which actually did not really exist, since there
might be a time difference between the closing price of the stock traded on
the NYSE or NASDAQ, and the last trade of the option on that stock made
on the Chicago Board Options Exchange. I have therefore written a program,
19
FormatOptionData.java, which can be found in the appendix, to filter out the
last trades on each day the option was traded, for an option of pre-specified
strike price and expiration date, and convert it into a more readable format. To
determine the risk-free rate, we will look up the (annualized) interest rate for
a 3 month Treasury Bond. This is the best proxy for the risk-free rate, since
there is virtually no risk of the government defaulting at any given time, so that
the return is guaranteed.
20
goals is therefore to determine how many past periods (days) of data we want to
take into account to arrive an estimate as precise as possible for the volatility.
Since we do not know the real volatility, how can we measure the precision of
our estimate? Our metric for this will be the squared relative deviation,
(P̂ − P )2
M (P̂ ) = ,
P
where P̂ is the price the Black-Scholes equation yields when we plug in our
estimated volatility, and P is the actual option price. We do this by fixing a
day, probably in the middle of the option’s life, and looking at the last trade of
the day, thereby determining the option price, P . We then determine historical
volatility for different time windows, by looking at prices from the past up to
the date that we fixed, and determine which one gives us the best estimate.
We would furthermore like to look at the consistency of the estimates, in the
sense of them going into a certain direction if we increase the time interval, and
if this trend also holds up when we compare the direction across the different
stocks. We will see this in the results in a graph. To get some more information
from the results, we also define the residual to be the difference between the
predicted price and the actual price, the squared residual to be the square of this
number, and the relative residual to be the residual divided by the actual price.
The relative residual will serve as a measure of how much the predicted value of
the option differed from the empirical value, in percentage terms. The relative
squared residual will however be the variable which will be regressed on the time
window. The time window will be changed in increments of 7 days, so the first
historical volatility calculation will be based on closing prices of the stock 7 days
before the pricing date of the option up to the pricing date, the second one 14
days before until the pricing date, and so on. To hold other variables constant,
and allow some between-stocks comparisons, we will estimate volatility for the
same pricing date, for each stock involved3 . I found the price of put options on
the stocks in the table below, for which the true option price, the expiration
date, the stock price, the interest rate (which is of course the same across all
stocks), and the strike price is shown.
3 In theory, because of the stationary increments property of the Brownian Motion that
is assumed to describe the stocks path, this should actually not matter. However, in reality
this would possibly make a difference, since the stocks probably do not follow a Brownian
Motion process exactly. We try to minimize nuisance in our study by not analyzing volatility
of different stocks over different periods, since there might actually be some periods in the
market where overall volatility is higher, or sudden changes occur. If this is the case, then
we want this underlying noise to be present for all stocks, so that we can still make some
comparisons among them, and attribute differences to the stocks, and not to the underlying
conditions in the market
21
6.3 Results
I have included the capability to estimate historical volatility, according to the
above formula, into the program UserInterface.java I am using. What I hoped
to find is some kind of point of maximum efficiency for estimates of volatility,
marking the point at which the cost of changing volatility outweighs the in-
crease in precision gained by taking more time into account. This would require
volatility to be at least “locally” constant, i.e. within some small time interval
around a fixed date. In all this, one can however never forget, that by using
the Black-Scholes as a measure of the precision of my estimates, I am actually
testing the joint hypothesis that other assumptions of the Black-Scholes, such
as no-arbitrage, and complete markets, are true, and that varying the time-
scale makes a difference in estimating volatility. Unfortunately, there seems to
be no better way of doing this, since most more advanced models do not have
closed-form solutions, and therefore are not computationally easy to implement
as proxies for the accuracy of volatility estimates. A summary of the results for
historical volatility is shown in the table below.
The volatility is not very meaningful in this case, since the summary here is
aggregated across all stocks, and the stocks have different volatilities. Looking
at the Residual, we see that using historical volatility, the highest an option
is overvalued is by $6.80, and the lowest one is undervalued by is $4.79. On
average, using historical volatility (disregarding the time window chosen), the
22
stock option is overvalued by $2.18. This persistent upward bias will show up
later on in a plot of volatility versus time window chosen. The relative residual
reveals an average overpricing by 111%, with a low of 91% underpricing, and
including a case where an option was overvalued by 316%. The squared versions
are mainly measures which ignore the direction of the pricing bias, and the
relative squared deviation will be used in a regression analysis to determine
how long of a time window one should use to minimize the pricing bias.
Instead of displaying a graph of how volatility depends on the length of
the time window chosen, we instead opted to present the relative residual as
a function of the time window, since for all the stocks we selected, since they
are virtually at the money, the price the Black-Scholes equation predicts is
an increasing, linear function of the volatility, so that the relative residual,
being again a linear function of the predicted price, will reflect the calculated
volatilities, since if these increase, so does the relative residual. It is also more
meaningful to display these, since they represent the percentage by which the
option was under or overpriced relative to the true price.
23
In each case, there is a persistent overpricing bias, which seems to stabilize
itself after one takes a time window of more than a hundred days (remember,
this means that historical volatility is calculated hundred days before 4/10/1992
up until that date, and then plugged into the Black-Scholes equation with the
other values observed from the market). It also seems to be a common pattern,
that taking into account two or three weeks (one interval denotes seven days)
seems to be the best one can do with historical volatility. In some cases, such as
MOQ and ORQ, big spikes in volatility are observable, which are most likely due
to sudden changes in the stock price in the time window chosen. This shows the
fundamental tradeoff in historical volatility, namely that in short time intervals
the sampling error will be larger than in longer ones, so local outliers in log-
returns will distort the volatility we arrive at. On the other hand, taking into
account more data seems to produce an upward pricing bias.
To make our casual observations more precise, we will now regress the
squared relative deviation on the time window chosen. One problem associ-
ated with this, which we have not taken into account, is that there is serial
autocorrelation present in our data, since we are successively enlarging our time
window, and thereby taking into account the same data, plus a new batch of
seven days, in every sample point. In other words, the time windows are over-
lapping, since we always calculate historical volatility up to the 10th of April,
1992. We have calculated the Durbin-Watson statistic for this purpose, and
concluded that there is positive autocorrelation present, since it is less than 1.5,
meaning that successive values of calculated volatilities are correlated with each
other. The linear regression we use hence loses its efficiency criterion, but it still
gives us unbiased estimates of the parameters of the coefficient in the equation.
We hence proceed with the regular regression shown below.
The effect that the length of the Time Window chosen has on the squared
relative residual is significant at the 5% level. Since the coefficient is positive,
24
albeit only very slightly so, we conclude that increasing the time window will
increase the pricing bias, so that a very short time window should be chosen
to minimize the pricing bias (we want the squared relative residual to ideally
be zero, in which case there is no bias). Putting these facts together with
our observations earlier, we conclude that taking into account between 2 and
3 weeks, in the absence of unusual spikes, will lead to the best estimates of
historical volatility, as we measure it.
7.2 Results
All the residuals are defined the same way as in the historical volatility case.
What we are changing now to find the optimal estimate is the day on which we
elicit implied volatility from the closing trade of the option we want to price, with
the same expiration date and strike price. We then go back and plug this value
into the equation for pricing it on 4/10/1992. We define DaysBeforePricing -
as the name suggests - to be the days before 4/10/1992, from which we take the
implied volatility. Let us consider an example to make the procedure clearer.
Let us say we take CSC to be the stock, with strike price $65 and expiration
month September. We now look at implied volatility on April 2nd, 1992. Let
us suppose that the stock price is $60, the short term interest rate is 3%, the
option price is $5.00 on that day. Then we use these values, and see - with
the help of the Newton-Raphson algorithm - which value of volatility would
lead to the observed option price. This is our implied volatility, which we then
again plug back into the Black-Scholes equation, but this time with the observed
25
market values on April 10th, which are shown in the table of stocks earlier in the
historical volatility section. This is then our predicted price, which we compare
to the real price via our residuals. Let us now look at the summary statistics
for implied volatility, and the associated residuals of our predicted prices and
real prices.
Although these results are aggregated over all stocks and all days, the volatil-
ity is moving in a well-defined band between about 20 and 60 per cent, with a
mean of 40. This is in contrast with the more wildly varying values we calculated
for historical volatility, ranging from 0 to 170, with a mean of 70. The stan-
dard deviation is also significantly lower than in the case of historical volatility,
suggesting a more precise estimate. Looking at the residuals, we notice that
on average the option is only overpriced by $0.13, compared to $2.17 in the
historical volatility case. The outliers are nearer towards the mean as well. A
similar picture emerges when looking at the relative deviations, where on aver-
age the option price was only overpriced by about 6.3%, whereas in the historical
volatility case the overpricing was an average 111%. The largest overestimate
of the option price is 114%, barely above the average by which historical over-
estimated. Looking at these results, we see that implied volatility seems to be
clearly superior to historical volatility in terms of approximating true volatility,
at least insofar as it replicates the actual price in the market much nearer when
using the Black-Scholes model.
Before making further comments on this, let us look at a graph of how the
relative residual changes as we go back from the day we price the option on
to elicit implied volatility. Notice that for MOQ and CSC, there are not many
days available, since there were not any trades of the options with the necessary
expiration date and strike price more than 20 days before April 10th. For the
most part, it is quite striking how stable the estimates are over time, and how
near to 0 they are. The only anomalies are MOQ, which seems persistently
underpriced using implied volatility, and CSC, which is constantly overpriced,
with no trend downwards noticeable. For MOQ, the reverse trend is observable,
where persistent overpricing is the case. Unlike in the historical volatility case,
there is no clear trend distinguishable here, so that we have to turn to regression
to determine if going back further leads to better estimates or not.
26
Serial Autocorrelation, as shown present by the Durbin-Watson statistic be-
low - again limits the efficiency of our tests, and the strength of our conclusions,
but we nonetheless proceed to the regression results. Notice, however, that this
correlation over time is actually a reason to doubt the independence assumption
made on the process governing stock price, since if the past is strongly corre-
lated with the presence, then independence is clearly violated. There is much
debate about whether this assumption can hold or not, and we merely mention
the issue here, since it seems to show up in the data. On the other hand, the
autocorrelation is not surprising, given the stability of our residuals over time,
since if the values were staying the same all the time, then we would have perfect
serial autocorrelation.
According to these results, which are significant at the 1% level, the pricing
bias goes down if we go back in time more. In fact, if this model were the true
model, we would have to calculate implied volatility 69 days before the day we
price, in order to make the predicted price match the true price. However, the
effect of the two outlier stocks, MOQ and CSC, might be weighing in a lot,
27
which is why we run the regression again below, censoring these two stocks, to
see if clearer results are available.
This change actually makes our prior inference highly insignificant, since now
we have to fail to reject the hypothesis that there is no effect of how many days
before the date of pricing we take implied volatility from on the relative residual.
Although this is a rather unsatisfactory answer, if we base our conclusions on
the graph earlier, we might as well conclude that it is best to choose values from
the last few days, since they yield precise and consistent estimates.
8 Conclusions
We finish this paper by a discussion of our results. When using historical volatil-
ity to estimate true volatility, there is a significant bias towards overpricing the
option price when using the Black-Scholes equation, with an average of over
100% deviation from the real price. When changing the time window we take
into account, in the absence of large spikes, the first two or three weeks be-
fore the date we are pricing the option on seem to be the optimal choice. In
contrast to this, implied volatility yields astonishingly precise and consistent
28
estimates for volatility, with the exception of two significant outliers. The av-
erage overpricing is a mere 6% of the original option price. However, there
is no clear trend as to when one should look at implied volatility, since it is
nearly constant across weeks, sometimes even months. For this reason, it seems
best to stay within a week of the date we are interested in pricing the option
on, since from visual inspection this seems to be a stable period. However,
our conclusions about the precision of implied volatility is seriously weakened
by the two outliers, MOQ and CSC, since they persistently were over and un-
derpriced, respectively. Surveying the empirical literature on implied volatility
and looking at larger samples of stocks might clear up these issues more and
yield stronger conclusions. Comparing both estimators directly also turns out
to be problematic, since if we wanted to compare the means of relative residuals,
for instance, we would violate the independence condition of the two samples.
There might furthermore be problems involved with comparing implied and his-
torical volatility directly, since they are calculated very differently. Lastly, the
issue of serial autocorrelation in both samples would have to be dealt with more
formally, since more advanced regression models are available to arrive at more
valid conclusions.
Acknowledgements
I thank Jialing Dai for advising me on this project, and spending countless hours
discussing the mathematical, and statistical concepts involved in this project
with me. I also thank Robert Indik for his advice on how to structure the
report. I am very grateful to Duncan Buell, the donor of the Lusk Scholarship,
which I received to work on this project, and without which this would have not
been possible. I also am indebted to Chris Lamoureux, the Department Head of
the Finance Department at the University of Arizona, for making the Berkeley
Option Database accessible to me, and Andrew Zhang, the ”data czar” of the
Finance Department, for helping me locate and gather the necessary data.
References
[1] Chung, Kai Lai. Elementary Probability Theory with Stochastic Processes.
3rd edition. Springer (1974).
[2] Hull, John. Options, Futures, and Other Derivatives. 5th edition. Prentice
Hall (2002): 151-252.
[3] Kwok, Yue-Kuen. Mathematical Models of Financial Derivatives. Springer
Finance (1998): 62-64.
[4] Medina, Pablo Koch; Sandro, Merino. Mathematical Finance and Probabil-
ity. Birkhauser Verlag (2003): 201-216.
29
[5] Rubinstein, Mark, Ross, Stephen A, and Cox, John C. Option Pricing:
A Simplified Approach. http://www.in-the-money.com/artandpap/Option
Pricing - A Simplified Approach.doc.
[6] Scholes, Myron, Black Fischer. The Pricing of Options and Corporate Lia-
bilities. The Journal of Political Economy Vol. 81 (May - June 1973): 637 -
654.
[7] The Berkeley Options Database User’s guide. The Berkeley Op-
tions Database. Institute for Business and Economic Research
#1922. University of California, Berkeley. http://www.in-the-
money.com/pages/bodbguide.htm.
Appendix
UserInterface.java
/*
* Author : Peter Gross
* This program is a user interface for calculating implied
* and historical volatility
*/
30
startInterface ();
}
}
}
ArrayList < Date > dates = new ArrayList < Date >();
ArrayList < Double > rates = new ArrayList < Double >();
double rate ;
// Enter closing stock price , short term interest rate and date on day
// to be priced on here
double assetPriceNow = 16.2;
double rateNow = .0365;
Date dateToBePriced = new Date (93 , 4 , 12);
31
}
lines . add (" Stock \ tImplied Volatility \ tDate \ tPredicted Price ");
Date datePriced ;
Date expDate ;
int expMonth ;
int interval ;
int times ;
double timeLeft ;
double rate = .0365; // Need to specify interest rate on date to be
// priced
double strike ;
double assetPrice = 46.1; // Also need to specify asset price on date
// to be priced
boolean put = true ;
String fileName ;
String stockSymbol ;
String outputFileName ;
String optionType ;
32
System . out . print (" Stock symbol : ");
stockSymbol = in . nextLine ();
System . out
. print (" Select interval in days to calculate historical volatility : ");
interval = in . nextInt ();
// Convert to years
timeLeft /= 365;
if ( put )
optionType = " PUT ";
else
optionType = " CALL ";
33
prices [ i ] = B la c k S c h o l e s P r i c e ( assetPrice , strike , rate , timeLeft ,
volatilities [ i ] , put );
}
// Temporary variables
entry temp1 , temp2 ;
String s1 , s2 ;
// Makes sure only closing prices are used and prices are in the
// range of interest
if (! s1 . equals ( s2 )
&& temp1 . getDate () >= ( startYear * 10000 + startMonth * 100 + startDay )
&& temp1 . getDate () <= ( endYear * 10000 + endMonth * 100 + endDay )) {
entries . add ( temp1 );
count ++;
}
34
// average of log returns
average = sum / ( count );
return volatility ;
double price = 0;
if ( put )
return price ;
double approximation = 0;
double k = 1 / (1 + 0.2316419 * d );
double a1 = 0.319381530;
double a2 = -0.356563782;
double a3 = 1.781477937;
double a4 = -1.821255978;
double a5 = 1.330274429;
if ( d >= 0)
approximation = 1
- ((1.0 / Math . sqrt (2 * Math . PI )) * Math
. exp (( -( d * d )) / 2.0))
* ( a1 * k + a2 * k * k + a3 * k * k * k + a4 * k * k * k
* k + a5 * k * k * k * k * k );
else {
k = 1 / (1 + 0.2316419 * -d );
approximation = 1 - (1 - ((1.0 / Math . sqrt (2 * Math . PI )) * Math
. exp (( -( d * d )) / 2.0))
* ( a1 * k + a2 * k * k + a3 * k * k * k + a4 * k * k * k
* k + a5 * k * k * k * k * k ));
}
return approximation ;
35
// This function writes an ArrayList < String > into a file named
// outputFileName , for the first count
// elements of the ArrayList
public static void output ( String outputFileName , int count ,
ArrayList < String > data ) {
{
FileWriter writer = null ;
try {
writer = new FileWriter ( o utp ut Fil eN am e );
} catch ( IOException ioe ) {
System . out . println (" Could not create the new file : " + ioe );
}
return entries ;
// This method estimates implied volatility using the Newton - Raphson method
// with steps iterations
public static double f i n d I m p l i e d V o l a t i l i t y ( double assetPrice ,
double optionPrice , double interestRate , double strikePrice ,
double timeToExpiration , int steps ) {
36
for ( int i = 0; i < steps ; i ++) {
sigmaN = sigmaN
- ( B la ck S c h o l e s P r i c e ( assetPrice , strikePrice , interestRate ,
timeToExpiration , sigmaN , true ) - optionPrice )
/ Vega ( assetPrice , optionPrice , interestRate , strikePrice ,
sigmaN , t i m e T o E x p i r a t i o n );
}
return sigmaN ;
return vega ;
}
Date.java
/*
* Author : Peter Gross
* This class handels date arithmetic , which is necessary for the
* historical volatility calculations in UserInterface . Note that it is only
* meant to handle dates between January 1 , 1989 and December 31 , 1993 , since that
* is the range of dates that my part of the database covers .
*/
int year ;
int month ;
int day ;
boolean isLeap ;
year = initYear ;
month = initMonth ;
day = initDay ;
isLeap = leap [ year - 89];
37
// Increment one day
public void incrementDays () {
if ( month == 2) {
if ( isLeap ) {
if ( day < 29)
day ++;
else {
day = 1;
month ++;
}
}
else {
if ( day < 28)
day ++;
else {
day = 1;
month ++;
}
}
else {
if ( day < daysInMonth [11])
day ++;
else {
month = 1;
year ++;
isLeap = leap [ year - 89];
day = 1;
}
}
}
// Increment n days
public void incrementDays ( int n ) {
return firstDate ;
38
// Find the days between start and end
public static int daysBetween ( Date start , Date end ) {
int daysBetween = 0;
Date temp = new Date ( start . year , start . month , start . day );
return daysBetween ;
// This function returns the third Friday of a given month and given year
// This is important since options expire the third Friday of a given month
public static Date findExpDate ( int expYear , int expMonth ) {
int daysBetween = 0;
Date currentDate = firstFriday ;
FormatOptionData.java
/*
* Author : Peter Gross
* This program converts data from the Berkeley Option Database into a
* more readable format for further inspection .
* For a given year , expiration month , strike price , option type , and
* ticker symbol it will write a file with all trades of the particular option
* from the beginning of the year through the expiration month . Files from
* Berkeley Option Database need to be in same folder , and named after their
* ticker symbol
*/
ArrayList < entry > lines = new ArrayList < entry >();
39
// Set option type
if ( put )
optionType = " PUT ";
else
optionType = " CALL ";
// Read File
long count = 0;
entry temp ;
// Write file
// This function writes an ArrayList < entry > into a file named
// outputFileName , for the first count
// elements of the ArrayList
public static void output ( String outputFileName , int count ,
ArrayList < entry > data ) {
entry.java
/*
* Author : Peter Gross
* This class stores information from the Berkeley Option Database and
* allows manipulation of this information .
*/
40
import java . util . Scanner ;
public int year , month , day , time ; // Time and date of transaction
if ( market ) {
41
year = in . nextInt ();
month = in . nextInt ();
day = in . nextInt ();
time = in . nextInt ();
expMonth = in . nextInt ();
if ( in . next () == " Put ")
isPut = true ;
else
isCall = true ;
String type ;
if ( isPut )
type = " Put ";
else
type = " Call ";
if (( optionPrice / 0.1) % 1 == 0)
optionStr = optionStr + "0";
return symbol + " " + year + " " + monthStr + " " + dayStr + " "
+ timeStr + " " + exp + " " + type + " " + strikePrice + "
"
+ optionStr + " " + assetPrice ;
}
42