Lesson 2
Lesson 2
2.1 Introduction
The purpose of this lesson is to provide a broad overview of risk and its measurement.
First we will examine some futures contract data, and then some foreign exchange data. These
will illustrate the nature of risks facing a broad range of companies and individuals. Businesses
care about these risks because they determine the value of the company. For firms that trade in
offshore markets, foreign exchange exposure is significant. In this chapter you will be provided
with the basic toolkit and skills to measure risk empirically. The definition and understanding of
mean and variance and probability goes to the heart of risk management and this must be
understood to fully appreciate the spectrum of choice in strategic risk management. This extends
some of the analysis of Lesson 1.
The definition of risk is closely tied to exposure with negative outcomes. What is risk for
one person is not risk for another. The conventional measure of risk is through variance, which
will be discussed further below, but to strictly tie oneself to this definition is not only misleading
but also quite incorrect. There are some properties associated with variance which are convenient
and a close enough approximation to be useful. For now let’s just define risk in terms of
unfavorable and unexpected outcomes; that is downside risk.
The study of risk also requires an understanding and an appreciation for the nature of
probabilities (as discussed in Lesson 1). This is not to say that all risks can be defined in a
probabilistic sense, indeed there are so many uncertainties that sometimes attaching probabilities
to bad outcomes is unachievable. Nonetheless, an understanding of risk and the nature of
probabilities is key to understanding risk management techniques. The remainder of this lesson is
dedicated towards introducing some basic concepts.
The term probability is considered synonymous with the term frequency, and relates to
the number of time in repeating sampling that a specific outcome occurs. Probability is the
lifeblood of statistical measurement and forms the backbone of strategic risk management. To no
surprise the earliest applications of probability theory were applied to games of chance such as
rolling two dice or playing cards. In these games the payoff to a gamble is related to the
likelihood of an outcome. In rolling two dice the probability of rolling a seven is greater than
rolling a two or a twelve. In playing poker the probability of getting a pair is greater than the
probability of getting four of a kind.
Table 2.1 shows the possible outcomes from the roll of two dice. There are 36 possible
outcomes with each outcome represented in the shaded area. The number 2 appears only once as
so it would be said that 2 has a 1 in 36 chance of occurring or the probability that a 2 will occur
is 1/36 or 2.78%. In contrast the number 7 appears six times so it would be said that there is a 6
in 36 or 1 in 6 chance of rolling a 7 or a 16.67% probability (1/6 = 6/36) of occurrence.
The probabilities can be graphed onto a histogram or frequency diagram such as that shown
below. On the horizontal axis are the possible outcomes from 2 through 12, and on the vertical
axis are the frequencies of each occurring. Dividing the frequency by the total number of
possible outcomes gives the probability. Hence, a frequency diagram or histogram can easily be
transformed into a probability distribution graph. Technically this type of graph is called a
probability density function (or PDF).
7
5
Frequency/Probability
0
2 3 4 5 6 7 8 9 10 11 12
Sum of Two Dice
Let’s look at the probability distribution of the roll of two dice in @RISK. In @RISK you would
use the RiskDiscrete function to generate the probability distribution for the roll of two dice.
There are two elements to the problem. The first is the value of the roll (the x value) and the
chance that this occurs (the p-value). You can enter either the frequency of the outcomes (the
actual numbers of occurrence) or you can enter the actual probabilities. In this example there are
36 probabilities so if we look at Table 1 the probability of rolling a 7 occurs 6 times
{1,6},{2,5},{3,4},{4,3},{5,2},{6,1} so the probability is 6/36 = 0.167 or 16.7%. What this
means in practical terms is that if you were to roll the dice 100 times, you would expect to roll a
7 16.7% of the time.
In @RISK you can enter the RiskDiscrete function directly or you can enter it using the define
distributions icon.
Figure 1: RiskDiscrete Probability Distribution for the Roll of a Dice
The blue bar chart in Figure 1 is the theoretical distribution for the outcomes of the roll of two
dice. Before proceeding let’s examine the summary on the right side of the graph.
Figure 2 gives the value of the minimum, maximum, mean and standard deviation for the roll-
the-dice distribution. The minimum and maximum are obvious numbers since one can do no
lower than rolling two 1s or better than rolling two 6s. The mean is the probability weighted
average of all possible outcomes, and the standard deviation is a measure of dispersion around
this mean.
11
mean = ∑ pi xi
i =1
(=
1× 2 ) + ( 2 × 3) + ( 3 × 4 ) + ( 4 × 5 ) + ( 5 × 6 ) + ( 6 × 7 ) + ( 5 × 8 ) + ( 4 × 9 ) + ( 3 × 10 ) + ( 2 × 11) + (1× 12 )
7
36
= ( 0.028 × 2 ) + ( 0.056 × 3) + ( 0.083 × 4 ) + ( 0.111× 5 ) + ( 0.139 × 6 ) + ( 0.167 × 7 ) + ( 0.139 × 8 ) + ( 0.111× 9 ) + ( 0.083 × 10 ) + ( 0.056 × 11) + ( 0.028 × 12 ) = 7
To calculate the standard deviation we first have to calculate the variance. The variance is the
sum of the squared deviations from the mean. The standard deviation is the square root of the
variance.
12
= σ= ∑ p (x − x)
2 2
Var i i
i =1
12
Std Dev= σ= ∑ p (x − x)
2
i i
i =1
Table 2.2 Calculating Expected Value, Variance, and Standard Deviation for
the Roll of 2 Dice
Using @RISK we can simulate the discrete probability distribution. Figure 2.3 shows @RISK
output for 1,000 iterations.
Rolling a Dice
Comparison with Discrete({},{})
3.00 11.00
0.16
Rolling a Dice
0.14
Minimum 2.0
Maximum 12.
0.12
Mean 7.0
Std Dev 2.4
0.10
Values 100
0.08
Discrete({},{})
0.06
Minimum 2.0
Maximum 12.
0.04
Mean 7.0
0.00
0 2 4 6 8 10 12 14
You can see immediately that the simulated distribution mimics that theoretical distribution with
a mean of 7.00 and a standard deviation of 2.42. The theoretical probability of rolling less than
or equal to 3 is 8.3% (the probability of rolling a 2 plus the probability of rolling a
3=0.028+0.056=8.24 (with some rounding). The theoretical probability of rolling between 3 and
11 is 88.9%, and the theoretical probability of rolling higher than 11 is 2.8%.
The use of the terms ‘less than’, ‘less than or equal to’, ‘no more than’, ‘greater than’,
‘greater than or equal to’, ‘no less than’ or ‘between’ are essential terms in risk analysis. These
are in fact statements of probability or statements of likelihood of occurrence. In Figure 3 we
manipulate that vertical probability lines (by grabbing them with the cursor) and then move them
to make various probability statements. Here we have the probability of rolling 4 or less (16.6%),
the probability of rolling 8 or above (41.6%) or the probability of getting between 4 and 8
(41.8%). Note that all of the probabilities must sum to 1. So the probability of rolling less than 8
AND more than 4 is equivalent to 100% minus the probability of rolling 4 or less, minus the
probability of rolling 8 or above.
These measures of probability are based upon what we refer to as cumulative probabilities.
Discrete probabilities refer to the probability of an event happening. These probabilities are read
off the Y axis of the probability distribution functions. The discrete probability for the roll of a 2
is 2.8%, for the roll of a 3 is 5.6%, the roll of a 4 is 8.3% and so on. We can make the following
probability statements; the probability of rolling less than a 3 must be the probability of rolling a
2 and that is 2.8%; the probability of rolling 3 or less is the probability of rolling a 2 plus the
probability of rolling a 3 (0.028+0.056=0.084). This is also the same as the probability of rolling
less than a 4. When we add the probabilities together to obtain these probability distributions we
are actually working with cumulative probabilities (we accumulate the probabilities).
Figure 4 shows what we refer to as the Cumulative Probability Distribution. This graph can be
generated by clicking the icon on the simulated @RISK graph.
Rolling a Dice
Comparison with Discrete({},{})
4.10 7.90
Rolling a Dice
0.8
Minimum 2.0
Maximum 12.
Mean 7.0
0.6 Std Dev 2.4
Values 100
0.4 Discrete({},{})
Minimum 2.0
Maximum 12.
0.2 Mean 7.0
Std Dev 2.4
0.0
0 2 4 6 8 10 12 14
The cumulative distribution function is typically S shaped and is often referred to as the CDF.
Reading off the X axis the probabilities on the Y axis give the probability of being less than the
X axis reading. In Figure 4 the CDF is a step function because the roll of two die represent
discrete probabilities.
The relationship between the probability distribution function (also known as PDF) in Figure 3
and the CDF in figure 4, is that each step represents an addition to the cumulative probability
measure. So the cumulative probability of rolling 5 or less (27.8%) is equal to the cumulative
probability of rolling 4 or less (16.7%) plus the probability of rolling a 5 (11.1%). In this sense
the probabilities measured by the PDF are often referred to as marginal probabilities along the
CDF curve. The term marginal probability is a statement on the calculus of probabilities; the
derivative of the CDF at any point is equal to the probability measure along the PDF at the same
point. Thus, with the vast majority of probability measures in finance and economics being from
the CDF, when from time to time you hear your professor referring to the marginal probability
along the CDF, he/she is simply referring to the same-point probability from the PDF.
The rolling of two dice discussed above provides an easy example of a discrete probability
distribution. In most real world examples however data does not come in discrete form. For
example prices, sales, crop yields, milk production, exports, exchange rates and so on are
continually changing and do not always take on integer values. There are two types of risk
simulation problems of interest. The first is when the manager has historical data which can be
used to approximate the probability distribution, and the second is when no data is available. In
the first case the decision maker may want to assume a probability distribution. For example if a
manager is measuring the variability in sales for a given month, quarter or year he/she may want
to simply assume distribution based upon the mean and variance of the distribution. Two useful
distributions that require only two parameters are the normal distribution (RiskNormal) and the
lognormal distribution (RiskLognorm). While it is generally safe to assume that per unit prices
follow a log-normal distribution, the distribution of output may not be known. In this case simply
assuming a normal distribution might be naïve and lead to grave errors of probability. To
accommodate problems where the actual distribution is not known we can fit the distribution to
one of many parametric distributions. BestFit is a routine within @RISK developed to
accomplish this task.
If, however, there is no historical information to provide guidance on what the underlying
probability distribution is, then the modeler has to resort to subjective measures. Subjective
measures are important for new business opportunities or the introduction of a new product
where the price point is not known with certainty, and because the product is new there are few
to none reference prices. The managers might have an idea of the possible ranges of the price and
an even better idea of the most likely prices but these are uncertain. To solve these problems we
rely on a convenient family of distributions including the beta distribution (RiskBetaSubj), the
PERT distribution (RiskPert) or the triangular distribution (RiskTriang). What is unique about
these distributions is that all that is needed is a measure of worst case, best case and most likely
case (and an average or mean for RiskBetaSubj).
These distributions are discussed presently.
When data is available the simplest measures are mean and variance. Other measures that define
the probability distribution are skewness and kurtosis which will be discussed later.
Figure 5: Simulating Corn Prices and Yields
Agriculture is an interesting business to illustrate price and output risks because in most cases
farmers cannot manipulate prices (they are price takers) and yield is random to nature’s
elements. Figure 3 shows data for corn yields and prices in Ontario Canada. It is a small sample
from which we have to make inferences about probabilities. In Column D we multiply price
times yield in order to capture the per hectare revenue for each period. In Row 20 we use
=AVERAGE(B3:B18) to calculate mean yields and do the same for prices and revenues. In Row
21 we use the excel function =STDEV(B3:B18) to measure the standard deviation. This measure
of standard deviation is Excel assumes that the data are obtained from a sample and is therefore
adjusted for 1 degree of freedom relative to the population measure. As a rule you should use
STDEV rather than STDEVP when samples are low. A good rule of thumb is to use the sample
rather than population measure when the sample size is under 30 observations. But even with up
to 100 observations may be better. Under a principle of conservatism it is much better to have a
slight overestimate of the variance than an underestimate. In the output cells in ROW 23 we
enter the formula =RiskNormal(B20,B21) for corn yields and =RiskLogNorm(C20,C21) for
corn prices. The formula in Cell D23 is =B23*C23 which multiplies price times yield.
Figure 6: Corn prices (top), corn yields (middle) and Corn revenues (bottom) using the sample means and standard
deviations from observed data.
The probability distributions in figure 6 were generated from @RISK for lognormal and normal
distributions using the data in Figure 5. Now as discussed above we have assumed that the
distributions are normal and lognormal and have used them because it is generally understood
that prices follow a lognormal distribution in practice, and the normal distribution because there
are few probability distributions that can be generated using mean and variance alone.
Using Distribution Fitting to Define Probability Distributions
The assumptions in the last paragraph ought not to be taken for granted. There are many studies
that have found the normality assumption in crop yields not to hold. On the other hand there are
few studies that have shown prices to be anything other than lognormal.
To relax the need for assumption @RISK provides a routine called Distribution Fitting that can
be used to identify the ‘best fit’ probability distribution. Figure xxx illustrates the use of
distribution fitting for the corn yield data. The first step is to highlight with the cursor the data
you want to fit (corn yields) and click the Distribution Fitting icon. The graph of the actual data
in bins is shown and the ‘best fit’ distribution for this example is shown to be the logistic
distribution.
Figure 7: Using Distribution Fitting in @RISK to identify Best Fit Corn Yield Distributions
Note in figure 7 that the actual distribution looks quite different from the normal distribution in
Figure 6. This is not an infrequent result especially when data is limited as provided in Figure 6.
As a point of comparison the best fit distribution for corn prices is, as was assumed, the
lognormal distribution. This is illustrated in Figure 8.
Figure 8: Best Fit lognormal distribution for corn prices
The fit displayed is based on the Chi-Square distribution but there are two other tests from which
the user can choose. These are the Anderson-Darling test and the Kolmogrov-Smirnov test. We
needn’t bother with the specifics of each test but as a general rule if two or more of these tests
identify the same best fit distribution then that would be good eveidence that that the elected
distribution is the best representation of the data. The essential measurement of best fit is as
follows. Any of the distribution Logistic, Weibull, Normal, Extreme Value (ExtValue) and
others are of a parametric form which means that certain parameters that define the distributions
can be obtained from mean, median, mode, variance, skewness and so on. Based on the relevant
parameters a distribution is defined let’s call this . The actual data is then divided into a
number of bins ( ) and the mid-point of each of these bins is defined as . The frequencies for
each of these can be defined as and can be read off the frequency distribution graph as
illustrated by the bars in Figure 7. Having identified this value can be substituted into
can then be generated for each of the candidate probability distributions, and the one with the
lowest value of will be the best fit distribution. In essence the methodology of the best fit
distribution is to minimize the distance between points on a theoretical distribution and those
from the actual data. The distribution with the minimum variance wins.
Figure 9 shows the @RISK simulation of price, yields and revenues using the best fit
distributions. The outcomes of these simulations compared to those generated from the empirical
data and the assumed distributions are provided in Figure 6.
Figure 9: Corn prices (top), corn yields (middle) and Corn revenues (bottom) using distributions generated by the BestFit
procedure in @RISK.
Objective and Subjective Probability Assessment
The above examples were based on generating distributions when data was observable
and quantifiable. The generated distributions are objective in nature. However, in most real-
world applications outside of gambling casinos probability assessment is not as objective.
Objective probability distributions can be calculated from known data and experimentation. The
distribution described by the roll of a dice is an example of an objective distribution because
there is no ambiguity about the numbers or frequency of outcomes. Similarly one might examine
30 years of price or yield data and summarize from this data frequencies and probabilities. You
can then use these calculated values to make decisions.
In most management decisions the outcomes and the response to those outcomes are
based in probabilities. If there is disagreement between actuarial probabilities (an objective
measure based on available data) and subjective probabilities ( a modification to the objective
probabilities based on a ‘gut-feel’) you must go with the subjective probabilities. In this context
subjective probabilities reflect management's beliefs about specific outcomes and these in turn
lead to decisions. In team situations managers may have different subjective probabilities about
outcomes and when one thinks about it most quarrels are probably due to differing opinions
about the likelihood of specific outcomes occurring. Consensus assessments of risk, especially
when there is no known information, is sometimes referred to as the Delphi Method of
probability assessment.
Subjective probabilities are really quite hard to defend, as are many of life’s beliefs. In
practice managers (and academics) default to what the data says in order to make decisions. It is
not so much that the most favourable outcome will occur with certainty, it often will not.
However the objective of strategic risk management is to minimize the likelihood that a poor
outcome will result. In addition there is a bias, an optimistic bias, put forth by many managers in
order to get their projects accepted. Downplaying the downside is dangerous. Consider polls
taken for an election. The polls will indicate a winner and a loser. The loser always rejects polls
and will cite the number of times that polls have been wrong while the winner will always admit
that polls are good and will cite the number of polls that correctly predicted outcomes in the past.
This is the nature of the strategy. On the other hand suppose that a manager was conducting
market research using sound research methods. The research indicates that consumer acceptance
for a new product would be low, and the manager dismisses the results. Chances are that the
carefully constructed survey, designed to elicit objective probabilities, was correct and the
manager’s subjective probability assessment based on her own beliefs would have led to a
mistake, or shall we say an error in judgement.
Subjective Probabilities and the PERT Distribution
In the absence of information a useful probability distribution with wide application is the PERT
distribution. Because this distribution is so useful it requires some exploration in detail.
The PERT distribution ( ‘Program Evaluation Research Task’) was developed originally by
Malcolm et al (1959) to study the critical paths in the development and manufacture of the
Polaris Fleet Ballistic Missile program. The main task of PERT was to examine randomness
along nodes of a critical path to provide estimates of a range over which the project will be
completed. Malcolm et al (1959) sought a simple means by which engineers could simply state
shortest time, longest time and most likely time for any task to be completed at any node. In our
context we define the optimistic yield/mu as b, the pessimistic yield/mu as a, and the modal
1
yield, m. The standard deviation of these yields is assumed to be = σ ( b − a ) . Assuming
6
further that the underlying generating process is a beta distribution as provided above with
a + 4m + b 6( y − a) 6 (b − y )
y= , α1 = , and α 2 = the Beta-PERT distribution is given by
6 (b − a ) (b − a )
( y − a ) (b − y )
α1 −1 α 2 −1
f PERT ( y ) = . 1
B (α1 , α 2 )( b − a )
α +α −1
1 2
The PERT distribution in @RISK requires only three parameters which are subjective in nature,
although they can be based on historical data. The function is RiskPert(minimum, most likely,
maximum). First, let’s examine the data in Figure 6. Table 2 provides the minimum, maximum
and most likely value of the price and yield data. The most likely value is often called the mode
and it can be computed directly using the mode function in Excel or can be eye-balled from the
data. Essentially the outcome that will occur with the greatest frequency is the most likely
outcome or the modal value.
Table 1: Minimum, Most Likely, and Maximum values for corn prices and yields for PERT distribution
1
This formulation of the PERT distribution differs from that originally presented in Malcom et al
f ( y ) =K ( y − a ) (b − y )
α1 α2
(1959) who defined the beta distribution, , with
α1 =
( y − a )( 2m − a − b ) α1 ( b − y )
α2 =
( m − y )( b − a ) and y −a .
yields =RiskPert(5300,7300,8100) and
prices =RiskPert(0.10745,0.12, 0.2239)
The PERT distributions generated in @RISK for the corn examples are provided in Figures 10
and 11. Using the overlay capabilities in @RISK ( )we can overlay these PERT distributions
with the original distributions to observe how differing assumptions about probability
distributions compare.
Figure 12: Comparison of PERT and Normal Distributions for Corn Yields