Liability Modelling
Liability Modelling
Paul King
29 January 2023
Table of contents
1 Overview 6
2
5.2 Economic Scenario Generators . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.3 Actuarial uses of ESGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.4 Stylized facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.5 ESG summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.6 Single period mean-variance models . . . . . . . . . . . . . . . . . . . . . . . 30
5.7 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.8 Additional reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6 Run-off triangles 31
6.1 Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.2 Need for claims estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.2.1 Accurate estimation of reserves is important . . . . . . . . . . . . . . . 32
6.2.2 Example: Incremental paid claims . . . . . . . . . . . . . . . . . . . . 32
6.2.3 Cumulative paid claims . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.2.4 Mathematical (regression) model . . . . . . . . . . . . . . . . . . . . . 33
6.3 Basic chain ladder approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.3.1 Development factors for all years . . . . . . . . . . . . . . . . . . . . . 35
6.3.2 Assumptions underlying the basic chain ladder method . . . . . . . . 35
6.4 Adjusting for past inflation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.4.1 Assumptions underlying the inflation-adjusted chain ladder method . 38
6.5 Average cost per claim method . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.5.1 Assumptions underlying the average cost per claim method . . . . . . 40
6.6 The Bornhuetter-Ferguson method . . . . . . . . . . . . . . . . . . . . . . . . 40
6.6.1 Loss Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.6.2 Assumptions underlying the Bornhuetter-Ferguson method . . . . . . 41
6.7 Statistical models and simulation . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.7.1 Goodness of fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.7.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.8 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7 Ruin theory 45
7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.1.2 Premiums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.1.3 The surplus process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.1.4 Simulating the surplus process . . . . . . . . . . . . . . . . . . . . . . 46
7.2 The probability of ruin in continuous time . . . . . . . . . . . . . . . . . . . . 47
7.2.1 Some relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.3 Probability of ruin in discrete time . . . . . . . . . . . . . . . . . . . . . . . . 48
7.4 A counting process for claims . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.4.1 The Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.4.2 Distribution of time to the first claim . . . . . . . . . . . . . . . . . . 49
7.4.3 The compound Poisson process . . . . . . . . . . . . . . . . . . . . . . 49
7.5 Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.5.1 Lundberg’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.5.2 How R depends on other parameters . . . . . . . . . . . . . . . . . . . 50
3
7.5.3 The adjustment factor - compound Poisson processes . . . . . . . . . . 50
7.5.4 General aggregate claims processes - adjustment . . . . . . . . . . . . 51
7.6 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9 Behavioural economics 59
9.1 Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
9.2 Criticisms of expected utility theory (EUT) . . . . . . . . . . . . . . . . . . . 59
9.3 Summary of Prospect Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
9.3.1 Decision making under Prospect Theory . . . . . . . . . . . . . . . . . 60
9.4 Heuristics and behavioural biases . . . . . . . . . . . . . . . . . . . . . . . . . 63
9.4.1 Anchoring and adjustment . . . . . . . . . . . . . . . . . . . . . . . . 64
9.4.2 Familiarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
9.4.3 Overconfidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
9.4.4 Hindsight bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
9.4.5 Confirmation bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.4.6 Self-serving bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.4.7 Status quo bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.4.8 Herd behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.5 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.6 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4
10.4.3 Moral hazard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
10.4.4 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
10.5 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
71
5
1 Overview
The module covers the objectives in the IFoA syllabus for CM2 that aren’t covered by
MA3471 / MA7471. The broad structure will be:
We’ll be using both R and Excel, so you’ll need to make sure you have access to a working
R system.
You can access R by logging on to https://rserver.mcs.le.ac.uk. If you don’t have an account
from Fundamentals of Data Science last semester you will be sent one in the next few days.
(For some mysterious reason, it’s not possible to log onto the server on-campus if you are
connected to the University wireless network. You must log onto the Guest (_cloud) network
instead.)
Alternatively you can install R and RStudio on your own computer, or you can access
RStudio on the university computers via the Student Desktop.
6
2 Measures of investment risk
Financial markets operate under uncertainty, and hence every participant in those markets
is exposed to the possibility of losing money or, achieving a return that is less than expected
(i.e. they are exposed to investment risk). Several different measures can be used to quantify
this risk, all of which aim to measure the scale of the possible loss and/or take account of
the probability of incurring that loss. These are the two main components of any risk.
In this chapter, we define several widely used measures of financial risk and use them to
compare different investment opportunities. They could be used to analyse single assets or,
more usefully, portfolios of assets. In particular, we focus on variance of return, downside
semi-variance of return, shortfall probability and the value at risk measure.
ì Some risk-free investments exist; can you give examples? Why not just invest in
those, i.e. why bother taking investment risk?
Government bonds (Gilts in the UK) are practically risk free. Also bank deposits
in the UK are guaranteed by the government up to a limit (£85,000 at the time of
writing).
There’s a saying “No risk, no return”. Risk-free investments give relatively low returns,
and so many investors choose riskier assets with higher expected returns.
ì Can you think of some examples of why you would want to analyse the risks of a
portfolio of assets?
First, to make sure you are getting enough extra return for the risk taken compared
with risk-free alternatives.
Second, to make sure you are getting the highest return for the chosen level of risk you
are willing to take (your “risk appetite”).
Other reasons:
• If you measure risk it gives you a chance to control it e.g. by hedging or diversi-
fication
• You can start to see whether your assets will be a good match to your liabilities
• Allowing for the above, you can work out how much capital to hold so that, even
if the risks materialise, you are able to remain solvent.
7
2.1 Learning objectives
This chapter covers the following learning objectives from the CM2 syllabus.
The risk measures listed above can be classified into two main types. The first type are
dispersion measures, such as variance and semi-variance of returns; they estimate the spread
of returns around the expected value. Measures of the second type, such as value at risk
and expected shortfall, are probabilistic and are more directly related to the probability of
loss. As a generalisation, the dispersion measures often tell you more about the central part
of the distribution, whilst the probabilistic measures can provide more information about
the tails (to the extent that they can be reliably estimated).
ì What are the two components of risk that might be captured by different risk
measures?
ì Why do you think variance is the simplest and most widely used risk measure?
Variance is the most basic and widely understood statistical measure of dispersion
around the mean.
8
2.3 Variance of returns
Consider the price 𝑃 of an asset as a random variable on the probability space that is
formed by all possible events affecting the market. Then the single period return 𝑟 = 𝑃1𝑃−𝑃0 ,
0
where 𝑃0 is the initial price, is also a random variable. Investment risk stems from possible
deviations of the price from its expected value and the impact that this will have on the
return.
The simplest and most widely used risk measure in mathematical finance is variance of
return, which is defined to be:
𝐸(𝑅 − 𝐸(𝑅))2 = 𝐸(𝑅2 ) − (𝐸(𝑅))2
The square root of the variance, i.e. the standard deviation, can also be used as a measure
of risk, but variance is more convenient due to its better analytical properties (by avoiding
square roots).
The variance of return measures the total spread of the return distribution giving equal
weights to positive and negative deviations from the mean (i.e. all variability is treated
as “bad”). Usually investors benefit from positive deviations and only suffer from negative
deviations, so it would appear natural to focus on negative deviations only when considering
risk. One measure that does so is called the downside semi-variance of returns and will
be described in more detail below. However, typical distributions of financial variables
are approximately symmetric around the mean, so the variance and semi-variance will be
proportional. When the distribution is close to being symmetric, the semi-variance does not
contain any additional information that the variance does not.
Since the variance is usually easier to compute and is readily available as a standard output
in most statistical software, this advantage in ease of calculation is important. Alternative
measures that are more complicated are only justified if they provide genuine additional
insight. The use of the variance of returns also allows for the development of useful theo-
retical results, as we will see under mean-variance portfolio theory, where it can be used to
find optimal portfolio allocations. Lastly, if an investor has a quadratic utility function, or
returns are known to be normally distributed, then variance of returns is by definition the
appropriate risk measure.
ì If 𝑓(𝑥) is the probability density function (pdf) of the investment return, what is
the integral expression for the variance of return?
∞
∫−∞ [𝑥 − 𝐸(𝑥)]2 𝑓(𝑥)𝑑𝑥
9
ì How would you decide if it is worth using a risk measure that is more complicated
than variance of return.
You would only choose a different measure if the extra insight justified the extra
complication in the circumstances you are using the measure.
Semi-variance measures downside variability and is computed as the average of the squared
deviations below the mean return (i.e. it provides a measure of 𝐸(𝑟 − 𝐸(𝑟))2− , where 𝑓− ≡
𝑚𝑖𝑛(𝑓, 0))
Semi-variance is similar to the variance, however, it only considers observed returns below
the mean/expected return. A useful tool in portfolio or asset analysis, especially in the
presence of skewed distributions, semi-variance provides a measure for downside risk. While
standard deviation and variance provide measures of overall volatility, semi-variance only
looks at the negative fluctuations of the asset. By ignoring all values above the mean (or an
investor’s target return) semi-variance estimates the average degree of loss that a portfolio
or asset could incur, given the definition of “loss” being used (i.e. in an absolute sense or
compared to a benchmark).
For risk averse investors, solving for optimal portfolio allocations by minimizing semi-
variance would limit the likelihood of a large loss. A similar result could be achieved with
the variance of returns, unless the distribution was highly skewed.
ì If 𝑓(𝑥) is the probability density function (pdf) of the investment return, what is
the integral expression for the downside semi-variance of return?
𝐸(𝑥)
∫−∞ [𝑥 − 𝐸(𝑥)]2 𝑓(𝑥)𝑑𝑥
10
ì Calculate the variance and semi-variance of returns using the integral expressions
given above.
6 2
∫0 (𝑥−3)
6 𝑑𝑥 = 3
(Or we could just use the standard formula for the variance of a uniform distribution.)
3 2
∫0 (𝑥−3)
6 𝑑𝑥 = 32
Since the “bottom half” of the distribution has exactly the same form as the full
distribution we would expect the downside semi-variance to be equal to half of the
variance.
ì How would the answer differ if the distribution was 𝑈 (−3%, 3%)?
Since the measures are independent of the location of the distribution, the results
would be the same.
The value at risk (VaR) estimates the future loss (within a given interval of time) that will
only be exceeded with a given (low) probability. It is therefore a “threshold” measure of
loss that is not the highest possible loss, but rather is only likely to be exceeded in extreme
conditions. Crucially, VaR tells you nothing about the potential scale of losses beyond the
threshold level. Having evaluated VaR as 𝑙 for the confidence level 𝛼, 0 < 𝛼 ≤ 1, one can be
100.𝛼% sure the loss would not exceed 𝑙 and therefore 100.(1-a)% sure that it will. Typically
𝛼 is chosen close to 1, but it is not certain that VaR can be accurately estimated for very
high confidence levels (e.g. 𝛼 = 99.5% or higher).
Given the cumulative distribution function 𝐹𝐿 (𝑙) of losses 𝐿, 𝐹 = 𝑃 𝑟(𝐿 <= 𝑙) , the assign-
ment 𝛼 → 𝑉 𝑎𝑅 is roughly speaking the inverse function of 𝐹 . That is exactly so if 𝐹𝐿 (𝑙) is
continuous and strictly increasing. In general, the function is weakly increasing and might
also have (at most a countable number of) jump points. The value at risk for the confidence
level 𝛼 is then defined to be:
𝑉 𝑎𝑅𝛼 (𝐿) = 𝑖𝑛𝑓{𝑙|𝐹𝐿 (𝑙) ≥ 𝛼}
If 𝛼 belongs to the range of values of 𝐹𝐿 (𝑙) (that is always the case if 𝐹𝐿 (𝑙) is continuous) then
the weak inequality can be replaced by equality. Also, if 𝐹𝐿 (𝑙) is strictly increasing, then
we can drop the symbol inf, and 𝑉 𝑎𝑅𝛼 (𝐿) would be a unique solution to the equation:
𝐹𝐿 (𝑉 𝑎𝑅𝛼 (𝐿)) = 𝛼
A vast amount of literature is devoted to how to evaluate VaR. It is used not only by traders
and portfolio managers but also by regulatory authorities. For example, banks in the United
11
States are required to hold reserves equal to 3 times the 10-day 99% VAR for market risks.
ì If 𝑓𝐿 (𝑥) is the probability density function (pdf) of the loss 𝐿, what integral expres-
sion defines the 𝛼% VaR?
𝑉 𝑎𝑅𝛼
∫−∞ 𝑓𝐿 (𝑥)𝑑𝑥 = 𝛼
ì Assume investment returns are distributed as 𝑈 (−3%, 3%). Calculate the 98% VaR
on an investment of 100.
1. VaR does not take into account the possibility of big losses with probabilities less than
(1 − 𝛼).
2. VaR makes no distinction between tail types of loss distributions and hence underes-
timates risk in the situation of heavy tail distributions.
3. VaR is not a coherent risk measure. In particular it is not sub-additive. This means
that there are examples when VaR of a portfolio is greater than the sum of the VaRs
of its components. This contradicts common sense: if we view a risk measure as the
amount of money needed for reserves to cover potential losses due to market risk:
it’s counter-intuitive that the portfolio requires more reserves than the sum of its
components (i.e. holding the assets together is more risky than a group of people
holding them separately).
ì Why do you think it’s difficult to estimate VaR accurately for very high confidence
levels?
Because tails of distributions are hard to estimate accurately given that, by definition,
extreme events happen rarely.
The expected shortfall (ES) is also called conditional value at risk or the tail VaR. ES
evaluates the risk of an investment in a conservative way, focusing on the less profitable
12
outcomes above a given threshold. This is a similar approach to the semi-variance of returns
described above. For high values of (1 − 𝛼) it ignores the most potentially profitable but
unlikely possibilities, for small values of (1 − 𝛼) it focuses on the worst losses. On the other
hand, unlike the maximum loss, even for higher values of 𝛼 ES does not consider only the
single most catastrophic outcome. A value of 𝛼 often used in practice is 95%.
𝐸𝑆𝛼 (𝐿) = 𝐸(𝐿|𝐿 ≥ 𝑙)
where 𝑙 = 𝑉 𝑎𝑅𝛼 (𝐿) for the threshold 𝛼. Note that the possible equality in the right-hand
side is essential, since otherwise for sufficiently high 𝛼 the expected loss would be zero in
case of bounded 𝐿. Consider for example the distribution 𝐿 = 0 with probability 0.9 and
𝐿 = 100 with probability 0.1. The value at risk for 𝛼 = 0.95 coincides with the expected
shortfall and is equal to 100, while 𝐸(𝐿|𝐿 > 100) = 0.
Some properties of expected shortfall are as follows:
The expected shortfall has advantages compared with VaR. It is a more conservative risk
measure than VaR and for the same confidence level it will suggest higher reserves to hold
against potential losses, if that is what the risk measure is used to determine (this could
potentially be true for a bank or insurance company, but not necessarily an investment
fund).
Consider a simple example illustrating relations between VaR and ES. Suppose that we have
a debt security with nominal value 100 and redemption date being tomorrow. It will be
redeemed completely with probability 0.99. With probability 0.01 the borrower will refuse
to pay 100 and we get only half of the nominal value. In this scenario our loss 𝐿 will be
0 with probability 0.99 and 50 with probability 0.01. For 𝛼 = 0.95 we find 𝑉 𝑎𝑅𝛼 (𝐿) = 0,
that is VaR recommends that we do not hold any reserves against potential losses at all.
This seems strange, because our loss might be significant and its probability 0.01 is not so
small as to be something that we can ignore. At the same time:
𝐸𝑆𝛼 (𝐿) = 𝐸(𝐿|𝐿 > 0) = 50
Thus, the expected shortfall takes into account bigger losses that might occur with low (less
than 1−𝛼) probability. It is also more informative in the often encountered real life situation
when the loss distribution has a heavy tail. It can also be useful in fund management, where
performance is often measured relative to a specified benchmark (perhaps a stock market
index).
13
2.7 Shortfall probabilities
A related measure to the expected shortfall is the shortfall probability, which is simply
𝑃 𝑟(𝐿 ≤ 𝑙) for a specified loss 𝑙. This is a unitless way of comparing return distributions and
can be used as a way of understanding behaviour in the tails of those distributions.
3
∫2.88 𝑥6 𝑑𝑥
𝐸𝑆 = 3
∫2.88 16 𝑑𝑥
3
[𝑥2 /12]2.88
= 3
[𝑥/6]2.88
(9 − 2.882 )
=
2(3 − 2.88)
= 2.94
An investor using a particular risk measure will base their investment decisions on the
available combinations of risk and expected return. Using information regarding how a
specific investor will make the trade-off between these two competing features of potential
investments is made it is possible, in principle, to determine the investor’s underlying utility
function. On the other hand, given a specific utility function, the related risk measure can
be determined. For example, if an investor has a quadratic utility function, the variance of
return is an appropriate measure of risk.
This is because the function used to maximise expected utility is a function of the expected
return and the variance of returns only. If expected return and semi-variance below the
expected return are used as the basis of investment decisions (i.e. the semi-variance is
the risk measure), it can be shown that this implies a utility function which is quadratic
below the expected return level and linear above (since the investor is assumed to be risk
neutral in respect of positive returns, but risk averse in terms of negative returns with
utility depending on the mean and variance for these negative returns). Use of a shortfall
14
risk measure corresponds to a utility function that has a discontinuity at the minimum
required return (or shortfall threshold).
As you will have seen from the discussion above, different risk measures have different
advantages and disadvantages and so there is no one measure that is always superior to the
others. It is therefore worth investigating the properties of different investments using a
variety of different measures in order to better understand the risk profile.
An important risk assessment often used in practice that is not covered here is stress testing,
where extreme financial scenarios (e.g. combinations of large falls in asset prices, high rates
of inflation, or very low or negative interest rates) are applied to an asset or portfolio to
test how the values will perform under extreme conditions. Knowing how your investments
are likely to behave under extreme conditions can provide valuable insights to the risks that
you are exposed to. When performing stress tests, liabilities need to be consistently valued
according to the stress scenario. Often in the actuarial context, it is the surplus or shortfall
of assets compared to liabilities that is more important than the absolute value of either.
Having calculated the risk measures, the most important step is then considering what to do
in light of these results. Monitoring changes over time can be used for portfolio management:
investigating the reasons for observed changes can inform future investment strategy or risk
mitigation actions.
ì Define the value at risk 𝑉 𝑎𝑅𝛼 as 𝑉 𝑎𝑅𝛼 (𝐿) = 𝑠𝑢𝑝{𝑙|𝐹𝐿 (𝑙) < (1 − 𝛼)}. Show that
this definition is equivalent to the definition: 𝑉 𝑎𝑅𝛼 (𝐿) = 𝑖𝑛𝑓{𝑙|𝐹𝐿 (𝑙) ≥ (1 − 𝛼)}
This result relies on the properties of the cumulative distribution function (cdf) of
losses, 𝐹 (ℓ). All cdfs are weakly increasing functions where limℓ→0 𝐹 (ℓ) = 0 and
limℓ→∞ 𝐹 (ℓ) = 1. The fact that the function is weakly increasing means that 𝐹 has at
most a finite number of jump points and that at each jump limℓ→𝑎 𝐹 (ℓ) = 𝑎.
Suppose now that 𝛼 ∈ (0, 1]. There are two possible alternatives:
𝛼 is not included in the mapping of 𝐹 , in i.e. there is a jump in 𝐹 such that
limℓ→𝑎− 𝐹 (ℓ) < 𝛼 < 𝐹 (𝑎). In other words, the step function 𝐹 “skips” 𝛼. In this
case, it is clear that
15
inf ℓ|𝐹𝐿 (ℓ) ≥ (1 − 𝛼) = 𝑎 since 𝐹 is an increasing function. This also means that
ℓ < 𝑎 ⟺ 𝐹 (ℓ) < 𝛼. Therefore, as required,
Here sup designates the supremum (lowest upper bound) of a set of real numbers.
Similarly we use notation inf for the infimum (greatest lower bound).}
Past IFoA exam questions from CM2 and CT8 papers are a very valuable resource to help
you develop your understanding and exam technique. Question papers and Examiners’
Reports can be downloaded from the IFoA website free of charge.
By the way, IFoA exams have 100 marks in three hours, whereas University of Leicester
exams are two hours long. So you can multiply the IFoA marks for any question by 1.5 to
get the Leicester equivalent.
2.11 Homework
CM2A September 2019 Question 1 This combines some bookwork on definitions with
the sort of risk measure calculations you should be able to do.
CM2B September 2020 Question 3 Parts (i) and (ii) The mini-project for this module
will require some computer computations (in Excel, or R if you prefer), so here’s something
to get you started.
The calculation in Part (iii) is the sort of thing you’ll be doing in the second half of the
module, so feel free to have a go now!
There are also some additional questions on the homework sheet in the Week 1 folder.
16
3 Stochastic rates of return
3.2 Reading
This week’s work is covered by Chapter 12 of Stephen Garrett’s book, An introduction to the
mathematics of finance: a deterministic approach. (You don’t need to read Section 12.5.)
There’s a link to an electronic version of the book in the Blackboard Reading list and I’ve
put a PDF of Chapter 12 in the stochastic rates of return folder.
In particular, you need to know the material in 12.2 and 12.3, plus exercise 12.3.1.
(“Know” in the sense of being able to derive the equations and apply them to exam ques-
tions.)
Before it was in CM2 this material was in CT1, so that’s where you will find most past
exam papers, though we’ll also look at the modelling questions in the CM2B exam.
17
3.3 Mathematical models
Models can be used for different purposes (e.g. entertainment, trying to understand the
system modelled - including the effects of changes in the system) but practising actuaries
most often use them for making decisions.
Here’s an example of a big decision: how much should an global insurance company offering
many types of policy hold as capital reserves?
You’d expect such a big decision to need a big model to answer it: and you’d be right. The
model should include elements that model the behaviour of the various assets the company
holds, the behaviour of its liabilities and the behaviour of its customers.
We’ll look at all three aspects in this module but we’ll start simply with the rate of return
models described in Garrett’s book.
In the first lecture we’ll go over the construction in R of a simple random walk model and in
the second lecture for this week I’ll talk you through the reproduction of Garrett’s Example
12.5.3 using R.
18
3.4 Key notation and equations from Garrett
In all that follows, we are assuming that the rates of return in each period are independent.
Let 𝑆𝑛 be the accumulation of a single payment of 1 at time 0, let 𝑖𝑡 be the return obtainable
over the 𝑡𝑡ℎ year. Then
𝑆𝑛 = (1 + 𝑖1 )(1 + 𝑖2 ) … (1 + 𝑖𝑛 )
Similarly we let the accumulation of a series of annual investments of 1 at the start of each
year be 𝐴𝑛 .
Suppose that the return in each period has mean 𝑗 and variance 𝑠2 .
Then (Garratt Eq. 12.2.5)
𝐸[𝑆𝑛 ] = (1 + 𝑗)𝑛 ,
and (Garratt Eq. 12.2.5)
𝑣𝑎𝑟[𝑆𝑛 ] = (1 + 2𝑗 + 𝑗2 + 𝑠2 ) − (1 + 𝑗)2𝑛 .
The equations for the moments of 𝐴𝑛 are a bit more complicated see Garrett P283 - P285.
19
Check your understanding
𝑠𝑛|
̈ ̄ at rate 𝑗.
Suppose that the random variable 𝑙𝑛(1+𝑖𝑡 ) is normally distributed with mean 𝜇 and variance
𝜎2 . In this case, the variable (1+𝑖𝑡 ) is said to have a log-normal distribution with parameters
𝜇 and 𝜎2 .
In this case, 𝑆𝑛 has a log-normal distribution with parameters 𝑛𝜇 and 𝑛𝜎2 .
2 2
𝑣𝑎𝑟[𝑆𝑛 ] = 𝑒2𝑛𝜇+𝑛𝜎 (𝑒𝑛𝜎 − 1)
3.5 Homework
The first part of your homework (which we’ll go over in the feedback session) will be to
reproduce Example 12.6.1 from Garrett in R.
The second part of the homework is the behavioural economics reading below.
Most weeks I’ll ask you to read a few chapters from Thinking Fast and Slow.
As well as giving you an understanding or the behavioural economics part of the chapter
this is a very interesting and useful book that will give you insights into human behaviour
that will be useful in many areas of your future career.
This week you should skim-read the introduction and Chapters 1 - 4 (most chapters are
only short).
20
You should also read Chapter 5 more carefully, bearing in mind the following definitions
from Wikipedia and the CM2 Core reading.
3.5.2 Heuristic
3.5.3 Availability
3.5.4 Familiarity
This heuristic is closely-related to availability, and describes the process by which people
favour situations or options that are familiar over others that are new. This may lead to
an undiversified portfolio of investments if people simply put their money in industries or
companies that they are familiar with rather than others in alternative markets or sectors.
Home-country bias refers to people’s tendency to disproportionally invest in stocks from
their home country, rather than forming an internationally-diversified portfolio.
The following is a selection of past exam questions that cover the material on stochastic
rates of return in this chapter:
21
• CT1 S2017 Q5
• CT1 A2016 Q8
• CT1 A2016 Q8
• CM2A A2019, Q3(i, ii, iii)
• CT1 S2018 Q6
• CT1 A2018 Q4
• CT1 S2016 Q9
22
4 Simulation and stochastic modelling
This week we’ll cover some basic techniques for building simulation models in R. The aim
is to lay the foundations for future work, so there are no specific syllabus objectives.
One of the key concepts this week is the idea of a generative model.
This is a description of the process generating the variables you want to simulate. For
example:
The return on the fund each year is an iid random variable drawn from a lognor-
mal distribution with fixed mean and variance. The accumulated value of the
fund at the end of N years is the initial value multiplied by the product of the
annual returns for each year.
Or
Claims arise as a Poisson process with a fixed rate. The value of each claim is
an iid random variable drawn from a Pareto distribution with fixed parameters.
The total claim amount at the end of the period is the sum of all the individual
claims arising in the period
The generative model allows us to simulate the variables of interest, but it doesn’t tell us
how to implement the simulation in code.
Running the simulation allows us to predict the simulated variables and comparing the
predictions with the observed values in the real world will tell us how good our generative
model is (assuming we haven’t made errors in the simulation).
We can then use the model to decide what actions to take (which will likely include actions
to improve the performance of the model).
Interestingly, there are many theories of how the brain works that would be specified in a
very similar way - it is always trying to predict the results of proposed actions and comparing
what happens with its predictions. Actions will be adjusted in the light of the difference
between the expected and observed outcome.
23
4.2 Turning a generative model into code
There will usually be a number of different structures that can be adopted when coding a
particular generative model. Key decision that will influence the structure are:
• Is it necessary to store the whole simulated path of each run, or just the final value?
• Does each simulated path need to run for the same number of steps, or can it be
terminated at a defined point?
• Do we simulate by using discrete time steps, or some other type of event?
• What is the trade-off between number of steps and precision?
• Is runtime more important than development time, or vice versa?
• Do we generate simulated paths one at a time - or can we do them in parallel?
Let’s look at some examples using the first generative model above. (You should be familiar
with this from last week.)
Suppose we want to build a model where we track the growth of the fund each year, and
store these values.
Write outlines of the code you could use for this (a) generating one simulated path at a time
and (b) generating the first year value for all of the simulated funds, then the second year’s
value for all of them, and so on.
When you’ve done that translate your outlines into pseudocode, and then into code using
for loops.
Here’s some code for the first method. First the set up:
# Initialise constants
mu <- 0.02
sig <- 0.01
n_years <- 5
n_sims <- 5000
fund_one <- 1000
And then the actual simulation. (Notice it’s a good idea do a test run with very small
numbers of years and simulations.)
set.seed(1)
for (sim in (seq(n_sims))){
24
for (year in seq(2, n_years + 1)){
FundVal[year, sim] <-
FundVal[year - 1, sim] * rlnorm(1, meanlog = mu, sdlog = sig)
}
}
FundVal[1:5, 1:5]
Once you have simulated a set of results, estimating probabilities is usually just a matter
of counting and then calculating ratios.
For example, if we extract the final fund values from the calculations above we can calculate
the probability that the average annual return is less than two per cent, and the lower
quartile return.
Of course, the numbers calculated above will only be meaningful if n_sims is large enough.
The package microbenchmark (Mersmann 2021) is a useful way to time chunks of code. It’s
easiest if you put them in a function.
Here are two functions that produce the same output.
25
r
}
The difference between them is that the first one starts off with a vector of length one and
increases its length in each loop. The second function creates a vector of the needed length
right at the beginning.
Now let’s compare their speed, alongside the built-in vectorised function that does the same
thing.
library(microbenchmark)
n <- 1000
res <- microbenchmark(slow_loop(n),
fast_loop(n),
rnorm(n),
check = 'identical',
setup = set.seed(12345))
res
Unit: microseconds
expr min lq mean median uq max neval
slow_loop(n) 2933.486 3724.4080 5122.06028 3879.218 3978.3105 26809.652 100
fast_loop(n) 1160.875 1455.8165 1864.62738 1532.869 1572.3180 12488.060 100
rnorm(n) 48.553 53.4705 61.49053 63.441 64.5255 88.084 100
Further information about timing chunks of code can be found on the Jumping Rivers
website and many other places online.
As well as timing individual chunks of code, a profiler can be used to understand which parts
of a program are taking most time (and hence, which parts you should try to optimise). R
comes with its own profiler, which is described here. You don’t need to know the details of
how to use the profiler, but read the first paragraph of the preceding link for some useful
background.
26
4.5 Discrete Event Simulation (DES)
27
5 Valuing benefit guarantees
This week we’ll look at ways of valuing benefit guarantees by simulation and, more generally,
at economic scenario generators (ESGs - not to be confused with Environmental, Social, and
Governance criteria in investing).
The IFoA syllabus objective is:
Practical application of of this objective will be assessed in the mini-project. Here we will
look at some of the underlying concepts and issues to be taken into consideration - which
will be tested in the exam.
The key reading is contained in Economic Scenario Generators: a practical guide (Pedersen
et al. 2016), a copy of which is available in Blackboard.
The Introduction, Executive Summary, and Chapter 1 of this paper are examinable.
28
5.3 Actuarial uses of ESGs
Life insurance. Applications of ESGs for life insurance liabilities are primarily focused
on the interaction of interest rate changes and policyholder behaviour regarding lapses and
other optionality such as surrenders.
Pensions. In pensions work ESGs have a role in consistent valuation of assets and liabilities
and can also be valuable in understanding member behaviour, as well as areas like liability-
driven investment and pension-risk transfer.
General insurance. General insurance business is strongly cyclical and driven by many
factors that can be modelled with an ESG, both in terms of liabilities (particularly driven
by inflation), and in terms of market behaviour.
Stylized facts are generalized statements about economic and market behaviour, based on
historical experience. An ESG should produce outputs consistent with the stylized facts
relevant to its intended use.
Examples of stylized facts include:
29
5. Key financial market and economic variables are modelled in simulated series, and
importantly the interaction among the variables also is modelled. Modelling in an eco-
nomic scenario generator can be performed on a market-consistent basis or real-world
basis. Each has application to the insurance and pension worlds and to understanding
financial markets
6. Some key financial market variables particularly important to insurance and pensions
are bonds (and related interest rates), including corporate and asset-backed bonds,
and equities.
If we are able to estimate the distribution of the variable of interest at the end of the period
of interest we may be able to apply standard probability theory in an analytical way.
For example, if we assume the return on a fund is normal at the end of a period we will
be able to calculate the probability of the return falling below a specified amount and the
mean and the variance of the shortfall if it does so. This leads directly to ruin theory, which
we’ll look at in a couple of weeks.
5.7 Homework
Research and answer the following questions in your own words (do not copy and paste!)
1. Explain the difference between the “real world” approach and the “risk neutral” ap-
proach in ESGs. Give examples when each might be more appropriate.
2. List three variables that might be linked in a “cascade approach”.
3. What are:
4. State two “stylized facts” commonly accepted in actuarial science that are not listed
in the paper Economic Scenario Generators: A practical guide.
30
6 Run-off triangles
This week we look at methods for estimating the reserves that an insurance policy should
hold in a respect of policies written in the past.
• Define a development factor and show how a set of assumed development factors can
be used to project the future development of a delay triangle.
• Describe how a statistical model can be used to underpin a run-off triangles approach.
• Describe and apply a basic chain ladder method for completing the delay triangle
using development factors.
• Show how the basic chain ladder method can be adjusted to make explicit allowance
for inflation.
• Describe and apply the average cost per claim method for estimating outstanding
claim amounts.
• Describe and apply the Bornhuetter-Ferguson method for estimating outstanding
claim amounts.
• Discuss the assumptions underlying the application of the methods above.
There are many run-off triangle questions available in past IFoA papers. You should practice
these until you can do the calculations rapidly and accurately. You should also be able to
comment on the assumptions and other matters discussed below.
• There is normally a delay between incidents leading to claim and the insurance pay
out
• Insurance companies need to estimate future claims for their reserves
• It makes sense to use historical data to infer future patterns of claims
31
6.2.1 Accurate estimation of reserves is important
Too low reserves and you run the risk of action from the regulator, or even insolvency. My
reducing reserves frees capital to be invested in business projects or returned to shareholders.
(Or paid as executive bonuses!)
Look at the table of data below. It shows claims incurred, grouped by the year in which
the accident leading to the claim happened - the accident year. The number of years until
a claim is recorded is called the delay, or development period.
filter, lag
32
3 2003 673 242 189 255 55
4 2004 776 267 184 163
5 2005 824 301 207
6 2006 911 298
7 2007 974
Grouping by accident year means that the data should include claims which have been
incurred but not year reported (IBNR), which will have to be estimated. Those of you
who discovered the Claims Reserving Manual while doing research for a previous week’s
homework might have seen some methods for doing this - but it’s beyond the scope of this
module.
Other ways of grouping include: the year the policy was written, the year a claim was
reported.
We want to estimate the Dev6 column to calculate claims outstanding. There are many
possible ways we could do this.
33
We’ll talk more about the regression model in Thursday’s lecture. For now we’ll stick to
some pragmatic calculation methods without worrying about their statistical properties.
We could estimate 𝑟6 as 1588/507, but that would be ignoring a lot of information. Instead,
consider development from Dev0 to Dev1: we could calculate a factor for each entry where
we have two values.
34
6.3.1 Development factors for all years
We do the same for all development years, but let’s work with a smaller triangle.
1. Payments from each accident year will develop in the same way. That is, the same
development factors are used to project outstanding claims for each accident year.
(In fact, sometimes ad hoc adjustments are made to the development factor for a
particular year, based on extra sources of information.)
2. No additional adjustments for inflation are needed. Weighted average inflation (as
included in the development factors) will remain the same in future.
3. Claims are fully run off for the first accident year included in the data. It’s impossible
to project forward using the methods below. (In practice, further projections might be
carried out using so-called tail factors, but this is beyond the scope of this module.)
If the inflation assumption above can’t be justified we’ll can use an inflation adjusted method.
We treat past and future inflation separately.
Consider the following table of past inflation.
35
The inflation figures are those for claims inflation, rather than general price inflation.Claims
inflation is different from general price inflation, and often significantly higher. (Can you
think of reasons why this might be?)
Willis Towers Watson publish an index of claims inflation.
To adjust for past inflation we need to work first of all with the incremental payments
triangle, not the cumulative claims version. The smaller incremental triangle is:
Origin 0 1 2 3
1 2014 810 236 220 193
2 2015 844 320 168
3 2016 927 294
4 2017 974
We now convert the past year amounts to 2017 values by multiplying by the appropriate
cumulative inflation factor. Note that each lower-left to upper-right diagonal holds data
from the same year.
Origin 0 1 2 3
1 2014 896 254 229 193
2 2015 907 333 168
3 2016 964 294
4 2017 974
Now we can apply the basic chain ladder method. First converting to a cumulative trian-
gle…
Origin 0 1 2 3
1 2014 896 1150 1379 1572
2 2015 907 1240 1408
3 2016 964 1258
4 2017 974
𝑟0→1 = 1.318
𝑟1→2 = 1.166
𝑟2→3 = 1.140
36
Origin 0 1 2 3
1 2014 896 1150 1379 1572
2 2015 907 1240 1408 1605
3 2016 964 1258 1467 1672
4 2017 974 1284 1497 1707
We’re nearly there, but we’ve taken out the inflationary growth from the future payments.
To add it back in, we need to return to the incremental form…
Origin 0 1 2 3
1 2014 896 254 229 193
2 2015 907 333 168 197
3 2016 964 294 209 205
4 2017 974 310 213 210
Origin 0 1 2 3
1 2014 896 254 229 193
2 2015 907 333 168 205
3 2016 964 294 217 222
4 2017 974 322 231 236
Origin 0 1 2 3
1 2014 896 1150 1379 1572
2 2015 907 1240 1408 1612
3 2016 964 1258 1475 1697
4 2017 974 1296 1527 1763
37
6.4.1 Assumptions underlying the inflation-adjusted chain ladder method
1. Payments from each accident year will develop in the same way in real terms. That is,
the same development factors are used to project outstanding claims for each accident
year after they have been adjusted for inflation.
2. Explicit assumptions are made for past and future rates of claims inflation.
3. Claims are fully run-off for the first accident year included in the data.
The average cost per claim method projects numbers of claims and average size of claims
separately.
The method we’ll demonstrate uses grossing up factors. A grossing up factor tell us what
proportion of the ultimate amount emerges in each year. However, the same approach,
of modelling numbers of claims and severity separately, can be used with development
factors.
An outline of the method is:
• From cumulative claim amounts & number of claims calculate average cost per claim
• Project average cost per claim & number of claims, using grossing up factors or devel-
opment factors
• Multiply projected average cost and number for each origin year to get projected total
claims
Origin 0 1 2 3
1 2014 810 1046 1266 1459
2 2015 844 1164 1332
3 2016 927 1221
4 2017 974
Origin 0 1 2 3
1 2014 336 506 577 646
2 2015 397 528 646
3 2016 426 589
4 2017 481
38
Dividing gives us the average cost per claim (ACC).
Origin 0 1 2 3
1 2014 2411 2067 2194 2259
2 2015 2126 2205 2062
3 2016 2176 2073
4 2017 2025
The following table shows the grossing up factors for the average cost per claim.
Origin 0 1 2 3 Ai3
1 2014 1.067 0.915 0.971 1 2259
2 2015 1.002 1.039 0.971 2122
3 2016 1.026 0.977 2122
4 2017 1.032 1963
• For the top row, divide each entry by the value for the final year.
• For each subsequent row, start by calculating the grossing-up factor on the diagonal
by taking the average of the factors above it. Then use that factor to estimate the
ACC in the final year.
• Then work backwards along the row calculating the factor by dividing the ACC for
each year by the final-year figure.
(The only real way to understand and learn run-off triangle methods is to work through
examples, first by hand and then in Excel.)
You do exactly the same for the numbers of claims.
Origin 0 1 2 3 Ni3
1 2014 0.520 0.783 0.893 1 646
2 2015 0.549 0.73 0.893 723
3 2016 0.547 0.757 778
4 2017 0.539 893
Finally we can use the projected number of claims and ACC to calculate the estimate of
the final claim amount for each year.
Origin Average cost per claim Number of Claims Projected Claims Estimate
1 2014 2259 646 1459
2 2015 2122 723 1535
3 2016 2122 778 1652
4 2017 1963 893 1753
39
Total paid = 4986
Total estimated = 6398
Total outstanding = 1412
The key assumptions underlying this method will depend on the exact form it takes. In the
form presented above, it’s assumed:
1. Numbers of claims and ACC from each accident year will develop in the same way as
a proportion of the ultimate value. That is, the same grossing-up factors are used for
each accident year.
2. No adjustments are required for inflation. (However, inflation can be accounted for in
a similar way to the previous approach.)
3. Claims are fully run-off for the first accident year included in the data.
In this section we will look at the Bornhuetter-Ferguson method, which uses the loss ratio to
create a first approximation of the ultimate total claims, then adjusts it to reflect experience
to date.
Loss ratios are used in the derivation of the premium basis and tend to show some stability
from year to year, in the absence of known changes to the risks insured, or the premium
basis.
So if we have the premium for each origin year and reasonable confidence in the loss ratio,
we can estimate the ultimate total claim payments.
In the basic chain ladder method we calculated development factors and used them to project
the claims incurred to date forwards, ending up with the ultimate total claim payments
arising from each accident year.
In the Bornhuetter-Ferguson method we calculate development factors in the same way but
we apply them to the ultimate total claim, calculated using the loss ratio, working backwards
40
to find the expected claims arising in each year. We use these estimated claims, for future
years, to calculate the reserves.
Let’s assume a loss ratio of 0.8 and start with the same cumulative claims table as in the
basic chain ladder example.
Origin 0 1 2 3
1 2014 810 1046 1266 1459
2 2015 844 1164 1332
3 2016 927 1221
4 2017 974
We calculate the development factors as usual, and also the cumulative factors.
1 2 3
Single year 1.329 1.176 1.152
Cumulative 1.801 1.355 1.152
The assumptions are the same as for the basic chain-ladder method plus the assumption
that the loss ratio assumed is appropriate.
The statistical model behind the chain ladder approaches is based on the fact that they can
be expressed as linear regression models with different weights.
Consider the first triangle we saw as an example. Let’s just consider the first two develop-
ment years and plot Dev1 vs Dev0.
paid_cum_df %>%
filter(!is.na(Dev1)) %>%
ggplot(aes(x = Dev0, y = Dev1)) +
41
geom_point() +
stat_smooth(method = 'lm', formula = 'y ~ 0 + x', se = FALSE) +
theme_bw()
1200
1100
Dev1
1000
900
800
700
The line shown is a linear regression line forced to go through the origin, and we can see
that the development factor from Dev0 to Dev1 can be thought of as the slope of this line.
fit1 <-
paid_cum_df %>%
filter(!is.na(Dev1)) %>%
lm(formula = 'Dev1 ~ 0 + Dev0') # The zero forces the line through the origin
This isn’t quite the value we got before (1.387) because fit1 is an unweighted linear regres-
sion model and the basic chain ladder method is equivalent to a weighted model, where the
weights are inversely proportional to the value of claims recorded in Dev0.
fit2 <-
paid_cum_df %>%
42
filter(!is.na(Dev1)) %>%
lm(formula = 'Dev1 ~ 0 + Dev0',
weights = 1 / Dev0)
One way to examine the goodness of fit of the basic chain ladder method is to apply the
development factors to the claims recorded in the first development year.
This will give us fitted claim amounts for the years in which we do have actual data, as well
as predicted values for future years.
Comparing the fitted values with the actual data gives us some idea of the fit of our model - it
is equivalent to examining the residuals when using the standard linear regression model.
There is an example to work through in the homework.
6.7.2 Simulation
43
6.8 Homework
The homework for this section is contained in an Excel workbook in the Week 5 Blackboard
folder.
You should also continue your behavioural economics reading, reading Chapters 11 to 15 of
Thinking Fast and Slow.
Before you do the reading, consider the following definitions (from the CM2 Core Reading)
and notice where the concepts appear in the reading.
Anchoring and Adjustment is a term used to explain how people produce estimates.
People start with an initial idea of the answer (the anchor) and then adjust away from
this initial anchor to arrive at their final judgement. Thus, people may use experience or
‘expert’ opinion as the anchor, which they amend to allow for evident differences to the
current conditions. The effects of anchoring are pervasive and robust and are extremely
difficult to ignore, even when people are aware of the effect and aware that the anchor is
ridiculous. The anchor does not have to be related to the good. Nor does the anchor have
to be consciously chosen by the consumer. If adjustments are insufficient, final judgements
will reflect the (possibly arbitrary) anchors. Anchoring can have important implications for
investment decisions, not least if individual investors rely on seemingly irrelevant yet salient
data or statistics in order to guide their portfolio choices (e.g. expected returns from one-off
investments in other unrelated industries, etc.).
Representativeness. Decision-makers often use similarity as a proxy for probabilistic
thinking. Representativeness occurs because it is easier and quicker for our brain to compare
a situation to a similar one (System I) than assess it probabilistically on its own merits
(System II).
Representativeness is one of the most commonly used heuristics and can, at times, work
reasonably well. Nonetheless, similarity does not always adequately predict true probability,
leading to irrational outcomes. This is also related to the law of small numbers, where people
assess the probability of something occurring based on its occurrence in a small, statistically-
unrepresentative sample due to a desire to make sense of the uncertain situation (the name
is an ironic play on the law of large numbers in statistics). Representativeness can lead
individuals to base their decision on whether to invest in a particular stock, or not, on
the basis of its price over a few recent periods, rather than its long-term movement or the
underlying fundamentals of the company.
Availability. This heuristic is characterised by assessing the probability of an event oc-
curring by the ease with which instances of its occurrence can be brought to mind. Vivid
outcomes are more easily recalled than other (perhaps more sensible) options that may re-
quire System II thinking. This can lead to biased judgements when examples of one event
are inherently more difficult to imagine than examples of another. For example, individuals
living in areas that are prone to extreme weather events may only be compelled to take out
home insurance after they have been directly affected by such events rather than beforehand,
and may even cease their coverage after some time as the memory of the event subsides.
44
7 Ruin theory
7.1 Overview
In previous modules you’ve looked at the distribution of insurance losses over one period
of time. In reality, of course, insurance claims (and premiums) occur as processes through
time.
This week we will look at the behaviour of the amount of money held by an insurer over
time: the so-called solvency process.
We will be covering the following syllabus items:
• Explain what is meant by the aggregate claim process and the cashflow process for a
risk.
• Use the Poisson process and the distribution of inter-event times to calculate proba-
bilities of the number of events in a given time interval and waiting times.
• Define a compound Poisson process and calculate probabilities using simulation.
• Define the probability of ruin in infinite/finite and continuous/discrete time and state,
and explain relationships between the different probabilities of ruin.
• Describe the effect on the probability of ruin, in both finite and infinite time, of
changing parameter values by reasoning or simulation.
• Calculate probabilities of ruin by simulation.
7.1.1 Definitions
45
7.1.2 Premiums
𝑈 (𝑡) = 𝑈 + 𝑐𝑡 − 𝑆(𝑡)
Hence {𝑈 (𝑡)}𝑡≥0 is a stochastic process called the surplus process or cash flow process.
46
The surplus process
4
U(t)
0 1 2 3 4 5 6 7 8 9 10
t
Define:
𝜓(𝑈1 , 𝑡) ≥ 𝜓(𝑈2 , 𝑡)
𝜓(𝑈1 ) ≥ 𝜓(𝑈2 )
𝜓(𝑈 ) ≥ 𝜓(𝑈 , 𝑡2 ) ≥ 𝜓(𝑈 , 𝑡1 )
lim 𝜓(𝑈 , 𝑡) = 𝜓(𝑈 )
𝑡→∞
47
7.3 Probability of ruin in discrete time
4
U(t)
0 1 2 3 4 5 6 7 8 9 10
t
We must have
1. 𝑁 (0) = 0
2. 𝑁 (𝑡) is integer valued
3. 𝑁 (𝑡2 ) ≥ 𝑁 (𝑡1 ) if 𝑡2 ≥ 𝑡1
4. 𝑁 (𝑡2 ) − 𝑁 (𝑡1 ) is the number of claims in (𝑡1 , 𝑡2 )
The claim number process {𝑁 (𝑡)}𝑡≥0 is defined to be a Poisson process with parameter 𝜆
if
48
2.
The number of events which occur in a time period of length 𝑡 has a Poisson distribution
with mean 𝜆𝑡.
So 𝑇1 ∼ ℰ𝑥𝑝(𝜆)
Also, the time between any two claims has the same distribution.
We now combine the Poisson process for the number of claims with a claim amount distri-
bution to give a compound Poisson process for the aggregate claims process. We assume:
∞
1. the random variables {𝑋𝑖 }𝑖=1 are independent and identically distributed
∞
2. the random variables {𝑋𝑖 }𝑖=1 are independent of 𝑁 (𝑡) for all 𝑡 ≥ 0
3. the stochastic process {𝑁 (𝑡)}𝑡≥0 is a Poisson process with parameter 𝜆
Then
(𝜆𝑡)𝑘
𝑃 [𝑁 (𝑡) = 𝑘] = 𝑒−𝜆𝑡 , for 𝑘 = 1, 2, ...
𝑘!
The aggregate claims process {𝑆(𝑡)}𝑡≥0 is said to be a compound Poisson claims process
and, for any 𝑡 ≥ 0 𝑆(𝑡) has a compound Poisson distribution with parameter 𝜆𝑡. We can
thus apply all the results you’ve previously seen for the compound Poisson distribution:
𝐸[𝑆(𝑡)] = 𝜆𝑡𝑚1
𝑉 𝑎𝑟[𝑆(𝑡)] = 𝜆𝑡𝑚2
𝑀𝑆 (𝑟) = 𝑒𝜆𝑡(𝑀𝑋 (𝑟)−1)
49
So we also have
𝑐 = (1 + 𝜃)𝜆𝑚1
7.5 Bounds
In this section we look at how we can place bounds on the ultimate probability of ruin.
𝜓(𝑈 ) ≤ 𝑒−𝑅𝑈
Since 𝑅 encapsulates the factors that determine the riskiness of a portfolio (apart from 𝑈 ),
by working out how we would expect these factors to affect risk we can work out how they
affect 𝑅
𝑅 will be defined in terms of the Poisson parameter and the MGF of the individual claim
distribution, plus the rate of premium income per unit time.
It is defined to be the unique positive root of
𝜆𝑀𝑋 (𝑟) − 𝜆 − 𝑐𝑟 = 0
If we write 𝑐 as (1 + 𝜃)𝜆𝑚1 , this equation for 𝑅 becomes
𝑀𝑋 (𝑟) − 1 − (1 + 𝜃)𝑚1 𝑟 = 0
50
7.5.4 General aggregate claims processes - adjustment
For the general (i.e. not just Poisson) aggregate process the adjustment coefficient is the
positive root of
𝐸[𝑒𝑅(𝑆𝑖 −𝑐) ] = 1
7.6 Homework
The homework questions are in the Week 6 homework folder. They are based on past IFoA
exam questions.
51
8 Rational expectation and utility theory
In economics, ‘utility’ is the satisfaction that an individual obtains from a particular course
of action.
In the application of utility theory to finance and investment choice, it is assumed that a
numerical value called the utility can be assigned to each possible value of the investor’s
wealth by what is known as a utility function.
The Expected Utility Theorem (EUT) states that a function, 𝑈 (𝑤), can be constructed as
representing an investor’s utility of wealth, 𝑤, at some future date. Decisions are made
in a manner to maximise the expected value of utility given a set of objectively agreed
probabilities of different outcomes.
(Compare with Subjective Utility Theory.)
52
8.2.1 Expected Utility Axioms
1. Comparability
An investor can state a preference between all available certain outcomes.
2. Transitivity
If 𝐴 is preferred to 𝐵 and 𝐵 is preferred to 𝐶, then 𝐴 is preferred to 𝐶.
3. Independence
If an investor is indifferent between two certain outcomes, 𝐴 and 𝐵, then they are also
indifferent between the following two gambles:
4. Certainty equivalence
Suppose that 𝐴 is preferred to 𝐵 and 𝐵 is preferred to 𝐶. Then there is a unique probability,
𝑝, such that an investor is indifferent between 𝐵 and a gamble giving 𝐴 with probability 𝑝
and 𝐶 with probability (1 − 𝑝).
𝐵 is known as the ‘certainty equivalent’ of the above gamble.
[There is more than one possible way to set out the axioms of EUT. (For example the way
in Dhami.) Any exam questions will be based on the axioms in these slides.]
8.2.2 Non-satiation
We assume that people prefer more wealth to less. This is known as the principle of non-
satiation and can be expressed as:
′
𝑈 (𝑤) > 0
Where 𝑈 (𝑤) is a utility function and 𝑤 is wealth.
A risk averse investor values an incremental increase in wealth less highly than an incremen-
tal decrease and will reject a fair gamble. The utility function is concave:
″
𝑈 (𝑤) < 0
A risk seeking investor values an incremental increase in wealth more highly than an incre-
mental decrease and will seek a fair gamble. The utility function is convex:
″
𝑈 (𝑤) > 0
53
A risk neutral investor is indifferent between a fair gamble and the status quo. In this case
the utility function is linear:
″
𝑈 (𝑤) = 0
Risk preference can also be expressed in terms of the certainty equivalent.
Consider a gamble 𝐴 with certainty equivalent 𝐶𝐿 .
For a risk neutral investor 𝐶𝐿 = 𝐸[𝐴], where 𝐸[𝐴] is the expected value of 𝐴.
For a risk averse investor 𝐶𝐿 < 𝐸[𝐴],
For a risk seeking investor 𝐶𝐿 > 𝐸[𝐴].
If the absolute value of the certainty equivalent decreases with increasing wealth, the in-
vestor is said to exhibit declining absolute risk aversion. If the absolute value of the
certainty equivalent increases, the investor exhibits increasing absolute risk aversion. If
the absolute value of the certainty equivalent decreases (increases) as a proportion of total
wealth as wealth increases, the investor is said to exhibit declining (increasing) relative
risk aversion.
Two functions measure how risk preference changes as a function of wealth:
″
−𝑈 (𝑤)
𝐴(𝑤) = 𝑈 ′ (𝑤)
54
″ ′
For 𝑈 (𝑤) to be negative (risk aversion) 𝑑 must be negative, and for 𝑈 (𝑤) to be positive
(non-satiation) we must have −∞ < 𝑤 < −1/2𝑑
Exercise
Show that
2𝑑
𝐴(𝑤) = (1+2𝑑𝑤) ,
−2𝑑𝑤
𝑅(𝑤) = (1+2𝑑𝑤) .
And state whether the quadratic utility function shows increasing or decreasing absolute
and relative risk aversion.
Utility functions exhibiting constant relative risk aversion are said to be “iso-
elastic”. The use of iso-elastic utility functions simplifies the determination of
an optimal strategy for a multi-period investment decision, because [it] allows
for a series of so-called “myopic” decisions. What this means is that the decision
at the start of each period only considers the possible outcomes at the end of
that period and ignores subsequent periods.
Read the answer to a question about iso-elasticity on Economics Stack Exchange. Can
you see how constant relative risk aversion leads to the ability to make myopic investment
decisions?
Now read this. What do you think?
55
8.3.3 The power utility function
The plots here are to give you an idea of the general shape. The absolute values are irrelevant,
so they have been adjusted to fit onto the same graph.
Plots of some utility functions
Log Power Quadratic
10
U(w)
0
0 5 10 15 20 25
w
The main problem with EUT is that its axioms (the axioms of rationality) seem reasonable,
but they can only be justified if they describe human (or organisational) behaviour in reality.
And it has been shown many times that they don’t! (See Dhami for plenty of examples.)
Also:
56
• we don’t know the precise form of any person’s utility function (although there have
been plenty of attempts to devise ways of measuring them);
• organisational decision making is a balance between a coalition of different interests;
• utility functions are likely to be highly state dependent, depending not just on wealth,
but on other factors (think of some factors that might affect your risk preferences).
Absolute dominance is said to exist when one investment portfolio (or results of another
type of decision) provides a higher return than another in all possible circumstances. Clearly,
this situation will rarely occur, so we usually need to consider the relative likelihood of out-
performance, i.e. stochastic dominance.
Consider two investment portfolios, 𝐴 and 𝐵, with cumulative probability distribution func-
tions of returns 𝐹𝐴 and 𝐹𝐵 respectively.
The first order stochastic dominance theorem states that, assuming an investor prefers more
to less, 𝐴 will dominate 𝐵 (i.e. the investor will prefer portfolio 𝐴 to portfolio 𝐵) if:
𝐹𝐴 (𝑥) ≤ 𝐹𝐵 (𝑥) for all 𝑥, and
𝐹𝐴 (𝑥) < 𝐹𝐵 (𝑥) for some 𝑥.
This means that the probability of portfolio 𝐵 producing a return below a certain value is
never less than the probability of portfolio 𝐴 producing a return below the same value, and
exceeds it for at least some value of 𝑥.
The second order stochastic dominance theorem applies when the investor is risk averse, as
well as preferring more to less.
In this case, the condition for 𝐴 to dominate 𝐵 is that:
𝑥 𝑥
∫𝑎 𝐹𝐴 (𝑦)𝑑𝑦 ≤ ∫𝑎 𝐹𝐵 (𝑦)𝑑𝑦 for all 𝑥.
With the strict inequality holding for some value of 𝑎 , and where 𝑎 is the lowest return
that the portfolios can possibly provide.
The interpretation of the inequality above is that a risk averse investor will accept a lower
probability of a given extra return, at a low absolute level of return, in preference to the
same probability of extra return at a higher absolute level. In other words, a potential gain
of a certain amount is not valued as highly as a loss of the same amount.
57
8.7 Homework
8.8 Reading
58
9 Behavioural economics
• Describe the main features of Kahneman and Tversky’s prospect theory critique of
expected utility theory.
• Explain what is meant by “framing”, “heuristics” and “bias” in the context of financial
markets and describe the following features of behaviour in such markets:
– the herd instinct
– anchoring and adjustment
– self-serving bias
– loss aversion
– confirmation bias
– availability bias
– familiarity bias
Before going any further read, or reread, Chapters 25 and 26 of Thinking Fast and Slow.
The fundamental problem is EUT can’t explain many observations of real behaviour, for
example:
• Individuals have different attitudes to risk (i.e. utility functions) depending on wealth
[Friedman & Savage].
• Utility should be measured relative to a reference point [Markowitz].
Faced with (i.e. “Having the prospect of”) a risky choice leading to gains, people are risk-
averse, preferring solutions that lead to a lower expected utility but with a higher certainty
(concave value function). Faced with a risky choice leading to losses, people are risk-seeking,
preferring solutions that lead to a lower expected utility as long as it has the potential to
avoid losses (convex value function).
59
People tend to overweight very low probabilities and underweight very high probabilities.
Prelec’s function is often used for weighting probabilities. For the Formula see Dhami page
121. (Dhami 2019)
60
Editing
• Acceptance
• Segregation
Acceptance implies that people rarely change the way that the decision is presented to them
- that is they accept the framing that is presented,rather than reframing the decision for
themselves.
Framing is a very powerful technique, used continually in marketing.
Segregation is the process of separating the parts of the picture that are relevant to making
the decision, from those that are (or should be) irrelevant. The behavioural biases described
below can be very involved in this process.
Other stages are:
Coding
This refers to defining the reference point and the various outcomes and probabilities in a
quantifiable way.
The location of the reference point—and the consequent coding of outcomes as gains or
losses—can be affected by the formulation of the offered prospects and by expectations of
the decision maker.
Combining
Prospects can sometimes be simplified by combining the probabilities associated with iden-
tical outcomes.
For example, the prospect (200, 0.25; 200, 0.25) will be reduced to (200, 0.50) and evaluated
in this form.
Cancellation
Some prospects contain a riskless component that is segregated from the risky component
during editing.
For example, the prospect (100, 0.70; 150, 0.30) can be decomposed into a sure gain of 100
and the risky prospect (50, 0.30).
Simplification
Prospects can be simplified by rounding either outcomes or probabilities.
For example, the prospect (99, 0.51) can be coded as an even chance of winning 100.
Outcomes that are extremely improbable are likely to be ignored, meaning the probabilities
are rounded down to 0.
[The examples above are from: http://mark-hurlstone.github.io/Week%205.%20Decision%20Making%20Unde
]
61
Evaluation
Reference dependence
People derive utility from gains and losses measured relative to some reference point –
giving an S-shaped utility curve with a point of inflexion at that point.
Loss aversion
People are more sensitive to losses than gains, so steeper curve below inflexion. They are
risk averse in the domain of gains and risk seeking in the domain of losses. together with a
diminishing sensitivity this gives rise to the S-shape in the diagram.
62
People’s preferences depend on what they already have. They value something more highly
just because they already own it, and will often refuse to sell something for well above its
market value, even if they could purchase something virtually identical in the market.
Mental accounting
People hold mental accounts of the sources or planned use of money and may make different
decisions depending on the mental account involved.
Probabilities are weighted
E.g. see Prelec curve.
Certainty effect
Change from certainty is weighted highly. So the move from 100% probability to 99%
probability is give much more significance than the move from 50% to 49%.
Isolation effect
Common elements in the things being compared are ignored in the decision, even though
they might have different implications.
Evaluation formula
𝑛
𝑉 = ∑ 𝜋(𝑝𝑖 )𝑣(𝑥𝑖 )
𝑖=1
𝑉 is the overall utility of the outcomes to the individual making the decision; 𝑥𝑖 are the
outcomes; 𝑝𝑖 are the probabilities of the outcomes; 𝜋() is a probability weighting function;
𝑣() is the s-shaped value function passing through the origin.
63
9.4.1 Anchoring and adjustment
Once we have established (or been persuaded to establish) an anchor we find it very difficult
to make sufficient adjustments away from it.
This is why a common tactic in negotiating is to try and be the first to suggest a price
rather than allowing the other side to create an “anchor” favourable to themselves.
A classic example of anchoring is to have a few very expensive bottles of wine on a restaurant
list to make the others look good value.
9.4.2 Familiarity
9.4.3 Overconfidence
Looking backwards, people believed they could predict the future much better than they
could at the time. They think they had information in the past that actually only emerged
later.
This might be the reason why so many people believe active investment works - they see
outperformance of some investments and believe it was predictable before it occurred.
It’s probably also one of the reasons for the persistent belief that universities and schools
aren’t producing graduates well equipped for employment - people older than 40 have for-
gotten how much they didn’t know when they were 22!
64
9.4.5 Confirmation bias
People look for, and remember, evidence that confirms their existing beliefs. There are
many examples of people being able to narrow the sources of news so they only get what
they want to hear.
As another example, if you feel ethics are sustainability are important factors in making
investment choices you’ll easily find reputable investment professionals who will tell ethical
investments are a good financial choice.
If you believe that such considerations should be irrelevant to investment, plenty of people
will also confirm that view.
People believe luck is evidence of their own skill (and the opposite when observing other
people). For example, when someone makes an investment that pays off well, the assign it
to their skill. When it doesn’t do well they think it is bad luck (or perhaps others conspiring
against them.)
Status quo bias is a preference given to the present state of affairs, or a natural bias towards
the current or previous decision.
You will have noticed that people don’t like to change their minds! People are also slow to
change their investment portfolio in reaction to events or a change in their circumstances.
There are many, many examples of this (look up the Dutch Tulip Mania if you haven’t heard
of it before).
Extraordinary Popular Delusions and the Madness of Crowds is currently available for 49p
on Kindle.
9.5 Homework
Find a relevant, and powerful, example for each of the behavioural heuristics/biases listed
above.
Make notes on the five stages of editing listed under “other stages” above.
Continue reading Thinking fast and slow.
65
Figure 9.2: Flock_of_sheep_crossing_a_bridge
66
9.6 Further reading
67
10 Insurance and risk markets
In fact, there is no new theory to learn - we already have the tools we need to see why people
are prepared to purchase insurance even though it might lead to an expected decrease in
wealth.
Consider the following example from the CM2 Core Reading.
The maximum premium 𝑃 which an individual will be prepared to pay in order to insure
themselves against a random loss 𝑋 is given by the solution of the equation:
𝐸[𝑈 (𝑎 − 𝑋)] = 𝑈 (𝑎 − 𝑃 ),
where 𝑎 is the initial level of wealth.
√
For example, consider an individual with a utility function of 𝑢(𝑥) = 𝑥 and current wealth
of 15000.
Assume this individual is at risk of suffering damages that are uniformly distributed up to
15000, so their expected loss is 7500.
If they accept the risk their wealth will be uniformly distributed between 0 and 15000, so
their expected utility is:
15000 √ √
𝑥 15000
∫ 𝑑𝑥 =
0 15000 1.5
Setting this equal to the utility if they pay the premium, we find they would be willing to
pay up to 8333 for insurance that covers any loss. This is well above the 7500 expected loss.
(Admittedly it’s not a very realistic example.)
68
10.3 Finding the minimum premium
The insurance premium 𝑄 which an insurer should be prepared to charge for insurance
against a risk with potential loss 𝑌 is given by the solution of the equation:
𝐸[𝑈 (𝑎 + 𝑄 − 𝑌 )] = 𝑈 (𝑎),
where 𝑎 is the initial wealth.
If the insurer uses a premium loading 𝜃 we can write this as:
𝐸[𝑈 (𝑎 + (1 + 𝜃)𝐸[𝑌 ] − 𝑌 )] = 𝑈 (𝑎).
As we have seen above, a person who is risk averse will be prepared to pay more for insurance
than the long-run average value of claims which will be made. Thus, insurance can be
worthwhile for the risk averse policyholder even if the insurer has to charge a premium in
excess of the expected value of claims in order to cover expenses and to provide a profit
margin. An insurance contract is feasible if the minimum premium that the insurer is
prepared to charge (given their own risk aversion) is less than the maximum amount that a
potential policyholder is prepared to pay.
Insurance reduces the variability of losses due to adverse outcomes by pooling risks.
In pooling risks, an insurer attempts to group insured risks within homogeneous groups so
that the premiums charged to a particular group accurately reflect the risk.
Adverse selection describes the fact that people who know that they are particularly bad
risks are more inclined to take out insurance than those who know that they are good risks
if the premiums charged are based on the average risk for the whole group.
The key requirement for adverse selection is information asymmetry where one party (the
buyer of insurance) has information that the other party doesn’t.
To try and reduce the problems of adverse selection insurance companies may try and find
out lots of information about potential policyholders. Policyholders can then be put in
small, reasonably homogeneous pools and charged appropriate premiums. However, there
are a number of problems with this approach:
69
• customers don’t like providing detailed information so might choose a competitor with
a less onerous proposal form
• anti-discrimination legislation may prevent the use of certain types of information
Moral hazard describes the fact that a policyholder may, because they have insurance, act
in a way which makes the insured event more likely. Moral hazard makes insurance more
expensive. It may even push the price of insurance above the maximum premium that a
person is prepared to pay.
More generally, moral hazard occurs when someone is induced to behave in a risky way
because someone else bears the cost of the risk taken on. A well known example occurs if
banks believe they are “too big to fail”. They may take indulge in risky behaviour confident
that the government will bail them out if things go wrong.
Moral hazard is not the same as insurance fraud. Making a false claim for a loss that hasn’t
occurred, or which shouldn’t be covered by the insurance contract, is fraud, not moral
hazard.
10.4.4 Exercise
For two types of non-life insurance and two types of life insurance think of some examples
of moral hazard, adverse selection, and fraud.
How might an insurance company seek to deal with the increased risk brought by each of
your examples?
10.5 Homework
Have a go at the homework question in the Insurance folder on Blackboard. This is a past
CT7 (Economics) question, with some extensions.
70
Dhami, Sanjit. 2019. The Foundations of Behavioral Economic Analysis: Volume i: Behav-
ioral Economics of Risk, Uncertainty, and Ambiguity. Vol. 1. Oxford: Oxford University
Press USA - OSO.
Gesmann, Markus, Daniel Murphy, Yanwei (Wayne) Zhang, Alessandro Carrato, Mario
Wuthrich, Fabio Concina, and Eric Dal Moro. 2023. ChainLadder: Statistical Methods
and Models for Claims Reserving in General Insurance. https://mages.github.io/Chai
nLadder/.
Jakhria, P., R. Frankland, S. Sharp, A. Smith, A. Rowe, and T. Wilkins. 2019. “Evolution
of Economic Scenario Generators: A Report by the Extreme Events Working Party
Members.” British Actuarial Journal 24. https://doi.org/10.1017/S135732171800018
1.
Mersmann, Olaf. 2021. Microbenchmark: Accurate Timing Functions. https://github.com
/joshuaulrich/microbenchmark/.
Pedersen, Hal., Mary Pat. Campbell, Stephan L. Christiansen, Samuel H. Cox, Daniel.
Finn, Ken. Griffin, Nigel. Hooker, Matthew. Lightwood, Stephen M. Sonlin, and Chris.
Suchar. 2016. Evolution of Economic Scenario Generators: A Report by the Extreme
Events Working Party Members. Society of Actuaries. https://www.soa.org/globalasse
ts/assets/Files/Research/Projects/research-2016-economic-scenario-generators.pdf.
Ross, Sheldon M. 2013. Simulation. Fifth edition. Amsterdam: Academic Press.
Ucar, Iñaki, Bart Smeets, and Arturo Azcorra. 2019. “simmer: Discrete-Event Simulation
for R.” Journal of Statistical Software 90 (2): 1–30. https://doi.org/10.18637/jss.v090.
i02.
71