C Textbook Sample Spring 2010
C Textbook Sample Spring 2010
Construction of
Actuarial Models
Michael Hosking, MA
Associate of the Institute of Actuaries
All rights reserved. No portion of this book may be reproduced in any form
or by any means without the prior written permission of the copyright owner.
10 9 8 7 6 5 4 3 2 1
ISBN: 978-0-9816081-0-5
Preface to the Fourth Edition
Welcome to the fourth edition of this introductory guide to the Construction of Actuarial Models.
Based on our experience as professional educators, our aim when writing this text has been to
produce a clear, practical and student-friendly guide in which theoretical derivations have been
balanced with a helpful, structured approach to the material. We have supplemented the
explanations with over 430 worked examples and practice questions to give students ample
opportunity to see how the theory is applied. The result—we hope—is a thorough but accessible
introduction to the construction and evaluation of actuarial models.
This text is of particular relevance to actuarial students who are preparing for Exam C of the
Society of Actuaries, and Exam 4 of the Casualty Actuarial Society. Where possible, examples are
set in an insurance or risk management context. For more information about an actuarial career,
visit www.beanactuary.org or www.soa.org. Aspiring actuaries in the UK should visit
www.actuaries.org.uk.
The numerical solutions to all of the end-of-chapter practice questions can be found at the end of
the book. Detailed worked solutions to these practice questions can be downloaded free of
charge from the BPP Professional Education website at www.bpptraining.com. Other useful
study resources can also be found there.
For students preparing for the SOA’s Exam C or the CAS’s Exam 4, it is critically important to
work as many exam-style questions as possible. For such preparation, this text should be used in
conjunction with our supplemental Q&A Bank which contains hundreds of multiple choice
questions (including relevant questions from past exams). Practice questions included in this
book at the end of each chapter are designed to emphasize first principles and basic calculation
whereas exam-style questions, such as those contained in the Q&A Bank, can be quite obtuse.
This edition could not have been completed without the helpful contributions of Julie Wilson and
David Wilmot. Any errors in this text are solely our own.
We hope that you find this text helpful in your studies, wherever these may lead you.
Michael Hosking
November 2009
i
ii
Table of contents
Introduction vi
iii
Chapter 5 Estimation 133
5.1 Introduction to estimation 134
5.2 Common estimators of mean and variance 135
5.3 Other descriptive measures of a distribution 137
5.4 Properties of estimators 139
5.5 Maximum likelihood estimation 142
5.6 Variations on maximum likelihood estimation 151
5.7 Method of moments estimation 158
5.8 Method of percentiles estimation 161
5.9 Bayesian estimation 164
5.10 Confidence intervals 171
Chapter 5 practice questions 176
iv
Chapter 10 Bühlmann Credibility 285
10.1 Introduction 286
10.2 Computing the Bühlmann credibility prediction 287
10.3 Examples of Bühlmann credibility predictions 293
10.4 A comparison of Bayesian and Bühlmann ESE 298
10.5 The Bühlmann – Straub model 300
10.6 The Bühlmann model with multiple risk parameters 304
Chapter 10 practice questions 307
Bibliography 386
Index 387
v
Introduction
Before we start the main subject matter in this text, we should take care of a little housekeeping.
Assumed knowledge
We assume that the reader has knowledge of calculus, probability theory, statistics, and the
theory of survival models. The necessary background information from probability theory and
actuarial models can be found in BPP’s textbooks; Probability by Carr & Gauger and Actuarial
Models by Gauger. You can visit our website and download a free sample chapter from each of
these texts.
We have tried hard to ensure that all new notation is explained clearly. We sometimes use
exp( x ) in place of e x , especially when this avoids complicated superscripts that might otherwise
be difficult to read.
Rounding poses a particular dilemma. Our standard policy in this text has been to keep full
accuracy within intermediate calculations even though an intermediate result may be shown as a
rounded value. So, you may occasionally disagree with the last significant figure or two in a
calculation if you calculate the result using the rounded values shown.
Short numerical answers to all of the end-of-chapter practice questions can be found at the end of
this book. Detailed worked solutions to these practice questions can be downloaded free of
charge from the BPP Professional Education website at www.bpptraining.com (look for “Text
Question Solutions” on the left side menu). Other useful study resources can also be found there.
If you find an error in this text, we’ll be pleased to hear from you so that we can publish an errata
list for students on our website and correct these errors in the next edition. Please email details of
any errors to [email protected]. A current errata list is maintained in the Student
Mailbag on the Exam C page of the BPP website www.bpptraining.com. Thank you.
vi
Loss models
Overview
Over the first four chapters we will study models of aggregate loss distributions. These
distributions are used to model things such as:
• the annual amount of claims paid by an auto insurer for a single policy
• the annual number of claims paid by an insurer on a group medical plan
• the annual payments of a reinsurance company to an insurance company where only a
part of the insurer’s claim payments are reimbursed by the reinsurer.
By the end of this chapter, we’ll be able to:
• calculate the expected value and variance of such aggregate losses
• apply the concepts of mixing, splicing, truncating and censoring distributions to form
new ones.
1
Loss models Chapter 1
Interest theory plays a significant role in models of long-term lines of business that include life
insurance and annuities. This is because the premiums received by the insurer must be invested
for a number of years before all of the annuity payments or death benefits are paid out. Actuarial
present values are computed at issue to determine the annual benefit premium. Actuarial
present values are also computed for each in-force policy to measure future liability with respect
to the policy. Reserves must be set up so that the insurer has a clear picture of future liabilities
for each line of business. The rate of interest used has a profound effect on pricing and reserving
via the computation of both random present values and actuarial present values. Lower rates of
interest will generally result in higher premiums and reserves.
In Chapters 1–4 of this text, we will develop models for short-term lines of business such as auto
insurance, homeowner’s insurance, or health insurance. These policies (contracts) typically cover
a one-year time period. When the policy comes up for renewal after the year is over, the insurer
determines a new premium rate that is based on recent claim experience and perhaps some
regulatory requirements. Since premium income is often held by the insurer for a relatively short
period of time before claims occur, the interest it could earn can usually be ignored.
Levels of loss
The term loss is used at several distinct levels:
• Insured
It could be used to refer to the actual out of pocket expense as a result of some insured
random event such as an automobile accident. It could also be the cost of medical
treatment and compensation for “pain” resulting from an attack by some insured’s pit
bull.
In general it refers to a monetary measure of what is “lost” by the insured as a result of
an insured event. This type of loss is often referred to as the ground up loss.
• Insurer
From an insurer’s point of view the money it pays out in claims is viewed as a loss
against the premium income received from policyholders.
A ground up loss of a policyholder might be completely covered by the insurance.
However, it is more common that the ground up loss is subject to a deductible amount
or a limit (to be covered in Chapter 2). In this case, part of the ground up loss (up to a
maximum of the deductible) is retained by the policyholder, and any remainder of the
ground up loss is paid by the insurer as a claim. In other words, there is a sharing of
each ground up loss between the policyholder and the insurer.
The loss amount is split into the sum of the amount retained by the insured and the
amount reimbursed to the insured by the insurer.
• Reinsurer
Now move up to the next level and consider the total annual claims paid by the insurer
for this line of business (the insurer’s share of all ground up losses). To limit the risk that
aggregate claims might far exceed premium income from the line of business, the insurer
might seek a reinsurance treaty (policy) whereby its claims are shared with a reinsurer in
return for part of the insurer’s premium income. Effectively, the insurer takes out its own
insurance with the reinsurer.
2
Chapter 1 Loss models
Part of the claims are retained by the insurer and part of the claims are ceded to the
reinsurer (ie become the legal responsibility of the reinsurer).
So the insurer might enter into a treaty where the insurer pays a reinsurance premium to
the reinsurer, who is then obligated to pay part of the insurer’s claims. In this way, these
payments represent losses from the reinsurer’s point of view.
In order to create models for such aggregate “losses” of insurers and reinsurers, and the loss
sharing arrangements, we will need to use some relatively sophisticated probability results.
Some of these results are not typically part of a first course in probability theory. Others are
covered, but not in sufficient depth.
To remedy this situation we will introduce the necessary probability theory in this chapter at the
same time as we discuss the individual risk model (IRM) and the collective risk model (CRM).
The theory in this chapter plays a pivotal role relative to Chapters 2-4. Understanding the loss
model terminology and the probability results introduced here is absolutely essential for
understanding Chapters 2-4. Take the time to read this current chapter several times. The
reward for a thorough understanding of this material will be that Chapters 2-4 will then be a lot
easier to work through.
The moment generating function and probability generating function associated with a random
variable X are real valued functions of a real variable. They are useful for several purposes:
• calculating moments of the distribution
• identifying the distribution of a sum of independent random variables from the same
parametric family.
Let’s begin with their definitions.
M X ( t ) = E etX =
∑ etx Pr ( X = x ) (discrete case)
x
∞
∫ −∞ e
tx
= f X ( x ) dx (continuous case)
The domain of this function is the set of all values of t such that the sum or integral exists. There
is the possibility that for some t the sum might be a divergent infinite series or the integral might
be a divergent improper integral. The moment generating function is useful only when it is
defined on an open interval containing zero.
For example, if X is a discrete random variable with Pr ( X = i ) = 1/3 for i = 1, 2, 3 , then the
moment generating function is:
1 t 1 2 t 1 3t
MX ( t ) = ∑ etx Pr ( X = x ) =
3
e + e + e for − ∞ < t < ∞
3 3
x
3
Loss models Chapter 1
Example 1.1
Suppose that the discrete random variable N has a probability function given by:
λk
Pr ( N = k ) = e −λ where k = 0 , 1 , 2 , ... and λ > 0
k!
(This is the Poisson distribution with parameter λ which will be covered extensively in
Chapter 3.)
Determine the moment generating function.
Solution
However, in this form the moment generating function is virtually useless. To make a generating
function truly useful you must be able to write the summation in a more compact form. The
result we need here is the Taylor series for the exponential function:
∞
xk x2 x3
ex = ∑ k!
=1+x+
2!
+
3!
+ (converges for − ∞ < x < ∞ )
k =0
(λ e t )
k
∞
λk ∞ t λ ( e t −1 )
MN ( t ) = ∑ tk − λ
e e
k!
=e −λ
∑ k!
= e −λ eλ e = e for − ∞ < t < ∞ ♦♦
k =0 k =0
Example 1.2
Suppose that the continuous random variable X has a probability density function given by:
e − x /θ
fX ( x ) = for x > 0 and θ > 0
θ
(This is the exponential distribution with parameter θ that will be covered extensively in
Chapter 2.)
Determine the moment generating function.
Solution
∞ ∞ tx e − x /θ
MX ( t ) = ∫ −∞ etx f X ( x ) dx = ∫ 0
e
θ
dx
Once again, to make the function useful we need to evaluate the integral so as to obtain a closed
form expression in the real variable t:
∞ tx e − x /θ 1 (
∞ − x θ −1 −t ) dx
MX ( t ) = ∫ 0
e
θ
dx =
θ ∫0 e
4
Chapter 1 Loss models
In general, we have:
∞ − ax b − ax 1 − e − ab 1−0 1
∫ 0
e dx = lim
b →∞ 0
e ∫ dx = lim
b →∞ a
=
a
=
a
if a > 0
The integral diverges if a ≤ 0 . Comparing this result to the last form of the generating function,
we see that:
1 (
∞ − x θ −1 −t ) dx = 1 1
= (1 −θ t )
−1
if t < θ −1
MX ( t ) =
θ ∫0 e
θ
×
θ −1
−t
So the generating function has a simple closed form. In contrast with the result in Example 1.1,
the generating function in this case is not defined for all real numbers. ♦♦
The fact that we saw a smaller domain for the moment generating function in Example 1.2 than
we found for the distribution in Example 1.1 is no real drawback. The key thing is that the
function is defined in an interval that includes zero. The following is a summary of key
properties of the moment generating function. We won’t include the proofs here but they are
readily available from probability texts.
M ∑ ai Xi ( t ) = M X1 ( a1t ) M Xn ( an t )
Example 1.3
Compute the mean and variance for the exponential distribution in Example 1.2.
Solution
E [ X ] = MX
′ (0) = θ , E X 2 = MX
′′ ( 0 ) = 2θ 2
♦♦
2
⇒ var ( X ) = E[ X 2 ] − (E[ X ])2 = 2θ 2 − (θ ) =θ 2
Example 1.4
Show that the sum of two independent Poisson distributed random variables follows a Poisson
distribution whose parameter is the sum of the two component parameters.
5
Loss models Chapter 1
Solution
MN1 + N 2 ( t ) = MN1 ( t ) MN 2 ( t ) = e
(
λ1 et −1 ) eλ2 ( et −1) = e( λ1 + λ2 )( et −1)
This function is exactly the moment generating function for a Poisson distribution with
parameter λ1 + λ2 . So by Property 2, it follows that N 1 + N 2 follows a Poisson distribution with
parameter λ1 + λ2 . ♦♦
PX ( t ) = E t X = E e ( ) = M X ( ln ( t ) )
ln t X
Equivalently, we have:
( )
PX et = M X ( t )
The probability generating function has properties similar to those of the moment generating
function. Once again, proofs are beyond the scope of this course.
P∑ Xi ( t ) = PX1 ( t ) PXn ( t )
4. PX ( t ) = M X ( ln ( t ) )
Our main use of the probability generating function will occur in Chapter 3.
Example 1.5
Compute the probability generating function for the Poisson distribution in Example 1.1.
Solution
MX ( t ) = e
(
λ et − 1 ) ⇒ PX ( t ) = MX ( ln ( t ) ) = e
(
λ eln ( t ) − 1 ) = e λ ( t − 1) ♦♦
6
Chapter 1 Loss models
Sums of independent random variables will occur in several settings over the course of
Chapters 1-4. For example:
• If a portfolio (line of business) consists of n independent risks (policies), then the annual
claim frequency for the portfolio is the sum of the annual claim frequencies for the
individual policies.
• The aggregate annual claim amount is usually modeled as a sum of independent
individual claim amounts.
Suppose that a portfolio consists of n risks (policies) whose total annual losses X1 , X2 ,... , Xn are
independent and identically distributed like X. The aggregate annual loss for the entire portfolio
is represented by the sum
S = X1 + X 2 + + Xn
This method of modeling the aggregate annual loss is known as the Individual Risk Model since
it focuses on the losses from the individual policies (risks).
A very important basic problem is therefore to compute the distribution of a sum of independent
random variables from knowledge of the distributions of the component independent random
variables. There are three main options as to how this might be done:
1. If the random variables are from the same parametric family of distributions, then the
distribution of their sum will often be from this family as well. Moment generating
functions are very useful in this regard as we saw in Example 1.4.
2. Recursively using the method of convolutions (see below).
3. Approximately using the Central Limit Theorem (see below).
f X ∗ fY ( s )
f X ∗ fY ( s ) = Pr ( X + Y = s ) = ∑ Pr ( X = x , Y = s − x ) (discrete case)
x
= ∑ Pr ( X = x ) Pr (Y = s − x ) (independence)
x
= ∑ f X ( x ) fY ( s − x )
x
∞
f X ∗ fY ( s ) = ∫ f ( x ) fY ( s − x ) dx (continuous case)
−∞ X
7
Loss models Chapter 1
In both loss theory and survival models our random variables can only take on non-negative values.
In this case there is a small adjustment to the formulas above since both x and s − x must be non-
negative:
Convolution formulas
s
f X ∗ fY ( s ) = ∫0 f X ( x ) fY ( s − x ) dx (continuous non-negative random variables)
The summation formula above should be pretty intuitive. The possible ways of obtaining
X + Y = s consist of all combinations where x is between 0 and s, and where y is equal to s − x .
Theoretically this technique could be used recursively to compute the distribution of a sum of
more than 2 independent variables. For example, if X1 , X 2 ,... , Xn are independent and
∗n
identically distributed like X, then the PDF of the sum S = X1 + + Xn , denoted by f X ( s ) , is
computed recursively by the rule:
∗k + 1 ∗k
fX (s) = f X ∗ fX (s) k≥1
= ∑ ∗k
f X ( x ) fX ( s − x )
0≤x≤s
∗1
where f X ( x ) = f X ( x ) .
By hand these calculations quickly become cumbersome. They are not as difficult using a
computer but, if n is large, even computer calculations can become impractical.
Example 1.6
Assume that the discrete random variables X1 , X2 , X 3 are independent and identically
distributed like X where f X ( 1 ) = 0.7 , f X ( 2 ) = 0.3 .
∗2 ∗3
Compute f X and f X .
Solution
8
Chapter 1 Loss models
Example 1.7
Assume X1 , X2 are independent and identically distributed like X and that f X ( x ) = 1 for
0 < x <1 .
∗2
Compute f X .
Solution
The possible values of the sum lie between 0 and 2. For s between 0 and 1 we have:
∗2 s
f X (s) = ∫ 0 f X ( x ) fY ( s − x ) dx = s , 0<s≤1
both factors are 1
since 0 < x , s − x <1
For s between 1 and 2, the integrand f X ( x ) fY ( s − x ) is non-zero only when s − 1 < x < 1 . So we
have:
∗2 1
f X (s) = ∫ s −1 fX ( x ) fY ( s − x ) dx = 2 − s , 1≤s<2 ♦♦
both factors are 1
since 0 < x , s − x <1
Example 1.8
Suppose that the annual loss for an individual policy has the following distribution:
Pr ( X = 0 ) = 0.75 , Pr ( X = 1 ) = 0.15 , Pr ( X = 2 ) = 0.10
Suppose that there are n = 100 independent policies in a portfolio whose annual losses are each
distributed like X.
Determine the expected annual loss for the portfolio, the variance in annual loss for the portfolio,
and the approximate 90th percentile of aggregate annual loss.
9
Loss models Chapter 1
Solution
The 90th percentile of the standard normal distribution is 1.282. Since S is approximately normal
in distribution, the approximate 90th percentile of S is:
1. E [S ] = n E [ X ] , var ( S ) = n var ( X )
2. If n ≥ 50 , then:
S − E [S ] F − E [S ] F − E [S ] F − nE [ X ]
Pr ( S ≤ F ) = Pr ≤ ≈ Φ = Φ
var ( S ) var ( S ) var ( S ) n var ( X )
E [S ] + zα var ( S ) = n E [ X ] + zα n var ( X )
where α = Pr ( N ( 0, 1 ) > zα ) = 1 − Φ ( zα ) .
• In practice, the graph of the PDF of most aggregate loss distributions is skewed to the
right (there is a significant probability attached to large losses), meaning that it has a long
thin tail of area at the right end. If the distribution of annual losses is skewed to the right
and the central limit theorem is used to fit a normal PDF (which isn’t skewed) then right
tail probabilities such as Pr ( S > F ) can be badly underestimated by the approximation:
F − nE [ X ]
Pr ( S > F ) ≈ 1 − Φ
n var ( X )
10
Chapter 1 Loss models
The double expectation theorem is a device that is employed when conditional moments of a
random variable are easier to compute than the unconditional moments. Once we’ve stated it we
will show how it can be applied. Its proof appears in the appendix to Chapter 1.
E [ X ] = E E X|Y
(
var ( X ) = E var ( X|Y ) + var E X | Y )
The abstract nature of these relations obscures their meaning. The following example from the
theory of contingent payment models should help you understand how to use it.
Example 1.9
Suppose that females in a certain population have a constant force of mortality equal to
µF = 0.015 , and that males in this population have a constant force of mortality equal to
µ M = 0.020 .
Let Y = aT x be the random present value variable for a continuous life annuity of 1 per year for
( )
a randomly selected member of this population. Assuming that the force of interest is δ = 0.05 ,
and that 55% of the population are female, determine the expected value and variance of Y.
Solution
1 1
E [Y ] = = = 14.28571
µM + δ 0.07
1 µ M µM
2
1 2 2
var (Y ) = Ax − Ax
δ2
( ) = − = 34.01361
δ 2 µ M + 2δ µ M + δ
1 1
E [Y ] = = = 15.38462
µF + δ 0.065
µF
2
1 2 = 1 µF
( )
2
var (Y ) = Ax − Ax − = 30.87214
δ2 δ 2 µF + 2δ µF + δ
In other words, if we are given the gender of the randomly selected life, then we can compute the
expected value and variance of the random present value variable. So what we need to do first is
create an indicator of gender.
11
Loss models Chapter 1
The expected value and variance calculations can now be viewed as conditional moments:
(
var (Y ) = var E Y|I ) + E var (Y|I )
variability between average variability
subgroup means within the subgroups
(
= 0.45 × 14.285712 + 0.55 × 15.38462 2 − 14.890112 )
+ 0.45 × 34.01361 + 0.55 × 30.87214
We will return to this example later and look at how to tackle it in a different way. For now, just
notice that intuitively, the PDF for the future lifetime of a randomly selected life should be a
weighted average of the PDF for the exponential future lifetime of a male and the exponential
future lifetime of a female:
fT ( t ) = 0.45 f M ( t ) + 0.55 f F ( t ) = 0.45 × 0.02 e −0.02 t + 0.55 × 0.015 e −0.015t , 0 < t < ∞
In the Individual Risk Model for aggregate annual losses, the focus was on the individual policies
that make up the portfolio. We let X be the model for the total annual loss from a single policy.
If there are n independent policies in the portfolio, each with total annual loss distributed like X,
then the aggregate annual loss for the portfolio was modeled by:
S = X1 + + Xn
12
Chapter 1 Loss models
In the Collective Risk Model we ignore the individual policies that generate the losses. We
assume that over the course of a year there are a random number of losses, N , that are generated
by a portfolio of risks. The individual loss amounts Y1 , ... , YN are assumed to be independent
and identically distributed like Y. We assume that Y > 0 . Furthermore, the loss amounts are
assumed to be independent of N. Under the Collective Risk Model, the aggregate annual loss for
the portfolio is then modeled by the compound sum or random sum:
S = Y1 + + YN
The random variable N is known as the frequency component of the compound sum. It is a
counting distribution that has 0, 1, 2, and so on, as its possible values. The individual loss
amount model Y is referred to as the severity component. The number of risks in the portfolio is
hidden from view, but it will clearly affect the distribution of N. A greater number of policies
will be reflected in a greater annual frequency of losses.
Here is the connection between these two modeling methods. The ith policy experiences a
random number N i of individual loss amounts where N i ≥ 0 . The frequency of annual losses for
the entire portfolio, N, is the sum of the annual loss frequencies over the various policies:
N = N1 + + Nn
Each individual policy’s total annual loss is therefore the sum of a random number N i of
individual loss amounts each distributed like Y. So we have:
Xi ∼ Y1 + + YN i
Thus the difference between the IRM and the CRM is how the individual loss amounts are
grouped:
• In the CRM the aggregate annual loss S is computed as the sum of the individual loss
amounts Yi as they occur.
• For the IRM, these individual losses Yi are first grouped into total annual losses from the
n individual policies to determine X1 ,... , Xn . Then the various X j are summed to
determine S.
A key advantage of the CRM is that we can write an exact formula for fS in terms of f N for the
frequency component and fY for the severity component. We can adjust these two individual
models independently. For example:
• Increased exposure for the insurer (more policies in the portfolio or more coverages
being added to the individual policies) can be handled by adjusting parameters of the
frequency model N . (This is considered in Chapter 3.)
• One can begin by assuming that Y is an individual ground up loss experienced by a
policyholder. The insurer might wish to apply a deductible amount, a limit, or a
coinsurance (explained later) factor to the ground up loss in order to limit the severity of
claim payments (ie losses). Or the insurer might want to model the effect of loss-inflation
on S from one year to the next. These tasks can be accomplished by adjusting the
distribution of the severity component Y. (This is considered in Chapter 2.)
In this section we want to derive the basic relations between the distributions of S, N, and Y for
the CRM (or random sum model) for aggregate annual losses of an insurer or reinsurer. More
advanced problems will be deferred until we have studied frequency and severity models in
greater detail.
13
Loss models Chapter 1
The difficulty with analyzing the random sum S = Y1 + + YN is the random number of terms
being summed. But if we knew that N = k (ie if we were given N = k ), then the sum is easily dealt
with:
E S|N = k = E Y1 + + Yk = k E [Y ]
var (S|N = k ) = var (Y1 + + Yk ) = k var (Y )
∗k
fS ( s | N = k ) = f Y (s)
These relations are true for all k. The first two are typically rewritten in the following form that
emphasizes that the conditional mean and variance of S are linear functions of N:
E S | N = E [Y ] N
var (S | N ) = var (Y ) N
As you look at these formulas keep in mind that E [Y ] and var (Y ) are fixed real numbers and
that N is a random variable.
Now you can begin to appreciate the beneficial nature of the double expectation theorem. In
view of our conditional mean and variance formulas above, it is the ideal tool for relating
moments of S to moments of the frequency and severity distributions.
Theorem 1.2
Suppose that S = Y1 + + YN where the various Yi are independent and identically distributed
like Y where Y is non-negative. Suppose also that the individual loss amounts Yi are
independent of the annual loss frequency N.
∞
(iii) fS ( s ) = ∑ Pr ( N = k ) f Y
∗k
(s)
k =0
(iv) (
MS ( t ) = MN ln ( MY ( t ) ) ) and PS ( t ) = PN ( PY ( t ) )
Proof
E S | N = E [Y ] N
var (S | N ) = var (Y ) N
E [S ] = E E S | N = E E [Y ] N = E [ N ] E [Y ]
14
Chapter 1 Loss models
(
var ( S ) = E var ( S | N ) + var E S | N )
= E var (Y ) N + var ( E [Y ] N )
2
= E [ N ] var (Y ) + ( E [Y ]) var ( N )
(iii) Let’s just consider the case when Y is a discrete random variable. So we have fY ( y ) = Pr (Y = y ) .
Since S is a sum of Y’s, it is also a discrete random variable. Therefore, we also have
fS ( s ) = Pr (S = s ) .
∞
{S = s} = ∪ {N = k and Y1 + Y2 + + Yk = s}
k =0 there are k terms adding up to s
The events on the right side are mutually exclusive. Since N and the various Yi are independent,
we have:
∞
fS ( s ) = Pr ( S = s ) = Pr
{N = k and Y1 + Y2 +
∪ + Yk = s}
k =0
∞
= ∑ Pr ( N = k and Y1 + Y2 + + Yk = s )
k =0
∞
= ∑ Pr ( N = k ) Pr (Y1 + Y2 + + Yk = s )
k =0 ∗k
fY (s)
(iv) The formula PS ( t ) = PN ( PY ( t ) ) can also be established with the help of the Double Expectation
Theorem:
( )
The formula MS ( t ) = MN ln ( MY ( t ) ) is now easily derived since MN ( t ) = PN et : ( )
( )
MS ( t ) = PS et = PN PY et ( ( )) = P N ( MY ( t ) ) = PN eln( MY (t ) ) = MN ( ln ( MY ( t ) ))
(v) Consider the formula derived in (iii). We can see that if Y is a continuous random variable then
the terms on the right side with k = 1, 2 ,... form a weighted sum of continuous random variables,
since a sum of continuous random variables is continuous. So for this part of the distribution of S
∗0
there is no probability at zero. However, in the term with k = 0 , the symbol f Y (s) is the PDF of
an empty sum of terms distributed like Y. In other word, we have:
∗0
fY (0) = 1 (the PDF of a random variable that is certain to be zero)
15
Loss models Chapter 1
So Pr (S = 0 ) = Pr ( N = 0 ) .
In the case where Y is discrete it is also true that Y1 + Y2 + + Yk is discrete. As a result, it follows
∗k
that f Y ( 0 ) = Pr (Y1 + Y2 + + Yk = 0 ) . Since all of the Yi are non-negative, it follows that:
k =0 k =0
= E t N = PN ( t ) = PN ( Pr (Y = 0 ) )
There are a number of important observations to make regarding this theorem listing the basic
properties of random sums:
• Property (i) is a fairly intuitive idea. It says that the expected value of a random sum is
equal to the product of the expected number of terms and the expected value of a term.
Notice that if Pr ( N = n ) = 1 (in other words there are always N = n terms where n is a
fixed number), then property (i) is exactly like the well-known formula
E Y1 + + Yn = n E [Y ] .
• Property (ii) should also seem reasonable. In a random sum there are two sources of
variability – variance in the number of terms and variance in the amounts of the
individual terms.
The term E [ N ] var (Y ) is proportional to the variance in term amounts and reflects the
2
second source. The term ( E [Y ]) var ( N ) is proportional to variance in the number of
terms.
Once again, if Pr ( N = n ) = 1 , then the compound sum variance formula reduces to the
familiar result: var (Y1 + + Yn ) = n var ( Y ) .
• Property (iii) shows how to combine the functions f N and fY to obtain an exact formula
for fS . However, it is difficult to use in practice since it is usually not easy to compute
∗k
the convolution f Y .
Nevertheless, there is a simple combinatorial idea underlying property (iii) that can be
used to compute Pr (S = 0 ) , Pr (S = 1 ) , … , and so on when the possible values of Y are
the counting numbers 1, 2 , 3,... . For example, we have:
Pr ( S = 0 ) = Pr ( N = 0 )
Pr ( S = 1 ) = Pr ( N = 1 and Y = 1 ) = f N ( 1) fY ( 1)
Pr ( S = 2 ) = Pr ( N = 2 and Y1 = Y2 = 1, or, N = 1 and Y = 2 )
= f N ( 2 ) fY ( 1 ) fY ( 1 ) + f N ( 1) fY ( 2 )
16
Chapter 1 Loss models
In Chapters 3 and 4 we will see recursive ways to calculate these same probabilities when
the frequency model is an ( a , b , 0 ) distribution. (These models are introduced in
Chapter 3.) The starting value for this recursion is Pr (S = 0 ) . Calculation of this starting
value is given by property (v) in the above theorem.
Example 1.10
Suppose that the annual frequency of losses from a portfolio follows a Poisson distribution with
parameter λ = 10 :
10 k
Pr ( N = k ) = e −10 for k = 0 , 1 , 2 ,...
k!
E [ N ] = var ( N ) = λ = 10
Suppose that the individual loss amounts Y are uniformly distributed on 0 , 1000 . Compute
the mean and variance of aggregate annual losses against the portfolio.
Solution
To employ Properties (i) and (ii) of Theorem 1.2, we first need to calculate the mean and variance
of the frequency and severity components:
E [ N ] = var ( N ) = λ = 10 (given)
fY ( y ) = 0.001 for 0 < y < 1, 000
Therefore:
E [S ] = E [ N ] E [Y ] = 10 × 500 = 5, 000
By Property (ii) of Theorem 1.2, the variance of aggregate annual losses is:
2
var (S ) = E [ N ] var (Y ) + ( E [Y ]) var ( N )
The technique introduced here of combining several distributions to form a new distribution is
called mixing. The distribution resulting from mixing is a hybrid of several other distributions.
Mixed distributions
You may be familiar with the most basic type of mixed distribution from a first course in
probability. It has both a discrete part and a continuous part. These two parts can be determined
from the CDF.
17
Loss models Chapter 1
Here is an example from loss model theory of how a mixed distribution arises in a natural way.
Example 1.11
Suppose that a ground up loss X is uniformly distributed on (0 , 1000] . For each such loss
suffered by a policyholder, suppose that an insurer will make a payment Y that is equal to the
excess of the loss over 100 (provided that the loss exceeds 100) up to a maximum reimbursement
of 500.
Determine the cumulative distribution function for Y, the insurance payment per loss and draw
the graph of the CDF.
Solution
0 if 0 < X ≤ 100
Y = X − 100 if 100 < X < 600
500 if 600 ≤ X ≤ 1, 000
Since the loss X is uniformly distributed, we have:
f X ( x ) = 0.001 for 0 < x ≤ 1, 000
• The event Y = 0 is equivalent to the event 0 < X ≤ 100 . So there is a point mass of
probability at zero:
100
FY ( 0 ) = Pr (Y = 0 ) = Pr ( 0 < X ≤ 100 ) = ∫ 0.001 dx = 0.10
0
• The event 100 < X < 600 is equivalent to the event 0 < Y < 500 . For 0 < y < 500 , we
have:
FY ( y ) = Pr (Y ≤ y ) = Pr (Y = 0 ) + Pr ( 0 < Y ≤ y )
= 0.10 + Pr ( 0 < X − 100 ≤ y ) = 0.10 + Pr ( 100 < X ≤ y + 100 )
y + 100
= 0.10 + ∫ 100 0.001 dx = 0.10 + 0.001y for 0 < y < 500
FY (y)
1 •
0.6 ο
0.1 •
ο y
500
♦♦
18
Chapter 1 Loss models
∞
E Y k =
∑ y k Pr (Y = y ) + ∫ −∞ y
k
FY′ ( y ) dy
all y where
Pr(Y = y ) ≠ 0
Pr ( a < Y ≤ b ) = FY ( b ) − FY ( a )
Pr ( a ≤ Y ≤ b ) = Pr (Y = a ) + Pr ( a < Y ≤ b )
= FY ( a ) − lim FY ( x ) + ( FY ( b ) − FY ( a ) )
x →a−
= FY ( b ) − lim FY ( x )
x → a−
Example 1.12
Solution
We will use Property 3 since Y has a mixed distribution. The discrete part has point masses of
probability at 0 and 500. The respective heights of the jump discontinuities at these points are
0.10 and 0.40. The continuous part has a PDF that is non-zero on the interval ( 0 , 500 ) . For
0 < y < 500 , we saw that FY ( y ) = 0.10 + 0.001y in Example 1.11. Looking back at that formula
and the accompanying graph, you can see that the derivative fails to exist at 0 and 500, the
derivative is zero for y < 0 and for y > 500 , and the derivative is equal to 0.001 for 0 < y < 500 .
19
Loss models Chapter 1
Therefore:
E [Y ] = 200 + 125 = 325
E Y 2 = 100, 000 + 41, 666.67 = 141, 666.67
Hence the variance is:
2
var (Y ) = E Y 2 − ( E [Y ]) = 36, 041.67 ♦♦
There are several other ways to make these same calculations. You actually encountered some
mixed distributions in Exam M.
For example, if Z is the random present value variable for a 10-year term insurance of 1 on ( x )
with the benefit payable on death (a continuous model), then Z is a function of T ( x ) (future
lifetime). It has a point mass of probability at Z = 0 corresponding to the event T ( x ) > 10 , since
in this case no payment is made. The continuous part of the distribution of Z corresponds to the
event 0 < T ( x ) ≤ 10 . You can avoid the problem of dealing with Z as a mixed distribution by
computing moments and probabilities in terms of the distribution of T ( x ) .
Example 1.13
Compute the first and second moment of Y in Example 1.11 by viewing Y as a function of the
ground up loss X.
Solution
100 k 600 k 1000
E Y k = ∫ 0 f X ( x ) dx + ∫ 100 ( x − 100 ) f X ( x ) dx + ∫ 600 500 k f X ( x ) dx
0
One other thing can be done with the PDF of a mixed distribution. It can be written as a
weighted average of a discrete probability function and a continuous PDF. For the payment per
loss variable Y in Examples 1.11 - 1.13, we saw that:
Consider a discrete random variable D with point masses of probability at 0 and 500 proportional
to 0.10 and 0.40 respectively:
0.10 0.40
fD ( 0 ) = = 0.20 f D ( 500 ) = = 0.80 and zero otherwise
0.50 0.50
Consider a continuous random variable C whose PDF is proportional to the continuous part of
fY :
0.001
fC ( y ) = = 0.002 for 0 < y < 500 and zero otherwise
0.500
20
Chapter 1 Loss models
fY ( y ) = 0.50 f D ( y ) + 0.50 fC ( y )
The two coefficients equal to 0.50 are the weights. Moments of Y can now be computed in a third
way from this weighted average:
E Dk E C k
The reason that we went through this discussion of a weighted average approach is that this is
the model for the more general types of mixing that we will consider next.
In general the variance of a mixed distribution cannot be expressed as the weighted average of
the individual component variances.
λk
Pr ( N = k ) = e −λ for k = 0 , 1 ,... where λ > 0
k!
E [ N ] = var ( N ) = λ
But, individuals drive different distances over a year, they live in different areas with different
hazards due to traffic patterns, and so on. So not all drivers and policies have the same
characteristics. Therefore the insurer might use different values of λ for different policies.
21
Loss models Chapter 1
Suppose the insurer used past data to classify current policyholders as follows:
Category Parameter Percentage of policies
low risk λ = 0.20 35%
moderate risk λ = 0.40 60%
high risk λ = 1.00 5%
Since the value of λ varies over the portfolio, we should consider it as a discrete random variable
Λ with probability function:
f Λ ( 0.20 ) = 0.35 , f Λ ( 0.40 ) = 0.60 , f Λ ( 1.00 ) = 0.05
Ιn this light, the probability function for annual claim frequency for a policy should be viewed as
a conditional Poisson distribution:
λk
Pr ( N = k | Λ = λ ) = e −λ for k = 0 , 1 ,...
k!
So we have changed our point of view. Instead of viewing the parameter as a number, we now
view it as a random variable.
Now suppose that we have a new policyholder at the start of year 2008 with no past data to help
us classify the risk level. The policyholder will have to pass underwriting, but this person is a
randomly selected policy from the portfolio. So the appropriate annual claims frequency N
should be modeled as the marginal distribution where the effect of λ has been summed out.
Here’s how we obtain the marginal distribution of N :
• The first step is to multiply the marginal probability function of Λ and the conditional
probability function of N given Λ to obtain the joint probability function of N and Λ :
f N ,Λ ( k , λ ) = f Λ ( λ ) f N ( k|Λ = λ )
• The second step is to sum this joint probability function over all values of λ to obtain the
marginal probability function of N:
fN ( k ) = ∑ f N ,Λ ( k , λ ) = ∑ fΛ ( λ ) fN ( k | Λ = λ )
λ λ
0.2 k 0.4 k 1.0 k
= 0.35 e −0.2 + 0.60 e −0.4 + 0.05 e −1.0 for k = 0 , 1 ,...
k! k! k!
This expression could also be referred to as a weighted average of Poisson distributions, where
0.35, 0.60, and 0.05 are the weights that sum to 1. Once this expression is written down it is easy
to calculate probabilities or moments for N.
Example 1.14
(a) The probability that there are 2 or more claims in the next year for a new policyholder.
(b) The expected number of claims for a randomly selected policyholder in the next year.
22
Chapter 1 Loss models
Solution
(a) The probability that is requested is computed as follows. From the marginal probability function
above, we can calculate:
f N ( 0 ) = 0.70714 , f N ( 1 ) = 0.23658
Therefore:
P r ( N ≥ 2 ) = 1 − 0.70714 − 0.23658
= 0.05628
E[N ] = ∑ k Pr ( N = k )
k
∞ 0.2 k 0.4 k 1.0 k
= ∑ k 0.35 e−0.2 k!
+ 0.60 e −0.4
k!
+ 0.05 e −1.0
k !
k =0
∞ ∞ ∞
0.2 k 0.4 k 1.0 k
= 0.35 ∑ k e −0.2
k!
+ 0.60 ∑ k e −0.4
k!
+ 0.05 ∑
k e −1.0
k!
k =0 k =0 k =0
E N| Λ= 0.2 = 0.2 E N| Λ= 0.4 = 0.4 E N| Λ= 1.0 = 1.0
= 0.36
Note: You can see from the last line that this calculation is equivalent to using the Double
Expectation Theorem: E [ N ] = E E N | Λ . ♦♦
f X , Θ ( x ,θ ) = f Θ (θ ) f X ( x | Θ = θ )
fX ( x ) = ∑ f X , Θ ( x ,θ ) = ∑ f Θ (θ ) f X ( x | Θ = θ )
θ θ
E X k = E E X k| Θ =
∑ Pr ( Θ = θ ) E X k | Θ = θ
θ
Two-point mixture
When the mixing parameter Θ has only 2 possible values, then the distribution of X is called a
two-point mixture.
Take another look at Example 1.9 (Section 1.4) where we had a population of lives that was a
mixture of males and females with slightly different mortality models. The mortality model for a
randomly selected life from this population is a two-point mixture. Try to obtain the results of
this example using the theory outlined above.
23
Loss models Chapter 1
The theory here is virtually identical to the theory of mixing with a discrete parameter. Where
you summed in the case of a discrete distribution for the parameter, here you will integrate.
f X , Θ ( x ,θ ) = f Θ (θ ) f X ( x | Θ = θ )
f X ( x ) = ∫ f X , Θ ( x ,θ ) dθ = ∫θ fΘ (θ ) f X ( x | Θ = θ ) dθ
θ
E X k = E E X k| Θ = ∫ f Θ (θ ) E X k | Θ = θ dθ
θ
Example 1.15
θk
f N ( k|Θ = θ ) = e −θ for k = 0 , 1 ,...
k!
1
where: f Θ (θ ) = for 0.2 ≤ θ ≤ 1.0
0.8
Determine Pr ( N ≥ 1 ) and E [ N ] .
Solution
We again have a conditional Poisson distribution for frequency. But the parameter is distributed
uniformly on the interval 0.2 , 1.0 .
fN (0 ) = ∫ f N , Θ ( 0 ,θ ) dθ = ∫ f Θ (θ ) f N ( 0| Θ = θ ) dθ
So we have:
Pr ( N ≥ 1 ) = 1 − f N ( 0 ) = 0.43644
24
Chapter 1 Loss models
Example 1.16
Each life in a certain population has a lifetime that follows a constant force model (Exam M):
For this population, the force varies uniformly from 1.0 to 2.0.
Determine the probability that a randomly selected life survives 1 year.
Solution
We are asked to compute the unconditional probability sX ( 1 ) = Pr ( X > 1) , since for a “randomly
selected life” we do not know (ie are not given) the appropriate force.
We are given that:
f M ( µ ) = 1 for 1 < µ < 2
To compute the marginal PDF for X we first need the joint PDF for X and M:
Let’s pause briefly to think about what is needed to finish the problem. Evaluating this integral is
the tricky part because you now look at x as a constant and you have to anti-differentiate with
respect to µ . So looking at the integral above, you see that integration by parts would be
required. Furthermore, even after this integration is completed, you will still need to perform
another integral calculation:
∞
sX ( 1 ) = ∫1 f X ( x ) dx
There is a possible way to avoid these complications with a bit of theory we have not yet
developed. In the same manner that the marginal PDF was obtained as a weighted average of the
conditional PDF, you can also obtain the marginal survival function as a weighted average of the
conditional survival function:
2
sX ( x ) = ∫µ =1 sX ( x | M = µ ) f M ( µ ) dµ
This integral is considerably simpler than the one for the marginal PDF because
sX ( x | M = µ ) = e − µ x . It also avoids the problem of an additional integral of the PDF to obtain
the survival function.
We can now get there in one step:
2 2 −µx
sX ( x ) = ∫µ =1 sX ( x | M = µ ) f M ( µ ) dµ = ∫µ =1 e ⋅ 1 dµ
2
e− µ x e − x − e −2 x
= − =
x x
µ =1
⇒ sX ( 1 ) = 0.23254 ♦♦
25
Loss models Chapter 1
Theorem 1.3
(i) Suppose that X|Θ ∼ exponential mean Θ and Θ ∼ inverse gamma distribution with parameters
α , θ . Then X follows a 2-parameter Pareto distribution with parameters α , θ matching the
parameters of the inverse gamma mixing distribution.
(ii) Suppose that X|Θ ∼ Normal Θ , σ 12 ( ) and Θ follows the normal distribution Normal µ , σ 22 . ( )
Then X follows the normal distribution Normal µ , σ 12 + σ 22 . ( )
Proof
(i) Here are the details for part (i). We will use λ to denote the mean of the exponential distribution
instead of the usual θ since θ is a parameter of the inverse gamma distribution.
e− x / λ
1. f X ( x|Λ = λ ) = for x > 0 and λ > 0 (conditional exponential)
λ
θ α e −θ / λ
2. fΛ (λ ) = for λ > 0 (marginal inverse gamma: see exam tables)
λ α + 1 Γ (α )
θα e (
− θ + x ) /λ
3. f X ,Λ ( x , λ ) = f Λ ( λ ) f X ( x|Λ = λ ) =
λ α + 2 Γ (α )
α θα (θ + x )α + 1 e−(θ + x ) / λ
= × ( Γ (α + 1 ) = α Γ (α ) )
(θ + x )α + 1 λ α + 2 Γ (α + 1 )
∞
4. fX ( x ) = ∫0 f X , Λ ( x , λ ) dλ
α θα ∞ (θ + x )α + 1 e−(θ + x ) / λ
=
(θ + x )α + 1
∫0 λ α + 2 Γ (α + 1 )
dλ
α θα
= (the 2-parameter Pareto PDF)
(θ + x )α + 1
(ii) The same four steps as in the proof of the first part should be repeated. But this time the algebra
in Step 4 is much more complex. So we will not include a full proof. Instead we will simply use
the Double Expectation Theorem to show that the marginal mean and variance agree with the
assertion in the Theorem:
1. X|Θ ∼ Normal Θ , σ 12 ( )⇒
E X|Θ = Θ , var ( X|Θ ) = σ 12
26
Chapter 1 Loss models
Suppose that an insurer has a portfolio of 100 independent policies. Assume that over the next
year each policy will generate either 0 or 1 claim with respective probabilities 0.90 and 0.10.
Assume that each claim amount is uniformly distributed on (0, 1000] . Then we have:
Pr ( N = 0 ) = 0.90
Pr ( N = 1 ) = 0.10
fY ( y ) = 0.001 for 0 < y < 1000
Suppose we let S denote the insurer’s aggregate claims in the next year. Let’s compute the
expected value and variance of S using both the IRM and CRM models. You will see that mixed
distributions play a role in this analysis. This example will serve as a reminder of the properties
of these two methods of modeling and it should be read several times until it is completely
understood.
IRM
Let’s proceed first according to the IRM. Aggregate annual claims are modeled by:
S = X1 + + X100
where Xi is the total annual claims arising from the ith policy, and the various Xi are
independent and identically distributed like X.
What is the distribution of X? Since each policy will experience 0 or 1 claim, the total annual
claim amount is either 0 with probability 0.90, or it is uniformly distributed on (0, 1000] with
probability 0.10. This is a two-point mixture:
1000 k
E X k = 0 k × 0.90 + ∫ x 0.0001 dx
0
⇒ E [ X ] = 50 , E X 2 = 33, 333.33
⇒ var ( X ) = 30,833.33
E [S ] = 100 E [ X ] = 5, 000
var ( S ) = 100 var ( X ) = 3, 083, 333.33
CRM
Now lets apply the CRM. In this model we view aggregate annual claims as:
S = Y1 + + YN
where Y is the individual claim amount model (uniform on (0, 1000] ), and N is the annual
frequency of claims arising from the whole portfolio.
The claim frequency for a single policy follows a Bernoulli distribution with p = 0.10 . So the
annual claim frequency for the whole portfolio is a sum of 100 independent Bernoulli variables
with p = 0.10 . In other words, the distribution of N is binomial with n = 100 , p = 0.10 .
27
Loss models Chapter 1
Recall from Theorem 1.2 (Section 1.5) that the mean and variance for a CRM model are computed
as:
2
E [S ] = E [ N ] E [Y ] var ( S ) = E [ N ] var (Y ) + ( E [Y ]) var ( N )
1, 000 1, 000 2
E [Y ] = = 500 var (Y ) = = 83, 333.33
2 12
The frequency model is binomial, so we have:
E [ N ] = np = 100 × 0.10 = 10
var ( N ) = np ( 1 − p ) = 9
d g −1 ( y )
(
fY ( y ) = f X g −1 ( y ) ) dy
where g −1 ( y ) is the inverse function
There are basically two types of 1-1, differentiable transformation: increasing and decreasing.
The inverse will exist in each case.
If g ( x ) is an increasing function then so is the inverse function. Similarly, if g ( x ) is a decreasing
function, then so is the inverse function. So the absolute value in the formula above is essential
when the transformation is decreasing and has a negative derivative.
The particular transformations that are frequently used in loss models are:
Linear: Y = aX models an inflationary effect
Exponential: Y = eX X normally distributed ⇒ Y is lognormal
γ
Raise to a power: Y=X X exponentially distributed
eg X −1 is inverse exponential
In each case we have 1-1, differentiable functions, so the method of transformations can be
applied to develop a formula for fY .
28
Chapter 1 Loss models
Example 1.17
Suppose that X follows a gamma distribution (see Chapter 2 for more details) with parameters
θ and α then:
1
f (x) = α
xα −1 e − x /θ where 0 < x < ∞ and θ > 0 , α > 0
θ Γ (α )
Solution
dg −1 ( y )
fY ( y ) = f X g ( −1
( y ))
dy
= fX ( y / a) / a
1
= α ( y / a )α −1 e−( y / a) /θ / a (given a > 0)
θ Γ (α )
1 − y /( aθ )
= α
yα −1 e
( aθ ) Γ (α )
If you look carefully at this relation you should see that Y follows a gamma distribution with the
same α parameter as X, but with the parameter θ being replaced with aθ . ♦♦
Note that you can also prove this result by considering the moment generating function of the
gamma distribution. You may like to check that you can obtain the same result using this
alternative method.
Certain parametric families of distributions that will be used for severity models exhibit the
property: if X is in the family, then so is aX . Such a family is called a scale family. In the
preceding example we saw that the gamma family is a scale family. Another example is the
normal family:
(
X ∼ N µ ,σ 2 ) ⇒ aX ∼ N aµ , a2σ 2 ( )
Sometimes a scale family has a scale parameter. A parameter of X is called a scale parameter if
the corresponding parameter of aX is multiplied by a, and all other parameters are unchanged.
For example, we showed in Example 1.17 that the parameter θ in the gamma distribution is a
scale parameter. In contrast, the normal family does not have a scale parameter. Both the mean
and variance parameters are changed when X is multiplied by a.
Another technique for combining several continuous models to form a hybrid of the components
is called splicing.
If X is a continuous loss model you might want to use different distributions for different ranges
of losses. For example suppose we want to create a model so that losses in the range (0 , 1000] are
uniformly distributed and losses in the range (1000 , 3000] are also uniformly distributed.
Suppose that 85% of losses are less than or equal to 1,000.
29
Loss models Chapter 1
The most general type of splicing of n different distributions together is described as follows:
• For i = 1 , 2 ,... , n , you are given a continuous model PDF f i ( x ) with all probability on
( ai , ai + 1 ] where 0 ≤ a1 < a2 < < an +1
Example 1.18
Suppose that a new loss distribution X is created by splicing the uniform distribution on
(0 , 1000] with the uniform distribution on (1000 , 3000] where p1 = 0.85 and p2 = 0.15 .
Solution
This is the model discussed at the beginning of this section. We have seen that:
As a result we have:
1,000 2,000
Pr ( X ≤ 2, 000 ) = ∫0 0.85 × 0.001 dx + ∫ 1,000 0.15 × 0.0005 dx
= 0.85 + 0.075 = 0.925
a3 a2 a3
E[X ] = ∫ a1 x f ( x ) dx = p1 ∫ a1 x f1 ( x ) dx + p2 ∫
a2
x f 2 ( x ) dx (general)
1,000 3,000
= 0.85 ∫0 0.001x dx + 0.15 ∫ 1,000 0.0005 x dx = 725
♦♦
30
Chapter 1 Loss models
Truncation
As an example of truncation, consider the case where there is a deductible operating on an
insurance policy (covered in detail in Chapter 2). For example, suppose that for each ground up
loss X greater than 100, an insurer pays a claim Y, the excess of X over 100. So we have:
Y = ( X − 100 )| X > 100 = ( X | X > 100 ) − 100
The distributions of Y and X are closely related, but the only values of X that the insurer is aware
of are ones greater than 100. If a loss X is less than or equal to 100, there is no claim to be filed
and paid by the insurer. So for some X ’s there is no Y . There is a loss of information as you
move from X to Y. This is an example of truncation below at 100. In general, any variable Y that
is a conditional form of X, given that X falls in some interval, is obtained by truncating the
distribution of X. For some values of X, there will be no value of Y to observe.
We have already met this phenomenon in survival model theory. The future lifetime after age x
is obtained by truncating and shifting the distribution of the lifetime of a newborn:
T ( x ) = ( X − x ) | X > x = ( X| X > x ) − x
The main result that is needed when dealing with a random variable Y that is obtained by
truncating X is a method for finding the PDF of Y from the PDF of X. This involves a simple
argument with conditional probabilities.
Suppose that Y = X | X ∈ I for some interval I = ( a , b ) . We will first derive the relationship
between the two distribution functions.
For x ∈( a , b ) , we have:
fX ( x )
In the discrete case we also have fY ( x ) = for a < x < b .
Pr ( a < X < b )
31
Loss models Chapter 1
Example 1.19
2
Suppose that lx = ( 80 − x ) for 0 ≤ x ≤ 80 and let X be the corresponding lifetime of a newborn.
Determine the PDF of Y, the age at death of a newborn who is known to die after age 50.
Solution
fX ( x )
fY ( x ) = for 50 < x ≤ 80
FX ( 80 ) − FX ( 50 )
fX ( x ) 2 ( 80 − x ) /80 2 2 ( 80 − x )
fY ( x ) = = = for 50 < x ≤ 80 ♦♦
FX ( 80 ) − FX ( 50 ) 30
2 30 2
1 − 1 − 2
80
Censoring
Another important concept is censoring. If X is a random variable we defined Y = X ∧ n as:
X if X ≤ n
X ∧ n = min {X , n} =
n if X > n
Information is lost here in the sense that when you observe Y some X values are known
imprecisely.
The temporary life expectancy ex:n may be defined as E T ( x ) ∧ n , the expected value of a
censored form of the future lifetime at age x.
In loss models, a similar type of censoring occurs as follows. Suppose that X is the ground up
loss of a policyholder, and suppose the insurance payment per loss, Y, is X ∧ L . In other words,
losses less than X are fully reimbursed, but losses X that are bigger than L result in a
reimbursement of L. The number L is known as a policy limit (discussed further in Chapter 2).
32
Chapter 1 Loss models
Example 1.20
Suppose that a ground up loss X is uniformly distributed on the interval (0 , 1000] . Suppose
there is a policy limit of 500 per loss.
Determine the insurer’s expected payment per loss.
Solution
We are given:
f X ( x ) = 0.001 for 0 < x ≤ 1, 000 and Y = X ∧ 500 = min { X , 500}
5002 × 0.001
= + 500 2 × 0.001 = 125 + 250 = 375 ♦♦
2
It is interesting to note that when a continuous random variable X is censored above at L, that is,
we observe Y = X ∧ L , the resulting variable has a mixed distribution. The PDF of Y is given by:
This is simply due to the fact that when X = x < L we have Y = x as well. And there is
obviously a point mass probability at Y = L , since this event is equivalent to the event X ≥ L that
has probability equal to sX ( L ) .
33
Loss models Chapter 1
Question 1.1
A continuous random variable X has a moment generating function given by
−2
MX ( t ) = ( 1 − 10t ) for t < 0.10 . Determine the mean and variance of X.
Question 1.2
The probability function for a discrete random variable N is given by:
4
Pr ( N = k ) = 0.7 k 0.34 − k for k = 0 , 1 , 2 , 3 , 4
k
Use the binomial theorem to obtain a closed form formula for the moment generating function of
N. Then use this formula to write down a formula for the probability generating function.
Question 1.3
If N 1 and N 2 are independent and identically distributed like N in Question 1.2, find the
probability generating function for N 1 + N 2 .
Question 1.4
In Example 1.6 we found f ∗X3 ( x ) where f X ( 1 ) = 0.7 and f X ( 2 ) = 0.3 . Use the results of this
∗4
Example along with the recursive method to calculate f X (x) .
Question 1.5
An insurer has a portfolio of 100 policies. For each policy there is an 80% chance that the total
annual loss is zero. There is a 20% chance that the total annual loss is uniformly distributed on
the interval 100 , 2, 000 . Determine the mean, variance, and approximate 95-th percentile for
the insurer’s aggregate annual losses. Use the Central Limit Theorem to approximate the
distribution of aggregate annual losses.
Question 1.6
In a certain group of 23-year-olds the males have a mean height of 70 inches and a standard
deviation of 4 inches. The females have a mean height of 67 inches with a standard deviation of 3
inches. Males make up 45% of the group. Find the mean and standard deviation in the height of
a randomly selected member of this group.
34
Chapter 1 Loss models
Question 1.7
A portfolio consists of 4 policies. The individual loss amounts for a 1-year period and the policy
numbers are as follows:
57 (policy #2) , 100 (Policy #4) , 90 (Policy #2) , 140 (Policy #1) , 30 (Policy #1)
Using the notation of Section 1.5, determine X1 , X 2 , X3 , X 4 and S . Then determine the values
of N , Y1 , Y2 ,... , YN and S .
Question 1.8
The annual frequency of losses from a portfolio, N, follows a Poisson distribution with mean
λ = 5:
5k
Pr ( N = k ) = e −5 k = 0 , 1 , 2 ,...
k!
The individual loss amounts, Y, follow an exponential distribution with PDF:
Question 1.9
A ground up loss follows the PDF:
2 ( 1, 000 − x )
fX ( x ) = for 0 ≤ x ≤ 1, 000
1, 000 2
For each such loss by a policyholder an insurer pays Y. Y is 80 % of the excess of the loss over 50,
if the loss exceeds 50, and with a maximal reimbursement of 500. Write down a formula for Y in
terms of X and draw the graph of this relation.
Question 1.10
For the loss model in Question 1.9, determine a formula for the CDF FY ( y ) and draw its graph.
Question 1.11
For the loss model in Question 1.9, the random variable Y has a mixed distribution. Give a
formula for fY ( y ) , identifying the discrete and continuous parts.
Question 1.12
Use the results of Question 1.11 to calculate the expected value of the payment per loss Y.
35
Loss models Chapter 1
Question 1.13
Duplicate the results of Question 1.12 by viewing Y as a function of X.
Question 1.14
The individual loss amounts against a portfolio are uniformly distributed on 0 , m . For 75% of
the losses we have m = 1, 000 . For the other 25% of losses we have m = 2, 000 . Write down a
formula for the PDF of a randomly selected loss from this portfolio.
Question 1.15
Using the Double Expectation Theorem, determine the mean and variance for the loss variable in
Question 1.14.
Question 1.16
The annual frequency of losses against a policy in a portfolio follows the Poisson probability
function:
λk
Pr ( N = k|λ ) = e −λ for k = 0 , 1 , 2 ,...
k!
Over the portfolio, λ varies uniformly from 0.2 to 1.0. Determine Pr ( N ≥ 2 ) for a randomly
selected policy from this portfolio.
Question 1.17
A loss X follows the exponential distribution with mean 50:
e − x /50
fX ( x ) = for x > 0
50
The random variable Y is equal to 1 /X . Calculate the PDF of Y (an inverse exponential
distribution).
Question 1.18
You are given:
f 1 ( x ) = 0.001 for 0 < x ≤ 1, 000 , p1 = 0.80
2 × 1, 000 2
f2 ( x ) = for x > 1, 000 , p2 = 0.20
x3
The distributions are spliced together to form a PDF f ( x ) . Write down a formula for f ( x ) and
compute the mean of the spliced distribution.
36
Chapter 1 Loss models
Question 1.19
Suppose that X follows the exponential distribution with mean 500:
e − x /500
fX ( x ) = for x > 0
500
If Y is obtained by truncating X below at 100, determine the PDF of Y and its expected value
E X | X > 100 .
Question 1.20
Suppose that X follows the exponential distribution with mean 500:
e − x /500
fX ( x ) = for x > 0
500
If Y is obtained by censoring X above at 1,000, determine the PDF of Y (labelling the discrete and
continuous parts), and its expected value.
37
Loss models Chapter 1
Appendix
Proof
The proof of the DET will assume continuous random variables but the results apply equally well
to discrete ones. Note first that we have:
∞
E X | Y = y = ∫−∞ x f X ( x | Y = y ) dx
∞
E E X | Y = y = ∫−∞ ( E X |Y = y ) fY ( y ) dy
∞ ∞
= ∫−∞ ∫−∞ x f X ( x|Y = y ) dx fY ( y ) dy
∞ ∞
= ∫−∞ ∫−∞ x fX ( x|Y = y ) fY ( y ) dx dy
∞ ∞
= ∫−∞ ∫−∞ x fXY (x , y ) dx dy
= E[X ]
For the variance formula we proceed as follows. First, note that a simple modification of the
argument above (ie replacing X by X 2 ) will result in:
E E X 2|Y = E X 2
( )
2
= E E X 2 | Y − E E X | Y
( )
2
= E var ( X | Y ) + E X | Y
2
( ) − E E X | Y
( )
2
= E var ( X | Y ) + E E X | Y
2
( ) ( )
− E E X | Y
(
var E X|Y )
38