100% found this document useful (3 votes)

4K views810 pages

Exam C Manual

This document provides an introductory guide to constructing actuarial models to prepare for the Actuarial Exam C/4. It covers topics such as probability, random variables, distributions, parameter estimation, credibility theory, and aggregate loss models. The guide is based on sections from the textbook "Loss Models: From Data to Decisions" and includes examples, problems, and references to help understand the exam topics.

Uploaded by

Anonymous RNacXQ

We take content rights seriously. If you suspect this is your content, claim it here.

100% found this document useful (3 votes)

4K views810 pages

Exam C Manual

Uploaded by

Anonymous RNacXQ

We take content rights seriously. If you suspect this is your content, claim it here.

You are on page 1/ 810

An Introductory Guide in the Construction of

Actuarial Models:
A Preparation for the Actuarial Exam C/4

Marcel B. Finan
Arkansas Tech University
All
c Rights Reserved
Preliminary Draft
Last Updated

November 4, 2017
To my son
Amin

ii
Preface

This is the fifth of a series of lecture notes intended to help individuals

to pass actuarial exams. The topics in this manuscript parallel the topics
tested on Exam C of the Society of Actuaries exam sequence. As with the
previous manuscripts, the main objective of the present manuscript is to
increase users’ understanding of the topics covered on the exam.

The flow of topics follows very closely that of Klugman et al. Loss Models:
From Data to Decisions. The lectures cover designated sections from this
book as suggested by the 2012 SOA Syllabus.

The recommended approach for using this manuscript is to read each sec-
tion, work on the embedded examples, and then try ALL the problems given
in the text. An answer key is provided by request. Email:[email protected].

Problems taken from previous SOA/CAS exams will be indicated by the

symbol ‡.

This manuscript can be used for personal use or class use, but not for com-
mercial purposes. If you find any errors, I would appreciate hearing from
you: [email protected]

Marcel B. Finan
Russellville, Arkansas
February 15, 2013.

iii
iv
Contents

Preface iii

Actuarial Modeling 1
1 Understanding Actuarial Models . . . . . . . . . . . . . . . . . . 2

A Review of Probability Related Results 5

2 A Brief Review of Probability . . . . . . . . . . . . . . . . . . . . 6
3 A Review of Random Variables . . . . . . . . . . . . . . . . . . . 18
4 Raw and Central Moments . . . . . . . . . . . . . . . . . . . . . 34
5 Empirical Models, Excess and Limited Loss variables . . . . . . . 45
6 Median, Mode, Percentiles, and Quantiles . . . . . . . . . . . . . 62
7 Sum of Random Variables and the Central Limit Theorem . . . . 69
8 Moment Generating Functions and Probability Generating Func-
tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Tail Weight of a Distribution 87

9 Tail Weight Measures: Moments and the Speed of Decay of S(x) 88
10 Tail Weight Measures: Hazard Rate Function and Mean Excess
Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
11 Equilibrium Distributions and Tail Weight . . . . . . . . . . . . 102

Risk Measures 109

12 Coherent Risk Measurement . . . . . . . . . . . . . . . . . . . . 110
13 Value-at-Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
14 Tail-Value-at-Risk . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Characteristics of Actuarial Models 125

15 Parametric and Scale Distributions . . . . . . . . . . . . . . . . 126
16 Discrete Mixture Distributions . . . . . . . . . . . . . . . . . . . 131
17 Data-dependent Distributions . . . . . . . . . . . . . . . . . . . 138

v
Generating New Distributions 143
18 Scalar Multiplication of Random Variables . . . . . . . . . . . . 144
19 Powers and Exponentiation of Random Variables . . . . . . . . 148
20 Continuous Mixing of Distributions . . . . . . . . . . . . . . . . 153
21 Frailty (Mixing) Models . . . . . . . . . . . . . . . . . . . . . . 160
22 Spliced Distributions . . . . . . . . . . . . . . . . . . . . . . . . 165
23 Limiting Distributions . . . . . . . . . . . . . . . . . . . . . . . 168
24 The Linear Exponential Family of Distributions . . . . . . . . . 172

Discrete Distributions 177

25 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . 178
26 The Negative Binomial Distribution . . . . . . . . . . . . . . . . 183
27 The Bernoulli and Binomial Distributions . . . . . . . . . . . . 189
28 The (a, b, 0) Class of Discrete Distributions . . . . . . . . . . . . 195
29 The Class C(a, b, 1) of Discrete Distributions . . . . . . . . . . . 201
30 The Extended Truncated Negative Binomial Model . . . . . . . 206

Modifications of the Loss Random Variable 211

31 Ordinary Policy Deductibles . . . . . . . . . . . . . . . . . . . . 212
32 Franchise Policy Deductibles . . . . . . . . . . . . . . . . . . . . 220
33 The Loss Elimination Ratio and Inflation Effects for Ordinary
Deductibles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
34 Policy Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
35 Combinations of Coinsurance, Deductibles, Limits, and Inflations237
36 The Impact of Deductibles on the Number of Payments . . . . 250

Aggregate Loss Models 257

37 Individual Risk and Collective Risk Models . . . . . . . . . . . 258
38 Aggregate Loss Distributions via Convolutions . . . . . . . . . . 264
39 Stop Loss Insurance . . . . . . . . . . . . . . . . . . . . . . . . . 284
40 Closed Form of Aggregate Distributions . . . . . . . . . . . . . 294
41 Distribution of S via the Recursive Method . . . . . . . . . . . 300
42 Discretization of Continuous Severities . . . . . . . . . . . . . . 308
43 Individual Policy Modifications Impact on Aggregate Losses . . 314
44 Aggregate Losses for the Individual Risk Model . . . . . . . . . 321
45 Approximating Probabilities in the Individual Risk Model . . . 328

Review of Mathematical Statistics 333

46 Properties of Point Estimators . . . . . . . . . . . . . . . . . . . 334
47 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 341

vi
48 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . 344

The Empirical Distribution for Complete Data 351

49 The Empirical Distribution for Individual Data . . . . . . . . . 352
50 Empirical Distribution of Grouped Data . . . . . . . . . . . . . 357

Estimation of Incomplete Data 363

51 The Risk Set of Incomplete Data . . . . . . . . . . . . . . . . . 364
52 The Kaplan-Meier and Nelson-Åalen Estimators . . . . . . . . . 370
53 Mean and Variance of Empirical Estimators with Complete Data379
54 Greenwood Estimate for the Variance of the Kaplan-Meier Es-
timator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
55 Variance Estimate of the Nelson-Åalen Estimator and Confi-
dence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . 393
56 Kernel Density Estimation . . . . . . . . . . . . . . . . . . . . . 400
57 The Kaplan-Meier Approximation for Large Data Sets . . . . . 409

Methods of parameter Estimation 419

58 Method of Moments and Matching Percentile . . . . . . . . . . 421
59 Maximum Likelihood Estimation for Complete Data . . . . . . 434
60 Maximum Likelihood Estimation for Incomplete Data . . . . . . 445
61 Asymptotic Variance of MLE . . . . . . . . . . . . . . . . . . . 455
62 Information Matrix and the Delta Method . . . . . . . . . . . . 463
63 Non-Normal Confidence Intervals for Parameter Estimation . . 471
64 Basics of Bayesian Inference . . . . . . . . . . . . . . . . . . . . 474
65 Bayesian Parameter Estimation . . . . . . . . . . . . . . . . . . 485
66 Conjugate Prior Distributions . . . . . . . . . . . . . . . . . . . 493
67 Estimation of Class (a, b, 0) . . . . . . . . . . . . . . . . . . . . 497
68 MLE with (a, b, 1) Class . . . . . . . . . . . . . . . . . . . . . . 506

Model Selection and Evaluation 513

69 Assessing Fitted Models Graphically . . . . . . . . . . . . . . . 514
70 Kolmogorov-Smirnov Hypothesis Test of Fitted Models . . . . . 521
71 Anderson-Darling Hypothesis Test of Fitted Models . . . . . . . 527
72 The Chi-Square Goodness of Fit Test . . . . . . . . . . . . . . . 532
73 The Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . 540
74 Schwarz Bayesian Criterion . . . . . . . . . . . . . . . . . . . . 546

vii
Credibility Theory 551
75 Limited Fluctuation Credibility Approach: Full Credibility . . . 552
76 Limited Fluctuation Credibility Approach: Partial Credibility . 561
77 Greatest Accuracy Credibility Approach . . . . . . . . . . . . . 567
78 Conditional Distributions and Expectation . . . . . . . . . . . . 571
79 Bayesian Credibility with Discrete Prior . . . . . . . . . . . . . 581
80 Bayesian Credibility with Continuous Prior . . . . . . . . . . . 594
81 Bühlman Credibility Premium . . . . . . . . . . . . . . . . . . . 600
82 The Bühlmann Model with Discrete Prior . . . . . . . . . . . . 605
83 The Bühlmann Model with Continuous Prior . . . . . . . . . . 618
84 The Bühlmann-Straub Credibility Model . . . . . . . . . . . . . 627
85 Exact Credibility . . . . . . . . . . . . . . . . . . . . . . . . . . 637
86 Non-parametric Empirical Bayes Estimation for the Bühlmann
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
87 Non-parametric Empirical Bayes Estimation for the Bühlmann-
Straub Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
88 Semiparametric Empirical Bayes Credibility Estimation . . . . 661

Basics of Stochastic Simulation 667

89 The Inversion Method for Simulating Random Variables . . . . 668
90 Applications of Simulation in Actuarial Modeling . . . . . . . . 675
91 Estimating Risk Measures Using Simulation . . . . . . . . . . . 684
92 The Bootstrap Method for Estimating Mean Square Error . . . 686

Answer Key 691

Exam C Tables 781

BIBLIOGRAPHY 796

Index 798

viii
Actuarial Modeling

This book is concerned with the construction and evaluation of actuarial

models. The purpose of this chapter is to define models in the actuarial
setting and suggest a process of building them.

1
2 ACTUARIAL MODELING

1 Understanding Actuarial Models

Modeling is very common in actuarial applications. For example, life in-
surance actuaries use models to arrive at the likely mortality rates of their
customers; car insurance actuaries use models to work out claim probabil-
ities by rating factors; pension fund actuaries use models to estimate the
contributions and investments they will need to meet their future liabilities.

A “model” in actuarial applications is a simplified mathematical descrip-

tion of a certain actuarial task. Actuarial models are used by actuaries to
form an opinion and recommend a course of action on contingencies relating
to uncertain future events.

Commonly used actuarial models are classified into two categories:

(I) Deterministic Models. These are models that produce a unique set
of outputs for a given set of inputs such as the future value of a deposit
in a savings account. In these models, the inputs and outputs don’t have
associated probability weightings.

(II) Stochastic or Probabilistic Models. In contrast to deterministic

models, these are models where the outputs or/and some of the inputs are
random variables. Examples of stochastic models that we will discuss in
this book are the asset model, the claims model, and the frequency-severity
model.

The book in [4] explains in enormous detail the advantages and disadvan-
tages of stochastic (versus deterministic) modeling.

Example 1.1
Determine whether each of the model below is deterministic or stochastic.
(a) The monthly payment P on a home or a car loan.
(b) A modification of the model in (a) is P + ξ, where ξ is a random variable
introduced to account for the possibility of failure of making a payment.

Solution.
(a) In this model, the element of randomness is absent. This model is a
deterministic one.
(b) Because of the presence of the random variable ξ, the given model is
stochastic
1 UNDERSTANDING ACTUARIAL MODELS 3

In [1], the following process for building an actuarial model is presented.

Phases of a Good Modeling Process

A good modeling requires a thorough understanding of the problem mod-

elled. The following is a helpful checklist of a modeling process which is by
no means complete:

Choice of Models. Appropriate models are selected based on the actuary’s

prior knowledge and experience and the nature of the available data.

Model Calibration. Available data and existing techniques are used to cali-
brate a model.

Model Validation. Diagnostic tests are used to ensure the model meets its
objectives and adequately conforms to the data.

Adequacy of Models. There is a possibility that the models in the previ-

ous stage are inadequate in which case one considers other possible models.
Also, there is a possibility of having more than one adequate models.

Selection of Models. Based on some preset criteria, the best model will
be selected among all valid models.

Modification of Models. The model selected in the previous stage needs

to be constantly updated in the light of new future data and other changes.
4 ACTUARIAL MODELING

Practice Problems
Problem 1.1
After an actuary being hired, his or her annual salary progression is mod-
eled according to the formula S(t) = $45, 000e0.06t , where t is the number
of years of employment.

Determine whether this model is deterministic or stochastic.

Problem 1.2
In the previous model, a random variable ξ is introduced: S(t) = $45, 000e0.06t +
ξ.

Determine whether this model is deterministic or stochastic.

Problem 1.3
Consider a model that depends on the movement of a stock market such as
the pricing of an option with an underlying stock.

Does this model considered a deterministic or stochastic model?

Problem 1.4
Consider a model that involves the life expectancy of a policyholder.

Is this model categorized as stochastic or deterministic?

Problem 1.5
Insurance companies use models to estimate their assets and liabilities.

Are these models considered deterministic or stochastic?

A Review of Probability
Related Results

One aspect of insurance is that money is paid to policyholders upon the

occurrence of a random event. For example, a claim in an automobile in-
surance policy will be filed whenever the insured auto is involved in a car
wreck. In this chapter a brief outline of the essential material from the the-
ory of probability is given. Almost all of the material presented here should
be familiar to the reader. A more thorough discussion can be found in [2]
and a listing of important results can be found in [3]. Probability concepts
that are not usually covered in an introductory probability course will be
introduced and discussed in futher details whenever needed.

5
6 A REVIEW OF PROBABILITY RELATED RESULTS

2 A Brief Review of Probability

In probability, we consider experiments whose results cannot be predicted
with certainty. Examples of such experiments include rolling a die, flipping
a coin, and choosing a card from a deck of playing cards.

By an outcome or simple event we mean any result of the experiment.

For example, the experiment of rolling a die yields six outcomes, namely,
the outcomes 1,2,3,4,5, and 6.

The sample space Ω of an experiment is the set of all possible outcomes

for the experiment. For example, if you roll a die one time then the exper-
iment is the roll of the die. A sample space for this experiment could be
Ω = {1, 2, 3, 4, 5, 6} where each digit represents a face of the die.

An event is a subset of the sample space. For example, the event of rolling
an odd number with a die consists of three simple events {1, 3, 5}.

Example 2.1
Consider the random experiment of tossing a coin three times.
(a) Find the sample space of this experiment.
(b) Find the outcomes of the event of obtaining more than one head.

Solution.
We will use T for tail and H for head.
(a) The sample space is composed of eight simple events:

Ω = {T T T, T T H, T HT, T HH, HT T, HT H, HHT, HHH}.

(b) The event of obtaining more than one head is the set

{T HH, HT H, HHT, HHH}

The complement of an event E, denoted by E c , is the set of all possible

outcomes not in E. The union of two events A and B is the event A ∪ B
whose outcomes are either in A or in B. The intersection of two events A
and B is the event A ∩ B whose outcomes are outcomes of both events A
and B.

Two events A and B are said to be mutually exclusive if they have no

outcomes in common. Clearly, for any event E, the events E and E c are
mutually exclusive.
2 A BRIEF REVIEW OF PROBABILITY 7

Remark 2.1
The above definitions of intersection, union, and mutually exclusive can be
extended to any number of events.

Probability Axioms
Probability is the measure of occurrence of an event. It is a function Pr(·)
defined on the collection of all (subsets) events of a sample space Ω and
which satisfies Kolmogorov axioms:

Axiom 1: For any event E ⊆ Ω, 0 ≤ Pr(E) ≤ 1.

Axiom 2: Pr(Ω) = 1.
Axiom 3: For any sequence of mutually exclusive events {En }n≥1 , that is
Ei ∩ Ej = ∅ for i 6= j, we have
P∞
Pr (∪∞n=1 En ) = n=1 Pr(En ). (Countable Additivity)

If we let E1 = Ω, En = ∅ forPn > 1 then by Axioms P 2 and 3 we have

∞ ∞
1 = Pr(Ω) = Pr (∪∞ n=1 En ) = n=1 Pr(E n ) = Pr(Ω) + n=2 Pr(∅). This
implies that Pr(∅) = 0. Also, if {E1 , E2 , · · · , En } is a finite set of mutually
exclusive events, then by defining Ek = ∅ for k > n and Axiom 3 we find
n
X
Pr (∪nk=1 Ek ) = Pr(Ek ).
k=1

Any function Pr that satisfies Axioms 1-3 will be called a probability mea-
sure.

Example 2.2
Consider the sample space Ω = {1, 2, 3}. Suppose that Pr({1, 3}) = 0.3
and Pr({2, 3}) = 0.8. Find Pr(1), Pr(2), and Pr(3). Is Pr a valid probability
measure?

Solution.
For Pr to be a probability measure we must have Pr(1) + Pr(2) + Pr(3) = 1.
But Pr({1, 3}) = Pr(1) + Pr(3) = 0.3. This implies that 0.3 + Pr(2) = 1
or Pr(2) = 0.7. Similarly, 1 = Pr({2, 3}) + Pr(1) = 0.8 + Pr(1) and so
Pr(1) = 0.2. It follows that Pr(3) = 1 − Pr(1) − Pr(2) = 1 − 0.2 − 0.7 = 0.1.
It can be easily seen that Pr satisfies Axioms 1-3 and so Pr is a probability
measure
8 A REVIEW OF PROBABILITY RELATED RESULTS

Probability Trees
For all multistage experiments, the probability of the outcome along any
path of a tree diagram is equal to the product of all the probabilities along
the path.

Example 2.3
In a city council, 35% of the members are female, and the other 65% are
male. 70% of the male favor raising city sales tax, while only 40% of the
female favor the increase. If a member of the council is selected at random,
what is the probability that he or she favors raising sales tax?

Solution.
Figure 2.1 shows a tree diagram for this problem.

Figure 2.1

The first and third branches correspond to favoring the tax. We add their
probabilities.
Pr(tax) = 0.455 + 0.14 = 0.595
Conditional Probability and Bayes Formula
Consider the question of finding the probability of an event A given that an-
other event B has occurred. Knowing that the event B has occurred causes
us to update the probabilities of other events in the sample space.

To illustrate, suppose you roll two dice of different colors; one red, and
one green. You roll each die one at time. Our sample space has 36 out-
1
comes. The probability of getting two ones is 36 . Now, suppose you were
told that the green die shows a one but know nothing about the red die.
What would be the probability of getting two ones? In this case, the answer
is 16 . This shows that the probability of getting two ones changes if you have
2 A BRIEF REVIEW OF PROBABILITY 9

partial information, and we refer to this (altered) probability as a condi-

tional probability.

If the occurrence of the event A depends on the occurrence of B then the

conditional probability will be denoted by P (A|B), read as the probability
of A given B. It is given by
number of outcomes corresponding to event A and B
Pr(A|B) = .
number of outcomes of B
Thus,
n(A∩B)
n(A ∩ B) n(S) Pr(A ∩ B)
Pr(A|B) = = n(B)
=
n(B) Pr(B)
n(S)

provided that Pr(B) > 0.

Example 2.4
Let A denote the event “an immigrant is male” and let B denote the event
“an immigrant is Brazilian”. In a group of 100 immigrants, suppose 60
are Brazilians, and suppose that 10 of the Brazilians are males. Find the
probability that if I pick a Brazilian immigrant, it will be a male, that is,
find Pr(A|B).

Solution.
Since 10 out of 100 in the group are both Brazilians and male, Pr(A ∩ B) =
10 60
100 = 0.1. Also, 60 out of the 100 are Brazilians, so Pr(B) = 100 = 0.6.
Hence, Pr(A|B) = 0.1
0.6 = 6
1

It is often the case that we know the probabilities of certain events con-
ditional on other events, but what we would like to know is the “reverse”.
That is, given Pr(A|B) we would like to find Pr(B|A).

Bayes’ formula is a simple mathematical formula used for calculating Pr(B|A)

given Pr(A|B). We derive this formula as follows. Let A and B be two events.
Then
A = A ∩ (B ∪ B c ) = (A ∩ B) ∪ (A ∩ B c ).
Since the events A ∩ B and A ∩ B c are mutually exclusive, we can write

Pr(A) = Pr(A ∩ B) + Pr(A ∩ B c )

= Pr(A|B)Pr(B) + Pr(A|B c )Pr(B c ) (2.1)
10 A REVIEW OF PROBABILITY RELATED RESULTS

Example 2.5
A soccer match may be delayed because of bad weather. The probabilities
are 0.60 that there will be bad weather, 0.85 that the game will take place
if there is no bad weather, and 0.35 that the game will be played if there is
bad weather. What is the probability that the match will occur?

Solution.
Let A be the event that the game will be played and B is the event that
there will be a bad weather. We are given Pr(B) = 0.60, Pr(A|B c ) = 0.85,
and Pr(A|B) = 0.35. From Equation (2.1) we find

Pr(A) = Pr(B)Pr(A|B)+Pr(B c )Pr(A|B c ) = (0.60)(0.35)+(0.4)(0.85) = 0.55

From Equation (2.1) we can get Bayes’ formula:

Pr(A ∩ B) Pr(A|B)Pr(B)
Pr(B|A) = = . (2.2)
Pr(A) Pr(A|B)Pr(B) + Pr(A|B c )Pr(B c )
Example 2.6
A factory uses two machines A and B for making socks. Machine A produces
10% of the total production of socks while machine B produces the remaining
90%. Now, 1% of all the socks produced by A are defective while 5% of all
the socks produced by B are defective. Find the probability that a sock
taken at random from a day’s production was made by the machine A,
given that it is defective?

Theorem 2.1 (Bayes’ Theorem)

Suppose that the sample space Ω is the union of mutually exclusive events
H1 , H2 , · · · , Hn with P (Hi ) > 0 for each i. Then for any event A and 1 ≤
i ≤ n we have
Pr(A|Hi )Pr(Hi )
Pr(Hi |A) =
Pr(A)
2 A BRIEF REVIEW OF PROBABILITY 11

where

Pr(A) = Pr(H1 )Pr(A|H1 ) + Pr(H2 )Pr(A|H2 ) + · · · + Pr(Hn )Pr(A|Hn ).

Example 2.7
A survey is taken in Oklahoma, Kansas, and Arkansas. In Oklahoma, 50%
of surveyed support raising tax, in Kansas, 60% support a tax increase, and
in Arkansas only 35% favor the increase. Of the total population of the
three states, 40% live in Oklahoma, 25% live in Kansas, and 35% live in
Arkansas. Given that a surveyed person is in favor of raising taxes, what is
the probability that he/she lives in Kansas?

Solution.
Let LI denote the event that a surveyed person lives in state I, where I
= OK, KS, AR. Let S denote the event that a surveyed person favors tax
increase. We want to find Pr(LKS |S). By Bayes’ formula we have

Practice Problems
Problem 2.1
Consider the sample space of rolling a die. Let A be the event of rolling
an even number, B the event of rolling an odd number, and C the event of
rolling a 2.

Find
(a) Ac , B c and C c .
(b) A ∪ B, A ∪ C, and B ∪ C.
(c) A ∩ B, A ∩ C, and B ∩ C.
(d) Which events are mutually exclusive?

Problem 2.2
If, for a given experiment, O1 , O2 , O3 , · · · is an infinite sequence of outcomes,
verify that
i
1
Pr(Oi ) = , i = 1, 2, 3, · · ·
2
is a probability measure.

Problem 2.3 ‡
An insurer offers a health plan to the employees of a large company. As
part of this plan, the individual employees may choose exactly two of the
supplementary coverages A, B, and C, or they may choose no supplementary
coverage. The proportions of the company’s employees that choose cover-
ages A, B, and C are 14 , 13 , and , 12
5
respectively.

Determine the probability that a randomly chosen employee will choose

no supplementary coverage.

Problem 2.4
A toll has two crossing lanes. Let A be the event that the first lane
is busy, and let B be the event the second lane is busy. Assume that
Pr(A) = 0.2, Pr(B) = 0.3 and Pr(A ∩ B) = 0.06.

Find the probability that both lanes are not busy.

Hint: Recall the identity

Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).

2 A BRIEF REVIEW OF PROBABILITY 13

Problem 2.5
If a person visits a car service center, suppose that the probability that he
will have his oil changed is 0.44, the probability that he will have a tire
replacement is 0.24, the probability that he will have airfilter replacement
is 0.21, the probability that he will have oil changed and a tire replaced is
0.08, the probability that he will have oil changed and air filter changed is
0.11, the probability that he will have a tire and air filter replaced is 0.07,
and the probability that he will have oil changed, a tire replacement, and
an air filter changed is 0.03.

What is the probability that at least one of these things done to the car?
Recall that

Pr(A∪B∪C) = Pr(A)+Pr(B)+Pr(C)−Pr(A∩B)−Pr(A∩C)−Pr(B∩C)+Pr(A∩B∩C)

Problem 2.6 ‡
A survey of a group’s viewing habits over the last year revealed the following
information
(i) 28% watched gymnastics
(ii) 29% watched baseball
(iii) 19% watched soccer
(iv) 14% watched gymnastics and baseball
(v) 12% watched baseball and soccer
(vi) 10% watched gymnastics and soccer
(vii) 8% watched all three sports.
Find the probability of a viewer that watched none of the three sports during
the last year.

Problem 2.7 ‡
The probability that a visit to a primary care physician’s (PCP) office re-
sults in neither lab work nor referral to a specialist is 35% . Of those coming
to a PCP’s office, 30% are referred to specialists and 40% require lab work.

Determine the probability that a visit to a PCP’s office results in both

lab work and referral to a specialist.

Problem 2.8 ‡
You are given Pr(A ∪ B) = 0.7 and Pr(A ∪ B c ) = 0.9.

Determine Pr(A).
14 A REVIEW OF PROBABILITY RELATED RESULTS

Problem 2.9 ‡
Among a large group of patients recovering from shoulder injuries, it is found
that 22% visit both a physical therapist and a chiropractor, whereas 12%
visit neither of these. The probability that a patient visits a chiropractor
exceeds by 14% the probability that a patient visits a physical therapist.

Determine the probability that a randomly chosen member of this group

visits a physical therapist.

Problem 2.10 ‡
In modeling the number of claims filed by an individual under an auto-
mobile policy during a three-year period, an actuary makes the simplifying
assumption that for all integers n ≥ 0, pn+1 = 15 pn , where pn represents the
probability that the policyholder files n claims during the period.

Under this assumption, what is the probability that a policyholder files

more than one claim during the period?

Problem 2.11
An urn contains three red balls and two blue balls. You draw two balls
without replacement. Construct a probability tree diagram that represents
the various outcomes that can occur.

What is the probability that the first ball is red and the second ball is
blue?

Problem 2.12
Repeat the previous exercise but this time replace the first ball before draw-
ing the second.

Problem 2.13 ‡
A public health researcher examines the medical records of a group of 937
men who died in 1999 and discovers that 210 of the men died from causes
related to heart disease. Moreover, 312 of the 937 men had at least one par-
ent who suffered from heart disease, and, of these 312 men, 102 died from
causes related to heart disease.

Determine the probability that a man randomly selected from this group
died of causes related to heart disease, given that neither of his parents
suffered from heart disease.
2 A BRIEF REVIEW OF PROBABILITY 15

Problem 2.14 ‡
An actuary is studying the prevalence of three health risk factors, denoted
by A, B, and C, within a population of women. For each of the three fac-
tors, the probability is 0.1 that a woman in the population has only this risk
factor (and no others). For any two of the three factors, the probability is
0.12 that she has exactly these two risk factors (but not the other). The
probability that a woman has all three risk factors, given that she has A
and B, is 31 .

What is the probability that a woman has none of the three risk factors,
given that she does not have risk factor A?

Problem 2.15 ‡
An auto insurance company insures drivers of all ages. An actuary compiled
the following statistics on the company’s insured drivers:
Age of Probability Portion of Company’s
Driver of Accident Insured Drivers
16 - 20 0.06 0.08
21 - 30 0.03 0.15
31 - 65 0.02 0.49
66 - 99 0.04 0.28
A randomly selected driver that the company insures has an accident.

Calculate the probability that the driver was age 16-20.

Problem 2.16 ‡
An insurance company issues life insurance policies in three separate cate-
gories: standard, preferred, and ultra-preferred. Of the company’s policy-
holders, 50% are standard, 40% are preferred, and 10% are ultra-preferred.
Each standard policyholder has probability 0.010 of dying in the next year,
each preferred policyholder has probability 0.005 of dying in the next year,
and each ultra-preferred policyholder has probability 0.001 of dying in the
next year.
A policyholder dies in the next year.

What is the probability that the deceased policyholder was ultra-preferred?

Problem 2.17 ‡
Upon arrival at a hospital’s emergency room, patients are categorized ac-
cording to their condition as critical, serious, or stable. In the past year:
16 A REVIEW OF PROBABILITY RELATED RESULTS

(i) 10% of the emergency room patients were critical;

(ii) 30% of the emergency room patients were serious;
(iii) the rest of the emergency room patients were stable;
(iv) 40% of the critical patients died;
(vi) 10% of the serious patients died; and
(vii) 1% of the stable patients died.
Given that a patient survived, what is the probability that the patient was
categorized as serious upon arrival?

Problem 2.18 ‡
A health study tracked a group of persons for five years. At the beginning
of the study, 20% were classified as heavy smokers, 30% as light smokers,
and 50% as nonsmokers.
Results of the study showed that light smokers were twice as likely as non-
smokers to die during the five-year study, but only half as likely as heavy
smokers.
A randomly selected participant from the study died over the five-year pe-
riod.

Calculate the probability that the participant was a heavy smoker.

Problem 2.19 ‡
An actuary studied the likelihood that different types of drivers would be
involved in at least one collision during any one-year period. The results of
the study are presented below.
Probability
Type of Percentage of of at least one
driver all drivers collision
Teen 8% 0.15
Young adult 16% 0.08
Midlife 45% 0.04
Senior 31% 0.05
Total 100%
Given that a driver has been involved in at least one collision in the past
year, what is the probability that the driver is a young adult driver?

Problem 2.20 ‡
A blood test indicates the presence of a particular disease 95% of the time
when the disease is actually present. The same test indicates the presence
2 A BRIEF REVIEW OF PROBABILITY 17

of the disease 0.5% of the time when the disease is not present. One percent
of the population actually has the disease.

Calculate the probability that a person has the disease given that the test
indicates the presence of the disease.
18 A REVIEW OF PROBABILITY RELATED RESULTS

3 A Review of Random Variables

Let Ω be the sample space of an experiment. Any function X : Ω −→ R
is called a random variable with support the range of X. The function
notation X(s) = x means that the random variable X assigns the value x
to the outcome s.

We consider three types of random variables: Discrete, continuous, and

mixed random variables.

A random variable is called discrete if either its support is finite or a count-

ably infinite. For example, in the experiment of rolling two dice, let X be
the random variable that adds the two faces. The support of X is the finite
set {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. An example of an infinite discrete random
variable is the random variable that counts the number of times you play
a lottery until you win. For such a random variable, the support is the set N.

A random variable is called continuous if its support is uncountable. An

example of a continuous random variable is the random variable X that
gives a randomnly chosen number between 2 and 3 inclusively. For such a
random variable the support is the interval [2, 3].

A mixed random variable is partly discrete and partly continuous. An

example of a mixed random variable is the random variable X : (0, 1) −→ R
defined by
1 − s, 0 < s < 21

X(s) = 1 1
2, 2 ≤ s < 1.
We use upper-case letters X, Y, Z, etc. to represent random variables. We
use small letters x, y, z, etc to represent possible values that the correspond-
ing random variables X, Y, Z, etc. can take. The statement X = x defines
an event consisting of all outcomes with X−measurement equal to x which
is the set {s ∈ Ω : X(s) = x}.

Example 3.1
State whether the random variables are discrete, continuous, or mixed.
(a) A coin is tossed ten times. The random variable X is the number of
heads that are noted.
(b) A coin is tossed repeatedly. The random variable X is the number of
times needed to get the first head.
(c) X : (0, 1) −→ R defined by X(s) = 2s − 1.
3 A REVIEW OF RANDOM VARIABLES 19

1
(d) X : (0, 1) −→ R defined by X(s) = 2s − 1 for 0 < s < 2 and X(s) = 1
for 12 ≤ s < 1.

Solution.
(a) The support of X is {1, 2, 3, · · · , 10}. X is an example of a finite discrete
random variable.
(b) The support of X is N. X is an example of a countably infinite discrete
random variable.
(c) The support of X is the open interval (−1, 1). X is an example of a
continuous random variable.
(d) X is continuous on (0, 21 ) and discrete on [ 21 , 1)

Because the value of a random variable is determined by the outcome of

the experiment, we can find the probability that the random variable takes
on each value. The notation Pr(X = x) stands for the probability of the
event {s ∈ Ω : X(s) = x}.

There are five key functions used in describing a random variable:

Probability Mass Function (PMF)

For a discrete random variable X, the distribution of X is described by the
probability function(pf) or the probability mass function(pmf) given
by the equation
p(x) = Pr(X = x).

That is, a probability mass function (pmf) gives the probability that a dis-
crete random variable is exactly equal to some value. Note that the domain
of the pf is the support of the corresponding random variable. The pmf can
be an equation, a table, or a graph that shows how probability is assigned
to possible values of the random variable.

Example 3.2
Suppose a variable X can take the values 1, 2, 3, or 4. The probabilities
associated with each outcome are described by the following table:

x 1 2 3 4
p(x) 0.1 0.3 0.4 0.2

Draw the probability histogram.

20 A REVIEW OF PROBABILITY RELATED RESULTS

Solution.
The probability histogram is shown in Figure 3.1

Figure 3.1

Example 3.3
A committee of m is to be selected from a group consisting of x men and y
women. Let X be the random variable that represents the number of men
in the committee. Find p(n) for 0 ≤ n ≤ m.

Solution.
For 0 ≤ n ≤ m, we have

x y
n m−n
p(n) =
x+y
m

Note that if the support of a discrete random variable is Support = {x1 , x2 , · · · }

then
p(x) ≥ 0, x ∈ Support
p(x) = 0, x 6∈ Support
Moreover, X
p(x) = 1.
x∈Support
3 A REVIEW OF RANDOM VARIABLES 21

Probability Density Function

Associated with a continuous random variable X is a nonnegative function
f (not necessarily continuous) defined for all real numbers and having the
property that for any set B of real numbers we have
Z
Pr(X ∈ B) = f (x)dx.
B

We call the function f the probability density function (abbreviated

pdf) of the random variable X.

If we let B = (−∞, ∞) = R then

Z ∞
f (x)dx = Pr[X ∈ (−∞, ∞)] = 1.
−∞

Now, if we let B = [a, b] then

Z b
Pr(a ≤ X ≤ b) = f (x)dx.
a

That is, areas under the probability density function represent probabilities
as illustrated in Figure 3.2.

Figure 3.2
Now, if we let a = b in the previous formula we find
Z a
Pr(X = a) = f (x)dx = 0.
a

It follows from this result that

Pr(a ≤ X < b) = Pr(a < X ≤ b) = Pr(a < X < b) = Pr(a ≤ X ≤ b)

and
22 A REVIEW OF PROBABILITY RELATED RESULTS

Pr(X ≤ a) = Pr(X < a) and Pr(X ≥ a) = Pr(X > a).

Example 3.4
Suppose that the function f (t) defined below is the density function of some
random variable X. −t
e t ≥ 0,
f (t) =
0 t < 0.

Compute P (−10 ≤ X ≤ 10).

Solution.

Z 10 Z 0 Z 10
P (−10 ≤ X ≤ 10) = f (t)dt = f (t)dt + f (t)dt
−10 −10 0
Z 10
10
= e−t dt = −e−t 0 = 1 − e−10
0

Cumulative Distribution Function

The cumulative distribution function (abbreviated cdf) F (t) of a ran-
dom variable X is defined as follows

F (t) = Pr(X ≤ t)

i.e., F (t) is equal to the probability that the variable X assumes values,
which are less than or equal to t.

For a discrete random variable, the cumulative distribution function is found

by summing up the probabilities. That is,
X
F (t) = Pr(X ≤ t) = p(x).
x≤t

Example 3.5
Given the following pmf

1, if x = a
p(x) =
0, otherwise.

Find a formula for F (x) and sketch its graph.

3 A REVIEW OF RANDOM VARIABLES 23

Solution.
A formula for F (x) is given by

0, if x < a
F (x) =
1, otherwise
Its graph is given in Figure 3.3

Figure 3.3
For discrete random variables the cumulative distribution function will al-
ways be a step function with jumps at each value of x that has probability
greater than 0. Note that the value of F (x) is assigned to the top of the jump.

For a continuous random variable, the cumulative distribution function is

given by Z t
F (t) = f (y)dy.
−∞
Geometrically, F (t) is the area under the graph of f to the left of t.
Example 3.6
Find the distribution functions corresponding to the following density func-
tions:
1
(a)f (x) = , −∞ < x < ∞.
π(1 + x2 )
e−x
(b)f (x) = , −∞ < x < ∞.
(1 + e−x )2
Solution.
(a)
Z x x
1 1
F (x) = 2
dy = arctan y
−∞ π(1 + y ) π −∞
1 1 −π 1 1
= arctan x − · = arctan x + .
π π 2 π 2
24 A REVIEW OF PROBABILITY RELATED RESULTS

(b)
x
e−y
Z
F (x) = −y 2
dy
−∞ (1 + e )
x
1 1
= −y
=
1+e −∞ 1 + e−x

Next, we list the properties of the cumulative distribution function F (x) for
any random variable X.

Theorem 3.1
The cumulative distribution function of a random variable X satisfies the
following properties:
(a) 0 ≤ F (x) ≤ 1.
(b) F (x) is a non-decreasing function, i.e. if a < b then F (a) ≤ F (b).
(c) F (x) → 0 as x → −∞ and F (x) → 1 as x → ∞.
(d) F is right-continuous.

In addition to the above properties, a continuous random variable satisfies

these properties:
(e) F 0 (x) = f (x).
(f) F (x) is continuous.

For a discrete random variable with support {x1 , x2 , · · · , } we have

p(xi ) = F (xi ) − F (xi−1 ). (3.1)

Example 3.7
If the distribution function of X is given by

 0 x<0
1


 0≤x<1
 16

5

F (x) = 16 1 ≤ x < 2
11

 16 2 ≤ x < 3
15
3≤x<4


 16


1 x≥4

find the pmf of X.

Solution.
1
Using 3.1, we get p(0) = 16 , p(1) = 14 , p(2) = 38 , p(3) = 14 , and p(4) = 1
16 and
3 A REVIEW OF RANDOM VARIABLES 25

0 otherwise

The Survival Distribution Function of X

The survival function (abbreviated SDF), also known as a reliability
function is a property of any random variable that maps a set of events,
usually associated with mortality or failure of some system, onto time. It
captures the probability that the system will survive beyond a specified time.
Thus, we define the survival distribution function by

S(x) = Pr(X > x) = 1 − F (x).

It follows from Theorem 3.1, that any random variable satisfies the prop-
erties: S(−∞) = 1, S(∞) = 0, S(x) is right-continuous, and that S(x) is
nonincreasing.

For a discrete random variable, the survival function is given by

X
S(x) = p(t)
t>x

and for a continuous random variable, we have

Z ∞
S(x) = f (t)dt.
x

Note that S 0 (t) = −f (t).

Remark 3.1
For a discrete random variable, the survival function need not be left-
continuous, that is, it is possible for its graph to jump down. When it
jumps, the value is assigned to the bottom of the jump.

Example 3.8 ‡
For watches produced by a certain manufacturer:
(i) Lifetimes follow a single-parameter Pareto distribution with α¿ 1 and
θ = 4.
(ii) The expected lifetime of a watch is 8 years.
Calculate the probability that the lifetime of a watch is at least 6 years.

Solution.
From Table C, we have
αθ 4α
E(X) = = = 8 =⇒ α = 2.
α−1 α−1
26 A REVIEW OF PROBABILITY RELATED RESULTS

Also, from Table C, we have

α
θ
F (x) = 1 − =⇒ F (6) = 0.555.
x

Hence, S(6) = 1 − F (6) = 1 − 0.555 = 0.444

The Hazard Rate Function

The hazard rate function, also known as the force of mortality or the
failue rate function, is defined to be the ratio of the density and the
survival functions:
f (x) f (x)
h(x) = hX (x) = = .
S(x) 1 − F (x)

Example 3.9
Show that
S 0 (x) d
h(x) = − = − [ln S(x)]. (3.2)
S(x) dx

Solution.
S 0 (x)
The equation follows from f (x) = −S 0 (x) and d
dx [ln S(x)] = S(x)

Example 3.10
Find the hazard rate function of a random variable with pdf given by f (x) =
e−ax , a > 0.

Solution.
We have
f (x) ae−ax
h(x) = = −ax = a
S(x) e

Example 3.11
Let X be a random variable with support [0, ∞). Show that

S(x) = e−Λ(x)

where Z x
Λ(x) = h(s)ds.
0

Λ(x) is called the cumulative hazard function

3 A REVIEW OF RANDOM VARIABLES 27

Solution.
Integrating equation (3.2) from 0 to x, we have
Z x Z x
d
h(s)ds = − [ln S(s)]ds = ln S(0)−ln S(x) = ln 1−ln S(x) = − ln S(x).
0 0 ds

Now the result follows upon exponentiation

Some Additional Results

For a given event A, we define the indicator of A by

1, x ∈ A
IA (x) =
0, x 6∈ A.

Let X and Y be two random variables. It is proven in advanced probability

theory that the conditional expectation E(X|Y ) is given by the formula

E(XIY )
E(X|Y ) = . (3.3)
Pr(Y )

Example 3.12
Let X and Y be two random variables. Find a formula of E[(X −d)k |X > d]
in the
(a) discrete case
(b) continuous case.

Solution.
(a) We have

− d)k p(xj )
P
k xj >d (x
E[(X − d) |X > d] = .
Pr(X > d)

(b) We have
Z ∞
k 1
E[(X − d) |X > d] = (x − d)k fX (x)dx
Pr(X > d) d

If Ω = A ∪ B then for any random variable X on Ω, we have by the double

expectation property1

E(X) = E(X|A)Pr(A) + E(X|B)Pr(B).

1
See Section 37 of [2]
28 A REVIEW OF PROBABILITY RELATED RESULTS

Probabilities can be found by conditioning:

X
Pr(A) = Pr(A|Y = y)Pr(Y = y)
y

in the discrete case and

Z ∞
Pr(A) = Pr(A|Y = y)fY (y)dy
−∞

in the continuous case.

Weighted mean and variance

Given a set of data x1 , x2 , · · · , xn with weights w1 , w2 , · · · , wn . The weighted
mean is given by
w1 x 1 + · · · + x n wn
X=
w1 + · · · + wn
and the weighted variance is
n
1 X
Var(X) = Pn wi (xi − X)2 .
w
i=1 i − 1
i=1

Example 3.13
You are given the following information

xi 0 1 2 3 4 5
wi 512 307 123 41 11 6

Determine the weighted variance.

Solution.
The weighted mean is

0(512) + 1(307) + 2(123) + 3(41) + 4(11) + 5(6)

X= = 0.75.
512 + 307 + 123 + 41 + 11 + 6
The weighted variance is
1
Var(X) = [512(0 − 0.75)2 + 307(1 − 0.75)2 + 123(2 − 0.75)2 + 41(3 − 0.75)2
1000 − 1
+11(4 − 0.75)2 + 6(5 − 0.75)2 ] = 0.93243
3 A REVIEW OF RANDOM VARIABLES 29

Practice Problems

Problem 3.1
State whether the random variables are discrete, continuous, or mixed.

(a) In two tossing of a coin, let X be the number of heads in the two tosses.
(b) An urn contains one red ball and one green ball. Let X be the number
of picks necessary in getting the first red ball.
(c) X is a random number in the interval [4, 7].
(d) X : R −→ R such that X(s) = s if s is irrational and X(s) = 1 if s is
rational.

Problem 3.2
Toss a pair of fair dice. Let X denote the sum of the dots on the two faces.

Find the probability mass function.

Problem 3.3
Consider the random variable X : {S, F } −→ R defined by X(S) = 1 and
X(F ) = 0. Suppose that p = Pr(X = 1).

Find the probability mass function of X.

Problem 3.4 ‡
The loss due to a fire in a commercial building is modeled by a random
variable X with density function

0.005(20 − x) 0 < x < 20
f (x) =
0 otherwise.

Given that a fire loss exceeds 8, what is the probability that it exceeds 16 ?

Problem 3.5 ‡
The lifetime of a machine part has a continuous distribution on the interval
(0, 40) with probability density function f, where f (x) is proportional to
(10 + x)−2 .

Calculate the probability that the lifetime of the machine part is less than
6.
30 A REVIEW OF PROBABILITY RELATED RESULTS

Problem 3.6 ‡
A group insurance policy covers the medical claims of the employees of a
small company. The value, V, of the claims made in one year is described
by
V = 100000Y
where Y is a random variable with density function

k(1 − y)4 0 < y < 1

f (x) =
0 otherwise

where k is a constant.

What is the conditional probability that V exceeds 40,000, given that V

exceeds 10,000?

Problem 3.7 ‡
An insurance policy pays for a random loss X subject to a deductible of
C, where 0 < C < 1. The loss amount is modeled as a continuous random
variable with density function

2x 0 < x < 1
f (x) =
0 otherwise.

Given a random loss X, the probability that the insurance payment is less
than 0.5 is equal to 0.64 .

Calculate C.

Problem 3.8
Let X be a continuous random variable with pdf

αxe−x , x > 0

f (x) =
0, x ≤ 0.

Determine the value of α.

Problem 3.9
Consider the following probability distribution

x 1 2 3 4
p(x) 0.25 0.5 0.125 0.125
3 A REVIEW OF RANDOM VARIABLES 31

Find a formula for F (x) and sketch its graph.

Problem 3.10
Find the distribution functions corresponding to the following density func-
tions:

a−1
(a)f (x) = , 0 < x < ∞, 0 otherwise.
(1 + x)a
α
(b)f (x) =kαxα−1 e−kx , k > 0, α, 0 < x < ∞, 0 otherwise.

Problem 3.11
Let X be a random variable with pmf
1 2 n

p(n) = , n = 0, 1, 2, · · · .
3 3

Find a formula for F (n).

Problem 3.12
Given the pdf of a continuous random variable X.
1 −x
5e
5 if x ≥ 0
f (x) =
0 otherwise.

(a) Find Pr(X > 10).

(b) Find Pr(5 < X < 10).
(c) Find F (x).

Problem 3.13
A random variable X has the cumulative distribution function
ex
F (x) = .
ex + 1

Find the probability density function.

Problem 3.14
Consider an age-at-death random variable X with survival distribution de-
fined by
1 1
S(x) = (100 − x) 2 , 0 ≤ x ≤ 100.
10
32 A REVIEW OF PROBABILITY RELATED RESULTS

(a) Explain why this is a suitable survival function.

(b) Find the corresponding expression for the cumulative probability func-
tion.
(c) Compute the probability that a newborn with survival function defined
above will die between the ages of 65 and 75.

Problem 3.15
Consider an age-at-death random variable X with survival distribution de-
fined by
S(x) = e−0.34x , x ≥ 0.

Compute Pr(5 < X < 10).

Problem 3.16
Consider an age-at-death random variable X with survival distribution S(x) =
x2
1 − 100 for x ≥ 0.

Find F (x).

Problem 3.17
Consider an age-at-death random variable X. The survival distribution is
x
given by S(x) = 1 − 100 for 0 ≤ x ≤ 100 and 0 for x > 100.

(a) Find the probability that a person dies before reaching the age of 30.
(b) Find the probability that a person lives more than 70 years.

Problem 3.18
An age-at-death random variable has a survival function
1 1
S(x) = (100 − x) 2 , 0 ≤ x ≤ 100
10
and 0 otherwise.

Find the hazard rate function of this random variable.

Problem 3.19
Consider an age-at-death random variable X with force of mortality h(x) =
µ > 0.

Find S(x), f (x), and F (x).

3 A REVIEW OF RANDOM VARIABLES 33

Problem 3.20
Let x 61
F (x) = 1 − 1 − , 0 ≤ x ≤ 120.
120

Find h(40).
34 A REVIEW OF PROBABILITY RELATED RESULTS

4 Raw and Central Moments

Several quantities can be computed from the pdf that describe simple charac-
teristics of the distribution. These are called moments. The most common
is the mean, the first moment about the origin, and the variance, the second
moment about the mean. The mean is a measure of the centrality of the
distribution and the variance is a measure of the spread of the distribution
about the mean.

The nth moment = µ0n = E(X n ) of a random variable X is also known

as the nth moment about the origin or the nth raw moment. For a
continuous random variable X we have
Z ∞
µ0n = xn f (x)dx
−∞

and for a discrete random variable we have

X
µ0n = xn p(x).
x

By contrast, the quantity µn = E[(X − E(X))n ] is called the nth central

moment of X or the nth moment about the mean. For a continuous
random variable X we have
Z ∞
µn = (x − E(X))n f (x)dx
−∞

and for a discrete random variable we have

X
µn = (x − E(X))n p(x).
x

Note that Var(X) is the second central moment of X.

Example 4.1
Let X be a continuous random variable with pdf given by f (x) = 83 x2 for
0 < x < 2 and 0 otherwise. Find the second central moment of X.

Solution.
We first find the mean of X. We have

3 4 2
Z 2 Z 2
3 3
E(X) = xf (x)dx = x dx = x = 1.5.
0 0 8 32 0
4 RAW AND CENTRAL MOMENTS 35

The second central moment is

Z 2
µ2 = (x − 1.5)2 f (x)dx
0
Z 2
3 2
= x (x − 1.5)2 dx
0 8
2
3 x5

= − 0.75x4 + 0.75x3 = 0.15
8 5 0

The importance of moments is that they are used to define quantities that
characterize the shape of a distribution. These quantities which will be dis-
cussed below are: skewness, kurtosis and coefficient of variation.

Departure from Normality: Coefficient of Skewness

The third central moment, µ3 , is called the skewness and is a measure of
the symmetry of the pdf. A distribution, or data set, is symmetric if it looks
the same to the left and right of the mean.

A measure of skewness is given by the coefficient of skewness γ1 :

µ3 E(X 3 ) − 3E(X)E(X 2 ) + 2[E(X)]3

γ1 = = 3 .
σ3 [E(X 2 ) − E(X)2 ] 2

That is, γ1 is the ratio of the third central moment to the cube of the
standard deviation. Equivalently, γ1 is the third central moment of the
standardized variable
X −µ
X∗ = .
σ
If γ1 is close to zero then the distribution is symmetric about its mean such
as the normal distribution. A positively skewed distribution has a “tail”
which is pulled in the positive direction. A negatively skewed distribution
has a “tail” which is pulled in the negative direction (see Figure 4.1).

Figure 4.1
36 A REVIEW OF PROBABILITY RELATED RESULTS

Example 4.2
A random variable X has the following pmf:

x 120 122 124 150 167 245

1 1 1 1 1 1
p(x) 4 12 6 12 12 3

Find the coefficient of skewness of X.

Solution.
We first find the mean of X :
1 1 1 1 1 1 2027
µ = E(X) = 120× +122× +124× +150× +167× +245× = .
4 12 6 12 12 3 12
The second raw moment is
1 1 1 1 1 1 379325
E(X 2 ) = 1202 × +1222 × +1242 × +1502 × +1672 × +2452 × = .
4 12 6 12 12 3 12
Thus, the variance of X is

379325 4108729 443171

Var(X) = − =
12 144 144
and the standard deviation is
r
443171
σ= = 55.475908183.
144
The third central moment is

2027 3 1 2027 3 2027 3 1

1
µ3 = 120 − × + 122 − × + 124 − ×
12 4 12 12 12 6
3 3 3
2027 1 2027 1 2027 1
+ 150 − × + 167 − × + 245 − ×
12 12 12 12 12 3
=93270.81134.

Thus,
93270.81134
γ1 = = 0.5463016252
55.4759081833

Example 4.3
Let X be a random variable with density f (x) = e−x on (0, ∞) and 0
otherwise. Find the coefficient of skewness of X.
4 RAW AND CENTRAL MOMENTS 37

Solution.
Since
Z ∞ ∞
E(X) = xe−x dx = −e−x (1 + x)0 = 1
Z0 ∞
∞
E(X 2 ) = x2 e−x dx = −e−x (x2 + 2x + 2)0 = 2
Z0 ∞
∞
3
E(X ) = x3 e−x dx = −e−x (x3 + 3x2 + 6x + 6)0 = 6
0

we find
6 − 3(1)(2) + 2(1)3
γ1 = 3 =2
(2 − 12 ) 2
Coefficient of Kurtosis
The fourth central moment, µ4 , is called the kurtosis and is a measure of
peakedness/flatness of a distribution with respect to the normal distribution.

A measure of kurtosis is given by the coefficient of kurtosis:

E[(X − µ)4 ] E(X 4 ) − 4E(X 3 )E(X) + 6E(X 2 )[E(X)]2 − 3[E(X)]4
γ2 = = .
σ4 [E(X 2 ) − (E(X))2 ]2
The coefficient of kurtosis of the normal distribution is 3. The condition
γ2 < 3 indicates that the distribution is flatter compared to the normal
distribution, and the condition γ2 > 3 indicates a higher peak (relative to
the normal distribution) around the mean value.(See Figure 4.2)

Figure 4.2
38 A REVIEW OF PROBABILITY RELATED RESULTS

Example 4.4
A random variable X has the following pmf:

x 120 122 124 150 167 245

1 1 1 1 1 1
p(x) 4 12 6 12 12 3

Find the coefficient of kurtosis of X.

Solution.
We first find the fourth central moment.

2027 4 1 2027 4 2027 4 1

1
µ4 = 120 − × + 122 − × + 124 − ×
12 4 12 12 12 6
4 4 4
2027 1 2027 1 2027 1
+ 150 − × + 167 − × + 245 − ×
12 12 12 12 12 3
=13693826.62.

Thus,
13693826.62
γ2 = = 1.44579641
55.4759081834
Example 4.5
Find the coefficient of kurtosis of the random variable X with density func-
tion f (x) = 1 on (0, 1) and 0 elsewhere.

Solution.
Since Z 1
1
E(X k ) = xk dx = .
0 k+1
we obtain,
1 1
1
1
1 2 1 4

5 −4 4 2 +6 3 2 −3 2 9
γ2 = =
1 1 2 5

3 − 4

Coefficient of Variation
Some combinations of the raw moments and central moments that are also
commonly used. One such combination is the coefficient of variation,
denoted by CV (X), of a random variable X which is defined as the ratio of
the standard deviation to the mean:
σ
CV (X) = , µ = µ01 = E(X).
µ
4 RAW AND CENTRAL MOMENTS 39

It is an indication of the size of the standard deviation relative to the mean,

for the given random variable.

Often the coefficient of variation is expressed as a percentage. Thus, it

expresses the standard deviation as a percentage of the sample mean and
it is unitless. Statistically, the coefficient of variation is very useful when
comparing two or more sets of data that are measured in different units of
measurement.
Example 4.6
Let X be a random variable with mean of 4 meters and standard deviation
of 0.7 millimeters. Find the coefficient of variation of X.
Solution.
The coefficient of variation is
0.7
CV (X) = = 0.0175%
4000
Example 4.7
A random variable X has the following pmf:
x 120 122 124 150 167 245
1 1 1 1 1 1
p(x) 4 12 6 12 12 3
Find the coefficient of variation of X.
Solution.
We know that µ = 2027 12 = 168.9166667 and σ = 55.47590818. Thus, the
coefficient of variation of X is
55.47590818
CV (X) = = 0.3284217754
168.9166667
Example 4.8
Find the coefficient of variation of the random variable X with density func-
tion f (x) = e−x on (0, ∞) and 0 otherwise.
Solution.
We have Z ∞ ∞
µ = E(X) = xe−x dx = −e−x (1 + x)0 = 1
0
and
1 1
σ = (E(X 2 ) − (E(X))2 ) 2 = (2 − 1) 2 = 1.
Hence,
CV (X) = 1
40 A REVIEW OF PROBABILITY RELATED RESULTS

Practice Problems
Problem 4.1
Consider n independent trials. Let X denote the number of successes in n
trials. We call X a binomial random variable. Its pmf is given by
p(r) = C(n, r)pr (1 − p)n−r
where p is the probability of a success.

(a) Show that E(X) = np and E[X(X − 1)] = n(n − 1)p2 . Hint: (a + b)n =
P n k n−k .
k=0 C(n, k)a b
(b) Find the variance of X.
Problem 4.2
A random variable X is said to be a Poisson random variable with param-
eter λ > 0 if its probability mass function has the form
λk
p(k) = e−λ , k = 0, 1, 2, · · ·
k!
where λ indicates the average number of successes per unit time or space.

(a) Show that E(X) = λ and E[X(X − 1)] = λ2 .

(b) Find the variance of X.
Problem 4.3
A geometric random variable with parameter p, 0 < p < 1 has a probability
mass function
p(n) = p(1 − p)n−1 , n = 1, 2, · · · .
(a) By differentiating the geometric series ∞ n 1
P
n=0 x = 1−x twice and using
x = 1 − p is each equation, show that
P∞ n−1 = p−2 and
P∞ n−2 = 2p−3 .
n=1 n(1 − p) n=1 n(n − 1)(1 − p)
(b) Show that E(X) = p1 and E[X(X − 1)] = 2p−2 (1 − p).
(c) Find the variance of X.
Problem 4.4
A normal random variable with parameters µ and σ 2 has a pdf
1 (x−µ)2
f (x) = √ e− 2σ2 , − ∞ < x < ∞.
2πσ

Show that E(X) = µ and Var(X) = σ 2 . Hint: E(Z) = E X−µ σ = 0 where
Z is the standard normal distribution with parameters (0,1).
4 RAW AND CENTRAL MOMENTS 41

Problem 4.5
An exponential random variable with parameter λ > 0 is a random variable
with pdf
λe−λx if x ≥ 0

f (x) =
0 if x < 0
1 2
(a) Show that E(X) = λ and E(X 2 ) = λ2
.
(b) Find Var(X).

Problem 4.6
A Gamma random variable with parameters α > 0 and θ > 0 has a pdf
(
1 α−1 e− xθ if x ≥ 0
θ α Γ(α) x
f (x) =
0 if x < 0

where Z ∞
e−y y α−1 dy = Γ(α) = αΓ(α − 1).
0
Show:
(a) E(X) = αθ
(b) V ar(X) = αθ2 .

Problem 4.7
Let X be a continuous random variable with pdf given by f (x) = 83 x2 for
0 ≤ x ≤ 2 and 0 otherwise.

Find the third raw moment of X.

Problem 4.8
A random variable X has the following pmf:
x 120 122 124 150 167 245
1 1 1 1 1 1
p(x) 4 12 6 12 12 3

Find the fourth raw moment.

Problem 4.9
A random variable X has the following pmf:
x 120 122 124 150 167 245
1 1 1 1 1 1
p(x) 4 12 6 12 12 3

Find the fifth central moment of X.

42 A REVIEW OF PROBABILITY RELATED RESULTS

Problem 4.10
Compute the coefficient of skewness of a uniform random variable, X, on
[0, 1].

Problem 4.11
Let X be a random variable with density f (x) = e−x and 0 otherwise.

Find the coefficient of kurtosis.

Problem 4.12
A random variable X has the following pmf:
x 120 122 124 150 167 245
1 1 1 1 1 1
p(x) 4 12 6 12 12 3

Find the coefficient of variation.

Problem 4.13
Let X be a continuous random variable with density function f (x) = Axb e−Cx
for x ≥ 0 and 0 otherwise. The parameters A, B, and C satisfy
1 1
A = R∞ , B ≥ − , C > 0.
0 xB e−Cx dx 2

Show that
B+r
E(X n ) = E(X n−1 ).
C
Problem 4.14
Let X be a continuous random variable with density function f (x) = Axb e−Cx
for x ≥ 0 and 0 otherwise. The parameters A, B, and C satisfy
1 1
A = R∞ , B ≥ − , C > 0.
0 xB e−Cx dx 2

Find the first and second raw moments.

Problem 4.15
Let X be a continuous random variable with density function f (x) = Axb e−Cx
for x ≥ 0 and 0 otherwise. The parameters A, B, and C satisfy
1 1
A = R∞ , B ≥ − , C > 0.
0 xB e−Cx dx 2

Find the coefficient of skewness.

4 RAW AND CENTRAL MOMENTS 43

Problem 4.16
Let X be a continuous random variable with density function f (x) = Axb e−Cx
for x ≥ 0 and 0 otherwise. The parameters A, B, and C satisfy

1 1
A = R∞ , B ≥ − , C > 0.
0 xB e−Cx dx 2

Find the coefficient of kurtosis.

Problem 4.17
You are given: E(X) = 2, CV (X) = 2, and µ03 = 136. Calculate γ1 .

Problem 4.18
Let X be a random variable with pdf f (x) = 0.005x for 0 ≤ x ≤ 20 and 0
otherwise.

(a) Find the cdf of X.

(b) Find the mean and the variance of X.
(c) Find the coefficient of variation.

Problem 4.19
Let X be the Gamma random variable with pdf f (x) = 1 α−1 e− xθ
θα Γ(α) x for
x > 0 and 0 otherwise. Suppose E(X) = 8 and γ1 = 1.

Find the variance of X.

Problem 4.20
Let X be a Pareto random variable in one parameter and with a pdf
a
f (x) = xa+1 , x ≥ 1 and 0 otherwise.

a
(a) Show that E(X k ) = a−k for 0 < k < a.
(b) Find the coefficient of variation of X.

Problem 4.21
For the random variable X you are given:
(i) E(X) = 4
(ii) Var(X) = 64
(iii) E(X 3 ) = 15.

Calculate the skewness of X.

44 A REVIEW OF PROBABILITY RELATED RESULTS

Problem 4.22
Let X be a Pareto random variable with two parameters α and θ, i.e., X
has the pdf
αθα
f (x) = , α > 1, θ > 0, x > 0
(x + θ)α+1
and 0 otherwise.

Calculate the mean and the variance of X.

Problem 4.23
Let X be a Pareto random variable with two parameters α and θ, i.e., X
has the pdf
αθα
f (x) = , α > 1, θ > 0, x > 0
(x + θ)α+1
and 0 otherwise.

Calculate the coefficient of variation.

Problem 4.24
Let X be the Gamma random variable with pdf f (x) = 1 α−1 e− xθ
θα Γ(α) x for
x > 0 and 0 otherwise. Suppose CV (X) = 1.

Determine γ1 .

Problem 4.25
You are given the following times of first claim for five randomly selected
auto insurance policies observed from time t = 0 :

1 2 3 4 5

Calculate the kurtosis of this sample.

5 EMPIRICAL MODELS, EXCESS AND LIMITED LOSS VARIABLES45

5 Empirical Models, Excess and Limited Loss vari-

ables
Empirical models are those that are based entirely on data. Consider a
statistical model that results in a sample Ω of size n. Data points in the
sample are assigned equal probability of n1 . Let X be a random variable
defined on Ω. We refer to this model as an empirical model.

Example 5.1 ‡
You are given the following for a sample of five observations from a bivariate
distribution:
(i)

x y
1 4
2 2
4 3
5 6
6 4

(ii) x = 3.6 and y = 3.8.

A is the covariance of the empirical distribution Fe as defined by these
five observations. B is the maximum possible covariance of an empirical
distribution with identical marginal distributions to Fe .
Determine B − A.

Solution.
We have

A =Cov(X, Y ) = E(XY ) − E(X)E(Y )

4 + 4 + 12 + 30 + 24
= − 3.6(3.8) = 1.12.
5

Now, since E(X) and E(Y ) are fixed, we want to create a new bivariate
distribution from the given one with maximum E(XY ). Clearly, this occurs
if largest values of X are paired with largest values of Y. Hence, the following
bivariate distribution has the same marginal distributions as the original
bivariate distribution:
46 A REVIEW OF PROBABILITY RELATED RESULTS

x y
6 6
5 4
4 4
2 3
1 2

The covariance of this distribution is

36 + 20 + 16 + 6 + 2
B= − 3.6(3.8) = 2.32.
5
The final answer is B − A = 2.32 − 1.12 = 1.2

Example 5.2 ‡
You are given the following graph of cumulative distribution functions:

Determine the difference between the mean of the lognormal model and
the mean of the data.

Solution.
The empirical distribution is given by

p(10) =F (10) − F (0) = 0.20 − 0 = 0.20

p(100) =F (100) − F (10) = 0.60 − 0.2 = 0.4
p(1000) =F (1000) − F (100) = 1 − 0.6 = 0.4.
5 EMPIRICAL MODELS, EXCESS AND LIMITED LOSS VARIABLES47

The mean of the data is

0.2(10) + 0.4(100) + 0.4(1000) = 442.

Now, from the graph we see that the 20th and 60th percentiles of the log-
normal distribution are 10 and 100 respectively. That is,

0.2 = Φ ln 10−µ
σ and 0.6 = Φ ln 100−µ
σ

Using the table of standard normal distribution, we find

ln 10−µ ln 100−µ
−0.84 = σ and 0.25 = σ

Solving this system, we find µ = 4.0771 and σ = 2.1125. Thus, the mean of
the lognormal distribution is (Table C)
2 2)
eµ+0.5σ = e4.0771+0.5(2.1125 = 549.18.

The final answer is 549.18 − 442 = 107.18

Example 5.3
In a fitness club monthly new memberships are recorded in the table below.
January February March April May June
100 102 84 84 100 100

July August September October November December

67 45 45 45 45 93

Use an empirical model to construct a discrete probability mass function for

X, the number of new memberships per month.

Solution.
The sample under consideration has 12 data points which are the months of
the year. For our empirical model, each data point is assigned a probability
1
of 12 . The pmf of the random variable X is given by:

x 45 67 84 93 100 102
1 1 1 1 1 1
p(x) 3 12 6 12 4 12

Excess Loss Random Variable

Consider an insurance policy with a deductible. The insurer’s interest are
usually in the losses that resulted in a payment, and the actual amount
48 A REVIEW OF PROBABILITY RELATED RESULTS

paid by the insurance. The insurer pays the insuree the amount of the loss
that was in excess of the deductible2 . Any amount of losses that are be-
low the deductible are ignored by the insurance since they do not result in
an insurance payment being made. Hence, the insurer would be consider-
ing the conditional distribution of amount paid, given that a payment was
actually made. This is what is referred to as the excess loss random variable.

Let X be the random variable representing the amount of a single loss. In

insurance terms, X is known as the loss random variable or the severity
random variable. For a given threshold d such that Pr(X > d) > 0, the
random variable

P undefined, X ≤ d
Y = (X − d|X > d) =
X − d, X>d

is called the excess loss variable, the cost per payment, or the left
truncated and shifted variable. It stands for the amount paid by the
insurance which is also known as claim amount.

We can find the k th moment of the excess loss variable as follows. For a
continuous distribution with probability density function f (x) and cumula-
tive distribution function F (x), we have3
Z ∞
k k 1
eX (d) =E[(X − d) |X > d] = (x − d)k f (x)dx
Pr(X > d) d
Z ∞
1
= (x − d)k f (x)dx
1 − F (d) d

provided that the improper integral is convergent.

For a discrete distribution with probability density function p(x) and a cu-
mulative distribution function F (x), we have

1 X
ekX (d) = (xj − d)k p(xj )
1 − F (d)
xj >d

provided that the sum is convergent.

2
The deductible is referred to as ordinary deductible. Another type of deductible is
called franchise deductible and will be discussed in Section 32.
3
See (3.3)
5 EMPIRICAL MODELS, EXCESS AND LIMITED LOSS VARIABLES49

When k = 1, the expected value

eX (d) = E(Y P ) = E(X − d|X > d)

is called the mean excess loss function. Other names used have been
mean residual life function and complete expectation of life.

If X denotes payment, then eX (d) stands for the expected amount paid
given that there has been a payment in excess of the deductible d. If X de-
notes age at death, then eX (d) stands for the expected future lifetime given
that the person is alive at age d.

Example 5.4
Show that for a continuous random variable X, we have
Z ∞ Z ∞
1 1
eX (d) = (1 − F (x))dx = S(x)dx.
1 − F (d) d S(d) d

Solution.
Using integration by parts with u = x − d and v 0 = f (x), we have
Z ∞
1
eX (d) = (x − d)f (x)dx
1 − F (d) d
(x − d)(1 − F (x)) ∞
Z ∞
1
=− + 1 − F (d) (1 − F (x))dx
1 − F (d) d d
Z ∞ Z ∞
1 1
= (1 − F (x))dx = S(x)dx.
1 − F (d) d S(d) d
Note that
Z ∞ Z ∞
0 ≤ xS(x) = x f (t)dt ≤ tf (t)dt =⇒ lim xS(x) = 0
x x x→∞

Example 5.5
Let X be an excess loss random variable with pdf given by f (x) = 31 (1 +
2x)e−x for x > 0 and 0 otherwise. Calculate the mean excess loss function
with deductible amount x.

Solution.
The cdf of X is given by
x
e−x (2x + 3)
Z x
1 −t 1 −t
F (x) = (1 + 2t)e dt = − e (2t + 3) = 1 −
0 3 3 0 3
50 A REVIEW OF PROBABILITY RELATED RESULTS

where we used integration by parts. The mean excess loss function is

∞ e−x (2x+5)
e−t (2t + 3)
Z
1 3 2x + 5
eX (x) = e−x (2x+3)
1− dt = e−x (2x+3)
=
x 3 2x + 3
3 3

Example 5.6 ‡
For an industry-wide study of patients admitted to hospitals for treatment
of cardiovascular illness in 1998, you are given:
(i)

Duration In Days Number of Patients

Remaining Hospitalized
0 4,386,000
5 1,461,554
10 486,739
15 161,801
20 53,488
25 17,384
30 5,349
35 1,337
40 0

(ii) Discharges from the hospital are uniformly distributed between the du-
rations shown in the table.
Calculate the mean residual time remaining hospitalized, in days, for a pa-
tient who has been hospitalized for 21 days.

Solution.
Let X denote the number of days at the hospital measured from time 0. We
are asked to find E(X − 21|X > 21) which by Example 5.4 can be expressed
as Z ∞
SX (x)
E(X − 21|X > 21) = dx.
21 S X (21)

In life contingency theory, the assumption that discharges are uniform on a

given interval means that the graph of survival function is a linear function
on that interval. Hence, the graph of SSXX(21)
(x)
consists of line segments on the
intervals [0, 5], [5, 10], etc. Thus, E(X −21|X > 21) is just the area under the
graph from 21 to 40. The area under the graph is just the sum of areas of
5 EMPIRICAL MODELS, EXCESS AND LIMITED LOSS VARIABLES51

the trapezoids with bases [21, 25], [25, 30], [30, 35], and [35, 40]. For instance,
the area under the first trapezoid is

1 SX (21) SX (25)
(25 − 21) + .
2 SX (21) SX (21)

Again, from the theory of life contingencies (see [3]), we have

`x
SX (x) =
`0
where `0 is the number of patients at the hospital at time 0 and `x is the
expected number of patients in the hospital at time x. Thus,

SX (x) `x
= .
SX (21) `21

By linear interpolation, we have

53488 − 17384
`21 = 53, 488 − (21 − 20) = 46, 267.2.
25 − 20
Now, we go back to finding the area of the first trapezoid mentioned above,
we find

1 SX (21) SX (25) 1 17384
(25 − 21) + = (4) 1 + = 2.751.
2 SX (21) SX (21) 2 46, 267.20

We repeat the same calculation with the remaining three trapezoids, we find

E(X − 21|X > 21) = 2.751 + 1.228 + 0.361 + 0.072 = 4.412

Example 5.7
Show that
FX (y + d) − FX (d)
FY P (y) = .
1 − FX (d)

Solution.
We have
Pr(d < X ≤ y + d)
FY P (y) =Pr(Y P ≤ y) = Pr(X − d ≤ y|X > d) =
Pr(X > d)
FX (y + d) − FX (d)
=
1 − FX (d)
52 A REVIEW OF PROBABILITY RELATED RESULTS

Left-Censored and Shifted Random Variable

Note that in the excess loss situation, losses below or at the value d are not
recorded in any way, that is, the excess loss random variable is left-truncated
and it is shifted because of a number subtracted from X. However, when
small losses at or below d are recorded as 0 then we are led to a new random
variable which we call a left-censored and shifted random variable or the
cost per loss random variable. It is defined as

0, X≤d
Y L = (X − d)+ =
X − d, X > d.

The k th -moment of this random variable is given by

Z ∞
E[(X − d)k+ ] = (x − d)k f (x)dx
d

in the continuous case and

X
E[(X − d)k+ ] = (xj − d)k p(xj )
xj >d

in the discrete case. Note the relationship between the moments of Y P and
Y L given by
E[(X − d)k+ ] = ekX (d)[1 − F (d)] = ekX (d)S(d).
Setting k = 1 and using the formula for eX (d) we see that
Z ∞
E(Y L ) = S(x)dx.
d

This expected value is sometimes called the stop loss premium.4

We can think of the excess loss random variable as of a random variable

that exists only if a payment has been made. Alternatively, the left cen-
sored and shifted random variable is equal to zero every time a loss does not
produce payment.

Example 5.8
For a house insurance policy, the loss amount (expressed in thousands), in
the event of a fire, is being modeled by a distribution with density
3
f (x) = x(5 − x), 0 < x < 4.
56
4
See Section 39.
5 EMPIRICAL MODELS, EXCESS AND LIMITED LOSS VARIABLES53

For a policy with a deductible amount of $1,000, calculate the expected

amount per loss.

Solution.
We first calculate the survival function:
Z x
3 3 5 2 1 3
S(x) = 1 − F (x) = 1 − t(5 − t)dt = 1 − x − x .
0 56 56 2 3

Thus,
Z 4
L 3 5 2 1 3
E(Y ) = 1− x − x dx = 1325.893
1 56 2 3

Note that Y L is a mixed random variable. The discrete part is represented

by Y L = 0 for X ≤ d where

pY L (0) = Pr(Y L = 0) = Pr(X ≤ d) = FX (d) = FX (d) − FX (d− ).

The continuous part of the distribution of Y L is given by

fY L (y) = fX (y + d)

for y > 0 since Y L = y for X = y + d.

Limited Loss Variable or Policy Limits

Many insurance policies are covered up to a certain limit which we refer
to as policy limit. Let’s say the limit is u. That is, the insurer covers all
losses up to u fully but pays u for losses greater than u. Thus, if X is a loss
random variable then the amount paid by the insurer is X ∧ u. We call X ∧ u
the limited loss variable and is defined by

X, X ≤ u
X ∧ u = min(X, u) =
u, X > u.

Notice that the distribution of X is censored on the right and that is why the
limit loss variable is also known as the right-censored random variable.

The expected value of the limited loss value is E(X ∧ u) and is called the
limited expected value.

For a discrete distribution with probability density function p(xj ) and a

54 A REVIEW OF PROBABILITY RELATED RESULTS

cumulative distribution function F (xj ) for all relevant index values j, the
k th moment of the limited loss random variable is given by
X
E[(X ∧ u)k ] = xkj p(xj ) + uk [1 − F (u)].
xj ≤u

For a continuous distribution with probability density function f (x) and

cumulative distribution function F (x), the k th moment is given by
Z u Z ∞ Z u
E[(X∧u)k ] = xk f (x)dx+ uk f (x)dx = xk f (x)dx+uk [1−F (u)].
−∞ u −∞
Using integration by parts, we can derive an alternative formula for the
k th moment:
Z u
k
E[(X ∧ u) ] = xk f (x)dx + uk [1 − F (u)]
−∞
Z0 Z u
k
= x f (x)dx + xk f (x)dx + uk [1 − F (u)]
−∞ 0
0 Z 0
k
= x F (x) − kxk−1 F (x)dx

−∞ −∞
u Z u
k
− x S(x) + kxk−1 S(x)dx + uk S(u)

0 0
Z 0 Z u
k−1
=− kx F (x)dx + kxk−1 S(x)dx.
−∞ 0
Note that for x < 0 and k odd we have
Z x Z x
k k
t f (t)dt ≤ x f (t)dt = xk F (x) ≤ 0
−∞ −∞
so that
lim xk F (x) = 0.
x→−∞
A similar argument for x < 0 and k even.
In particular, for k = 1, we obtain
Z 0 Z u
E(X ∧ u) = − F (x)dx + S(x)dx.
−∞ 0
One can easily see that

0, X≤d X, X ≤ u
(X − d)+ + X ∧ u = + = X.
X − d, X > d. u, X > u.
That is, buying one policy with a deductible d and another one with a limit
d is equivalent to purchasing full cover.
5 EMPIRICAL MODELS, EXCESS AND LIMITED LOSS VARIABLES55

Remark 5.1
Usually the random variable X is non-negative (for instance, when X rep-
resents a loss or a time until death), and the lower integration limit −∞ is
replaced by 0. Thus, for X ≥ 0, we whave
Z u
E(X ∧ u) = S(x)dx.
0

Example 5.9
A continuous random variable X has a pdf f (x) = 0.005x for 0 ≤ x ≤ 20
and 0 otherwise. Find the mean and the variance of X ∧ 10.

Solution.
The cdf of X is

Z x Z x  0, x<0
F (x) = 0.005tdt = 0.005tdt = 0.0025x2 , 0 ≤ x ≤ 20
−∞ 0 
1, x > 20.

Thus,
Z 10 Z 10
55
E(X ∧ 10) = [1 − F (x)]dx = [1 − 0.0025x2 ]dx = .
0 0 6

Now,
Z 10
2
E[(X ∧ 10) ] = x2 (0.005x)dx + 102 [1 − F (10)] = 87.5.
0

Finally,
2
55 125
Var(X ∧ 10) = 87.5 − =
6 36

Example 5.10 ‡
The unlimited severity distribution for claim amounts under an auto liability
insurance policy is given by the cumulative distribution:

F (x) = 1 − 0.8e−0.02x − 0.2e−0.001x , x ≥ 0.

The insurance policy pays amounts up to a limit of 1000 per claim. Calculate
the expected payment under this policy for one claim.
56 A REVIEW OF PROBABILITY RELATED RESULTS

Solution.
We are asked to find the limited expected value E(X ∧ 1000). We have
Z 1000 Z 1000
E(X ∧ 1000) = S(x)dx = [0.8e−0.02x + 0.2e−0.001x ]dx = 166.4
0 0

Example 5.11 ‡
A health plan implements an incentive to physicians to control hospitaliza-
tion under which the physicians will be paid a bonus B equal to c times the
amount by which total hospital claims are under 400 (0 ≤ c ≤ 1).
The effect the incentive plan will have on underlying hospital claims is
modeled by assuming that the new total hospital claims will follow a two-
parameter Pareto distribution with α = 2 and θ = 300.
Suppose that E(B) = 100. Calculate the value of c.

Solution.
Let X denote the number of hospital claims. We are told that

c(400 − x), x < 400 400c − cx, x < 400
B= = = 400c−X∧400.
0, x ≥ 400 400c − 400c, x ≥ 400

Thus,

100 = E(B) = E[400c − X ∧ 400] = 400c − cE(X ∧ 400).

We are told that X has a Pareto distribution with parameters α = 2 and

θ = 300. Using Table C, we have
" α−1 #
θ θ
E(X ∧ u) = 1− .
α−1 u+θ

1200
With u = 400, we find E(X ∧ 400) = 7 . Finally,

1200
100 = 400c − c =⇒ c ≈ 0.44
7

Example 5.12
Show that, for X ≥ 0, we have

E(X) − E(X ∧ d)
eX (d) = .
S(d)
5 EMPIRICAL MODELS, EXCESS AND LIMITED LOSS VARIABLES57

Solution.
We have
Z ∞ Z ∞
E(X) = xf (x)dx = −xS(x)|∞
0 + S(x)dx
0 0
Z ∞
= S(x)dx
0
R∞ R∞ Rd
d S(x)dx S(x)dx − 0 S(x)dx
eX (d) = = 0
S(x) S(d)
E(X) − E(X ∧ d)
=
S(d)

Example 5.13 ‡
The random variable for a loss, X, has the following characteristics:

x F (x) E(X ∧ x)
0 0.0 0
100 0.2 91
200 0.6 153
1000 1.0 331
Calculate the mean excess loss for a deductible of 100.

Solution.
We are asked to find
E(X) − E(X ∧ 100)
eX (100) = .
1 − FX (100)
The only term unknown in this formula is E(X). Now, Pr(X > 1000) =
1 − F (1000) = 0. This shows that X ≤ 1000 so that X ∧ 1000 = X. It
follows that E(X) = E(X ∧ 1000) = 331.
The mean excess loss is
E(X) − E(X ∧ 100) 331 − 91
eX (100) = = = 300
1 − FX (100) 1 − 0.2

Remark 5.2
Just as in the case of a deductible, the random variable Y = X ∧ u has
a mixed distribution with continuous part fY (y) = fX (y) for y < u and a
discrete part pY (u) = 1 − FX (u).
58 A REVIEW OF PROBABILITY RELATED RESULTS

Practice Problems
Problem 5.1
Suppose that a policy has a deductible of $500. Complete the following
table.
Amount of loss 750 500 1200
Insurance payment

Problem 5.2
Referring to Example 5.3, find the cumulative distribution function of X.

Problem 5.3
Referring to Example 5.3, find the first and second raw moments of X.

Problem 5.4
Suppose you observe 8 claims with amounts

5 10 15 20 25 30 35 40

Calculate the empirical coefficient of variation.

Problem 5.5
Let X be uniform on the interval [0, 100]. Find eX (d) for d > 0.

Problem 5.6
Let X be uniform on [0, 100] and Y be uniform on [0, α]. Suppose that
eY (30) = eX (30) + 4.

Calculate the value of α.

Problem 5.7
Let X be the exponential random variable with mean λ. Its pdf is f (x) =
λe−λx for x > 0 and 0 otherwise.

Find the expected cost per payment (i.e., mean excess loss function).

Problem 5.8
For an automobile insurance policy, the loss amount (expressed in thou-
sands), in the event of an accident, is being modeled by a distribution with
density
3
f (x) = x(5 − x), 0 < x < 4
56
5 EMPIRICAL MODELS, EXCESS AND LIMITED LOSS VARIABLES59

and 0 otherwise.

For a policy with a deductible amount of $2,500, calculate the expected

amount per loss.

Problem 5.9
1
The loss random variable X has an exponential distribution with mean λ
and an ordinary deductible is applied to all losses.

Find the expected cost per loss.

Problem 5.10
1
The loss random variable X has an exponential distribution with mean λ
and an ordinary deductible is applied to all losses.

Find the variance of the cost per loss random variable.

Problem 5.11
The loss random variable X has an exponential distribution with mean λ1
and an ordinary deductible is applied to all losses. The variance of the cost
per payment random variable (excess loss random variable) is 25,600.

Find λ.

Problem 5.12
The loss random variable X has an exponential distribution with mean λ1
and an ordinary deductible is applied to all losses. The variance of the cost
per payment random variable (excess loss random variable) is 25,600. The
variance of the cost per loss random variable is 20,480.

Find the amount of the deductible d.

Problem 5.13
The loss random variable X has an exponential distribution with mean λ1
and an ordinary deductible is applied to all losses. The variance of the cost
per payment random variable (excess loss random variable) is 25,600. The
variance of the cost per loss random variable is 20,480.

Find expected cost of loss.

60 A REVIEW OF PROBABILITY RELATED RESULTS

Problem 5.14 φ
For the loss random variable with cdf F (x) = xθ , 0 < x < θ, and 0
otherwise, determine the mean residual lifetime eX (x).

Problem 5.15
Let X be a loss random variable with pdf f (x) = (1 + 2x2 )e−2x for x > 0
and 0 otherwise.

(a) Find the survival function S(x).

(b) Determine eX (x).

Problem 5.16
Show that
SX (y + d)
SY P (y) = .
SX (d)
Problem 5.17
Let X be a loss random variable with cdf F (x) = 1 − e−0.005x − 0.004e−0.005x
for x ≥ 0 and 0 otherwise.

(a) If an ordinary deductible of 100 is applied to each loss, find the pdf
of the per payment random variable Y P .
(b) Calculate the mean and the variance of the per payment random vari-
able.

Problem 5.18
A continuous random variable X has a pdf f (x) = 0.005x for 0 ≤ x ≤ 20
and 0 otherwise.

Find the mean and the variance of (X − 10)+ .

Problem 5.19 ‡
For a random loss X, you are given: Pr(X = 3) = Pr(X = 12) = 0.5 and
E[(X − d)+ ] = 3.

Calculate the value of d.

Problem 5.20 ‡
A loss, X, follows a 2-parameter Pareto distribution with α = 2 and unspec-
ified parameter θ. You are given:
5
E(X − 100|X > 100) = E(X − 50|X > 50).
3
5 EMPIRICAL MODELS, EXCESS AND LIMITED LOSS VARIABLES61

Calculate E(X − 150|X > 150).

Problem 5.21 ‡
For an insurance:
(i) Losses can be 100, 200 or 300 with respective probabilities 0.2, 0.2, and
0.6.
(ii) The insurance has an ordinary deductible of 150 per loss.
(iii) Y P is the claim payment per payment random variable.

Calculate Var(Y P ).

Problem 5.22 ‡
For an insurance:
(i) Losses have density function

0.02x, 0 << 10
f (x) =
0, otherwise.

(ii) The insurance has an ordinary deductible of 4 per loss.

(iii) Y P is the claim payment per payment random variable.

Calculate E[Y P ].

Problem 5.23 ‡
The loss severity random variable X follows the exponential distribution
with mean 10,000.

Determine the coefficient of variation of the excess loss variable Y = max{(X−

30000, 0)}.
62 A REVIEW OF PROBABILITY RELATED RESULTS

6 Median, Mode, Percentiles, and Quantiles

In addition to the information provided by the moments of a distribution,
some other metrics such as the median, the mode, the percentile, and the
quantile provide useful information.

Median of a Random Variable

In probability theory, median is described as the numerical value separat-
ing the higher half of a probability distribution, from the lower half. Thus,
the median of a discrete random variable X is the number M such that
Pr(X ≤ M ) ≥ 0.50 and Pr(X ≥ M ) ≥ 0.50.

Example 6.1
Given the pmf of a discrete random variable X.

x 0 1 2 3 4 5
p(x) 0.35 0.20 0.15 0.15 0.10 0.05

Find the median of X.

Solution.
Since Pr(X ≤ 1) = 0.55 and Pr(X ≥ 1) = 0.65, 1 is the median of X

In the case of a continuous random variable X, the median is the num-

ber M such that Pr(X ≤ M ) = Pr(X ≥ M ) = 0.5. Generally, M is found
by solving the equation F (M ) = 0.5 where F is the cdf of X.

Example 6.2
1
Let X be a continuous random variable with pdf f (x) = b−a for a < x < b
and 0 otherwise. Find the median of X.

Solution. R M dx
We must find a number M such that a b−a = 0.5. This leads to the
M −a
equation b−a = 0.5. Solving this equation we find M = a+b
2

Remark 6.1
A discrete random variable might have many medians. For example,
x let X be
the discrete random variable with pmf given by p(x) = 12 , x = 1, 2, · · ·
and 0 otherwise. Then any number 1 < M < 2 satisfies Pr(X ≤ M ) =
Pr(X ≥ M ) = 0.5.
6 MEDIAN, MODE, PERCENTILES, AND QUANTILES 63

Mode of a Random Variable

The mode is defined as the value that maximizes the probability mass func-
tion p(x) (discrete case) or the probability density function f (x) (continuous
case.) In the discrete case, the mode is the value that is most likely to be
sampled. In the continuous case, the mode is where f (x) is at its peak.

Example 6.3
1 x

Let X be the discrete random variable with pmf given by p(x) = 2 , x=
1, 2, · · · and 0 otherwise. Find the mode of X.

Solution.
The value of x that maximizes p(x) is x = 1. Thus, the mode of X is 1

Example 6.4
Let X be the continuous random variable with pdf given by f (x) = 0.75(1 −
x2 ) for −1 ≤ x ≤ 1 and 0 otherwise. Find the mode of X.

Solution.
The pdf is maximum for x = 0. Thus, the mode of X is 0

Percentiles and Quantiles

In statistics, a percentile is the value of a variable below which a certain per-
cent of observations5 fall. For example, if a score is in the 85th percentile,
it is higher than 85% of the other scores. For a random variable X and
0 < p < 1, the 100pth percentile (or the pth quantile) is the number x
such
Pr(X < x) ≤ p ≤ Pr(X ≤ x).
For a continuous random variable, this is the solution to the equation F (x) =
p. The 25th percentile is also known as the first quartile, the 50th percentile
as the median or second quartile, and the 75th percentile as the third quar-
tile.

Example 6.5
A loss random variable X has the density function
(
2.5(200)2.5
x3.5
x > 200
f (x) =
0 otherwise.

Calculate the difference between the 25th and 75th percentiles of X.

5
Another term for an ”observation” in this text is ”exposure”.
64 A REVIEW OF PROBABILITY RELATED RESULTS

Solution.
First, the cdf is given by
x
2.5(200)2.5
Z
F (x) = dt.
200 t3.5

If Q1 is the 25th percentile then it satisfies the equation

1
F (Q1 ) =
4
or equivalently
3
1 − F (Q1 ) = .
4
This leads to
2.5 ∞
∞
2.5(200)2.5 200 2.5
Z
3 200
= dt = − = .

4 Q1 t3.5 t Q1
Q1

Solving for Q1 we find Q1 = 200(4/3)0.4 ≈ 224.4. Similarly, the third quartile

(i.e. 75th percentile) is given by Q3 = 348.2, The interquartile range
(i.e., the difference between the 25th and 75th percentiles) is Q3 − Q1 =
348.2 − 224.4 = 123.8

Example 6.6
1
Let X be the random variable with pdf f (x) = b−a for a < x < b and 0
otherwise. Find the pth quantile of X.

Solution.
We have x
x−a
Z
dt
p = Pr(X ≤ x) = = .
a b−a b−a
Solving this equation for x, we find x = a + (b − a)p

Example 6.7
What percentile is 0.63 quantile?

Solution.
0.63 quantile is 63rd percentile
6 MEDIAN, MODE, PERCENTILES, AND QUANTILES 65

Practice Problems
Problem 6.1
Using words, explain the meaning of F (1120) = 0.2 in terms of percentiles
and quantiles.
Problem 6.2
Let X be a discrete random variable with pmf p(n) = (n−1)(0.4)2 (0.6)n−2 , n ≥
2 and 0 otherwise.

Find the mode of X.

Problem 6.3
Let X be a continuous random variable with density function
1
λ 9 x(4 − x), 0 < x < 3
f (x) =
0, otherwise.
Find the mode of X.
Problem 6.4
Suppose the random variable X has pmf
1 2 n

p(n) = , n = 0, 1, 2, · · ·
3 3
and 0 otherwise.

Find the median and the 70th percentile.

Problem 6.5
The time in minutes between individuals joining the line at an Ottawa Post
Office is a random variable with the density function
−2x
2e , x ≥ 0
f (x) =
0, x < 0.
Find the median time between individuals joining the line and interpret your
answer.
Problem 6.6
Suppose the random variable X has pdf
−x
e , x≥0
f (x) =
0, otherwise.
Find the 100pth percentile.
66 A REVIEW OF PROBABILITY RELATED RESULTS

Problem 6.7 ‡
An insurance company sells an auto insurance policy that covers losses in-
curred by a policyholder, subject to a deductible of 100 . Losses incurred
follow an exponential distribution with mean 300.

What is the 95th percentile of actual losses that exceed the deductible?

Problem 6.8
Let X be a randon variable with density function

λe−λx ,

x>0
f (x) =
0, otherwise.

Find λ if the median of X is 13 .

Problem 6.9
People are dispersed on a linear beach with a density function f (y) =
4y 3 , 0 < y < 1, and 0 elsewhere. An ice cream vendor wishes to locate
her cart at the median of the locations (where half of the people will be on
each side of her).

Where will she locate her cart?

Problem 6.10 ‡
An automobile insurance company issues a one-year policy with a deductible
of 500. The probability is 0.8 that the insured automobile has no accident
and 0.0 that the automobile has more than one accident. If there is an
accident, the loss before application of the deductible is exponentially dis-
tributed with mean 3000.

Calculate the 95th percentile of the insurance company payout on this policy.

Problem 6.11
Let Y be a continuous random variable with cumulative distribution function
y≤a

0,
F (y) = − 12 (y−a)2
1−e , otherwise

where a is a constant.

Find the 75th percentile of Y.

6 MEDIAN, MODE, PERCENTILES, AND QUANTILES 67

Problem 6.12
Find the pth quantile of the exponential distribution defined by the distri-
bution function F (x) = 1 − e−x for x ≥ 0 and 0 otherwise.

Problem 6.13
A continuous random variable has the pdf f (x) = e−|x| for x ∈ R.

Find the pth quantile of X.

Problem 6.14
Let X be a loss random variable with cdf
( α
θ
1 − θ+x , x≥0
F (x) =
0, x < 0.

The 10th percentile is θ − k. The 90th percentile is 3θ − 3k.

Determine the value of α.

Problem 6.15
A random variable X follows a normal distribution with µ = 1 and σ 2 = 4.
Define a random variable Y = eX , then Y follows a lognormal distribution.
It is known that the 95th percentile of a standard normal distribution is
1.645.

Calculate the 95th percentile of Y.

Problem 6.16
4x
Let X be a random variable with density function f (x) = (1+x2 )3
for x > 0
and 0 otherwise.

Calculate the mode of X.

Problem 6.17
3 5000 4

Let X be a random variable with pdf f (x) = 5000 x for x > 0 and 0
otherwise.

Determine the median of X.

68 A REVIEW OF PROBABILITY RELATED RESULTS

Problem 6.18
Let X be a random variable with cdf

 0, x<0
F (x) = x3
, 0≤x≤3
 27
1, x > 3.

Find the median of X.

Problem 6.19
Consider a sample of size 9 and observed data

45, 50, 50, 50, 60, 75, 80.120, 230.

Using this data as an empirical distribution, calculate the empirical mode.

Problem 6.20
3
A distribution has a pdf f (x) = x4
for x > 1 and 0 otherwise.

Calculate the 0.95th quantile of this distribution.

7 SUM OF RANDOM VARIABLES AND THE CENTRAL LIMIT THEOREM69

7 Sum of Random Variables and the Central Limit

Theorem
Random variables of the form

Sn = X1 + X2 + · · · + Xn

appear repeatedly in probability theory and applications. For example, in

the insurance context, Sn can represent the total claims paid on all policies
where Xi is the ith claim. Thus, it is useful to be able to determine proper-
ties of Sn .

For the expected value of Sn , we have

E(Sn ) = E(X1 ) + E(X2 ) + · · · + E(Xn−1 ) + E(Xn ).

A similar formula holds for the variance provided that the Xi0 s are indepen-
dent6 random variables. In this case,

Var(Sn ) = Var(X1 ) + Var(X2 ) + · · · + Var(Xn ).

Example 7.1 ‡
The random variables X1 , X2 , · · · , Xn are independent and identically dis-
tributed with probability density function

1 x
f (x) = e− θ .
θ
2
Determine E[X ].

Solution.
The random variable Xi has an exponential distribution with mean θ. Thus,

6
We say that X and Y are independent random variables if and only if for any two
sets of real numbers A and B we have

P (X ∈ A, Y ∈ B) = P (X ∈ A)P (Y ∈ B).
70 A REVIEW OF PROBABILITY RELATED RESULTS

E(Xi ) = θ and Var(Xi ) = θ2 . Thus,

E(X1 ) + · · · + E(Xn )
E[X] = =θ
n
Var(X1 ) + · · · + Var(Xn )
Var[X] =
n2
θ 2
=
n
2
E[X ] =Var[X] + E[X]2
θ2

2 n+1
= +θ = θ2
n n

The central limit theorem reveals a fascinating property of the sum of inde-
pendent random variables. It states that the CDF of the sum converges to
the standard normal CDF as the number of terms grows without limit. This
theorem allows us to use the properties of the standard normal distribution
to obtain accurate estimates of probabilities associated with sums of random
variables.

Theorem 7.1
Let X1 , X2 , · · · be a sequence of independent and identically distributed
random variables, each with mean µ and variance σ 2 . Then,
√ Z a
n X1 + X2 + · · · + Xn 1 x2
P −µ ≤a → √ e− 2 dx
σ n 2π −∞
as n → ∞.

The Central Limit Theorem says that regardless of the underlying distribu-
tion
√ of the variables Xi , so long as they are independent, the distribution of
n X1 +X2 +···+Xn
σ n − µ converges to the same, normal, distribution.

Example 7.2
The weight of a typewriter has a mean of 20 pounds and a variance of 9
pounds. Consider a train that carries 200 of these typewriters. Estimate the
probability that the total weight of typewriters carried in the train exceeds
4050 pounds.

Solution.
Label the typewriters as Typewriter 1, Typewriter 2, etc. Let Xi be the
7 SUM OF RANDOM VARIABLES AND THE CENTRAL LIMIT THEOREM71

weight of Typewriter i. Thus,

200
! P200 !
X X
i=1 √i − 200(20) 4050 − 20(200)
P Xi > 4050 =P > √
i=1
3 200 3 200
≈P (Z > 1.179) = 1 − P (Z ≤ 1.179)
=1 − Φ(1.179) = 1 − 0.8810 = 0.119

where Φ is the CDF of the standard normal distribution

Example 7.3 ‡
In an analysis of healthcare data, ages have been rounded to the nearest
multiple of 5 years. The difference between the true age and the rounded
age is assumed to be uniformly distributed on the interval from −2.5 years to
2.5 years. The healthcare data are based on a random sample of 48 people.
What is the approximate probability that the mean of the rounded ages is
within 0.25 years of the mean of the true ages?

Solution.
Let X denote the difference between true and reported age. We are given X
is uniformly distributed on (−2.5, 2.5). That is, X has pdf f (x) = 1/5, −2.5 <
x < 2.5. It follows that E(X) = 0 and
2.5
x2
Z
2 2
σX = E(X ) = dx ≈ 2.083
−2.5 5
√
so that SD(X) = 2.083 ≈ 1.443.
Now X 48 the difference between the means of the true and rounded ages,
has a distribution that is approximately normal with mean 0 and standard
deviation 1.443
√
48
≈ 0.2083. Therefore,

1 1 −0.25 X48 0.25
P − ≤ X 48 ≤ =P ≤ ≤
4 4 0.2083 0.2083 0.2083
=P (−1.2 ≤ Z ≤ 1.2) = 2Φ(1.2) − 1
≈2(0.8849) − 1 = 0.77

Example 7.4
Let X1 , X2 , X3 , X4 be a random sample of size 4 from a normal distribution
with mean 2 and variance 10, and let X be the sample mean. Determine a
such that P (X ≤ a) = 0.90.
72 A REVIEW OF PROBABILITY RELATED RESULTS

Solution.
2
The sample mean X is normal with mean µ = 2 and variance σn = 10
= 2.5,
√ 4
and standard deviation 2.5 ≈ 1.58, so

X −2 a−2 a−2
0.90 = P (X ≤ a) = P < =Φ .
1.58 1.58 1.58
a−2
Using Excel, we get 1.58 = 1.28, so a = 4.02
7 SUM OF RANDOM VARIABLES AND THE CENTRAL LIMIT THEOREM73

Practice Problems
Problem 7.1
A shipping agency ships boxes of booklets with each box containing 100
booklets. Suppose that the average weight of a booklet is 1 ounce and the
standard deviation is 0.05 ounces.

What is the probability that 1 box of booklets weighs more than 100.4
ounces?

Problem 7.2
In the National Hockey League, the standard deviation in the distribution
of players’ height is 2 inches. The heights of 25 players selected at random
were measured.

Estimate the probability that the average height of the players in this sample
is within 1 inch of the league average height.

Problem 7.3
A battery manufacturer claims that the lifespan of its batteries has a mean
of 54 hours and a standard deviation of 6 hours. A sample of 60 batteries
were tested.

What is the probability that the mean lifetime is less than 52 hours?

Problem 7.4
Roll a dice 10 times. Estimate the probability that the sum obtained is
between 30 and 40, inclusive.

Problem 7.5
Consider 10 independently random variables each uniformly distributed over
(0,1).

Estimate the probability that the sum of the variables exceeds 6.

Problem 7.6
The Chicago Cubs play 100 independent baseball games in a given season.
Suppose that the probability of winning a game in 0.8.

What’s the probability that they win at least 90?

74 A REVIEW OF PROBABILITY RELATED RESULTS

Problem 7.7
An insurance company has 10,000 home policyholders. The average annual
claim per policyholder is found to be $240 with a standard deviation of $800.

Estimate the probability that the total annual claim is at least $2.7 mil-
lion.

Problem 7.8
A certain component is critical to the operation of a laptop and must be
replaced immediately upon failure. It is known that the average life of this
type of component is 100 hours and its standard deviation is 30 hours.

Estimate the number of such components that must be available in stock so

that the system remains in continual operation for the next 2000 hours with
probability of at least 0.95?

Problem 7.9
An instructor found that the average student score on class exams is 74 and
the standard deviation is 14. This instructor gives two exams: One to a
class of size 25 and the other to a class of 64.

Using the Central Limit Theorem, estimate the probability that the average
test score in the class of size 25 is at least 80.

Problem 7.10
The Salvation Army received 2025 in contributions. Assuming the contri-
butions to be independent and identically distributed with mean 3125 and
standard deviation 250.

Estimate the 90th percentile for the distribution of the total contributions
received.

Problem 7.11 ‡
An insurance company issues 1250 vision care insurance policies. The num-
ber of claims filed by a policyholder under a vision care insurance policy
during one year is a Poisson random variable with mean 2. Assume the
numbers of claims filed by distinct policyholders are independent of one an-
other.

What is the approximate probability that there is a total of between 2450

and 2600 claims during a one-year period?
7 SUM OF RANDOM VARIABLES AND THE CENTRAL LIMIT THEOREM75

Problem 7.12
A battery manufacturer finds that the lifetime of a battery, expressed in
months, follows a normal distribution with mean 3 and standard deviation
1 . Suppose that you want to buy a number of these batteries with the
intention of replacing them successively into your radio as they burn out.

Assuming that the batteries’ lifetimes are independent, what is the small-
est number of batteries to be purchased so that the succession of batteries
keeps your radio working for at least 40 months with probability exceeding
0.9772?

Problem 7.13
The total claim amount for a home insurance policy has a pdf
x
1 − 1000

f (x) = 1000 e x>0
0 otherwise.

An actuary sets the premium for the policy at 100 over the expected total
claim amount.

If 100 policies are sold, estimate the probability that the insurance com-
pany will have claims exceeding the premiums collected.

Problem 7.14 ‡
A city has just added 100 new female recruits to its police force. The city
will provide a pension to each new hire who remains with the force until
retirement. In addition, if the new hire is married at the time of her re-
tirement, a second pension will be provided for her husband. A consulting
actuary makes the following assumptions:

(i) Each new recruit has a 0.4 probability of remaining with the police force
until retirement.
(ii) Given that a new recruit reaches retirement with the police force, the
probability that she is not married at the time of retirement is 0.25.
(iii) The number of pensions that the city will provide on behalf of each new
hire is independent of the number of pensions it will provide on behalf of
any other new hire.

Determine the probability that the city will provide at most 90 pensions
to the 100 new hires and their husbands.
76 A REVIEW OF PROBABILITY RELATED RESULTS

Problem 7.15
The amount of an individual claim has a two-parameter Pareto distribution
with θ = 8000 and α = 9. Consider a sample of 500 claims.

Estimate the probability that the total sum of the claims is at least 550,000.

Problem 7.16
Suppose that the current profit from selling a share of a stock is found to
follow a uniform distribution on [−45, 72].

Using the central limit theorem, approximate the probability of making a

profit from the sale of 55 stocks.

Problem 7.17
The severities of individual claims have the Pareto distribution with param-
eters α = 38 and θ = 8000.

Use the central limit theorem to approximate the probability that the sum
of 100 independent claims will exceed 600,000.

Problem 7.18 ‡
Let X and Y be the number of hours that a randomly selected person
watches movies and sporting events, respectively, during a three-month pe-
riod. The following information is known about X and Y :

E(X) = 50
E(Y) = 20
Var(X) = 50
Var(Y) = 30
Cov (X,Y) = 10

One hundred people are randomly selected and observed for these three
months. Let T be the total number of hours that these one hundred people
watch movies or sporting events during this three-month period.

Approximate the value of P (T < 7100).

Problem 7.19 ‡
Automobile losses reported to an insurance company are independent and
uniformly distributed between 0 and 20,000. The company covers each such
loss subject to a deductible of 5,000.
7 SUM OF RANDOM VARIABLES AND THE CENTRAL LIMIT THEOREM77

Calculate the probability that the total payout on 200 reported losses is
between 1,000,000 and 1,200,000.

Problem 7.20 ‡
For Company A there is a 60% chance that no claim is made during the
coming year. If one or more claims are made, the total claim amount is
normally distributed with mean 10,000 and standard deviation 2, 000.
For Company B there is a 70% chance that no claim is made during the
coming year. If one or more claims are made, the total claim amount is
normally distributed with mean 9,000 and standard deviation 2,000.
Assume that the total claim amounts of the two companies are independent.

What is the probability that, in the coming year, Company B’s total claim
amount will exceed Company A’s total claim amount?
78 A REVIEW OF PROBABILITY RELATED RESULTS

8 Moment Generating Functions and Probability

Generating Functions
A useful way to analyze the sum of independent random variables is to trans-
form the PDF or PMF of each random variable to a moment generating
function, abbreviated mgf.

The moment generating function of a continuous random variable X with a

density function f (x) is denoted by MX (t) and is given by
Z ∞
MX (t) = E[etx ] = etx f (x)dx.
−∞

The moment generating function of a discrete random variable X with a

probability mass function p(x) is denoted by MX (t) and is given by
X
MX (t) = E[etx ] = etx p(x).
x∈support(X)

Example 8.1
Calculate the moment generating function for the exponential distribution
with parameter λ, i.e. f (x) = λe−λx for x > 0 and 0 otherwise.

Solution.
We have
∞ ∞
λ −x(λ−t) ∞
Z Z
tx −λx −x(λ−t) λ
MX (t) = e λe dx = λe dx = − e = λ − t, t < λ
0 0 λ−t 0

Example 8.2
Let X be a discrete random variable with pmf given by the following table
x 1 2 3 4 5
p(x) 0.15 0.20 0.40 0.15 0.10
and 0 otherwise. Calculate MX (t).

Solution.
We have
MX (t) = 0.15et + 0.20e2t + 0.40e3t + 0.15e4t + 0.10e5t
As the name suggests, the moment generating function can be used to gen-
erate moments E(X n ) for n = 1, 2, · · · . The next result shows how to use
the moment generating function to calculate moments.
8 MOMENT GENERATING FUNCTIONS AND PROBABILITY GENERATING FUNCTIONS79

Theorem 8.1
For any random variable X, we have
dn

E(X n ) = MX
n (0) where M n (0) =
X dtn MX (t) t=0 .

Example 8.3
Let X be a binomial random variable with parameters n and p. Find the
expected value and the variance of X using moment generating functions.

Solution.
We can write
n
X n
X
tk k n−k
MX (t) = e C(n, k)p (1−p) = C(n, k)(pet )k (1−p)n−k = (pet +1−p)n .
k=0 k=0

Differentiating yields
d d
MX (t) = npet (pet + 1 − p)n−1 =⇒ E(X) = MX (t) |t=0 = np.
dt dt
To find E(X 2 ), we differentiate a second time to obtain

d2
MX (t) = n(n − 1)p2 e2t (pet + 1 − p)n−2 + npet (pet + 1 − p)n−1 .
dt2
Evaluating at t = 0 we find
00
E(X 2 ) = MX (0) = n(n − 1)p2 + np.

Observe that this implies the variance of X is

V ar(X) = E(X 2 ) − (E(X))2 = n(n − 1)p2 + np − n2 p2 = np(1 − p)

Example 8.4
Let X be a Poisson random variable with parameter λ. Find the expected
value and the variance of X using moment generating functions.

Solution.
We can write
∞ tn −λ n ∞ tn n ∞
X e e λ −λ
X e λ −λ
X (λet )n t t −1)
MX (t) = =e =e = e−λ eλe = eλ(e .
n! n! n!
n=0 n=0 n=0

Differentiating for the first time we find

0 t −1) 0
MX (t) = λet eλ(e =⇒ E(X) = MX (0) = λ.
80 A REVIEW OF PROBABILITY RELATED RESULTS

Differentiating a second time we find

00 t −1) t −1) 00
MX (t) = (λet )2 eλ(e + λet eλ(e =⇒ E(X 2 ) = MX (0) = λ2 + λ.

The variance is then

V ar(X) = E(X 2 ) − (E(X))2 = λ

Example 8.5
Let X be a normal random variable with parameters µ and σ 2 . Find the
expected value and the variance of X using moment generating functions.

Solution.
First we find the moment of a standard normal random variable with pa-
rameters 0 and 1. We can write
Z ∞ Z ∞
(z 2 − 2tz)

tZ 1 2
tz − z2 1
MZ (t) =E(e ) = √ e e dz = √ exp − dz
2π −∞ 2π −∞ 2
Z ∞ Z ∞
(z − t)2 t2

1 t2 1 (z−t)2 t2
=√ exp − + dz = e 2 √ e− 2 dz = e 2
2π −∞ 2 2 2π −∞
Now, since X = µ + σZ we have

MX (t) =E(etX ) = E(etµ+tσZ ) = E(etµ etσZ ) = etµ E(etσZ )

2 2
tµ
2 2
tµ σ 2t σ t
=e MZ (tσ) = e e = exp + µt
2

By differentiation we obtain

σ 2 t2

0
MX (t) = (µ + tσ 2 )exp + µt
2

and
σ 2 t2
2 2
00 2 2 2 σ t
MX (t) = (µ + tσ ) exp + µt + σ exp + µt
2 2

and thus
0 (0) = µ and E(X 2 ) = M 00 (0) = µ2 + σ 2
E(X) = MX X

The variance of X is

V ar(X) = E(X 2 ) − (E(X))2 = σ 2

8 MOMENT GENERATING FUNCTIONS AND PROBABILITY GENERATING FUNCTIONS81

Moment generating functions are also useful in establishing the distribu-

tion of sums of independent random variables. Suppose X1 , X2 , · · · , XN are
independent random variables. Then, the moment generating function of
Y = X1 + · · · + XN is
N
Y N
Y
t(X1 +X2 +···+XN ) X1 t XN t Xk t
MY (t) = E(e ) = E(e ···e )= E(e )= MXk (t).
k=1 k=1

Another important property is that the moment generating function uniquely

determines the distribution. That is, if random variables X and Y both have
moment generating functions MX (t) and MY (t) that exist in some neigh-
borhood of zero and if MX (t) = MY (t) for all t in this neighborhood, then
X and Y have the same distributions.

Example 8.6
If X and Y are independent binomial random variables with parameters
(n, p) and (m, p), respectively, what is the pmf of X + Y ?

Solution.
We have

MX+Y (t) = MX (t)MY (t) = (pet + 1 − p)n (pet + 1 − p)m = (pet + 1 − p)n+m .

Since (pet + 1 − p)n+m is the moment generating function of a binomial

random variable having parameters m + n and p, X + Y is a binomial
random variable with this same pmf

Example 8.7
If X and Y are independent Poisson random variables with parameters λ1
and λ2 , respectively, what is the pmf of X + Y ?

Solution.
We have
t −1) t −1) t −1)
MX+Y (t) = MX (t)MY (t) = eλ1 (e eλ2 (e = e(λ1 +λ2 )(e .
t
Since e(λ1 +λ2 )(e −1) is the moment generating function of a Poisson random
variable having parameter λ1 + λ2 , X + Y is a Poisson random variable with
this same pmf
82 A REVIEW OF PROBABILITY RELATED RESULTS

Probability Generating Function

Another useful tool for dealing with the distribution of a sum of discrete ran-
dom variables is the probability generating function or the z−transform
of the probability mass function, abbreviated pgf. For a discrete random
variable X, we define the probability generating function by
X
PX (t) = E(tx ) = tx p(x), ∀t ∈ R for which the sum converges.
x∈Support(X)

Note that PX (t) = MX [ex ln t ] and MX (t) = PX (et ). The pgf transforms a
sum into a product and enables it to be handled much more easily: Let
X1 , X2 , · · · , Xn be independent random variables and Sn = X1 + X2 + · · · +
Xn . It can be shown that

PSn (t) = PX1 (t)PX2 (t) · · · PXn (t).

Example 8.8
Find the pgf of the Poisson distribution of parameter λ.

Solution.
λx e−λ
Recall that the Poisson random variable has a pmf p(x) = x! . Hence,
∞ x e−λ ∞
X
xλ −λ
X (λt)x
PX (t) = t =e = e−λ eλt = eλ(t−1)
x! x!
x=0 x=0

The probability generating function gets its name because the power series
can be expanded and differentiated to reveal the individual probabilities.
Thus, given only the pgf PX (t) = E(tx ), we can recover all probabilities
Pr(X = x).
It can be shown that
1 dn

p(n) = PX (t) .
n! dtn
t=0

Example 8.9
Let X be a discrete random variable with pgf PX (t) = 5t (2 + 3t2 ). Find the
distribution of X.

Solution. 00 (0)
PX
We have, p(0) = PX (0) = 0; p(1) = PX0 (0) = 2
5 ; p(2) = 2! = 0, p(3) =
P X 000 (0)
3! = 35 ; and p(n) = 0, ∀n ≥ 4
8 MOMENT GENERATING FUNCTIONS AND PROBABILITY GENERATING FUNCTIONS83

Theorem 8.2
For any discrete random variable X, we have

dk

E[X(X − 1)(X − 2) · · · (X − k + 1)] = k PX (t) .
dt t=1

Example 8.10
Let X be a Poisson random variable with parameter λ. Find the mean and
the variance using probability generating functions.

Solution.
We know that the pgf of X is PX (t) = eλ(t−1) . We have

E(X) =PX0 (1) = λ

E[X(X − 1)] =λ2
E(X 2 ) =λ2 + λ
Var(X) =λ

Like moment generating functions, the probability generating function uniquely

determines the distribution. That is, if we can show that two random vari-
ables have the same pgf in some interval containing 0, then we have shown
that the two random variables have the same distribution.

Example 8.11
Let X be a Poisson random variable with parameter λ and Y is Poisson
with parameter µ. Find the distribution of X + Y, assuming X and Y are
independent.

Solution.
We have

PX+Y (t) = PX (t)PY (t) = eλ(t−1) eµ(t−1) = e(λ+µ)(t−1) .

This is the pgf of a Poisson random variable with parameter λ + µ. So, by

the uniqueness of pgfs, X + Y is a Poisson random variable with parameter
λ+µ
84 A REVIEW OF PROBABILITY RELATED RESULTS

Practice Problems
Problem 8.1
Let X be an exponential random variable with parameter λ.

Find the expected value and the variance of X using moment generating
functions.

Problem 8.2
Let X and Y be independent normal random variables with parameters
(µ1 , σ12 ) and (µ2 , σ22 ), respectively.

Find the distribution of X + Y.

Problem 8.3 ‡
Let X and Y be identically distributed independent random variables such
that the moment generating function of X + Y is
M (t) = 0.09e−2t + 0.24e−t + 0.34 + 0.24et + 0.09e2t , − ∞ < t < ∞.

Calculate Pr(X ≤ 0).

Problem 8.4 ‡
The value of a piece of factory equipment after three years of use is 100(0.5)X
where X is a random variable having moment generating function
1
MX (t) = 1−2t for t < 21 .
Calculate the expected value of this piece of equipment after three years of
use.

Problem 8.5
Let X and Y be two independent random variables with moment generating
functions
2 +2t 2 +t
MX (t) = et and MY (t) = e3t .
Determine the moment generating function of X + 2Y.

Problem 8.6
The random variable X has an exponential distribution with parameter b.
It is found that MX (−b2 ) = 0.2.

Find b.
8 MOMENT GENERATING FUNCTIONS AND PROBABILITY GENERATING FUNCTIONS85

Problem 8.7
If the moment generating function for the random variable X is MX (t) =
1 3
t+1 , find E[(X − 2) ].

Problem 8.8
Suppose a random variable X has moment generating function
9
2 + et

MX (t) = .
3
Find the variance of X.

Problem 8.9
A random variable X has the moment generating function
1
MX (t) = .
(1 − 2500t)4
Determine the standard deviation of X.

Problem 8.10 ‡
A company insures homes in three cities, J, K, and L . Since sufficient dis-
tance separates the cities, it is reasonable to assume that the losses occurring
in these cities are independent.
The moment generating functions for the loss distributions of the cities are:
MJ (t) =(1 − 2t)−3
MK (t) =(1 − 2t)−2.5
MJ (t) =(1 − 2t)−4.5
Let X represent the combined losses from the three cities.

Calculate E(X 3 ).

Problem 8.11
Let X be a binomial random variable with pmf p(k) = C(n, k)pk (1 − p)n−k .

Find the pgf of X.

Problem 8.12
Let X be a geometric random variable with pmf p(n) = p(1 − p)n−1 , n =
1, 2, · · · , where 0 < p < 1.

Find the pgf of X.

86 A REVIEW OF PROBABILITY RELATED RESULTS

Problem 8.13
Let X be a random variable with pgf PX (t) = eλ(t−1) . True or false: X is a
Poisson random variable with parameter λ.
Problem 8.14
Let X be a random variable and Y = a + bX. Express PY (t) in terms of
PX (t).
Problem 8.15
Let X have the distribution of a geometric random variable with parameter
p. That is, p(x) = p(1 − p)x−1 , x = 1, 2, 3, · · · .

Find the mean and the variance of X using probability generating func-
tions.
Problem 8.16
You are given a sample of size 4 with observed data
2 2 3 5 8

Using empirical distribution framework, calculate the probability generating

function.
Problem 8.17
Let X be a discrete random variable with the pmf given below
7
x −2 3 π 2
1 1 1 3
p(x) 3 6 8 8
and 0 otherwise. Find the probability generating function PX (t).
Problem 8.18
1
Suppose p(n) = 2n−2 , n = 3, 4, 5, · · · and 0 otherwise. Find the probability
generating function PX (t).
Problem 8.19
1 1 2 4

Suppose PX (t) = t 3t + 3 . Find the pmf of X.
Problem 8.20
Let X be a random variable with probability generating function PX (t) =
t(1+t)
2(3−2t) .

Using PX (t), find E(X) and Var(X).

Tail Weight of a Distribution

The (right-)tail of a distribution is the portion of the distribution corre-

sponding to large values of the random variable. Alternatively, we can define
the tail of a random variable X as the interval (x, ∞) with probability
Z ∞
Pr(X > x) = SX (x) = 1 − FX (x) = fX (x)dx
x

where SX (x) is the survival function of X.

A distribution is said to be a heavy-tailed distribution if it significantly

puts more probability on larger values of the random variable. We also say
that the distribution has a larger tail weight. In contrast, a distribution that
puts less and less probability for larger values of the random variable is said
to be light-tailed distribution. According to [1], there are four ways to
look for indication that a distribution is heavy-tailed. The purpose of this
chapter is to discuss these various ways.

87
88 TAIL WEIGHT OF A DISTRIBUTION

9 Tail Weight Measures: Moments and the Speed

of Decay of S(x)
There are four ways of measuring heavy-tailed distributions as suggested by
[1]:
• Existence of non-central moments.
• The speed for which the survival function decays to zero.
• The hazard rate function.
• The mean excess loss function.
In this section we cover the first two and the last two will be covered in the
next section.

The Existence of Moments

A distribution fX (x) is said to be light-tailed if E(xk ) < ∞ for all k > 0.
The distribution fX (x) is said to be heavy-tailed if either E(xk ) does not
exist for all k > 0 or the moments exist only up to a certain value of a
positive integer k.

Example 9.1
Show that the exponential distribution with parameter λ > 0 is light-tailed
according to the above definition. Refer to Table C.

Solution.
Using Table C, for all positive integers k, we have
Γ(k + 1)
E(X k ) = .
λk
Hence, the exponential distribution is light-tailed

Example 9.2
Show that the Pareto distribution with parameters α and θ is heavy-tailed.
Refer to Table C.

Solution.
Using Table C, we have

θk Γ(k + 1)Γ(α − k)
E(X k ) =
Γ(α)
provided that −1 < k < α. Since the moments are not finite for all positive
k, the Pareto distribution is heavy-tailed
9 TAIL WEIGHT MEASURES: MOMENTS AND THE SPEED OF DECAY OF S(X)89

Example 9.3
Let X be a continuous random variable with pdf fX (x) defined for x > 0 and
0 otherwise. Suppose that there is a constant M > 0 such that fX (x) = xCn
for all x ≥ M and 0 otherwise, where n > 1 and C = Mn−1 1−n . Show that X
has a heavy-tailed distribution.

Solution.
We have
Z M Z ∞
k k
E(X ) = x fX (x)dx + C xk−n dx
0 M
Z M ∞
k xk−n+1
= x fX (x)dx + C
0 k − n + 1 M
=∞

for all k > n − 1

Classification Based on the Speed of Decay of the Survival Function

The survival function SX (x) = P (X > x) captures the probability of the
tail of a distribution. Recall that SX (x) decreases to 0 as x → ∞. The
question is how fast the function decays to zero. If the survival function of a
distribution decays slowly to zero (equivalently the cdf goes slowly to one),
it is another indication that the distribution is heavy-tailed. In contrast,
when the survival function decays to 0 very rapidly then this is indication
that the distribution is light-tailed.

Next, we consider comparing the tail weights of two distributions with the
same mean. This is done by comparing the survival functions of the two
distributions. Algebraically, we compute the ratio of the tail probabilities
or the survival functions which we will refer to as the relative tail weight:
0 (x)
−SX
SX (x) fX (x)
lim = lim = lim ≥ 0.
x→∞ SY (x) x→∞ −S 0 (x) x→∞ fY (x)
Y

Note that in the middle limit we used L’Hôpital’s rule since limx→∞ SX (x) =
limx→∞ SY (x) = 0.

Now, if the above limit is 0, then this happens only when the numerator
is 0 and the denominator is positive. In this case, we say that the distri-
bution of X has lighter tail than Y. If the limit is finite positive number
then we say that the distributions have similar or proportional tails. If the
90 TAIL WEIGHT OF A DISTRIBUTION

limit diverges to infinity, then more probabilities on large values of X are

assigned to the numerator, In this case, we say that the distribution X has
heavier tail than the distribution Y.

Example 9.4
Compare the tail weight of the inverse Pareto distribution with pdf fX (x) =
θ2
τ θ1 xτ −1 θ2α e− x
(x+θ1 )τ +1
with the inverse Gamma distribution with pdf fY (x) = xα+1 Γ(α)
where θ1 , θ2 , τ > 0 and α > 1.

Solution.
We have
fX (x) τ θ1 xτ −1 xα+1 Γ(α)
lim = lim ·
x→∞ fY (x) x→∞ (x + θ1 )τ +1 θ2
θ2α e− x
τ +1
τ θ1 Γ(α) θ2 x
= lim e x xα−1
x→∞ θ2α x + θ1
τ θ1 Γ(α) 0
= · e · ∞ = ∞.
θ2α

Thus, X has a heavier tail than Y

Example 9.5
LetX be the exponential distribution with survival function SX = e−x for
x ≥ 0 and 0 otherwise, and Y be the distribution with survival function
SY (x) = x1 for x ≥ 1 and 0 otherwise. Compare the tail weight of these
distributions.

Solution.
We have
SX (x)
lim = lim xe−x = 0.
x→∞ SY (x) x→∞
Hence, X has a lighter tail than Y
9 TAIL WEIGHT MEASURES: MOMENTS AND THE SPEED OF DECAY OF S(X)91

Practice Problems
Problem 9.1
Let X be a random variable with pdf fX (x) = Cxn e−bx for x > 0 and 0
R ∞ −1
otherwise, where b, n > 0 and C = 0 xn e−bx dx .

Show that X has a light tail distribution.

Problem 9.2 R∞
Suppose X has a heavy-tailed distribution. Let t > 0. Show that N etx fX (x)dx =
∞ for some N = N (t) > 0.
Problem 9.3
Suppose X has a heavy-tailed distribution. Show that MX (t) = ∞ for all
t > 0.
Problem 9.4
Determine whether the Γ distribution with parameters α > 0 and θ > 0 is
light-tailed or heavy-tailed. Refer to Table C.
Problem 9.5
Let X be the inverse Weibull random variables with parameters θ and τ.

Determine whether the distribution is light-tailed or heavy-tailed. Refer

to Table C.
Problem 9.6
αθ1α
Compare the tail weight of the Pareto distribution with pdf fX (x) = (x+θ1 )α+1
− x
xτ −1 e θ2
with the Gamma distribution with pdf fY (x) = θ2τ Γ(τ ) where θ1 , θ2 , τ > 0
and α > 1.
Problem 9.7
Compare the tail weight of the Weibull distribution with pdf fX (x) =
τ τ
τ x τ −( xθ ) τ θ τ −( xθ )

x θ e and the inverse Weibull distribution with pdf fY (x) = x x e
where τ, θ > 0.
Problem 9.8
2
Let X be a random variable with pdf fX (x) = π(1+x 2 ) for x > 0 and 0
α
otherwise. Let Y be a random variable with pdf fY (x) = (1+x) α+1 for x > 0

and 0 otherwise, where α > 1.

Compare the tail weight of these distributions.

92 TAIL WEIGHT OF A DISTRIBUTION

Problem 9.9
2
Let X be a random variable with pdf fX (x) = π(1+x2 )
for x > 0 and 0
1
otherwise. Let Y be a random variable with pdf fY (x) = (1+x)2
for x > 0
and 0 otherwise.

Compare the tail weight of these distributions.

Problem 9.10
2
Let X be a random variable with pdf fX (x) = π(1+x 2 ) for x > 0 and 0
α
otherwise. Let Y be a random variable with pdf fY (x) = (1+x) α+1 for x > 0

and 0 otherwise, where 0 < α < 1.

Compare the tail weight of these distributions.

Problem 9.11
The distribution of X has the survival function
θxγ
SX (x) = 1 − , θ, γ > 0.
1 + θxγ
and 0 otherwise. The distribution of Y has pdf
x
xγ−1 e− θ
SY (x) = γ .
θ Γ(γ)

and 0 otherwise.

Compare the tail behavior of these distributions.

Problem 9.12
Using the criterion of existence of moments, complete the following. Refer
to Table C.
Distribution Heavy-Tail Light-Tail
Weibull
Inverse Pareto
Normal
Loglogistic

Problem 9.13
Using the criterion of existence of moments, complete the following. Refer
to Table C.
9 TAIL WEIGHT MEASURES: MOMENTS AND THE SPEED OF DECAY OF S(X)93

Distribution Heavy-Tail Light-Tail

Paralogistic
Lognormal
Inverse Gamma
Inverse Gaussian

Problem 9.14
Using the criterion of existence of moments, complete the following. Refer
to Table C.

Distribution Heavy-Tail Light-Tail

Inverse Paralogistic
Inverse Exponential

Problem 9.15
Show that the Loglogistic distribution has a heavier tail than the Gamma
distribution.

Problem 9.16
Show that the Paraloglogistic distribution has a heavier tail than the Log-
normal distribution.

Problem 9.17
Show that the inverse exponential distribution has a heavier tail than the
exponential distribution.

Problem 9.18
SX (x)
Let X and Y have similar (proportional) right tails and limx→∞ SY (x) = c.
Which of the following is a possible value of c?

(i) c = ∞ (ii) c = 0 (c) c > 0.

Problem 9.19
Let X be a Pareto distribution with parameters α = 4 and θ = 340. Let Y
be a Pareto distribution with parameters α = 6 and θ = 340.

Which of these has a heavier right tail relative to the other?

Problem 9.20
You are given the right-tails of the survival functions of three distributions
94 TAIL WEIGHT OF A DISTRIBUTION

X, Y and Z. Order these distributions according to tail weight.

10 TAIL WEIGHT MEASURES: HAZARD RATE FUNCTION AND MEAN EXCESS LOSS FUNCTION95

10 Tail Weight Measures: Hazard Rate Function

and Mean Excess Loss Function
In this section we classify the tail weight of a distribution based on the haz-
ard rate function and the mean excess loss function.

Classification Based on the Hazard Rate Function

Another way to classify the tail weight of a distribution is by using the
hazard rate function:
f (x) F 0 (x) d S 0 (x)
h(x) = = = − [ln S(x)] = − .
S(x) 1 − F (x) dx S(x)
By the existence of moments, the Pareto distribution is considered heavy-
tailed. Its hazard rate function is
f (x) α
h(x) = = .
S(x) x+θ
Note that h0 (x) = − (x+θ)
α
2 < 0 so that h(x) is nonincreasing. Thus, it
makes sense to say that a distribution is considered to be heavy-tailed if the
hazard rate function is nonincreasing. Likewise, the random variable X with
pdf f (x) = xe−x for x > 0 and 0 otherwise has a light-tailed distribution
according to the existence of moments (See Problem 9.1). Its hazrad function
x
is h(x) = x+1 which is a nondecreasing function. Hence, a nondecreasing
hazard rate function is an indication of a light-tailed distribution.

Example 10.1
Let X be a random variable with survival function f (x) = x12 if x ≥ 1 and
0 otherwise. Based on the hazard rate function of the distribution, decide
whether the distribution is heavy-tailed or light-tailed.

Solution.
The hazard rate function is
S 0 (x) − 23 2
h(x) = − = − 1x = .
S(x) x2
x

Hence, for x ≥ 1,
2
h0 (x) = −<0
x2
which shows that h(x) is nonincreasing. We conclude that the distribution
of X is heavy-tailed
96 TAIL WEIGHT OF A DISTRIBUTION

Remark 10.1
Under this definition, a constant hazard function can be called both non-
increasing and nondecreasing. We will refer to distributions with constant
hazard function as medium-tailed distribution. Thus, the exponential
random variable which was classified as light-tailed in Example 9.1, will be
referred to as a medium-tailed distribution.

The next result provides a criterion for testing tail weight based on the
probability density function.

Theorem 10.1
If for a fixed y ≥ 0, the function f (x+y)
f (x) is nonincreasing (resp. nondecreas-
ing) in x then the hazard rate function is nondecreasing (resp. nonincreas-
ing).

Proof.
We have
R∞ R∞ ∞
f (t)dt f (x + y)dy
Z
−1 x 0 f (x + y)
[h(x)] = = = dy.
f (x) f (x) 0 f (x)
f (x+y)
Thus, if f (x) is nondecreasing in x for a fixed y, then h(x) is a nonincreas-
f (x+y)
ing in x. Likewise, if f (x) is nonincreasing in x for a fixed y, then h(x) is
a nondecreasing in x

Example 10.2
Using the above theorem, show that the Gamma distribution with parame-
ters θ > 0 and 0 < α < 1 is heavy-tailed.

Solution.
We have
f (x + y) y α−1 − y
= 1+ e θ
f (x) x
and
d f (x + y) y(1 − α) y α−2 − y
= 1 + e θ >0
dx f (x) x2 x
for 0 < α < 1. Thus, the hazard rate function is nonincreasing and the
distribution is heavy-tailed

Next, the hazard rate function can be used to compare the tail weight of two
10 TAIL WEIGHT MEASURES: HAZARD RATE FUNCTION AND MEAN EXCESS LOSS FUNCTION97

distributions. For example, if X and Y are two distributions with increasing

(resp. decreasing) hazard rate functions, the distribution of X has a lighter
(resp. heavier) tail than the distribution of Y if hX (x) is increasing (resp.
decreasing) at a faster rate than hY (x) for a large value of the argument.

Example 10.3
Let X be the Pareto distribution with α = 2 and θ = 150 and Y be the
Pareto distribution with α = 3 and θ = 150. Compare the tail weight of
these distributions using
(a) the relative tail weight measure;
(b) the hazard rate measure.
Compare your results in (a) and (b).

Solution.
(a) Note that both distributions are heavy-tailed using the hazard rate anal-
ysis. However, h0Y (x) = − (x+150)
2 0 1
2 < hX (x) = − (x+150)2 so that hY (x) de-

creases at a faster rate than hX (x). Thus, X has a lighter tail than X.
(b) Using the relative tail weight, we find

fX (x) 2(150)2 (x + 150)4

lim = lim · = ∞.
x→∞ fY (x) x→∞ (x + 150)2 3(150)4
Hence, X has a heavier tail than Y which is different from the result in
(a)!

Remark 10.2
Note that the Gamma distribution is light-tailed for all α > 0 and θ > 0
by the existence of moments analysis. However, the Gamma distribution is
heavy-tailed for 0 < α < 1 by the hazard rate analysis. Thus, the concept
of light/heavy right tailed is somewhat vague in this case.

Classification Based on the Mean Excess Loss Function

A fourth measure of tail weight is the mean excess loss function as introduced
in Section 5. For a loss random variable X, the expected amount by which
loss exceeds x, given that it does exceed x is
E(X) − E(X ∧ x)
e(x) = eX (x) = E[X − x|X > x] = .
1 − F (x)

In the context of life contingency models (See [3]), if X is the random vari-
able representing the age at death and if T (x) is the continuous random
98 TAIL WEIGHT OF A DISTRIBUTION

variable representing time until death of someone now alive at age x then
e(x) is denoted by e̊( x) = E[T (x)] = E[X − x|X > x]. In words, for a
newborm alive at age x, e̊( x) is the average additional number of years until
death from age x, given that an individual has survived to age x. We call
e̊( x) the complete expectation of life or the residual mean lifetime.

Viewed as a function of x, an increasing mean excess loss function is an

indication of a heavy-tailed distribution. On the other hand, a decreasing
mean excess loss function indicates a light-tailed distribution.

Next, we establish a relationship between e(x) and the hazard rate func-
tion. We have
R∞ Rx
E(X) − E(X ∧ x) 0 SX (y)dy − 0 SX (y)dy
e(x) = =
1 − F (x) SX (x)
R∞ Z ∞
x SX (y)dy SX (x + y)
= = dy.
SX (x) 0 SX (x)

But one of the characteristics of the hazard rate function is that it can
generate the survival function:
Rx
SX (x) = e− 0 h(t)dt
.

Hence, we can write

R x+y
∞ ∞
e− h(u)du
Z Z
0 R x+y
e(x) = −
Rx
h(u)du
dy = e− x h(u)du
dy, y > 0.
0 e 0 0

From the above discussion, we see that for a fixed y > 0, if SX (x+y)
SX (x) is an
increasing function of x (and therefore e(x) is increasing) then the hazard
rate function is decreasing and consequently the distribution is heavy-tailed.
Likewise, if the SX (x+y)
SX (x) is a decreasing function of x (and therefore e(x) is
decreasing) then the hazard rate function is increasing and consequently the
distribution is light-tailed.

Example 10.4
2
Let X be a random variable with pdf f (x) = 2xe−x for x > 0 and 0
otherwise. Show that the distribution is light-tailed by showing SX (x+y)
SX (x) is
a decreasing function of x.
10 TAIL WEIGHT MEASURES: HAZARD RATE FUNCTION AND MEAN EXCESS LOSS FUNCTION99

Solution. R∞ 2 2
We have SX (x) = x 2te−t dt = e−x . Thus, for a fixed y > 0, we have

SX (x + y)
= e−2xy − y 2
SX (x)

whose derivative with respect to x is

d SX (x + y) 2
= −2ye−2xy−y < 0.
dx SX (x)
SX (x+y)
That is, SX (x) is a decreasing function of x
100 TAIL WEIGHT OF A DISTRIBUTION

Practice Problems
Problem 10.1
Show that the Gamma distribution with parameters θ > 0 and α > 1 is
light-tailed by showing that f (x+y)
f (x) is nonincreasing.

Problem 10.2
Show that the Gamma distribution with parameters θ > 0 and α = 1 is
medium-tailed.

Problem 10.3
Let X be the Weibull distribution with probability density function f (x) =
τ
− x
τ xτ −1 e ( θ )
θτ .

Using hazard rate analysis, show that the distribution is heavy-tailed for
0 < τ < 1 and light-tailed for τ > 1.

Problem 10.4
2
Let X be a random variable with pdf f (x) = 2xe−x for x > 0 and 0 other-
wise.

Determine the tail weight of this distributions using Theorem 10.1.

Problem 10.5
Using Theorem 10.1, show that the Pareto distribution is heavy-tailed.

Problem 10.6
Show that the hazard rate function of the Gamma distribution approaches
1
θ as x → ∞.

Problem 10.7
1
Show that limx→∞ e(x) = limx→∞ h(x) .

Problem 10.8
Find limx→∞ e(x) where X is the Gamma distribution.

Problem 10.9
Let X be the Gamma distribution with 0 < α < 1 and θ > 0. Show that
e(x) increases from αθ to θ.
10 TAIL WEIGHT MEASURES: HAZARD RATE FUNCTION AND MEAN EXCESS LOSS FUNCTION10

Problem 10.10
Let X be the Gamma distribution with α > 1 and θ > 0. Show that e(x)
decreases from αθ to θ.

Problem 10.11
Find limx→∞ e(x) where X is the Pareto distribution with parameters α and
θ and conclude that the distribution is heavy-tailed.

Problem 10.12
1
Let X be a random variable with pdf f (x) = (1+x)2
for x > 0 and 0 other-
wise.

Find an expression of limx→∞ e(x).

Problem 10.13
Let X be a random variable with mean excess loss function e(x) = x + 1.

(a) Find S(x), f (x) and h(x).

(b) Determine the tail behavior of X by using the moment criterion for tail
weight.

Problem 10.14
Let X be a random variable with mean excess loss function e(x) = x + 1.
Determine the tail behavior of X by using the hazard rate analysis.

Problem 10.15
Let X be a random variable with mean excess loss function e(x) = x + 1.
Determine the tail behavior of X by using the mean excess loss function
analysis.

Problem 10.16
x 2
Let X be a random variable with cdf S(x) = e−( θ ) . Determine the tail
behavior of X by using the mean excess loss function analysis.

Problem 10.17
Let X be the single-Pareto distribution with pdf
αθα
f (x) = .
xα+1
Use Theorem 10.1, to show that X is heavy-tailed.
102 TAIL WEIGHT OF A DISTRIBUTION

11 Equilibrium Distributions and Tail Weight

In this section, we shed more insight into the mean residual lifetime. Let X
be a random variable such that
R ∞ S(0) = 1. Using Example 5.4 with d = 0,
we can write e(0) = E(X) = 0 S(x)dx. We define the random variable Xe
with probability density function
S(x)
fe (x) = , x≥0
E(X)
and 0 otherwise. We call the distribution of Xe , the equilibrium distri-
bution or integrated tail distribution.

The corresponding survival function is

Z ∞ Z ∞
S(t)
Se (x) = fe (t)dt = dt, x ≥ 0.
x x E(X)
The corresponding hazard rate function is
fe (x) S(x) 1
he (x) = = R∞ = .
Se (x) x S(t)dt
e(x)

Thus, if he (x) is increasing then the distribution Xe (and thus X) is ligh-

tailed. If he (x) is decreasing then the distribution Xe (or X) is heavy-tailed.

Example 11.1
Show that the equilibrium mean is given by
E(X 2 )
E(Xe ) = .
2E(X)
Solution.
Using integration by parts , we find
Z ∞
2
E(X ) = x2 f (x)dx
0
∞
Z ∞
2
= −x S(x)0 + 2 xS(x)dx
0
Z ∞
=2 xS(x)dx
0

since Z ∞ Z ∞
2 2
0 ≤ x S(x) = x f (t)dt ≤ t2 f (t)dt
x x
11 EQUILIBRIUM DISTRIBUTIONS AND TAIL WEIGHT 103

which implies
lim x2 S(x) = 0.
x→∞

Now,
∞ ∞
E(X 2 )
Z Z
1
E(Xe ) = xfe (x)dx = xS(x)dx =
0 E(X) 0 2E(X)

Example 11.2
Show that h i
e(0) − R0x 1
dt
S(x) = e e(t)
.
e(x)

Solution.
Using e(0) = E(X), We have

S(x) =E(X)fe (x) = e(0)fe (x) = e(0)Se (x)he (x)

R xh 1
i
− dt
=e(0)he (x)e 0 e(t)
h i
e(0) − R0x e(t)
1
dt
= e
e(x)

Example 11.3
Show that
e(x) Se (x)
= .
e(0) S(x)

Solution. R∞ R∞
Since Se (x) = 1 R S(x)
E(X) x S(t)dt, we have x S(t)dt = e(0)Se (x). Since x∞ S(t)dt =
1
R∞
e(x) ,
we obtain x S(t)dt = e(x)S(x). Thus, e(x)S(x) = e(0)Se (x) or equiv-
alently
e(x) Se (x)
=
e(0) S(x)
If the mean residual life function is increasing ( implied if the hazard rate
function of X is decreasing by Section 10) then e(x) ≥ e(0) and

Se (x) ≥ S(x).

Integrating both sides of this inequality, we find

Z ∞ Z ∞
Se (x)dx ≥ S(x)dx
0 0
104 TAIL WEIGHT OF A DISTRIBUTION

which implies
E(X 2 )
≥ E(X)
2E(X)
and this can be rewritten as

E(X 2 ) − [E(X)]2 ≥ [E(X)]2

which gives
Var(X) ≥ [E(X)]2 .
Also,
Var(X)
[CV (x)]2 = ≥ 1.
[E(X)]2
Example 11.4
2
Let X be the random variable with pdf f (x) = (1+x) 3 for x ≥ 0 and 0

otherwise.
(a) Determine the survival function S(x).
(b) Determine the hazard rate function h(x).
(c) Determine E(X).
(d) Determine the pdf of the equilibrium distribution.
(e) Determine the survival function Se (x) of the equilibrium distribution.
(f) Determine the hazard function of the equilibrium distribution.
(g) Determine the mean residual lifetime of X.

Solution.
(a) The survival function is
Z ∞ ∞
2dt 1 1
S(x) = 3
=− 2
= .
x (1 + t) (1 + t) x (1 + x)2

(b) The hazard rate function is

f (x) 2
h(x) = = .
S(x) 1+x
(c) We have
Z ∞ Z ∞
2x
E(X) = xf (x)dx = dx = 1.
0 0 (1 + x)3
(d) We have
S(x) 1
fe (x) = =
E(X) (1 + x)2
11 EQUILIBRIUM DISTRIBUTIONS AND TAIL WEIGHT 105

for x > 0 and 0 otherwise.

(e) We have

1 ∞
Z ∞
dt 1
Se (x) = 2
=− = .
x (1 + t) 1+t x
1+x

(f) We have
fe (x) 1
he (x) = = .
Se (x) 1+x
(g) We have
1
e(x) = = x + 1, x ≥ 0
he (x)
106 TAIL WEIGHT OF A DISTRIBUTION

Practice Problems
Problem 11.1
2
Let X be the random variable √with pdf f (x) = 2xe−x for x > 0 and 0
R ∞ −x2
otherwise. Recall 0 e dx = 2π .
(a) Determine the survival function S(x).
(b) Determine the equilibrium distribution fe (x).

Problem 11.2
Let X be a random variable with pdf f (x) = 31 (1 + 2x)e−x for x > 0
and 0 otherwise. Determine the hazard rate function of the equilibrium
distribution. Hint: Example 5.3.

Problem 11.3
Let X be a random variable with mean excees loss function
1
e(x) = , x > 0.
1+x
Determine the survival funtion of the distribution X.

Problem 11.4
Let X be a random variable with mean excees loss function
1
e(x) = , x > 0.
1+x
Determine the survival function of the equilibrium distribution.

Problem 11.5
Let X be a random variable with mean excees loss function
1
e(x) = , x > 0.
1+x
Determine the mean of the equilibrium distribution.

Problem 11.6
3 2
Let X be a random variable with pdf f (x) = 8x for 0 < x < 2 and 0
otherwise.
(a) Find E(X) and E(X 2 ).
(b) Find the equilibrium mean.
11 EQUILIBRIUM DISTRIBUTIONS AND TAIL WEIGHT 107

Problem 11.7
Let X be a loss random variable with mean excess loss function

e(x) = 10 + 9x, x > 0.

Determine the survival function S(x).

Problem 11.8
A random variable X has an exponential distribution with parameter λ.
Calculate the equilibrium mean.
108 TAIL WEIGHT OF A DISTRIBUTION
Risk Measures

Most insurance models are stochastic or probabilistic models, i.e. involve

future uncertainty. As such, insurance companies are exposed to risks. Actu-
aries and risk managers job is to try to find the degree at which the insurance
companies are subject to a particular aspects of risk. In this chapter, we pro-
vide a definition of risk measure and discuss a couple of ways of measuring
risk.

109
110 RISK MEASURES

12 Coherent Risk Measurement

In financial terms, a risk measure is the necessary capital to be put on
reserve to support future risks associated, say to a portfolio. Risk man-
agement is about understanding and managing the potential losses in the
total portfolio. One of the key tasks of risk management is to quantify the
risk of the uncertainty in the future value of a portfolio. This quantification
is usually achieved by modeling the uncertain payoff as a random variable,
to which then a certain functional is applied. Such functionals are usually
called risk measures. This functional gives a single value that is intended
to provide a magnitude of the level of risk exposure.

Mathematically, we assign a random variable, defined on an appropriate

probability space, to each portfolio loss over some fixed time interval. Let
L be the collection of all such random variables. We will assume that L is a
convex cone so that if L1 and L2 are members of L then L1 + L2 and cL1
belong to L as well, where c is a constant. We define the coherent risk
measure to be the functional ρ : L −→[0, ∞) that satisfies the following
properties:

(P1) Subadditivity: For any L1 , L2 ∈ L, we have ρ(L1 + L2 ) ≤ ρ(L1 ) + ρ(L2 ).

This property says that the risk of two positions cannot get any worse than
adding the two risks separately. This property reflects the idea that pooling
risks helps to diversify a portfolio.

(P2) Monotonicity: If L1 ≤ L2 then ρ(L1 ) ≤ ρ(L2 ).

From an economic viewpoint, this property is obvious−positions leading al-
ways to higher losses require more risk capital.

(P3) Positive homogeneity: ρ(αL) = αρ(L), α > 0.

This property reflects the fact that there are no diversification benefits when
we hold multiples of the same portfolio, L.

(P4) Translation invariance: For any real number α, ρ(L + α) = ρ(L) + α.

That property states that by adding or subtracting a deterministic quantity
α to a position leading to the loss L we alter our capital requirements by
exactly that amount.

Remark 12.1
If L is a loss random variable then ρ(L) may be interpreted as the riskiness
12 COHERENT RISK MEASUREMENT 111

of a portfolio or the amount of capital that should be added to a portfolio

with a loss given by L, so that the portfolio can then be deemed acceptable
from a risk point of view. Coherent risk measures are important when the
risk comes from separate risk-taking departments within the same company.

Example 12.1
Show that the expectation function E(·) is a coherent risk measure on L.

Solution.
The expectation function E(·) satisfies the following properties:

(P1) E(L1 + L2 ) = E(L1 ) + E(L2 ).

(P2) If L1 ≤ L2 then E(L1 ) ≤ E(L2 ).
(P3) E(αL) = αE(L), α > 0.
(P4) E(L + α) = E(L) + α

Example 12.2
Show that the variance is not a cohorent risk measure.

Solution.
Since Var(L + a) = Var(L) 6= Var(L) + a, the variance of a distribution is
not a cohorent risk measure

Example 12.3
Show that ρ(L) = E(L) + βVar(L), where β > 0, satisfies the property of
translation invariant but not positive homogeneity. We refer to this risk
measure as the variance premium principle

Solution.
We have

ρ(L + α) =E(L + α) + βVar(L + α)

=E(L) + α + βVar(L)
=ρ(L) + α
ρ(αL) =E(αL) + βVar(αL)
=αE(L) + α2 βVar(L)
6=αρ(L)

where α > 0
112 RISK MEASURES

Example 12.4
Show that ρ(L) = α1 ln [E(eαL )], where α, t > 0, satisfies the properties of
translation invariant and monotonicity. We refer to this risk measure as the
exponential premium principle

Solution.
We have
1
ρ(L + β) = ln [E(eα(L+β) )]
α
1
= ln [E(eαL eαβ )]
α
1
= ln [eαβ E(eαL )]
α
1
= [αβ + ln [E(eαL )]
α
=ρ(L) + β.

Next, suppose that L1 ≤ L2 . We have

eαL1 ≤eαL2
E(eαL1 ) ≤E(eαL2 )
ln [E(eαL1 )] ≤ ln [E(eαL2 )]
ρ(L1 ) ≤ρ(L2 )
12 COHERENT RISK MEASUREMENT 113

Practice Problems
Problem 12.1
Show that ρ(0) = 0 and interpret this result.

Problem 12.2
Show that ρ(αL + β) = αρ(L) + β, where α > 0 and β ∈ R.

Problem 12.3
Show that ρ(L) = (1 + α)E(L) is a coherent risk measure, where α ≥ 0.
This risk measure is known as the expected value premium principle.

Problem 12.4
Which of the following is an implication of the subadditivity requirement
for a coherent risk measure?

(a) If the subadditivity requirement is met, then a merger of positions cre-

ates extra risk.
(b) If the subadditivity requirement is met, then a merger of positions does
not create extra risk.
(c) If the subadditivity requirement is met, then a merger of positions does
not affect risk.

Problem 12.5
Which of the following is an implication of the monotonicity requirement
for a coherent risk measure?

(a) Increasing the value of a portfolio increases risk.

(b) Increasing the value of a portfolio reduces risk.
(c) Increasing the value of a portfolio does not affect risk.

Problem 12.6
Which of the following is an implication of the positive homogeneity require-
ment for a coherent risk measure? More than one answer may be correct.

(a) If one assumes twice the amount of risk formerly assumed, one will
need twice the capital.
(b) As the size of a position doubles, the risk stays unchanged.
(c) The risk of the position increases in a linear way with the size of the
position.
114 RISK MEASURES

Problem 12.7
Which of the following is an implication of the translation invariant require-
ment for a coherent risk measure? More than one answer may be correct.

(a) Adding a fixed amount to the initial financial position should increase
the risk by the same amount.
(b) Subtracting a fixed amount to a portfolio decreases the required risk
capital by the same amount.
(c) Getting additional capital, if it is from a risk-free source, cannot funda-
mentally alter the riskiness of a position.

Problem 12.8
Show that ρ(L) = E(L) + αE[L − E(L)] satisfies (P1), (P3), and (P4).

Problem 12.9
Find the numerical value of ρ(L − ρ(L)).

Problem 12.10 p
Show that ρ(L) = E(L) + Var(L) satisfies the properties of translation
invariant and positive homogeneity. We refer to this risk measure as the
standard deviation principle.
13 VALUE-AT-RISK 115

13 Value-at-Risk
A standard risk measure used to evaluate exposure to risk is the value-at-
risk, abbreviated VaR. In general terms, the value-at-risk measures the
potential loss of value of an asset or a portfolio over a defined time with a
high level of certainty. For example, if the VaR is $1 million at one-month,
99% confidence level, then there is 1% chance that under normal market
movements the monthly loss will exceed $1 million. Bankers use VaR to
capture the potenetial losses in their traded portfolios from adverse market
movements over a period of time; then they use it to compare with their
available capital and cash reserves to ensure that the losses can be covered
withoud putting the firm at risk.

Mathematically, the value-at-risk is found as follows: Let L be a loss random

variable. The Value-at-risk at the 100p% level, denoted by VaRp (L) or πp ,
is the 100pth percentile or the p quantile of the distribution L. That is, πp
is the solution of FL (πp ) = p or equivalently, SL (πp ) = 1 − p.

Example 13.1
Let L be an exponential loss random variable with mean λ > 0. Find πp .

Solution.
x
The pdf of L is f (x) = λ1 e− λ for x > 0 and 0 otherwise. Thus, F (x) =
x πp
1 − e− λ . Now, solving the equation F (πp ) = p, that is, 1 − e− λ = p, we
obtain πp = −λ ln (1 − p)

Example 13.2
The loss random variable L has a Pareto distribution with parameters α
and θ. Find πp .

Solution.
αθα
The pdf of L is f (x) = (x+θ) α+1 for x > 0 and 0 otherwise. The cdf is
α
θ
F (x) = 1 − x+θ . Solving the equation F (πp ) = p, we find

1
πp = θ[(1 − p)− α − 1]

Example 13.3
The loss random variable L follows a normal distribution with mean µ and
standard deviation σ. Find πp .
116 RISK MEASURES

Solution.
Let Z = L−µσ . Then Z is the standard normal distribution. The p− quantile
of Z satisfies the equation Φ(z) = p. Thus, z = Φ−1 (p). Hence,

πp = µ + σz = µ + σΦ−1 (p)

Example 13.4
Consider a sample of size 8 in which the observed data points were 3,5,6,6,6,7,7,
and 10. Find VaR0.90 (L) for this empirical distribution.

Solution.
The pmf of L is given below.

x 3 5 6 7 10
1 1 3 2 1
p(x) 8 8 8 8 8

We want to find π0.90 such that

Pr(L < π0.90 ) < 0.90 ≤ Pr(L ≤ 0.90).

Thus, π0.90 = 10

Remark 13.1
According to [1], VaRp (L) is monotone, positive homogeneous, and trans-
lation invariant but not subadditive. Thus, VaRp (L) is not a coherent risk
measure.
13 VALUE-AT-RISK 117

Practice Problems
Problem 13.1
The loss random variable L has a uniform distribution in [a, b]. Find VaRp (L).

Problem 13.2
The cdf of a loss random variable L is given by
x2
FL (x) = 4 , 0<x≤2
1, x > 2.

Find π0.90 .

Problem 13.3
You are given the following empirical distribution

3, 5, 6, 6, 6, 7, 7, 10.

The risk measure under the standard deviation principle is ρ(L) = E(L) +
ασ(L). Determine the value of α so that ρ(L) = π0.90 .

Problem 13.4
Losses represented by L are distributed as a Pareto distribution with pa-
rameters α = 2 and θ = 60. Find VaR0.75 (L).

Problem 13.5
Losses represented by L are distributed as a single Pareto distribution with
α
a pdf f (x) = xαθ
α+1 , x > θ and 0 otherwise. Find πp .

Problem 13.6
A loss random variable X has a survival function
2
θ
S(x) = , x > 0.
x+θ
Find θ given that π0.75 = 40.

Problem 13.7
Let L be a random variable with discrete loss distribution given by
x 0 100 1000 10000 100000
p(x) 0.65 0.20 0.07 0.05 0.03
Calculate the Value-at-Risk of L at the 90% level.
118 RISK MEASURES

Problem 13.8
A loss random variable L has a two-parameter Pareto distribution satisfying:

VaR0.90 (L) = 216.23 and VaR0.99 (L) = 900.

Calculate VaR0.95 (L).

Problem 13.9
Let L be a loss random variable with probability generating function

PL (x) = 0.4x2 + 0.2x3 + 0.2x5 + 0.2x8 .

Determine VaR0.80 (L).

Problem 13.10
A loss random variable L has a survival function
2
100
S(x) = , x > 0.
x + 100

Calculate VaR0.96 and interpret this result.

14 TAIL-VALUE-AT-RISK 119

14 Tail-Value-at-Risk
The quantile risk meaure discussed in the previous section provides us only
with the probability that a loss random variable L will exceed the VaRp (L)
for a certain confidence level. It does not provide any information about
how large the losses are beyond a particular percentile. The Tail-Value-
at-Risk (TVaR) measure does consider losses above a percentile. Other
names used for TVaR are Tail Conditional Expectation and Expected
Shortfall.

The Tail-Value-at-Risk of a random variable L at the 100p% security level

is defined as

T V aRp (L) = E[L|L > V aRp (L)] = E[L|L > FL−1 (p)]

where FL (x) is the distribution of L. This is the expected value of the loss,
conditional on the loss exceeding πp . Note that TVaR is also the expected
cost per payment with a franchise deductible7 of πp .

We can write TVaR as

R∞ R∞
πp xf (x)dx πp xf (x)dx
T V aRp (L) = E[L|L > V aRp (L)] = = .
1 − F (πp ) 1−p

Now, using the substitution u = F (x), we can write

R1 R1
p xf (x)dx p VaRu (L)du
TVaRp (L) = = .
1−p 1−p
TVaR can also be written as
Z ∞ Z ∞
πp 1
TVaRp (L) = f (x)dx+ (x−πp )f (x)dx = πp +E[X−πp |X > πp ].
1 − p πp 1 − p πp

Since
E(L) − E(L ∧ πp )
E[X − πp |X > πp ] = ,
1−p
we can write
E(L) − E(L ∧ πp )
TVaRp (L) = VaRp (L) + .
1−p
7
See Section 32.
120 RISK MEASURES

Remark 14.1
Unlike the VaR risk measure, TVaR risk measure is shown to be coherent.

Example 14.1
Find the Tail-Value-at-Risk of an exponential distribution with mean λ > 0.

Solution.
From Problem 5.7, we have e(πp ) = λ. This and Example 13.1 give

T V aRp (L) = λ − λ ln (1 − p)

Example 14.2
Find the Tail-Value-at-Risk of a Pareto distribution with parameters α > 1
and θ > 0.

Solution.
The survival function of the Pareto distribution is
α
θ
S(x) = , x > 0.
x+θ
Thus,
Z ∞
1
e(πp ) = S(x)dx
S(πp ) πp
Z ∞
=(πp + θ)α (x + θ)−α dx
πp
(πp + θ)α ∞
= (x + θ)1−α πp
α−1
πp + θ
= .
α−1
On the other hand, using Example 13.2, we have
1
πp = θ[(1 − p)− α − 1].

Hence,
πp + θ 1
TVaRp (L) = + θ[(1 − p)− α − 1]
α−1
Example 14.3
x2
Let Z be the standard normal distribution with pdf fZ (x) = √1 e− 2 . Find
2π
TVaRp (Z).
14 TAIL-VALUE-AT-RISK 121

Solution.
Notice first that fZ (x) satisfies the differential equation xfX (x) = −fZ0 (x).
Using the Fundamental Theorem of Calculus, we can write
Z ∞
1
TVaRp (Z) = xfZ (x)dx
1 − p Φ−1 (p)
Z ∞
1
=− f 0 (x)dx
1 − p Φ−1 (p) Z
∞
1
=− fZ (x)
1−p Φ−1 (p)
1
= fZ [Φ−1 (p)]
1−p

Example 14.4
Let L be a loss random variable having a normal distribution with mean µ
and standard deviation σ. Find TVaRp (L).

Solution.
Since TVaRp (L) is a coherent risk measure, it is positive homogeneous and
translation invariant. Thus, we have
σ
TVaRp (L) = TVaRp (µ + σZ) = µ + σTVaRp (Z) = µ + fZ [Φ−1 (p)]
1−p
122 RISK MEASURES

Practice Problems
Problem 14.1
Let L be a loss random variable with uniform distribution in (a, b).

(a) Find the mean residual life e(x).

(b) Find TVaRp (L).

Problem 14.2
The cdf of a loss random variable L is given by
x2
FL (x) = 4 , 0<x≤2
1, x > 2.

(a) Find π0.90 and e(π0.90 ).

(b) Find TVaR0.90 (L).

Problem 14.3
You are given the following empirical distribution

3, 5, 6, 6, 6, 7, 7, 10.

Find π0.85 and TVaR0.85 (L).

Problem 14.4
Losses are distributed as Pareto distributions with mean of 200 and variance
of 60000.

(a) Find the values of the parameters α and β.

(b) Find e(100).
(c) Find π0.95 .
(d) Find TRaV0.95 (L).

Problem 14.5
Losses represented by the random variable L are uniformly distributed from
0 to the maximum loss. You are given that Var(L) = 62, 208.

Find TVaR0.75 (L).

Problem 14.6
Losses represented by the random variable L are uniformly distributed in
(0, 864). Determine β so that the standard deviation principle is equal to
TVaR0.75 (L).
14 TAIL-VALUE-AT-RISK 123

Problem 14.7
Diabetes claims follow an exponential distribution with parameter λ = 2.
Find TVaR0.90 (L).

Problem 14.8
You are given the following empirical distribution

3, 5, 6, 6, 6, 7, 7, 10.

Let β1 be the value of β in the standard deviation principle such that

µ + β1 σ = VaR0.85 . Let β2 be the value of β in the standard deviation
principle such that µ + β2 σ = TVaR0.85 (L).

Calculate β2 − β1 .

Problem 14.9
Let L1 be a Pareto random variable with parameters α = 2 and θ = 100.
Let L2 be a random variable with uniform distribution on (0, 864). Find p
such that
TVaR0.99 (L1 ) TVaRp (L2 )
= .
VaR0.99 (L1 ) VaRp (L2 )

Problem 14.10
Let L be a random variable with discrete loss distribution given by

x 0 100 1000 10000 100000

p(x) 0.65 0.20 0.07 0.05 0.03

Calculate the Tail-Value-at-Risk of L and the 90% level.

Problem 14.11
Find TVaR0.95 (L) when L has a normal distribution with mean of 100 and
standard deviation of 10.
124 RISK MEASURES
Characteristics of Actuarial
Models

In the previous chapter, a characteristic of an actuarial model is the tail

weight of the corresponding distribution. In this chapter we look at other
factors that characterize a model from another model. One such a factor is
the number of parameters needed in the determination of the model’s distri-
bution. More parameters involved in a model means that more information
is required and in this case the model is categorized as a complex model. We
start this chapter by discussing first simple models and then move toward
more complex models.

125
126 CHARACTERISTICS OF ACTUARIAL MODELS

15 Parametric and Scale Distributions

These are considered the simplest families of actuarial models. We will
consider a model that requires less parameters than another model as less
complex.

A parametric distribution is one that is completely determined by a set

of quantities called parameters. Examples of commonly used parametric
distributions are listed below.
Name PDF Parameters
Exponential f (x) = θe−θx θ>0
αθα
Pareto f (x) = (x+θ)α+1 α > 0, θ > 0
(x−µ)2
Normal f (x) = √ 1 e− 2σ 2 µ, σ
2πσ
e−λ λx
Poisson p(x) = x! λ>0
Additional parametric distributions can be found in the Tables of Exam C.

Now, when multiplying a random variable by a positive constant and the

resulting distribution belongs to the same family of distributions of the orig-
inal random variable then we call the distribution scale distribution.

Example 15.1
Show that the Pareto distribution is a scale distribution.

Solution.
The cdf of the Pareto distribution is
α
θ
FX (x) = 1 − .
x+θ
Let Y = cX. Then
y
FY (y) =Pr(Y ≤ y) = Pr X ≤
α c
cθ
=1 − .
y + cθ
This is a Pareto distribution with parameters α and cθ

Example 15.2
Show that the Weibull distribution with parameters θ and τ is a scale dis-
tribution.
15 PARAMETRIC AND SCALE DISTRIBUTIONS 127

Solution.
The cdf of the Weibull distribution is
x τ
FX (x) = 1 − e−( θ ) .

Let Y = cX. Then

y
FY (y) =Pr(Y ≤ y) = Pr X ≤
c
x τ
−( cθ )
=1 − e .

This is a Weibull distribution with parameters cθ and τ

A parameter θ in a scale distribution X is called a scale parameter if

cθ is a parameter of cX and θ is the only changed parameter.

Example 15.3
Show that the parameter θ in the Pareto distribution is a scale parameter.

Solution.
This follos from Example 15.1

Example 15.4
Find the scale parameter of the Weibull distribution with parameters θ and
τ.

Solution.
According to Example 15.2, the scale parameter is θ

Example 15.5
The amount of money in dollars that Clark received in 2010 from his invest-
ment in Simplicity futures follows a Pareto distribution with parameters
α = 3 and θ. Annual inflation in the United States from 2010 to 2011 is
i%. The 80th percentile of the earning size in 2010 equals the mean earning
size in 2011. If Clark’s investment income keeps up with inflation but is
otherwise unaffected, determine i.

Solution.
Let X be the earning size in 2010 and Y that in 2011. Then Y is a Pareto
distribution with parameters α = 3 and (1 + i)θ. We are told that

(1 + i)θ (1 + i)θ
π0.80 = E(Y ) = (1 + i)E(X) = = .
3−1 2
128 CHARACTERISTICS OF ACTUARIAL MODELS

Thus,
!3
(1 + i)θ (1 + i)θ θ
0.8 = Pr X < = FX =1− (1+i)θ
.
2 2 +θ
2

Solving the above equation for i, we find i = 0.42

By assigning all possible numerical values to the parameters of a partic-

ular parametric distribution we obtain a family of distributions that we call
a parametric distribution family.

Example 15.6
Show that exponential distributions belong to the Weibull distribution fam-
ily with parameters θ and τ.

Solution. τ
τ − x ( )
τ ( xθ ) eθ
Weibull distributions with parameters θ and τ have pdf fX (x) = x .
x
e− θ
Letting τ = 1, the pdf reduces to fX (x) = θ which is the pdf of an expo-
nential distribution
15 PARAMETRIC AND SCALE DISTRIBUTIONS 129

Practice Problems
Problem 15.1
Show that the exponential distribution is a scale distribution.
Problem 15.2
2
Let X be a random variable with pdf fX (x) = 2xe−x for x > 0 and 0
otherwise. Let Y = cX for c > 0.

Find FY (y).
Problem 15.3
Let X be a uniform random variable on the interval (0, θ). Let Y = cX for
c > 0.

Find FY (y).
Problem 15.4
x −α
Show that the Fréchet distribution with cdf FX (x) = e−( θ ) and parame-
ters θ and α is a scale distribution.
Problem 15.5
Show that the three-parameter Burr distribution with cdf FX (x) = 1 −
1
γ α is a scale distribution.
[1+( xθ ) ]
Problem 15.6
Find the scale parameter of the following distributions:

(a) The exponential distribution with parameter θ.

(b) The uniform distribution on (0, θ).
(c) The Fréchet distribution with parameters θ and α.
(d) The Burr distribution with parameters α, θ, and γ.
Problem 15.7
Claim severities are modeled using a continuous distribution and inflation
impacts claims uniformly at an annual rate of r.

Which of the following are true statements regarding the distribution of

claim severities after the effect of inflation?

(1) An exponential distribution will have a scale parameter of (1 + r)θ.

(2) A Pareto distribution will have scale parameters (1 + r)α and (1 + r)θ
(3) A Burr distribution will have scale parameters α, (1 + r)θ, γ.
130 CHARACTERISTICS OF ACTUARIAL MODELS

Problem 15.8
Let X be the
lognormal
distribution with parameters µ and σ and cdf
FX (x) = Φ ln x−µ
σ .

Show that X has a scale distribution with no scale parameters.

Problem 15.9
Show that the Gamma distribution is a scale distribution. Is there a scale
parameter?

Problem 15.10
Earnings during 2012 follow a Gamma distribution with variance 2,500. For
the 2013, earnings are expected to be subject to P % inflation and the ex-
pected variance for the 2013 year is 10,000.

Determine the value of P.

Problem 15.11
The Gamma distribution with parameters α and θ has the pdf fX (x) =
x
xα−1 e− θ
Γ(α) .

Show that the exponential distributions belong to this family.

Problem 15.12
Hardy Auto Insurance claims X are represented by a Weibull distribution
with parameters α = 2 and θ = 400. It is found that the claim sizes are
inflated by 30% uniformly.

Calculate the probability that a claim will be at least 90 counting infla-

tion.
16 DISCRETE MIXTURE DISTRIBUTIONS 131

16 Discrete Mixture Distributions

In probability and statistics, a mixture distribution is the probability
distribution of a random variable whose values can be interpreted as being
derived from an underlying set of other random variables. For example, a
dental claim may be from a check-up, cleaning, filling cavity, a surgical pro-
cedure, etc.

A random variable X is a k-point mixture of the random variables X1 , · · · , Xk

if its cumulative distribution function (cdf) is given by

FX (x) = a1 FX1 (x) + a2 FX2 (x) + · · · + ak FXk (x)

where each mixing weight ai > 0 and a1 + a2 + · · · ak = 1. The mixture

X is defined in terms of its pdf or cdf and is not the sum of the random
variables a1 X1 , a2 X2 , · · · , an Xn .

The mixing weights are discrete probabilities. To see this, let Θ be the dis-
crete random variable with support {1, 2, · · · , k} and pmf Pr(Θ = i) = ai .
We can think of the distribution of Θ as a conditioning distribution where
X = Xi is conditioned on Θ = i, or equivalently, FX|Θ (x|Θ = i) = FXi (x).
In this case, X is the unconditional distribution with cdf
k
X
FX (x) = a1 FX1 (x)+a2 FX2 (x)+· · ·+ak FXk (x) = FX|Θ (x|Θ = i)Pr(Θ = i).
i=1

In actuarial and insurance terms, discrete mixtures arise in situations where

the risk class of a policyholder is uncertain, and the number of possible risk
classes is discrete.

The continuous mixture of distributions will be discussed in Section 20 of

this study guide.

Example 16.1
Let Y be a 2-point mixture of two random variables X1 and X2 with mixing
weights 0.6 and 0.4 respectively. The random variable X1 is a Pareto random
variable with parameters α = 3 and θ = 900. The random variable X2 is a
Pareto random variable with parameters α = 5 and θ = 1500. Find the pdf
of Y.
132 CHARACTERISTICS OF ACTUARIAL MODELS

Solution.
We are given
3(900)3 5(1500)5
fX1 = (x+900)4
and fX2 (x) = (x+1500)6
.
Thus,
3(900)3 5(1500)5

fY (x) = 0.6fX1 + 0.4fX2 = 0.6 + 0.4
(x + 900)4 (x + 1500)6
Example 16.2 ‡
The random variable N has a mixed distribution:
(i) With probability p, N has a binomial distribution with q = 0.5 and
m = 2.
(ii) With probability 1 − p, N has a binomial distribution with q = 0.5 and
m = 4.
Calculate Pr(N = 2).

Solution.
We have

pN1 (N1 = 2) =C(2, 2)(0.5)2 = 0.25

pN2 (N2 = 2) =C(4, 2)(0.5)2 (0.5)2 = 0.375
pN (N = 2) =ppN1 (N1 = 2) + (1 − p)pN2 (N1 = 2) = 0.375 − 0.125p

Example 16.3
Determine the mean and second moment of the two-point mixture distribu-
tion with the cdf
α α+2
θ1 θ2
FX (x) = 1 − α − (1 − α) .
x + θ1 x + θ2
Solution.
The first part is the distribution of a Pareto random variable X1 with pa-
rameters α1 = α and θ1 . The second part is the distribution of a Pareto
random variable X2 with parameters α2 = α + 2 and θ2 . Thus,
θ1 θ1
E(X1 ) = =
α1 − 1 α−1
θ2 θ2
E(X2 ) = =
α2 − 1 α+1
θ1 θ2
E(X) =α + (1 − α)
α α+1
16 DISCRETE MIXTURE DISTRIBUTIONS 133

θ12 2! 2θ12
E(X12 ) = =
(α1 − 1)(α1 − 2) (α − 1)(α − 2)
2
θ2 2! 2θ22
E(X22 ) = =
(α2 − 1)(α2 − 2) (α)(α + 1)
2 2θ22

2 2θ1
E(X ) =α + (1 − α)
(α − 1)(α − 2) (α)(α + 1)
Next, we consider mixtures where the number of random variables in the
mixture is unknown. A variable-component mixture distribution has
a distribution function that can be written as

FX (x) = a1 FX1 (x) + a2 FX2 (x) + · · · + aN FXN (x)

where each aj > 0 and N

P
i=1 ai = 1 and N ∈ N.

In a variable-component mixture distribution, each of the mixture weights

associated with each individual FXj (x) is a parameter. Also, there are
(N − 1) parameters corresponding to the weights a1 through aN −1 . The
weight aN is not itself a parameter, since the value of aN is determined by
the value of the constants a1 through aN −1 .

Example 16.4
Determine the distribution, density, and hazard rate functions for the vari-
able mixture of exponential distributions.

Solution.
The distribution function of the variable mixture is
− θx − θx − θx
FX (x) = 1 − a1 e 1 − a2 e 2 − · · · − aN e N

where aj > 0 and N

P
i=1 ai = 1.
The density function is
a1 − θx a2 − x aN − θx
fX (x) = e 1 + e θ2 + · · · + e N
θ1 θ2 θN
and the hazard rate function is
x x x
a1 − θ1 a2 − θ2 aN − θN
θ1 e + θ2 e + ··· + θN e
hX (x) = −x −x − x
.
a1 e θ1 + a2 e θ2 + ··· + aN e θN
Note that the parameters corresponding to N = 3 are (a1 , a2 , θ1 , θ2 , θ3 ) and
those corresponding to N = 5 are (a1 , a2 , a3 , a4 , θ1 , θ2 , θ3 , θ4 , θ5 )
134 CHARACTERISTICS OF ACTUARIAL MODELS

Example 16.5 ‡
You are given claim count data for which the sample mean is roughly equal
to the sample variance. Thus you would like to use a claim count model
that has its mean equal to its variance. An obvious choice is the Poisson
distribution.
Determine which of the following models may also be appropriate.
(A) A mixture of two binomial distributions with different means
(B) A mixture of two Poisson distributions with different means
(C) A mixture of two negative binomial distributions with different means
(D) None of (A), (B) or (C)
(E) All of (A), (B) and (C).
Solution.
Let X be a 2-point mixture of the random variables X1 and X2 with mixing
weights α and 1−α. Let Θ be the discrete random variable such that Pr(Θ =
1) = α and Pr(Θ = 2) = 1 − α. Thus, we have
E(X) =αE(X1 ) + (1 − α)E(X2 )
Var(X) =E(X 2 ) − E(X)2
=αE(X12 ) + (1 − α)E(X22 ) − [αE(X1 ) + (1 − α)E(X2 )]2
=αVar(X1 ) + (1 − α)Var(X2 ) + α(1 − α)[E(X1 ) − E(X2 )]2 .
If X1 and X2 are Poisson with means λ1 and λ2 respectively with λ1 6= λ2 ,
then
Var(X) =αλ1 + (1 − α)λ2 + α(1 − α)(λ1 − λ2 )2
>αλ1 + (1 − α)λ2 = E(X).
If X1 and X2 are negative binomial with parameters (r1 , β1 ) and (r2 , β2 )
respectively with r1 β1 6= r2 β2 , then
Var(X) =αr1 β1 (1 + β1 ) + (1 − α)r2 β2 (1 + β2 ) + α(1 − α)(r1 β1 − r2 β2 )2
>αr1 β1 + (1 − α)r2 β2 = E(X).
If X1 and X2 are binomial with parameters (m1 , q1 ) and (m2 , q2 ) respectively
with m1 q1 6= m2 q2 , then
Var(X) =αm1 q1 (1 − q1 ) + (1 − α)m2 q2 (1 − q2 ) + α(1 − α)(m1 q1 − m2 q2 )2
=E(X) + α(1 − α)(m1 q1 − m2 q2 )2 − αm1 q12 − (1 − α)m2 q22 .
The expression α(1 − α)(m1 q1 − m2 q2 )2 − αm1 q12 − (1 − α)m2 q22 can be
positive, negative, or zero. Thus, a mixture of two binomial distributions
with different means may result in the variance being equal to the mean
16 DISCRETE MIXTURE DISTRIBUTIONS 135

Example 16.6 ‡
Losses come from an equally weighted mixture of an exponential distribu-
tion with mean m1 , and an exponential distribution with mean m2 .
Determine the least upper bound for the coefficient of variation of this dis-
tribution.

Solution.
Let X be the random variable with pdf

1 1 − mx 1 − mx
f (x) = e 1 + e 2 .
2 m1 m2

We have
1
E(X) = (m1 + m2 )
2
1
E(X ) = (2m21 + 2m22 )
2
2
2
1 2 2 1
Var(X) = (2m1 + 2m2 ) − (m1 + m2 ) .
2 2

The square of coefficient of variation of X is

1
2
2 + 2m22 ) − 12 (m1 + m2 )

2 (2m1
CV 2 = 1 2
2 (m1 + m2 )
3m21 − 2m1 m2 + 3m22
=
(m1 + m2 )2
8m1 m2
=3 − .
(m1 + m2 )2
m1
Let r = m2 . Then
8r
CV 2 = 3 − .
(1 + r)2
Thaking the derivative and setting it to 0, we find

−8(1 + r)−2 + 16r(1 + r)−3 = 0 =⇒ r = 1.

Moreover, r00 (1) = 1 > 0, r(0) = r(∞) = 3 so that r = 1 is a global

minimum.
√ Hence, the least upper bound of the coefficient of variation is
3
136 CHARACTERISTICS OF ACTUARIAL MODELS

Practice Problems
Problem 16.1
The distribution of a loss, X, is a 2-point mixture:
(i) With probability 0.6, X1 is a Pareto distribution with parameters α = 3
and θ = 900.
(ii) With probability 0.4, X2 is a Pareto distribution with parameters α = 5
and θ = 1500.

Determine Pr(X > 1000).

Problem 16.2
The distribution of a loss, X, is a 2-point mixture:
(i) With probability 0.5, X1 is a Burr distribution with parameters α =
1, γ = 2 and θ = 10000.5 .
(ii) With probability 0.5, X2 is a Pareto distribution with parameters α = 1
and θ = 1000.

Determine the median of X.

Problem 16.3
You are given:
• X is a 2-point mixture of two exponential random variables X1 and X2
with parameters θ1 = 1 and θ2 = 3 and mixing weights 12 and 16 respectively.
• Y = 2X and Y is a mixture of two exponential random variables Y1 and Y2 .

Find E(Y1 ) and E(Y2 ).

Problem 16.4
The severity distribution function for losses from your renters insurance is
the following:
5 3
1000 3500
FX (x) = 1 − 0.3 − 0.7 .
1000 + x 3500 + x
Calculate the mean and the variance of the loss size.
Problem 16.5
Seventy-five percent of claims have a normal distribution with a mean of
3,000 and a variance of 1,000,000. The remaining 25% have a normal dis-
tribution with a mean of 4,000 and a variance of 1,000,000.

Determine the probability that a randomly selected claim exceeds 5,000.

16 DISCRETE MIXTURE DISTRIBUTIONS 137

Problem 16.6
How many parameters are there in a variable component mixture consisting
of 9 Burr distributions?
Problem 16.7
Determine the distribution, density, and hazard rate functions for the vari-
able mixture of two-parameter Pareto distribution.
Problem 16.8
A Weibull distribution has two parameters: θ and τ. An actuary is creating
variable-component mixture distribution consisting of K Weibull distribu-
tions. If the actuary chooses to use 17 Weibull distributions instead of 12,
how many more parameters will the variable-component mixture distribu-
tion have as a result?
Problem 16.9
Let X be a 2-point mixture with underlying random variables X1 and X2 .
The distribution of X1 is a Pareto distribution with parmaters α1 = 3 and
θ. The distribution of X2 is a Gamma distribution with parameters α2 = 2
and θ2 = 2000.

Given that a1 = 0.7, a2 = 0.3, and E(X) = 1340, determine the value
of θ.
Problem 16.10
Let X be a 3-point mixture of three variables X1 , X2 , X3 . You are given the
following information:
R.V. Weight Mean Standard Deviation
X1 0.2 0.10 0.15
X2 0.5 0.25 0.45
X3 0.3 0.17 0.35
Determine Var(X).
Problem 16.11 ‡
The distribution of a loss, X, is a 2-point mixture:
(i) With probability 0.8, X1 is a Pareto distribution with parameters α = 2
and θ = 100.
(ii) With probability 0.2, X2 is a Pareto distribution with parameters α = 4
and θ = 3000.

Determine Pr(X ≤ 200).

138 CHARACTERISTICS OF ACTUARIAL MODELS

17 Data-dependent Distributions
In Section 15, we discussed parametric distributions. In Section 16, we
introduced the k−point mixture distributions that are also known as semi-
parametric distributions. In this section, we look at non-parametric
distributions.

According to [1], a data-dependent distribution is at least as complex

as the data or knowledge that produced it, and the number of “parameters”
increases as the number of data points or the amount of knowledge increases.

We consider two-types of data-dependent distributions:

• The empirical distribution is obtained by assigning a probability of

1
n to each data point in a sample with n data points.

Example 17.1
Below are the losses suffered by policyholders of an insurance company:
49, 50, 50, 50, 60, 75, 80, 120, 230.
Let X be the random variable representing the losses incurred by the poli-
cyholders. Find the pmf and the cdf of X.

Solution.
The pmf is given by the table below.
x 49 50 60 75 80 120 130
1 1 1 1 1 1 1
p(x) 9 3 9 9 9 9 9

The cdf is defined by

FX (x) = 19 number of elements in the sample that are ≤ x.
Thus, for example,
5
FX (73) =
9
• A kernel smoothed distribution
Given an empirical distribution, we wish to create a continuous distribution
whose pdf will be a good estimation of the (discrete) empirical distribution.
The density function is given by
n
X
fX (x) = pn (x)ki (x)
i=1
17 DATA-DEPENDENT DISTRIBUTIONS 139

where pn (x) = n1 and ki (x) is the kernel smoothed density function.

We illustrate these concepts next.

Example 17.2
Below are the losses suffered by policyholders of an insurance company:

49, 50, 50, 50, 60, 75, 80, 120, 230.

Develop a kernel-smoothed distribution associated with this set, such that

each point x is associated with a uniform distribution that has positive prob-
ability over the interval (x − 5, x + 5). As your answer, write the probability
density function (pdf) of the kernel smoothed distribution.

Solution.
For i = 1, 2, · · · , 9, we have
1

10 , x i − 5 ≤ x ≤ xi + 5
ki (x) =
0, otherwise.

We refer to ki (x) as the uniform kernel with bandwith 5.Thus,

9
X 1
fX (x) = ki (x)
9
i=1

A futher discussion of kernel density models will be covered in Section 56 of

this book.
140 CHARACTERISTICS OF ACTUARIAL MODELS

Practice Problems
Problem 17.1
You are given the following empirical distribution of losses suffered by poli-
cyholders Prevent Dental Insurance Company: