0% found this document useful (0 votes)
25 views16 pages

IAS+ +non Life+ +Lecture+Notes

Uploaded by

Huy Nguyen Quoc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views16 pages

IAS+ +non Life+ +Lecture+Notes

Uploaded by

Huy Nguyen Quoc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Introduction Actuarial Science:

Non-Life Insurance

Frank van Berkum


[email protected]

NB: The text in this document should be read in combination with the lecture slides and the
computer assignment for non-life insurance. For this course you also need to read the article
by Verbelen et al. (2018) as referred to in Section 4. You do not have to read Section 4.2 of
that article.

Insurance is a means against financial loss. It is possible only when risks are pooled. Many
individuals that all bear a similar risk pay an insurance premium which is only a fraction of the
potential loss for an individual. The losses that result in case of the insured event are financed
by the insurance premiums. Important questions for the insurance industry are: what is the
distribution of the total losses, and how should the insurance premium be determined?

Non-life insurance is the branch of insurance related activities where the insured benefit does
not depend on the survival of the policyholder. Typical examples of non-life insurance are auto-
mobile insurance, income protection insurance, fire and property insurance, and legal expenses
or assistance insurance. Here we discuss pricing applied to non-life insurance products. Since
in many countries automobile insurance is the largest insurance branch measured in premium
income, we illustrate the principle of pricing for automobile insurance.

In this document we discuss methods for pricing purposes of non-life insurance products.
We first introduce some technical concepts that allow us to investigate the distribution of
the total claims in a portfolio of policyholders. Then, we introduce Generalized Linear Models
(GLM) which are often used in insurance pricing, and we illustrate these concepts using car
insurance pricing.

1 Compound distributions
1.1 Properties of compound distributions
We consider a portfolio that produces a random number of N claims in a certain period. The
total claim amount is then given by

S = X1 + X2 + · · · + XN , (1)

1
where Xi is the ith claim, and if N = 0 we have S = 0. The random variable S is referred
to as a compound distribution since it is a composition of various single risks. We assume
the individual claims Xi are independent and identically distributed. Further, we assume that
N and all Xi are independent, which means that the number of claims and the height of
the claims are not related. The assumption that N and all Xi are independent is not always
appropriate. In a car insurance portfolio for example, bad weather may result in many similar
claim amounts. However, in practice the influence of these phenomena appears to be small.
In the special case that N is Poisson distributed, S follows a compound Poisson distribution,
which we will further discuss below. In case N has a (negative) binomial distribution, then S
follows a compound (negative) binomial distribution. A compound distribution thus represents
the distribution of a sum in which both the number of terms is uncertain, and the values of
the terms themselves are uncertain.
Though this is still a very general setup, we are able to derive expressions for the first two
moments (i.e. mean and variance) of a compound distribution. This is useful, for example,
when we are interested in calculating premiums for insurance products, and if we want to
investigate uncertainty in the outcomes. For this, we only need to know the first two moments
of 1) the distribution of the random number of terms, and 2) the distribution of the random
value of the individual terms.

Moments of compound distributions. Assume S is a compound random variable as in (1),


and the terms Xi are all distributed as X . Further, we use the following notation:

µk = E[X k ], P(x) = Pr[X  x]. (2)

We can calculate the expected value of S by using the conditional distribution of S given N.
We start by applying the law of total expectation and continue as follows:
1
X
E[S] = E[E[S|N]] = E[X1 + · · · + XN |N = n]Pr[N = n]
n=0
1
X
= E[X1 + · · · + Xn |N = n]Pr[N = n]
n=0
1
X
= E[X1 + · · · + Xn ]Pr[N = n]
n=0
X1
= nµ1 Pr[N = n] = µ1 E[N]. (3)
n=0

In the above derivation, we first use the condition N = n to substitute outcome n for the
random variable N on the left of the conditional bar below. Next, we use the independence
of Xi and N to dispose of the condition N = n. As a result of the independence between N
and all Xi , the expected claim total equals expected claim number times expected claim size
(given that there is a claim). This result is often used in practice, since it suggests that the
claim frequency and claim severity can be modeled separately.
A di↵erent approach is to model the total claim amount directly. However, using di↵erent
models for claim frequency and claim severity is likely to result in more accurate results,
since more flexibility is allowed. For example, di↵erent distributions can be used for the claim

2
frequency and for the claim severity, and di↵erent risk factors can be used to explain claim
frequency than those used to explain claim severity.
In a similar way we can compute the variance of the claim total. Using the law of total
variance1 we get:

Var[S] = E[Var[S|N]] + Var[E[S|N]]


= E[Var[X1 + ... + XN ]] + Var[E[X1 + ... + XN ]]
= E[NVar[X ]] + Var[Nµ1 ]
= E[N]Var[X ] + µ21 Var[N]. (4)

Sum of compound Poisson r.v.’s is compound Poisson. If S1 , S2 , ... , Sm are independent


compound Poisson random variables with Poisson parameter i and claims distribution Pi ,
i = 1, 2, ... , m, then S = S1 +S2 +· · ·+Sm is compound Poisson distributed with specifications
m
X m
X i
= i and P(x) = Pi (x). (5)
i=1 i=1

We can prove this result using the moment generating functions, but this is left for a later
course.
How can we interpret a sum of compound Poisson random variables? Suppose there are
policyholders i = 1, ... , m and each policyholder has its own claim frequency parameter i
and claim severity distribution Pi (x). The total claim amount for policyholder i thus follows
a compound Poisson distribution Si Then, the total claim amount for the entire portfolio is
a compound Poisson distribution S as defined above, and where the claim frequency of the
total portfolio is given by the sum of the claim frequencies of the individual policyholders, and
the claim severity distribution for individual claims in the portfolio P(x) is a weighted average
of the individual claim severity distributions Pi (x).

1.2 Risk analyses on a compound distribution


In general there are two ways to analyze properties of a distribution, using analytical approx-
imation formulas or through simulation. In the previous paragraphs we showed how to derive
the expected value and the variance of a compound distribution analytically. However, ana-
lytical formulas are not always available, and in such cases simulation can be very useful to
analyze quantiles of a loss distribution.
We will consider the following example. A portfolio contains 1000 policies which all have (in
expectation) a claim once every five years. The claim frequency is assumed to follow a Poisson
distribution, so the mean parameter of the Poisson distribution equals = 0.2. The distribution
for the claim size given that a claim has occured is assumed to be Normal(µS = 100, S2 = 102 ).
We are interested in the 1-in-100 year worst-case outcome, which we will approximate using the
Normal distribution (i.e. through a Normal approximation) and using a simulation approach.
We will also analyze how the impact of a reinsurance contract on the total expected loss and
the variance of the total loss can be quantified using simulations. For this purpose, we will

1
The law of total variance states that Var[W ] = Var[E[W |V ]] + E[Var[W |V ]].

3
use 100,000 simulations, and we show how these results can be obtained using the statistical
program R. With the following commands we define the input parameters for this example:

iB <- 1E5 # Number of simulations


iN <- 1000 # Size of the portfolio
dLambda <- 0.2 # Mean parameter of Poisson distribution
dMu <- 100 # Mean of claim size distribution
dSD <- 10 # Standard deviation of claim size distribution

Normal approximation. Often we are able to derive (at least) the mean and variance of
a distribution. Consider the case of a total loss amount which is built up from the claims
experience of many individual policies. In such a case there are many observations that result
in the total loss amount, and the distribution of the total loss amount may be close to a Normal
distribution. If that is the case, we can approximate quantiles of the actual loss distribution
using the Normal distribution. The quality of these approximations greatly depends on how
close the Normal distribution approximates the actual loss distribution.
For the example introduced above, we use equations (3) and (4), which results in µS =
20, 000 and the standard deviation equals S = 1, 421.27. The 1-in-100 year worst-case out-
come corresponds to a confidence level ↵ of 99%. We can then simply use the following formula
to approximate the 1-in-100 year worst-case outcome:
1
qS (↵) = µS + S · (↵) = 20, 000 + 1, 421.27 · 2.33 = 23, 306.

Hence, on average once every 100 years the total loss will be larger than 23,306. We can
similarly obtain this result in R the following way:

# Use Eq.(3) and Eq.(4) to define E[S] and Var[S]:


dMu_S <- iN * dLambda * dMu
dSD_S <- sqrt(iN*dLambda*dSD^2 + dMu^2*iN*dLambda)

# Use Normal approximation to derive the quantile of the total loss:


qnorm(p=0.99, mean=dMu_S, sd=dSD_S)
# = dMu_S + qnorm(p=0.99) * dSD_S
The function qnorm provides the requested quantile p of a Normal distribution with specified
mean and standard deviation. If no mean and standard deviation are defined, the quantile is
returned for the standard normal distribution (i.e. with mean 0 and standard deviation 1).

Simulation approach. A di↵erent way to obtain an estimate of the expected value and the
variance is through simulation. We can perform a large amount of simulations to approximate
the compound distribution. For i = 1, ... , B we perform the following steps:

• Firstly, we draw a random number of claims, denote this by ni ;

• Secondly, we draw ni random claim sizes, denote these by x1i , ... , xni ,i .
Pi
• Thirdly, we calculate the total loss amount as Si = nj=1 xji

4
P
From these simulated values we can calculate the average value µ̂S = B1 i Si and an estimate
P
of the variance ˆS2 = B 1 1 i (Si µ̂S )2 .
We can also derive an estimate of the cumulative distribution function, from which we can
derive quantiles of the distribution. A disadvantage of the simulation approach is that it is
unclear upfront how many simulations are needed to obtain stable results. Statistics close to
the center of the distribution (such as the mean and the median) can be estimated accurately
with a relatively limited number of simulations. However, if we are interested in the tails of
the distribution, say the 99th percentile, then we need a large amount of simulations in order
to obtain stable results. An important advantage of the simulation approach is that can easily
be adjusted to allow for very specific adjustments. The code below illustrates how the mean,
variance and quantiles of the total loss distribution can be obtained through simulation.

# Start the simulation


set.seed(1) # This ensure reproducibility of these results
vS <- numeric(iB)

# For each simulation, randomly draw the total loss amount


vN <- rpois(n=iB, lambda=iN * dLambda)
for(i in 1:iB) {
vS[i] <- sum( rnorm(n=vN[i], mean=dMu, sd=dSD) )
}

# Calculate statistics from the simulated total loss distribution


mean(vS) # 19997.35
sqrt(var(vS)) # 1421.59
quantile(vS, prob=0.99) # 23376.70
Note that the mean and variance are close to the value we derived using the analytical formulas.
The 99th percentile is also close to the quantile derived using the Normal approximation, but
we do not know which of the two is closest to the actual quantile of the total loss distribution;
both are approximations. If we increase the number of simulations, the simulation outcome
will get closer and closer to the actual value.
Now suppose the insurance company aims to reduce its risk that is caused by risky pol-
icyholders. The insurance company arranges the following with a reinsurance company. If a
policyholder claims for more than 150 in a year (i.e. the sum over all claims for a single pol-
icyholder exceeds 150), the di↵erence between the actual claim amount and 150 is paid by
the reinsurance company. For this situation we do not have explicit formulas to derive the
expected value and variance of the compound distribution. However, we can easily adjust the
simulation approach to allow for the reinsurance scheme, see below.

# For each simulation, randomly draw the total loss amount


vS <- numeric(iB)
for(i in 1:iB) {
vN <- rpois(n=iN, lambda=dLambda)
mX <- matrix(data=0, nrow=iN, ncol=max(vN))
for(j in 1:iN) {

5
if(vN[j]>0) mX[j, 1:vN[j]] <- rnorm(n=vN[j], mean=dMu, sd=dSD)
}
vX <- rowSums(mX)
vX[vX>150] <- 150
vS[i] <- sum( vX )
}

# Calculate statistics from the simulated total loss distribution


mean(vS) # 19004.47
sqrt(var(vS)) # 1299.74
quantile(vS, prob=0.99) # 22076.02
Note that the mean and standard deviation of the total loss have decreased significantly, and
the 1-in-100 year worst-case loss has also substantially decreased.

2 Regression models
2.1 The linear model
In the course Introduction to Econometrics the (standard) linear model was introduced. Sup-
pose there are observations yi for i = 1, ... , n that we wish to explain using risk factors
x1i , ... , xpi . The linear model imposes the following structure between yi and the xji :

yi = 1 x1i + 2 x2i + ... + p xpi + "i


p
X
2
= j xji + "i , with "i ⇠ (0, ). (6)
j=1

Here, "i represents random noise that surrounds the observations yi , meaning that observations
in general are not equal to their expected value. The parameters j are unknown, and our aim
is to find estimates for these parameters.
Define the residual ei which is the di↵erence between the observed value and the fitted
value. We denote estimates of the parameters by bj , and the residuals are then defined as
Pp
ei = y i j=1 bj xji . Estimates bj for the parameters j can then be estimated by minimizing
the squared residuals (also referred to as ordinary least squares or OLS).
n n p
!2
X X X
2
b̂ = arg min ei = arg min yi bj xji .
b b
i=1 i=1 j=1

This can be interpreted as follows. With OLS the goal is to find the values of b that minimizes
for observation i the di↵erence between the predicted value b̂1 x1i + ... + b̂p xpi and the observed
value yi , where the distance is measured by the sum of the squared prediction errors.
OLS is a very generic estimation approach since no assumptions are made on the distribu-
tion of the error terms "i . Using OLS to estimate as in (6) is equivalent to using maximum
likelihood estimation (MLE) on (6) assuming the error terms are normally distributed, i.e. as-
suming "i ⇠ Normal(0, 2 ) (MLE will be discussed in the next paragraph). However, in several
cases use of OLS may yield unreasonable results. Examples of such unreasonable results are

6
if negative fitted values occur while only positive values are realistic, or fractional outcomes
result while only integers are possible (count data). An alternative type of regression models
in which these limitations can be prevented are Generalized Linear Models; these are discussed
below.

2.2 Generalized Linear Models


Generalized Linear Models (GLM) generalize the linear model in various directions. The random
variables do not need to be normally distributed, the variance does not need to be independent
of the mean, and the link between the random variables and the parameters and covariates
does not need to be linear.
Generalized Linear Models have three characteristics:
1. The stochastic component of the model states that the observations are independent
random variable Yi , i = 1, ... , n with a density in the exponential dispersion family.2 The
most important for this course are:

• N(µi , i) random variables with Yi 2 R;


• Poisson(µi ) random variables with Yi 2 N0 ;
• gamma(↵ = 1
i
, = 1
i µi
) random variables with Yi 2 R+ .

It can be shown that for all examples above the mean is µi for each i . The variance
depends on µi and i as Var[Yi ] = i V (µi ) for some function V called the variance
function.

2. The systematic component of the model attributes to every observations a linear pre-
P
dictor ⌘i = j j xji , linear in the parameters 1 , ... , p . The xji are called covariates,
regressors or risk factors.

3. The link function links the expected value µi of Yi to the linear predictor ⌘i = g (µi ).
Rephrased in more simple terms: 1) we assume the random variable Yi to follow a certain
distribution with mean µi and dispersion parameter i , 2) the risk factors xji are used to
explain di↵erences in the mean value of Yi and this is summarized in the linear predictor ⌘i ,
and 3) the linear predictor and the mean value are connected through the link-function (which
is not necessarily the identity link as in case of OLS).

Maximum likelihood estimation. Let us denote the collection of risk factors x1i, , ... , xpi by
~xi , and denote the collection of parameters 1 , ... , p by ~ . Define f (yi |~xi , ~ ) as the probability
of having observed yi . This probability depends on the model that is assumed, the observed risk
factors ~xi , and parameters ~ which are yet unknown. The probability or likelihood of having
observed all observations yi for i = 1, ... , n is then given by
n
Y
f (yi |~xi , ~ ).
i=1

2
See for more information: https://en.wikipedia.org/wiki/Exponential_family.

7
We can estimate the parameters ~ by maximizing the likelihood; this procedure is called
maximum likelihood estimation (MLE). MLE can be interpreted as follows: we want to find
the set of parameters ~ under which it is most likely to have observed the observations yi
for i = 1, ... , n given the specified model. If we consider the likelihood as a function of the
unknown parameters ~ conditional upon the observations yi and the risk factors ~xi , we refer
to it as the likelihood function:
n
Y
L( ~ |y1 , ~x1 , ... , yn , ~xn ) = f (yi |~xi , ~ ). (7)
i=1

It is not practical to optimize the likelihood function itself, because a product is difficult to
optimize. Since the logarithm is a monotone function, maximizing the log of the likelihood
function yields the same results, and it is computationally much easier since the product
becomes a sum. The parameters ~ are estimated by optimizing the log lilkelihood function,
which is achieved by using numerical optimization techniques such as the iteratively reweighted
least squares (IRLS) algorithm. The numerical optimization techniques are out-of-scope for
this introduction course.

Link functions. One of the advantages of GLMs compared to standard linear models is that
the link between the random variables Yi and the parameters and covariates does not have
to be linear. As a result, many model specifications are possible such as additive models and
multiplicative models. The link function g (·) links the expected value µi of Yi to ⌘i = g (µi ),
and vice versa the inverse link function g 1 (·) links the linear predictor ⌘i to the expected
value µi of Yi through E(Yi ) = µi = g 1 (⌘i ).
Each of the distributions has a natural link function associated with it, called the canonical
link function. Using these link functions has some technical advantages which are out-of-scope
for this course. The canonical link functions for the distributions listed above are as follows.
P
• For the normal distribution the canonical link is the identity. Hence, ⌘i = j j xji =
g (µi ) = µi . This results in an additive model, since E(Yi ) = 1 x1i + ... + p xpi ;
P
• For the Poisson distribution the canonical link is the log. Hence, ⌘i = j j xji = g (µi ) =
log µi , and similarly E(Yi ) = µi = exp(⌘i ). This results in a multiplicative model, since
E(Yi ) = exp( 1 x1i + ... + p xpi ) = exp( 1 x1i ) · ... · exp( p xpi );
• For the gamma(↵ = 1 , = 1µ2 ) distribution the canonical link is minus the inverse
(g (µi ) = µ1i . The link function that is used more often is the inverse link function, i.e.
g (µi ) = µ1i . Note that these are closely related, since if
1 X
g (µi ) = = j xji
µi j

then
1 X X

= ( j )xji = j xji ,
µi j j

with j = j.

In the next section we provide some examples of how GLMs can be used in explaining obser-
vations using risk factors.

8
3 GLMs in car insurance pricing.
In Section 1 we introduced the concept of compound Poisson distributions. We showed that
- under the assumption of independence between the claim number and claim size - the total
expected claims equals the expected number of claims times the expected claim size. We will
use this result to separately model claim frequencies and claim severities, and we illustrate how
to estimate the claim frequency using GLMs.
Table 1 shows an example dataset that can be used to model the pure premium. The
dataset contains the risk factors gender (1 = female, 2 = male), and residential area (1 =
countryside, 2 = elsewhere, 3 = big city). We refer to a combination of gender and area
as a risk category. For each risk category the number of claims, total claim size (in 1,000),
number of policies, and the exposure in years are given. The exposure in years is smaller
than the number of policies, since not every policy has been in force for a complete year.
In the following paragraphs we illustrate how pricing can be performed using the techniques
introduced earlier.

Simple model for the claim numbers. Claim numbers are positive integer observations,
and therefore only a limited number of distributions is appropriate for modeling claim numbers,
but the Poisson distribution is a natural candidate. Consider a portfolio of n policyholders, and
suppose we have observed a sample Y1 = y1 , ... , Yn = yn of observed claim numbers. In this
paragraph we do not consider risk factors, i.e. all policyholders experience the same claim
frequency. If we assume the random variable Y follow a Poisson( ) distribution, then the
loglikelihood is given by
n
Y n
X n
X
`( ; ~y ) = log fYi (yi ; ) = n + yi log log yi !. (8)
i=1 i=1 i=1

We can maximize this loglikelihood with respect to to obtain the maximum likelihood
estimated (MLE) ˆ . Setting the derivative of `( ) wrt equal to zero and solving for gives
us:
n
X
d
`( ; ~y ) = n+ yi =0
d i=1
n
X
)ˆ= yi n. (9)
i=1

Using the figures from Table 1, we get ˆ = ⌃ni=1 yi /n = 9, 860/114, 944 = 0.0858. Hence,
each policy is expected to claim on average 0.0858 claims, or from every 12 policies (= 1 /
0.0858) we expect 1 claim.
In insurance situations, often the numbers of claims arise from policies that were not in
force during a full calendar year but only a (known) fraction of it. We denote this exposure
for policy i by wi . The number of claims for policy i then follows a Poisson( wi ) distribution,
and the loglikelihood for that model is given by
Y n
X n
X n
X
`( ; ~y ; w
~ ) = log fYi (yi ; wi ) = wi + yi log( wi ) log yi !. (10)
i=1 i=1 i=1

9
Table 1: Stylized dataset from a car insurance company.

Record Gender Area Number Total claim Number Exposure


of claims size (in e1,000) of policies in years
1 1 1 383 1,582 5,383 5,000
2 1 2 919 2,234 11,471 10,000
3 1 3 2,259 2,258 21,825 20,000
4 2 1 731 2,895 11,890 10,000
5 2 2 1,538 3,790 21,748 20,000
6 2 3 4,030 4,057 42,627 40,000
Totals: 9,860 16,816 114,944 105,000

It can be shown that in this case, the MLE for is given by ˆ = ⌃ni=1 yi /⌃ni=1 wi . Note that
in this simple setup, we only need the total number of claims ⌃ni=1 yi and the total exposure
⌃ni=1 wi , and these are therefore sufficient statistics.
Again, using the figures from Table 1, we get ˆ = 9, 860/105, 000 = 0.0939. Hence, each
policy is expected to claim on average 0.0939 claims, or from every 11 policies we expect 1
claim. If we would not appropriately take into account that not every policy is in force for the
entire year, we would thus significantly underestimate the annual claim frequency.

GLM for claim frequency. In line with the previous paragraph, we assume a Poisson dis-
tribution for the random number of claims Yi to estimate the annual claim frequency for
policyholder i:

Yi ⇠ Poisson( i ). (11)

However, we assume the expected claim frequency is proportional to some exposure measure
wi (e.g. the period that coverage is provided). Further, there are p risk factors available that
may help explain the variability in the observed yi . These risk factors are represented by the
variables xji for j = 1, ... , p.
A typical model for the claim frequency is a Poisson distribution with a log link:

Yi ⇠ Poisson( = exp(⌘i )),


i (12)
p
X
with ⌘i = log wi + j xji . (13)
j=1

We refer to the two equations (12) and (13) as the statistical model, which fully describes the
GLM that we want to estimate. In this model, the parameters j are estimated by optimizing
the loglikelihood. Because multiple risk factors are included in this model, there does not exist
an analytical solution to obtain the parameter estimates ˆj . Instead, numerical optimization
techniques have to be used as mentioned before. Note that the logarithm of the exposure
measure wi is included in the linear predictor to ensure the claim frequency is proportional to
wi , and that this is not multiplied by a coefficient (or by a fixed coefficient equal to 1). Terms

10
that are included in the linear predictor with a fixed coefficient are referred to as the o↵set.
(Verify for yourself what the expression for i is.)
In this weeks computer assignment we investigate a dataset dtData in R that looks as
follows:

> head(dtData)
Age Gender Region Mileage Expo Color Speed Urban Claims
1: 44 M 2 62746 0.8289 gray 0.1884 0.3374 0
2: 60 F 2 66932 1.0000 blue 0.0665 0.2167 0
3: 50 M 1 52070 0.3770 gray 0.1346 0.1604 0
4: 30 M 2 150677 0.9842 red 0.1183 0.8601 0
5: 29 F 2 73835 1.0000 white 0.0610 0.5520 0
6: 25 F 2 61944 0.8222 white 0.1100 0.5797 1
These variables are described in detail in the explanation of the computer assignment, but
here we consider Age (an integer indicating the age of the policyholder), and Region (a factor
with three levels which represents where the policyholder lives: 1 = rural area, 2 = elsewhere,
3 = big city).
Suppose we want to estimate a model that includes an intercept, a linear e↵ect in Agei ,
and we estimate an e↵ect for the three di↵erent levels of Regioni . Further, we believe that
the claim frequency is proportional to the period that insurance cover was provided, Expoi ,
and therefore we include the logarithm of this variable in the linear predictor. The statistical
model is then represented by:

Claimsi ⇠ Poisson Expoi · exp[ 1 + 2 · Agei + 3 · I[Regioni =2] + 4 · I[Regioni =3] ,

where I[Regioni =j] is 1 if Regioni = j and 0 otherwise. This way, the linear predictor includes
the intercept 1 and the linear age e↵ect 2 · Agei for all policyholders, but the other two
terms are only included for those policyholders that belong to that region. In R, this model is
estimated using
glm(Claims ~ 1 + Age + Region + offset(log(Expo)), family=poisson(link=log))

4 Usage-based pricing
Premiums can be adjusted after inception of the contract, for example using usage-based risk
factors. We consider two examples in this section: the use of telematics data in automobile
insurance data, and the use of wearables (health trackers) in health insurance.

Telematics data in automobile insurance. This example is based on the article by Verbe-
len et al. (2018).3 One of the authors, Katrien Antonio, is also professor in Actuarial Science
at UvA. The article is available on Canvas, and it can be downloaded alternatively from
https://rss.onlinelibrary.wiley.com/doi/10.1111/rssc.12283.

3
Verbelen, R., Antonio, K. and Claeskens , G. (2018). Unravelling the predictive power of telematics data in
car insurance pricing, Journal of the Royal Statistical Society: Series C, 67(5).

11
In Section 3 we illustrated how a pure premium can be calculated using risk factors that are
deduced from the policy. The most relevant risk factors that are reported at policy inception
are: age, license age, postal code, engine power or weight of the car, and use of the vehicle
(business or private). However, these factors say little about how the driving habits of the
policyholder.
With telematics technology additional information about the driving habits of the poli-
cyholder can be obtained. If a policyholder purchases an insurance policy with a telematics
option, then a ‘black box’ is installed in the car, and information on every ride is recorded and
sent to the insurer. Examples of information that is collected includes the region where and
the time when the ride took place, how fast the driver accelerates, and how fast the driver
takes turns. This provides the insurer with information on the aggressiveness of the driver, but
also of characteristics of where and when the policyholder drives.
In traditional car insurance the exposure measure used is the period during which cover is
provided, i.e. claim frequency is considered proportional to the coverage period. At inception
of a car insurance product, the policyholder often has to indicate what the expected mileage
is, and insurance companies use this as a risk factor that influences the premium that has to be
paid. With telematics data the actual distance driven becomes available, and it is worthwhile
investigating whether the claim frequency is proportional to the distance driven. This can be
investigated by estimating the following model:
⇣ hP i⌘
Claimsi ⇠ Poisson Mileagei · exp x
j j ji .

Using a variable as an exposure measure has a di↵erent interpretation than using a variable
as a risk factor. For example, consider two drivers with equal risk factors, but driver A drives
twice the distance of driver B during the same coverage period. When we use mileage as an
exposure measure, then we expect that driver A claims on average twice as much as driver
B, and this relative di↵erence is fixed. However, if we use coverage period as exposure and
mileage as risk factor, we do not know whether claim frequency increases with mileage. It
might be that drivers who drive more are in fact more experienced drivers, which lowers their
claim frequency. Indeed, in the telematics article referred to above the authors find that it
is optimal to use coverage period as the exposure measure and mileage as an explanatory
variable.

Wearables in health insurance. In 1997 the first insurance wellness program - Discovery
Vitality - was introduced in South Africa. Since 2012 the number of insurance wellness programs
increased significantly, even though this is not yet available in the Netherlands.
With insurance wellness programs the policyholder agrees to wear a health tracker (‘wear-
able’). The wearable tracks information that a↵ects or is related to the health status of the
policyholder. For example, with the number of steps taken in a day the insurer can determine
whether the policyholder was active that day or sat behind a desk all day.

Questions
Q1 A compound distribution is represented by S = X1 + X2 + · · · XN , where both Xi and N are
random variables. Explain why S = 0 if N = 0.

12
Q2 Consider a compound distribution S = X1 + X2 + · · · XN as in (1). Derive the expected value
of S using the law of total expectation and the variance of S using the law of total variance.
Q3 Explain in your own words why the theory on compound distributions is useful in analyzing
historical claims information in order to predict total loss on a policy.
Q4 We discussed that the sum of compound Poisson r.v.’s is also compound Poisson. Explain in
your own words why this result is useful for analyzing the total loss in a portfolio.
Q5 Consider a compound distribution S = X1 + X2 + · · · XN as in (1). The claim size Xi follows
a Normal distribution with mean 100 and standard deviation 10.
a. Derive the expected value and variance of S under the assumption that N ⇠ Poisson( =
1000 · 0.1), and derive the 90th percentile of the total loss distribution S using a Normal
approximation. Recall that for a Poisson( )-distribution the mean and variance equal .

b. Do the same as in the previous bullet, but now assuming N ⇠ Binomial(n = 1000, p =
0.1). Recall that for a Binomial(n, p)-distribution the mean equals n · p and variance
equals n · p · (1 p).
Explain the di↵erence in the results.

Explanatory note: the distribution for N in a. corresponds to a portfolio of size 1000 where
the claim frequency distribution of each policyholder equals Poisson( = 0.1), whereas in b.
it corresponds to a portfolio of size 1000 where all policyholders have probability 0.1 of filing
a single claim and probability 0.9 of having no claim at all.
Q6 Describe in your own words what the most important similarity and di↵erence are between
using ordinary least squares and generalized linear models for regression purposes.
Q7 Consider a random variable Yi with observations yi for i = i, ... , n with corresponding weights
wi . If we assume that Yi follows a Poisson( · wi ) distribution, write down the likelihood
function as a function of and find the MLE for .
Q8 Suppose we have observations yi and risk factors xji for i = 1, ... , n and j = 1, ... , p (i.e.
there are p risk factors or explanatory variables). We believe the observations originate from
a random variable Yi that follows a Poisson distribution, and we want to find the relationship
between the observations yi and the risk factors xji using a GLM.
a. What is the linear predictor for this problem?

b. Which link function would you use to link the linear predictor to the mean of the Poisson
distribution, and why?

c. Using the previous two answers, define the complete statistical model.

Q9 Variance functions. An advantage of GLMs compared to standard linear models is that the
variance may depend on the mean, but it does not have to be. Some common distributions
within the GLM framework are listed below:
• N(µi , i) random variables with Yi 2 R;

13
• Poisson(µi ) random variables with Yi 2 N0 ;

• gamma(↵ = 1
i
, = 1
i µi
) random variables with Yi 2 R+ .
Here, the parameter µi is the mean parameter and i is the dispersion parameter. Within the
GLM framework, we can specify the variance as Var[Yi ] = i V (µi ) for some function V (·)
which is called called the variance function. For each of the above distributions, specify the
variance as the product of the dispersion parameter and the variance function. What can you
conclude from the di↵erent variance functions?
Q10 Suppose we are an insurance company and we want to analyze what the probability pi is that
someone lapses its policy (i.e. that the policy is terminated). For all policyholders i = 1, ... , n
we have observed whether they have lapse their policy or not. We denote these observations
by yi , and these observations are 1 if the policy is terminated and 0 if it is not terminated.
We want to use the framework of GLMs to analyze these lapse probabilities. For now, suppose
there are no risk factors available.
a. Which distribution would you use to model the lapse probabilities, taking into account
the structure of the observations?

b. Since there are not risk factors available at this stage, the lapse probability is the same
for all policyholders and we denote the lapse probability by p. Write down the complete
statistical model, write down the likelihood function as a function of p, and find the
MLE for the lapse probability p.
Now suppose the lapse probability of policyholder i depends on the age of the policyholder
(denoted by x1i ) and the policy duration (denoted by x2i ). We want to use these two risk factors
to explain di↵erences in lapse probabilities between policyholders. NB: the policy duration
equals the number of years that a policy has been in force, and in regression analyses it is
common practice to always include a constant.
c. What is the linear predictor ⌘i for this regression problem?

d. Suppose we use the identify link function. Can you think of a problem that may arise
with this choice for a link function?

e. The typical link function used in Bernoulli or Binomial regressions is the logit-link:
⌘ = logit(p) = ln(p/(1 p)). Can you think of what the advantage is of using this link
function? Hint: try to find the inverse link function, i.e. p = f (⌘).

Q11 In mortality modeling, observations originate from individuals. However, analyses are almost
always performed on aggregated level (i.e. aggregated over individuals with similar character-
istics). In this exercise we investigate why this can be done. We will also again investigate the
importance of adequately taking exposure into account.

Suppose we have a portfolio of individuals i = 1, .., n with n = 10, 000. Each of these in-
dividuals were alive and aged exactly x at the beginning of the year, and at the end of the
year 160 of them died. For each individual we define an indicator variable i that equals 1 if
they died and 0 otherwise.

14
a. We want to derive the mortality probability qx which is the probability that someone aged
exactly x dies within the coming year. Which probability distribution is most suitable for
this type of observations?

b. Write down the complete log likelihood function as a function of qx and dependent on
i , and find the one-year mortality probability qx by optimizing the log likelihood.

We now introduce some more elaborate definitions.


• The h-period mortality probability h qx represents the probability that someone aged x
dies somewhere in the coming period h.

• The force of mortality µx is an instantaneous rate of mortality which is defined as


h qx
µx = lim .
h#0 h

If we assume the force of mortality to remain constant from [x, x + 1), the following re-
lationship
R h holds between the mortality probability and the force of mortality: h qx = 1
exp[ 0 µx+s ds] = 1 exp[ h · µx ].4 The h-period survival probability is given by h px =
1 h qx = exp[ h · µx ].
c. Assume everyone dies at the end of the year. There are two types of individuals within
the portfolio: those that have survived the entire year (with probability px ), and those
that survived until the end but died at the end (with probability px · µx ).5 Write down
the complete log likelihood function as a function of µx and dependent on i , and find
the estimate for µx .

d. Now assume that people die uniformly during the year, which means that on average
each person that died lived for half a year. The period that each individual lived during
the year is defined as ⌧i 2 (0, 1]. Write down the log likelihood function as a function of
µx (dependent on i and ⌧i ), and find the estimate for µx .
P P
e. Define e = i ⌧i and d = i i . What does the (log) likelihood from d. look like if you
simplify it further and make use of e and d?

The following questions are related to the article by Verbelen et al. (2018).
Q12 In traditional car insurance regression analyses gender is often found to be an important risk
factor. However, due to the European gender ruling, gender may not be used in car insurance
pricing to prevent discrimination based on gender. Explain how the use of telematics data may
reduce the importance of the covariate gender in analyzing historical car claims data.
Q13 The premium of car insurance is often proportional to the policy duration. This way, the
premium depends on a form of ‘use’ by the policyholder, namely the number of time units the
policy is in force.

4
This result follows from survival theory which is part of Life Insurance courses.
5
This is why the force of mortality is called an instantaneous rate of mortality: it is applicable to exactly
that instantaneous moment that it is used for. In that sense, the way that it is used in this sub question
is not in line with the actual definition, since people should be allowed to die throughout the entire year.
That will be investigated in the next sub question.

15
a. What alternatives did the authors investigate for quantifying the dependency of the
premium on types of ‘use’ by the policyholder?
Hint: there are four alternatives (including current practice).

b. What type of model is found to be optimal in a statistical sense (AIC and cross validation
scores)? Which of the four alternatives do believe are easiest to implement in practice?
One or multiple alternatives may be considered.

16

You might also like