IAS+ +non Life+ +Lecture+Notes
IAS+ +non Life+ +Lecture+Notes
Non-Life Insurance
NB: The text in this document should be read in combination with the lecture slides and the
computer assignment for non-life insurance. For this course you also need to read the article
by Verbelen et al. (2018) as referred to in Section 4. You do not have to read Section 4.2 of
that article.
Insurance is a means against financial loss. It is possible only when risks are pooled. Many
individuals that all bear a similar risk pay an insurance premium which is only a fraction of the
potential loss for an individual. The losses that result in case of the insured event are financed
by the insurance premiums. Important questions for the insurance industry are: what is the
distribution of the total losses, and how should the insurance premium be determined?
Non-life insurance is the branch of insurance related activities where the insured benefit does
not depend on the survival of the policyholder. Typical examples of non-life insurance are auto-
mobile insurance, income protection insurance, fire and property insurance, and legal expenses
or assistance insurance. Here we discuss pricing applied to non-life insurance products. Since
in many countries automobile insurance is the largest insurance branch measured in premium
income, we illustrate the principle of pricing for automobile insurance.
In this document we discuss methods for pricing purposes of non-life insurance products.
We first introduce some technical concepts that allow us to investigate the distribution of
the total claims in a portfolio of policyholders. Then, we introduce Generalized Linear Models
(GLM) which are often used in insurance pricing, and we illustrate these concepts using car
insurance pricing.
1 Compound distributions
1.1 Properties of compound distributions
We consider a portfolio that produces a random number of N claims in a certain period. The
total claim amount is then given by
S = X1 + X2 + · · · + XN , (1)
1
where Xi is the ith claim, and if N = 0 we have S = 0. The random variable S is referred
to as a compound distribution since it is a composition of various single risks. We assume
the individual claims Xi are independent and identically distributed. Further, we assume that
N and all Xi are independent, which means that the number of claims and the height of
the claims are not related. The assumption that N and all Xi are independent is not always
appropriate. In a car insurance portfolio for example, bad weather may result in many similar
claim amounts. However, in practice the influence of these phenomena appears to be small.
In the special case that N is Poisson distributed, S follows a compound Poisson distribution,
which we will further discuss below. In case N has a (negative) binomial distribution, then S
follows a compound (negative) binomial distribution. A compound distribution thus represents
the distribution of a sum in which both the number of terms is uncertain, and the values of
the terms themselves are uncertain.
Though this is still a very general setup, we are able to derive expressions for the first two
moments (i.e. mean and variance) of a compound distribution. This is useful, for example,
when we are interested in calculating premiums for insurance products, and if we want to
investigate uncertainty in the outcomes. For this, we only need to know the first two moments
of 1) the distribution of the random number of terms, and 2) the distribution of the random
value of the individual terms.
We can calculate the expected value of S by using the conditional distribution of S given N.
We start by applying the law of total expectation and continue as follows:
1
X
E[S] = E[E[S|N]] = E[X1 + · · · + XN |N = n]Pr[N = n]
n=0
1
X
= E[X1 + · · · + Xn |N = n]Pr[N = n]
n=0
1
X
= E[X1 + · · · + Xn ]Pr[N = n]
n=0
X1
= nµ1 Pr[N = n] = µ1 E[N]. (3)
n=0
In the above derivation, we first use the condition N = n to substitute outcome n for the
random variable N on the left of the conditional bar below. Next, we use the independence
of Xi and N to dispose of the condition N = n. As a result of the independence between N
and all Xi , the expected claim total equals expected claim number times expected claim size
(given that there is a claim). This result is often used in practice, since it suggests that the
claim frequency and claim severity can be modeled separately.
A di↵erent approach is to model the total claim amount directly. However, using di↵erent
models for claim frequency and claim severity is likely to result in more accurate results,
since more flexibility is allowed. For example, di↵erent distributions can be used for the claim
2
frequency and for the claim severity, and di↵erent risk factors can be used to explain claim
frequency than those used to explain claim severity.
In a similar way we can compute the variance of the claim total. Using the law of total
variance1 we get:
We can prove this result using the moment generating functions, but this is left for a later
course.
How can we interpret a sum of compound Poisson random variables? Suppose there are
policyholders i = 1, ... , m and each policyholder has its own claim frequency parameter i
and claim severity distribution Pi (x). The total claim amount for policyholder i thus follows
a compound Poisson distribution Si Then, the total claim amount for the entire portfolio is
a compound Poisson distribution S as defined above, and where the claim frequency of the
total portfolio is given by the sum of the claim frequencies of the individual policyholders, and
the claim severity distribution for individual claims in the portfolio P(x) is a weighted average
of the individual claim severity distributions Pi (x).
1
The law of total variance states that Var[W ] = Var[E[W |V ]] + E[Var[W |V ]].
3
use 100,000 simulations, and we show how these results can be obtained using the statistical
program R. With the following commands we define the input parameters for this example:
Normal approximation. Often we are able to derive (at least) the mean and variance of
a distribution. Consider the case of a total loss amount which is built up from the claims
experience of many individual policies. In such a case there are many observations that result
in the total loss amount, and the distribution of the total loss amount may be close to a Normal
distribution. If that is the case, we can approximate quantiles of the actual loss distribution
using the Normal distribution. The quality of these approximations greatly depends on how
close the Normal distribution approximates the actual loss distribution.
For the example introduced above, we use equations (3) and (4), which results in µS =
20, 000 and the standard deviation equals S = 1, 421.27. The 1-in-100 year worst-case out-
come corresponds to a confidence level ↵ of 99%. We can then simply use the following formula
to approximate the 1-in-100 year worst-case outcome:
1
qS (↵) = µS + S · (↵) = 20, 000 + 1, 421.27 · 2.33 = 23, 306.
Hence, on average once every 100 years the total loss will be larger than 23,306. We can
similarly obtain this result in R the following way:
Simulation approach. A di↵erent way to obtain an estimate of the expected value and the
variance is through simulation. We can perform a large amount of simulations to approximate
the compound distribution. For i = 1, ... , B we perform the following steps:
• Secondly, we draw ni random claim sizes, denote these by x1i , ... , xni ,i .
Pi
• Thirdly, we calculate the total loss amount as Si = nj=1 xji
4
P
From these simulated values we can calculate the average value µ̂S = B1 i Si and an estimate
P
of the variance ˆS2 = B 1 1 i (Si µ̂S )2 .
We can also derive an estimate of the cumulative distribution function, from which we can
derive quantiles of the distribution. A disadvantage of the simulation approach is that it is
unclear upfront how many simulations are needed to obtain stable results. Statistics close to
the center of the distribution (such as the mean and the median) can be estimated accurately
with a relatively limited number of simulations. However, if we are interested in the tails of
the distribution, say the 99th percentile, then we need a large amount of simulations in order
to obtain stable results. An important advantage of the simulation approach is that can easily
be adjusted to allow for very specific adjustments. The code below illustrates how the mean,
variance and quantiles of the total loss distribution can be obtained through simulation.
5
if(vN[j]>0) mX[j, 1:vN[j]] <- rnorm(n=vN[j], mean=dMu, sd=dSD)
}
vX <- rowSums(mX)
vX[vX>150] <- 150
vS[i] <- sum( vX )
}
2 Regression models
2.1 The linear model
In the course Introduction to Econometrics the (standard) linear model was introduced. Sup-
pose there are observations yi for i = 1, ... , n that we wish to explain using risk factors
x1i , ... , xpi . The linear model imposes the following structure between yi and the xji :
Here, "i represents random noise that surrounds the observations yi , meaning that observations
in general are not equal to their expected value. The parameters j are unknown, and our aim
is to find estimates for these parameters.
Define the residual ei which is the di↵erence between the observed value and the fitted
value. We denote estimates of the parameters by bj , and the residuals are then defined as
Pp
ei = y i j=1 bj xji . Estimates bj for the parameters j can then be estimated by minimizing
the squared residuals (also referred to as ordinary least squares or OLS).
n n p
!2
X X X
2
b̂ = arg min ei = arg min yi bj xji .
b b
i=1 i=1 j=1
This can be interpreted as follows. With OLS the goal is to find the values of b that minimizes
for observation i the di↵erence between the predicted value b̂1 x1i + ... + b̂p xpi and the observed
value yi , where the distance is measured by the sum of the squared prediction errors.
OLS is a very generic estimation approach since no assumptions are made on the distribu-
tion of the error terms "i . Using OLS to estimate as in (6) is equivalent to using maximum
likelihood estimation (MLE) on (6) assuming the error terms are normally distributed, i.e. as-
suming "i ⇠ Normal(0, 2 ) (MLE will be discussed in the next paragraph). However, in several
cases use of OLS may yield unreasonable results. Examples of such unreasonable results are
6
if negative fitted values occur while only positive values are realistic, or fractional outcomes
result while only integers are possible (count data). An alternative type of regression models
in which these limitations can be prevented are Generalized Linear Models; these are discussed
below.
It can be shown that for all examples above the mean is µi for each i . The variance
depends on µi and i as Var[Yi ] = i V (µi ) for some function V called the variance
function.
2. The systematic component of the model attributes to every observations a linear pre-
P
dictor ⌘i = j j xji , linear in the parameters 1 , ... , p . The xji are called covariates,
regressors or risk factors.
3. The link function links the expected value µi of Yi to the linear predictor ⌘i = g (µi ).
Rephrased in more simple terms: 1) we assume the random variable Yi to follow a certain
distribution with mean µi and dispersion parameter i , 2) the risk factors xji are used to
explain di↵erences in the mean value of Yi and this is summarized in the linear predictor ⌘i ,
and 3) the linear predictor and the mean value are connected through the link-function (which
is not necessarily the identity link as in case of OLS).
Maximum likelihood estimation. Let us denote the collection of risk factors x1i, , ... , xpi by
~xi , and denote the collection of parameters 1 , ... , p by ~ . Define f (yi |~xi , ~ ) as the probability
of having observed yi . This probability depends on the model that is assumed, the observed risk
factors ~xi , and parameters ~ which are yet unknown. The probability or likelihood of having
observed all observations yi for i = 1, ... , n is then given by
n
Y
f (yi |~xi , ~ ).
i=1
2
See for more information: https://en.wikipedia.org/wiki/Exponential_family.
7
We can estimate the parameters ~ by maximizing the likelihood; this procedure is called
maximum likelihood estimation (MLE). MLE can be interpreted as follows: we want to find
the set of parameters ~ under which it is most likely to have observed the observations yi
for i = 1, ... , n given the specified model. If we consider the likelihood as a function of the
unknown parameters ~ conditional upon the observations yi and the risk factors ~xi , we refer
to it as the likelihood function:
n
Y
L( ~ |y1 , ~x1 , ... , yn , ~xn ) = f (yi |~xi , ~ ). (7)
i=1
It is not practical to optimize the likelihood function itself, because a product is difficult to
optimize. Since the logarithm is a monotone function, maximizing the log of the likelihood
function yields the same results, and it is computationally much easier since the product
becomes a sum. The parameters ~ are estimated by optimizing the log lilkelihood function,
which is achieved by using numerical optimization techniques such as the iteratively reweighted
least squares (IRLS) algorithm. The numerical optimization techniques are out-of-scope for
this introduction course.
Link functions. One of the advantages of GLMs compared to standard linear models is that
the link between the random variables Yi and the parameters and covariates does not have
to be linear. As a result, many model specifications are possible such as additive models and
multiplicative models. The link function g (·) links the expected value µi of Yi to ⌘i = g (µi ),
and vice versa the inverse link function g 1 (·) links the linear predictor ⌘i to the expected
value µi of Yi through E(Yi ) = µi = g 1 (⌘i ).
Each of the distributions has a natural link function associated with it, called the canonical
link function. Using these link functions has some technical advantages which are out-of-scope
for this course. The canonical link functions for the distributions listed above are as follows.
P
• For the normal distribution the canonical link is the identity. Hence, ⌘i = j j xji =
g (µi ) = µi . This results in an additive model, since E(Yi ) = 1 x1i + ... + p xpi ;
P
• For the Poisson distribution the canonical link is the log. Hence, ⌘i = j j xji = g (µi ) =
log µi , and similarly E(Yi ) = µi = exp(⌘i ). This results in a multiplicative model, since
E(Yi ) = exp( 1 x1i + ... + p xpi ) = exp( 1 x1i ) · ... · exp( p xpi );
• For the gamma(↵ = 1 , = 1µ2 ) distribution the canonical link is minus the inverse
(g (µi ) = µ1i . The link function that is used more often is the inverse link function, i.e.
g (µi ) = µ1i . Note that these are closely related, since if
1 X
g (µi ) = = j xji
µi j
then
1 X X
⇤
= ( j )xji = j xji ,
µi j j
⇤
with j = j.
In the next section we provide some examples of how GLMs can be used in explaining obser-
vations using risk factors.
8
3 GLMs in car insurance pricing.
In Section 1 we introduced the concept of compound Poisson distributions. We showed that
- under the assumption of independence between the claim number and claim size - the total
expected claims equals the expected number of claims times the expected claim size. We will
use this result to separately model claim frequencies and claim severities, and we illustrate how
to estimate the claim frequency using GLMs.
Table 1 shows an example dataset that can be used to model the pure premium. The
dataset contains the risk factors gender (1 = female, 2 = male), and residential area (1 =
countryside, 2 = elsewhere, 3 = big city). We refer to a combination of gender and area
as a risk category. For each risk category the number of claims, total claim size (in 1,000),
number of policies, and the exposure in years are given. The exposure in years is smaller
than the number of policies, since not every policy has been in force for a complete year.
In the following paragraphs we illustrate how pricing can be performed using the techniques
introduced earlier.
Simple model for the claim numbers. Claim numbers are positive integer observations,
and therefore only a limited number of distributions is appropriate for modeling claim numbers,
but the Poisson distribution is a natural candidate. Consider a portfolio of n policyholders, and
suppose we have observed a sample Y1 = y1 , ... , Yn = yn of observed claim numbers. In this
paragraph we do not consider risk factors, i.e. all policyholders experience the same claim
frequency. If we assume the random variable Y follow a Poisson( ) distribution, then the
loglikelihood is given by
n
Y n
X n
X
`( ; ~y ) = log fYi (yi ; ) = n + yi log log yi !. (8)
i=1 i=1 i=1
We can maximize this loglikelihood with respect to to obtain the maximum likelihood
estimated (MLE) ˆ . Setting the derivative of `( ) wrt equal to zero and solving for gives
us:
n
X
d
`( ; ~y ) = n+ yi =0
d i=1
n
X
)ˆ= yi n. (9)
i=1
Using the figures from Table 1, we get ˆ = ⌃ni=1 yi /n = 9, 860/114, 944 = 0.0858. Hence,
each policy is expected to claim on average 0.0858 claims, or from every 12 policies (= 1 /
0.0858) we expect 1 claim.
In insurance situations, often the numbers of claims arise from policies that were not in
force during a full calendar year but only a (known) fraction of it. We denote this exposure
for policy i by wi . The number of claims for policy i then follows a Poisson( wi ) distribution,
and the loglikelihood for that model is given by
Y n
X n
X n
X
`( ; ~y ; w
~ ) = log fYi (yi ; wi ) = wi + yi log( wi ) log yi !. (10)
i=1 i=1 i=1
9
Table 1: Stylized dataset from a car insurance company.
It can be shown that in this case, the MLE for is given by ˆ = ⌃ni=1 yi /⌃ni=1 wi . Note that
in this simple setup, we only need the total number of claims ⌃ni=1 yi and the total exposure
⌃ni=1 wi , and these are therefore sufficient statistics.
Again, using the figures from Table 1, we get ˆ = 9, 860/105, 000 = 0.0939. Hence, each
policy is expected to claim on average 0.0939 claims, or from every 11 policies we expect 1
claim. If we would not appropriately take into account that not every policy is in force for the
entire year, we would thus significantly underestimate the annual claim frequency.
GLM for claim frequency. In line with the previous paragraph, we assume a Poisson dis-
tribution for the random number of claims Yi to estimate the annual claim frequency for
policyholder i:
Yi ⇠ Poisson( i ). (11)
However, we assume the expected claim frequency is proportional to some exposure measure
wi (e.g. the period that coverage is provided). Further, there are p risk factors available that
may help explain the variability in the observed yi . These risk factors are represented by the
variables xji for j = 1, ... , p.
A typical model for the claim frequency is a Poisson distribution with a log link:
We refer to the two equations (12) and (13) as the statistical model, which fully describes the
GLM that we want to estimate. In this model, the parameters j are estimated by optimizing
the loglikelihood. Because multiple risk factors are included in this model, there does not exist
an analytical solution to obtain the parameter estimates ˆj . Instead, numerical optimization
techniques have to be used as mentioned before. Note that the logarithm of the exposure
measure wi is included in the linear predictor to ensure the claim frequency is proportional to
wi , and that this is not multiplied by a coefficient (or by a fixed coefficient equal to 1). Terms
10
that are included in the linear predictor with a fixed coefficient are referred to as the o↵set.
(Verify for yourself what the expression for i is.)
In this weeks computer assignment we investigate a dataset dtData in R that looks as
follows:
> head(dtData)
Age Gender Region Mileage Expo Color Speed Urban Claims
1: 44 M 2 62746 0.8289 gray 0.1884 0.3374 0
2: 60 F 2 66932 1.0000 blue 0.0665 0.2167 0
3: 50 M 1 52070 0.3770 gray 0.1346 0.1604 0
4: 30 M 2 150677 0.9842 red 0.1183 0.8601 0
5: 29 F 2 73835 1.0000 white 0.0610 0.5520 0
6: 25 F 2 61944 0.8222 white 0.1100 0.5797 1
These variables are described in detail in the explanation of the computer assignment, but
here we consider Age (an integer indicating the age of the policyholder), and Region (a factor
with three levels which represents where the policyholder lives: 1 = rural area, 2 = elsewhere,
3 = big city).
Suppose we want to estimate a model that includes an intercept, a linear e↵ect in Agei ,
and we estimate an e↵ect for the three di↵erent levels of Regioni . Further, we believe that
the claim frequency is proportional to the period that insurance cover was provided, Expoi ,
and therefore we include the logarithm of this variable in the linear predictor. The statistical
model is then represented by:
where I[Regioni =j] is 1 if Regioni = j and 0 otherwise. This way, the linear predictor includes
the intercept 1 and the linear age e↵ect 2 · Agei for all policyholders, but the other two
terms are only included for those policyholders that belong to that region. In R, this model is
estimated using
glm(Claims ~ 1 + Age + Region + offset(log(Expo)), family=poisson(link=log))
4 Usage-based pricing
Premiums can be adjusted after inception of the contract, for example using usage-based risk
factors. We consider two examples in this section: the use of telematics data in automobile
insurance data, and the use of wearables (health trackers) in health insurance.
Telematics data in automobile insurance. This example is based on the article by Verbe-
len et al. (2018).3 One of the authors, Katrien Antonio, is also professor in Actuarial Science
at UvA. The article is available on Canvas, and it can be downloaded alternatively from
https://rss.onlinelibrary.wiley.com/doi/10.1111/rssc.12283.
3
Verbelen, R., Antonio, K. and Claeskens , G. (2018). Unravelling the predictive power of telematics data in
car insurance pricing, Journal of the Royal Statistical Society: Series C, 67(5).
11
In Section 3 we illustrated how a pure premium can be calculated using risk factors that are
deduced from the policy. The most relevant risk factors that are reported at policy inception
are: age, license age, postal code, engine power or weight of the car, and use of the vehicle
(business or private). However, these factors say little about how the driving habits of the
policyholder.
With telematics technology additional information about the driving habits of the poli-
cyholder can be obtained. If a policyholder purchases an insurance policy with a telematics
option, then a ‘black box’ is installed in the car, and information on every ride is recorded and
sent to the insurer. Examples of information that is collected includes the region where and
the time when the ride took place, how fast the driver accelerates, and how fast the driver
takes turns. This provides the insurer with information on the aggressiveness of the driver, but
also of characteristics of where and when the policyholder drives.
In traditional car insurance the exposure measure used is the period during which cover is
provided, i.e. claim frequency is considered proportional to the coverage period. At inception
of a car insurance product, the policyholder often has to indicate what the expected mileage
is, and insurance companies use this as a risk factor that influences the premium that has to be
paid. With telematics data the actual distance driven becomes available, and it is worthwhile
investigating whether the claim frequency is proportional to the distance driven. This can be
investigated by estimating the following model:
⇣ hP i⌘
Claimsi ⇠ Poisson Mileagei · exp x
j j ji .
Using a variable as an exposure measure has a di↵erent interpretation than using a variable
as a risk factor. For example, consider two drivers with equal risk factors, but driver A drives
twice the distance of driver B during the same coverage period. When we use mileage as an
exposure measure, then we expect that driver A claims on average twice as much as driver
B, and this relative di↵erence is fixed. However, if we use coverage period as exposure and
mileage as risk factor, we do not know whether claim frequency increases with mileage. It
might be that drivers who drive more are in fact more experienced drivers, which lowers their
claim frequency. Indeed, in the telematics article referred to above the authors find that it
is optimal to use coverage period as the exposure measure and mileage as an explanatory
variable.
Wearables in health insurance. In 1997 the first insurance wellness program - Discovery
Vitality - was introduced in South Africa. Since 2012 the number of insurance wellness programs
increased significantly, even though this is not yet available in the Netherlands.
With insurance wellness programs the policyholder agrees to wear a health tracker (‘wear-
able’). The wearable tracks information that a↵ects or is related to the health status of the
policyholder. For example, with the number of steps taken in a day the insurer can determine
whether the policyholder was active that day or sat behind a desk all day.
Questions
Q1 A compound distribution is represented by S = X1 + X2 + · · · XN , where both Xi and N are
random variables. Explain why S = 0 if N = 0.
12
Q2 Consider a compound distribution S = X1 + X2 + · · · XN as in (1). Derive the expected value
of S using the law of total expectation and the variance of S using the law of total variance.
Q3 Explain in your own words why the theory on compound distributions is useful in analyzing
historical claims information in order to predict total loss on a policy.
Q4 We discussed that the sum of compound Poisson r.v.’s is also compound Poisson. Explain in
your own words why this result is useful for analyzing the total loss in a portfolio.
Q5 Consider a compound distribution S = X1 + X2 + · · · XN as in (1). The claim size Xi follows
a Normal distribution with mean 100 and standard deviation 10.
a. Derive the expected value and variance of S under the assumption that N ⇠ Poisson( =
1000 · 0.1), and derive the 90th percentile of the total loss distribution S using a Normal
approximation. Recall that for a Poisson( )-distribution the mean and variance equal .
b. Do the same as in the previous bullet, but now assuming N ⇠ Binomial(n = 1000, p =
0.1). Recall that for a Binomial(n, p)-distribution the mean equals n · p and variance
equals n · p · (1 p).
Explain the di↵erence in the results.
Explanatory note: the distribution for N in a. corresponds to a portfolio of size 1000 where
the claim frequency distribution of each policyholder equals Poisson( = 0.1), whereas in b.
it corresponds to a portfolio of size 1000 where all policyholders have probability 0.1 of filing
a single claim and probability 0.9 of having no claim at all.
Q6 Describe in your own words what the most important similarity and di↵erence are between
using ordinary least squares and generalized linear models for regression purposes.
Q7 Consider a random variable Yi with observations yi for i = i, ... , n with corresponding weights
wi . If we assume that Yi follows a Poisson( · wi ) distribution, write down the likelihood
function as a function of and find the MLE for .
Q8 Suppose we have observations yi and risk factors xji for i = 1, ... , n and j = 1, ... , p (i.e.
there are p risk factors or explanatory variables). We believe the observations originate from
a random variable Yi that follows a Poisson distribution, and we want to find the relationship
between the observations yi and the risk factors xji using a GLM.
a. What is the linear predictor for this problem?
b. Which link function would you use to link the linear predictor to the mean of the Poisson
distribution, and why?
c. Using the previous two answers, define the complete statistical model.
Q9 Variance functions. An advantage of GLMs compared to standard linear models is that the
variance may depend on the mean, but it does not have to be. Some common distributions
within the GLM framework are listed below:
• N(µi , i) random variables with Yi 2 R;
13
• Poisson(µi ) random variables with Yi 2 N0 ;
• gamma(↵ = 1
i
, = 1
i µi
) random variables with Yi 2 R+ .
Here, the parameter µi is the mean parameter and i is the dispersion parameter. Within the
GLM framework, we can specify the variance as Var[Yi ] = i V (µi ) for some function V (·)
which is called called the variance function. For each of the above distributions, specify the
variance as the product of the dispersion parameter and the variance function. What can you
conclude from the di↵erent variance functions?
Q10 Suppose we are an insurance company and we want to analyze what the probability pi is that
someone lapses its policy (i.e. that the policy is terminated). For all policyholders i = 1, ... , n
we have observed whether they have lapse their policy or not. We denote these observations
by yi , and these observations are 1 if the policy is terminated and 0 if it is not terminated.
We want to use the framework of GLMs to analyze these lapse probabilities. For now, suppose
there are no risk factors available.
a. Which distribution would you use to model the lapse probabilities, taking into account
the structure of the observations?
b. Since there are not risk factors available at this stage, the lapse probability is the same
for all policyholders and we denote the lapse probability by p. Write down the complete
statistical model, write down the likelihood function as a function of p, and find the
MLE for the lapse probability p.
Now suppose the lapse probability of policyholder i depends on the age of the policyholder
(denoted by x1i ) and the policy duration (denoted by x2i ). We want to use these two risk factors
to explain di↵erences in lapse probabilities between policyholders. NB: the policy duration
equals the number of years that a policy has been in force, and in regression analyses it is
common practice to always include a constant.
c. What is the linear predictor ⌘i for this regression problem?
d. Suppose we use the identify link function. Can you think of a problem that may arise
with this choice for a link function?
e. The typical link function used in Bernoulli or Binomial regressions is the logit-link:
⌘ = logit(p) = ln(p/(1 p)). Can you think of what the advantage is of using this link
function? Hint: try to find the inverse link function, i.e. p = f (⌘).
Q11 In mortality modeling, observations originate from individuals. However, analyses are almost
always performed on aggregated level (i.e. aggregated over individuals with similar character-
istics). In this exercise we investigate why this can be done. We will also again investigate the
importance of adequately taking exposure into account.
Suppose we have a portfolio of individuals i = 1, .., n with n = 10, 000. Each of these in-
dividuals were alive and aged exactly x at the beginning of the year, and at the end of the
year 160 of them died. For each individual we define an indicator variable i that equals 1 if
they died and 0 otherwise.
14
a. We want to derive the mortality probability qx which is the probability that someone aged
exactly x dies within the coming year. Which probability distribution is most suitable for
this type of observations?
b. Write down the complete log likelihood function as a function of qx and dependent on
i , and find the one-year mortality probability qx by optimizing the log likelihood.
If we assume the force of mortality to remain constant from [x, x + 1), the following re-
lationship
R h holds between the mortality probability and the force of mortality: h qx = 1
exp[ 0 µx+s ds] = 1 exp[ h · µx ].4 The h-period survival probability is given by h px =
1 h qx = exp[ h · µx ].
c. Assume everyone dies at the end of the year. There are two types of individuals within
the portfolio: those that have survived the entire year (with probability px ), and those
that survived until the end but died at the end (with probability px · µx ).5 Write down
the complete log likelihood function as a function of µx and dependent on i , and find
the estimate for µx .
d. Now assume that people die uniformly during the year, which means that on average
each person that died lived for half a year. The period that each individual lived during
the year is defined as ⌧i 2 (0, 1]. Write down the log likelihood function as a function of
µx (dependent on i and ⌧i ), and find the estimate for µx .
P P
e. Define e = i ⌧i and d = i i . What does the (log) likelihood from d. look like if you
simplify it further and make use of e and d?
The following questions are related to the article by Verbelen et al. (2018).
Q12 In traditional car insurance regression analyses gender is often found to be an important risk
factor. However, due to the European gender ruling, gender may not be used in car insurance
pricing to prevent discrimination based on gender. Explain how the use of telematics data may
reduce the importance of the covariate gender in analyzing historical car claims data.
Q13 The premium of car insurance is often proportional to the policy duration. This way, the
premium depends on a form of ‘use’ by the policyholder, namely the number of time units the
policy is in force.
4
This result follows from survival theory which is part of Life Insurance courses.
5
This is why the force of mortality is called an instantaneous rate of mortality: it is applicable to exactly
that instantaneous moment that it is used for. In that sense, the way that it is used in this sub question
is not in line with the actual definition, since people should be allowed to die throughout the entire year.
That will be investigated in the next sub question.
15
a. What alternatives did the authors investigate for quantifying the dependency of the
premium on types of ‘use’ by the policyholder?
Hint: there are four alternatives (including current practice).
b. What type of model is found to be optimal in a statistical sense (AIC and cross validation
scores)? Which of the four alternatives do believe are easiest to implement in practice?
One or multiple alternatives may be considered.
16