Market-Based Insurance Ratemaking: Application To Pet Insurance
Market-Based Insurance Ratemaking: Application To Pet Insurance
insurance
Lyon, France
3 Hestialytics, Paris
4 University of California Santa Barbara, Department of Statistics and Applied Probability, Santa Barbara CA 93106-3110, USA
February 7, 2025
Abstract
This paper introduces a method for pricing insurance policies using market data. The approach is designed
for scenarios in which the insurance company seeks to enter a new market, in our case: pet insurance, lacking
historical data. The methodology involves an iterative two-step process. First, a suitable parameter is proposed
to characterize the underlying risk. Second, the resulting pure premium is linked to the observed commercial
premium using an isotonic regression model. To validate the method, comprehensive testing is conducted on
synthetic data, followed by its application to a dataset of actual pet insurance rates. To facilitate practical
implementation, we have developed an R package called IsoPriceR. By addressing the challenge of pricing
insurance policies in the absence of historical data, this method helps enhance pricing strategies in emerging
markets.
1
1 Introduction
Modern insurance pricing relies on predictive modeling methods to ensure that premiums reflect, as accurately
as possible, the average cost of claims. To achieve this, insurers rely on historical data to train statistical
learning models and calculate what is called the pure premium. Although the foundation of standard actuarial
practice often rests on generalized linear models (GLM), see Renshaw [18], the relentless evolution of data
science has ushered in a new era where more sophisticated machine learning algorithms are also coming
into play, see Blier-Wong et al. [5] and the reference therein. However, a challenge arises when an insurance
company enters a new market, lacking historical data on the risks it aims to cover. In this context, conventional
predictive modeling tools are failing, leaving insurers at a crossroad looking for innovative solutions to navigate
uncharted territory.
Although an insurance company may lack historical data in a new market, there is an attractive alternative:
Collect and analyze market data consisting of rates offered by competitors for similar insurance policies. Our
approach leverages these market data to provide insights into the underwritten risk leading to the calculation
of insurance premiums. Our objective is to develop a methodology that determines suitable commercial
premiums based on the observed commercial rates of competitors.
The data collection process involves obtaining insurance quotes. To gather these quotes, one can either
visit insurance company websites and answer several questions about the insured risk or obtain survey data
responses from a data broker that has automated such a process. The premiums quoted depend on the
responses provided by the customer. In this paper, we focus on a pet insurance application, but we have opted
for generic notation because we believe that the method could be suitable for other insurance products. Pet
insurance covers veterinary expenses for pets, including unexpected injuries (e.g., foreign object ingestion,
broken bones), illnesses (e.g., cancer, glaucoma, hip dysplasia, parvovirus), surgeries (e.g., cruciate ligament
tears, cataracts), medications, diagnostic tests (e.g., X-rays, blood tests, MRIs), and emergency exam fees.
However, most pet insurance policies exclude coverage for pre-existing conditions, routine and preventive
care, spaying/neutering, vaccinations, and other specific exclusions. To determine the cost of coverage, pet
owners must provide details about their pet’s species, breed, age, and gender. These characteristics are referred
to as rating factors in actuarial science and are crucial for risk classification. For an overview of this topic, we
direct the reader to the work of Antonio and Valdez [1]. An insurance company looking to enter a new market
would naturally identify these risk factors when collecting data on the premiums offered by competitors.
Within a specific risk class, the risk is represented by a positive random variable, denoted as X, which quantifies
2
the total amount of the claims during the insurance policy period. Insurance companies mitigate this risk by
offering coverage for a portion of X, denoted as g(X) < X, in exchange for a premium. The process involves
calculating the pure premium, defined as p = E[g(X)]. Customers are then presented with a commercial
premium derived from the pure premium as e
p = f (p) > p, where f represents the loading function. Our
problem is formulated as follows: given a collection of insurance quotes D = {e pn } corresponding to a
p1 , . . . , e
specific risk, with variations in loading and coverage functions, represented as
pi = fi {E[gi (X)]}, i = 1, . . . , n,
e
we aim to study the distribution of the underlying risk X and approximate the loading functions in order to
price our own insurance policies relative to current market premia and risk loadings.
Our approach assumes that the distribution of the risk X is parameterized by θ ∈ Θ ⊂ Rd . Ideally, if the loading
functions fi were known for i = 1, . . . , n, a procedure similar to the generalized method of moments could be
applied (see Hansen [13]). Unfortunately, these functions are unknown in this context to anyone not internal to
the companies from which the premium quotes were obtained. Therefore, in order to proceed we will perform
an estimation under a two-stage procedure. We posit a prior distribution π(θ) to delimit the parameter space
of the risk model from which we can readily sample θ. We compute the pure premiums piθ = Eθ [gi (X)], based
on the known coverages gi provided by the insurance policy. If X follows a compound distribution (i.e. a
random sum) then the pure premium does not have a closed form expression but can be approximated to any
desired level of precision using simulations.
In the second stage, the loading functions fi : R+ → R+ are approximated using an isotonic regression model,
chosen for its ability to maintain the monotonic relationship between pure and commercial premiums—a
desirable feature. Additionally, market data is inherently noisy, and isotonic regression provides robustness to
outliers, superior to that of simple linear regression, see further discussion on estimation properties of isotonic
regression in [15] and references therein. The procedure may be summarized as follows:
2. Compute the pure premiums piθ for each of the insurance policies i = 1, . . . , n,
3. Fit an isotonic function f to learn the relationship between the commercial premium e
pi and the pure
premium piθ ,
4. Build the ’synthetic’ market data Dθ by applying the estimated loading f to the pure premium f (piθ ) for
3
i = 1, . . . , n,
5. If the observed and synthetic market data are close enough, according to a prespecified distance, then we
store the parameter value θ and the associated loading function f .
After iterating the above steps, we get a sequence of parameter-loading function pairs: (θ1 , f1 ), (θ2 , f2 ), . . .. This
sequence allows us to price our own insurance policies. The problem we tackle is an inverse problem and our
solution is inspired from indirect inference methodologies pionneered by Gourieroux et al. [12]. The proposed
algorithm to search the parameter space resembles Approximate Bayesian Computation (ABC) algorithms
described in the book of Sisson et al. [21]. The parameter values sampled θ1 , θ2 , . . . by the algorithm yields an
approximate posterior distribution π(θ|D). This posterior distribution accounts for the uncertainty around
the estimated parameter value due to the use of Monte Carlo simulation to calculate the pure premium. ABC
algorithms have found successful applications in a range of actuarial science and risk management problems.
We refer the readers to the works of Peters et al. [17], Dean et al. [6], Peters and Sisson [16] and Goffard and
Laub [11] for further insights. Isotonic regression, a well-established statistical methodology, see for instance
Barlow et al. [3], plays a central role in our approach. A recent application in actuarial science addresses the
autocalibration challenges that can arise when pricing insurance contracts using machine learning algorithms,
see the work of Wüthrich and Ziegel [22].
The rest of the paper is organized as follows. Section 2 describes the risk model used in this study and discusses
insurance pricing principles. Section 3 provides a detailed account of the algorithmic procedure. Our method
is presented as an Approximate Bayesian Computation (ABC)-type optimization algorithm, which incorporates
a simple isotonic regression model. Section 4 presents the results of a simulation study designed to showcase
the performance of our method in a controlled environment. Then, we apply our algorithm on a dataset made
of real-world pet insurance rates in Section 5. Finally, we conclude in Section 6 and discuss the perspectives
and limits of our methodology for potential applications on other insurance products.
4
where N is a counting random variable and the Uk ’s are independent and identically distributed (iid) positive
random variables independent from N . The random variable N is the number of occurrences of an event over
a given time period (annually), each of these events is associated to a compensation Uk . We assume here that X
represents a risk that belongs a specific risk class determined by risk factors.
An insurance company offers to bear part of this risk g(X) ≤ X in exchange for a premium which should
compensate the average cost of claim given by p = E [g(X)] , referred to as the pure premium. We consider in
this work a function g defined as
g(x) = min(max(r · x − d, 0), l),
where r ∈ (0, 1] is the coverage rate, d > 0 is the deductible and l > 0 is the limit. We illustrate the impact of the
parameters of the insurance coverage in Example 1.
Example 1. Let us consider a scenario where the risk has a Poisson-lognormal distribution X ∼ Poisson(λ =
3) − LogNorm(µ = 0, σ = 1) and that n = 100 insurance coverages are proposed. These are characterized by a rate, a
deductible and a limit, set randomly as
The pure premium are increasing in the rates and decreasing in the deductible. Note that the pure premiums were
estimated via simulations to overcome the lack of explicit formula for the distribution functions of X.
In practice, the rate offered to policyholders includes a loading to compensate for the variability of the risks
and to cover the management costs. We describe this loading in the next section.
p = f (p) ≥ p.
e
5
5
Pure premium
r
3
0.6
0.7
2 0.8
0.9
1
0 2 4 6
d
Figure 1: Pure premiums as a function of the rate of coverage (r) and the deductible (d) for a Poisson(λ =
3) − LogNorm(µ = 0, σ = 1) risk.
The function f is referred to as the loading function. As the commercial premium is a function of the pure
premium then we are applying the expectation premium principle. Other premium principles are also possible
like the standard deviation principle discussed in Appendix B. A simple loading function is linear in the pure
premium as
f (x) = (1 + η)x,
where η > 0. The loading functions used by insurance companies are unknown to us and vary from one
insurance company to the other. We follow up on Example 1 in Example 2 where we randomize the linear link
between pure and commercial premium.
Example 2. Take the pure premiums of Example 1 and apply the following linear loadings
pi = (1 + ηi )pi , for i = 1, . . . , n.
e (3)
We only observe the commercial premium e pn and we would like to learn from them about the risk X
p1 , . . . , e
and the loading functions f1 , . . . , fn . We formulate our problem and describe our solution in the next section.
6
Commercial premium 10.0
7.5
5.0
2.5
0.0
1 2 3 4 5
Pure premium
Figure 2: Pure premium as a function of the commercial premium offered by various insurance companies.
where the risk X is a random variable defined in (1). The loading functions f1 , . . . , fn are unknown. The insurance
coverage functions g1 , . . . , gn are known and of the form
7
We use this preliminary problem to introduce our distance function and the notion of model identifiability.
Section 3.2 considers the actual problem involving the commercial premiums. The link between pure and
commercial premiums is approximated using an isotonic regression model. Regularization terms are included
in the distance used to compare observed and model-generated rates to mitigate the identifiability issue.
Finally, Section 3.3 refines the accept-reject algorithm laid out in the introduction to search the parameter
space in a more efficient way.
Assume that we hold a collection of pure premiums p1:n and consider the following optimization problem:
θ
Problem 1. Find θ ∈ Θ ⊂ Rd to minimize d p1:n , p1:n , where
are the pure premium associated to the risk X parametrized by θ and d(·, ·) denotes a distance function over the
observation space.
We measure the discrepancy between observed and model-generated pure premiums using the root mean
square error (RMSE) defined as
v
t n
X 2
θ
RMSE p1:n , p1:n = wiRMSE pi − piθ , (4)
i=1
for a candidate risk parameter θ. The weights wiRMSE > 0 for i = 1, . . . , n allow us to place greater emphasis
on specific data points. The statistical framework is that of minimum distance estimation. We do not have
access to the full shape of the data distribution. We must base our inference on specific moments, just as in the
generalized method of moments, a popular method among econometricians (see Hansen [13]).The model is
identifiable if there exists a unique estimator θ ∗ such that
θ
θ ∗ = arg min RMSE p1:n , p1:n . (5)
θ∈Θ
θ
Existence stems from the fact that the parameter space Θ is compact and the map θ 7→ RMSE(p1:n , p1:n ) is
continuous. Uniqueness is more difficult to verify as it depends on the functional gi ’s. Given the model for X
and the insurance coverages, the pure premium does not have an analytical expression making it difficult to
show the convexity of (4). A simple necessary condition is that the number of parameters must be smaller than
n, the number of moments considered. The shape of our insurance coverage function suggests that we may
successfully identify the correct parameters, as demonstrated in Example 3.
8
Example 3. We consider the same model as in Example 1. Recall that the risk has a Poisson-lognormal distribution
X ∼ Poisson(λ = 3) − LogNorm(µ = 0, σ = 1) and that n = 100 insurance coverages are proposed. The rates,
deductibles and limit, are set randomly as
Let us assume that µ = 0 and compute the pure premium over a grid of values for λ and σ . Figure 3 shows the
θ
RMSE[p1:n , p1:n ] depending on the value of λ and σ for (λ, σ ) ∈ [0, 5] × [0, 2]. This contour plot shows minimal RMSE
2.0
RMSE
(0.03, 0.19]
1.5 (0.19, 0.35]
(0.35, 0.52]
(0.52, 0.68]
(0.68, 0.85]
σ
1.0
(0.85, 1.01]
(1.01, 1.17]
(1.17, 1.34]
0.5 (1.34, 1.50]
(1.50, 1.50]
(1.50, 24.25]
1 2 3 4 5
λ
θ
Figure 3: Contour plot of RMSE(p1:n , p1:n ) for µ = 0 and (λ, σ ) ∈ [0, 5] × [0, 2].
If uniqueness of the solution cannot be verified, a workarround consists in adding regularization terms to
the discrepancy measure. We explore this dircetion in Section 3.2 where the actual problem is treated. The
issue is further addressed by uing a particle based optimization methods to search the parameter space and
provide a set of admissible candidate parameters. Such an optimization method is presented in Section 3.3.
This identifiability issue from an empirical point of view in the online supplementary material1 .
9
h i
θ
Problem 2. Find θ ∈ Θ ⊂ Rd and f : R+ 7→ R+ to minimize d e
p1:n , f p1:n , where the function f is applied
θ
elementwise on p1:n and d(·, ·) denotes a distance function over the observation space, subject to
Our first task is to find a generic function f to represent the safety loading functions fi ’s used by the competitors.
For this, we use isotonic regression. It is a statistical technique used for fitting a non-decreasing function to a
set of data points. The idea is that if two pure premiums satisfy pi ≤ pj then the commercial premium should
θ
also verify e pj . Consider a collection of candidate pure premiums p1:n
pi ≤ e , associated to a candidate estimate
of the risk parameter θ. Our data points are therefore pairs of pure and commercial premiums (piθ , e
pi )i=1,...,n .
Suppose the pure premiums have been ordered such that piθ ≤ pjθ for i ≤ j, isotonic regression seeks a least
piθ for the e
square fit e piθ ≤ e
pi ’s such that e pjθ for piθ ≤ pjθ . It reduces to finding e
p1θ , . . . , e
pnθ that minimizes
n
X
wiiso (e pi )2 , subject to e
piθ − e piθ ≤ e
pjθ whenever piθ ≤ pjθ ,
i=1
where wiiso denotes the weights associated to each pair (piθ , e
pi )i=1,...,n . The weights allows us to emphasize
i=1,...,n
on specific data points in the same manner as the weights defined for the RMSE in (4). Since the piθ ’s fall in a
totally ordered space, a simple iterative procedure called the Pool Adjacent Violators Algorithm (PAVA) can be
used. The pseudo algorithm that describes the procedure is provided below:
pi∗ = e
1. Initialize the sequence of values to be the same as the data points e pi .
2. Iterate through the sequence and identify "violations," which occur when the current value is greater
than the next value, that is
pi∗ > e
e ∗
pi+1 for some i = 1, . . . , n.
When a violation is found, adjust the values in the associated segment of the sequence to be the average
of the values,
pi∗ ← (e
e pi∗ + e∗
pi+1 )/2,
ensuring monotonicity.
10
Example 4. The isotonic fit of the data of Example 1 and Example 2 is provided on Figure 4.
Commercial premium
10.0
7.5
5.0
2.5
0.0
1 2 3 4 5
Pure premium
Remark 3.1. When looking at Figure 4, one may object that a simple linear regression model could do the job. This
impression is partly due to the (noisy) linear link between pure and commercial premium in (3). Isotonic regression
is a non-parametric approach, meaning it doesn’t make strong assumptions about the underlying distribution or
functional form of the relationship between variables. This can be advantageous when the true relationship is not
well represented by a linear model. We briefly illustrate this fact in Appendix A by looking at the residuals of the
linear and isotonic regression and considering a non-linear link function between pure and commercial premium.
Furthermore, we believe that ensuring e pj when pi < pj is desirable as greater pure premium should be associated
pi < e
to greater commercial premium as a rule of thumb. Isotonic regression aims at satisfying just this condition. The main
drawback of isotonic regression is that it can adhere too closely to the data points, risking overfitting. Note also that
isotonic regression lacks the interpretability of a simpler, lower-dimensional parametric curve.
We now turn to the definition of a discrepancy measure to compare the model-generated and observed market
commercial premiums. Our starting point is the root mean square error (RMSE) defined as
v
t n
h i X h i2
θ
RMSE e p1:n , f p1:n = wiRMSE e
pi − f piθ , (7)
i=1
11
h i
θ
The existence of such θ ∗ is guaranteed because θ 7→ RMSE e
p1:n , f p1:n only takes a finite number of val-
ues. Indeed, to each θ ∈ Θ is associated a unique permutation sθ ∈ Sn , where Sn denotes the set of all the
permutations of {1, . . . , n}, such that
psθθ (1) ≤ . . . ≤ psθθ (n) .
psθθ (1) ≤ . . . ≤ e
e psθθ (n) ,
h i
θ
leading to a given RMSE value RMSE e p1:n , f p1:n . Concretely, for θ1 , θ2 ∈ Θ, if it holds that sθ1 = sθ2 then
θ1 θ2
h i h i
RMSE e p1:n , f (p1:n ) = RMSE e
p1:n , f (p1:n ) . The application θ 7→ snθ is surjective since SnΘ = {snθ ; θ ∈ Θ} is finite.
The fact that Θ is a continuous space implies that θ ∗ cannot be unique. The application of isotonic regression
offers a straightforward interpretation of our objective, as detailed in Remark 3.2.
Remark 3.2. At this stage, the optimization problem simplifies to identifying the parameter value θ corresponding to
the most suitable permutation sθ of the pure premium, which provides the best isotonic fit. We adhere to the guiding
principle that a higher pure premium implies a higher commercial premium, as noted in Remark 3.1.
Our problem is an ill-posed inverse problem. Ill-posedness is usually dealt with by adding a regularization to
the objective function that one wants to minimize. The ratio of p/e
p corresponds to what practitioners would
call the expected Loss Ratio (LR). Our solution is based on targeting a given loss ratio. The loss ratio is a
standard measure to assess the profitability of insurance lines of business. An insurance company that enters a
new market is likely to have insights on the loss ratio relative to this market, for example by having informal
discussions with reinsurers, brokers or competitors. These feedbacks may translate into the definition of a
lower and upper bound denoted by LRlow and LRhigh , respectively. We can then assume that the loss ratios
LRi = pi /e
pi , for i = 1, . . . , n, should fall in the range [LRlow , LRhigh ], which we refer to as the loss ratio corridor.
Assuming that LRhigh < 1, we ensure both constraints in (6) by adding to our distance (7) two regularization
terms defined as v
t n
X 2
θ
Reglow e
p1:n , p1:n = wiRMSE e
pi − piθ · LR−1
low ,
+
i=1
and v
t n
X 2
θ
Reghigh e
p1:n , p1:n = wiRMSE piθ · LR−1 pi ,
high − e +
i=1
where (x)+ = max(x, 0) denotes the positive part of x. The distance we consider within Problem 2 is now given
by
h i h i
θ θ θ θ
d e
p1:n , f p1:n = RMSE ep1:n , f p1:n + Reglow e
p1:n , p1:n + Reghigh e
p1:n , p1:n .
12
We illustrate the impact of adding the regularization terms in Example 5.
Example 5. We consider the commercial rates of Example 2. Recall that the commercial premiums are given by
pi = (1 + ηi )pi ,
e for i = 1, . . . , n, (8)
Figure 5 displays the contour plot of the discrepancy between observed and model-generated commercial rates. When
2.0 2.0
ε ε
(0.78, 0.80] (0.78, 1.08]
1.5 (0.80, 0.83] 1.5 (1.08, 1.38]
(0.83, 0.85] (1.38, 1.68]
(0.85, 0.88] (1.68, 1.99]
(0.88, 0.90] (1.99, 2.29]
σ
σ
1.0 (0.90, 0.93] 1.0 (2.29, 2.59]
(0.93, 0.95] (2.59, 2.89]
(0.95, 0.98] (2.89, 3.20]
0.5 (0.98, 1.00] 0.5 (3.20, 3.50]
(1.00, 1.00] (3.50, 3.50]
(1.00, 1.76] (3.50, 37.70]
1 2 3 4 5 1 2 3 4 5
λ λ
(a) Distance function without regularization (b) Distance function with regularization
h i h i
θ θ
Figure 5: Contour plot of RMSE e
p1:n , f p1:n and d e
p1:n , f p1:n for µ = 0 and (λ, σ ) ∈ [0, 5] × [0, 2].
comparing Figure 5a and Figure 5b, we note how beneficial including regularization terms is to identify the true
parameter values.
Regularization brings us closer to the scenario described in Section 3.1, where the pure premium is known.
This corresponds to the case where LRlow = LRhigh = 1. Regularization enables us to exclude large portions of
the parameter space associated with nonsensical pure premiums given the commercial premiums. However,
careful consideration is required when setting the loss ratio corridor. A narrow loss ratio corridor results in a
highly precise estimation, but this estimation may be biased if the loss ratio corridor is misspecified.
The issue of identifiability persists, as multiple parameter values can still solve the optimization problem. This
highlights the necessity of using particle-based optimization techniques, as described in Section 3.3, to explore
the parameter space. Such techniques return a set of admissible candidate parameters and are less sensitive
13
to initialization. For a more detailed discussion of the identifiability issue, we refer the reader to the online
supplementary material2 .
Our solution alternates between proposing parameter values for the risk to compute the pure premiums and
approximating the fi ’s using isotonic regression. We must accommodate the lack of tractable expressions for
the pure premium
piθ = Eθ [gi (X)] , for i = 1, . . . , n.
The use of numerical methods makes a grid search procedure prohibitive from a computing time point of view.
It also prevents us from using gradient-based optimization procedures. In such cases, one can turn towards
particle swarm optimization algorithms or genetic algorithms to search the parameter space. Since we have
decided to take a Crude Monte Carlo estimator for the pure premiums, the accuracy depends on the number
of replications R of X being used. We adopt a Bayesian strategy in order to reflect the uncertainty around
the pure premium calculation onto the parameters’ final estimates. Our algorithm is similar to Approximate
Bayesian Computation algorithms and we simply refine the procedure laid out in the introduction to get an
approximation of the posterior distribution π(θ|D).
We start by setting a prior distribution π(θ) over the parameter space that we sequentially improve through
intermediate distributions characterized by a sequence of tolerance levels (ϵg )g≥0 that decrease gradually as
∞ = ϵ0 > ϵ1 > ϵ2 > . . . > 0. Each intermediate distribution (called a generation and denoted by g) is represented
g
by a cloud of weighted particles θj , wj . We approximate each intermediate posterior distribution using
j=1,...,J
a multivariate kernel density estimator (kde) denoted by πϵg (θ|D). The parameters of the algorithm are the
number of generations G, the population size J (the number of particles in the cloud), and the number of
Monte Carlo replications R of X.
The algorithm is initialized by setting ϵ0 = ∞ and πϵ0 (θ|D) = π(θ). For generation g ≥ 1, we hold an intermedi-
ate distribution πϵg−1 (θ|D) from which we can sample particles θ ∗ ∼ πϵg−1 (θ|D). We compute the associated
pure premium
∗
piθ = Eθ ∗ [gi (X)] , for i = 1, . . . , n.
The pure premiums are computed via Monte Carlo simulations. The accuracy depends on the number R of
2 https://github.com/LaGauffre/market_based_insurance_ratemaking/blob/main/latex/supp_material.pdf
14
copies of X involved in the Monte Carlo estimations. We then fit the isotonic regression model
∗
pi = f piθ + ei ,
e for i = 1, . . . , n,
where ei is an error term that captures the mismatch between the true value of the pure premium and
its empirical counterpart estimated by the competitor insurance company using its historical data and the
company-specific loading function. We further compare the observed commercial premiums to the model-
generated ones via the distance defined in Section 3.2 with
∗ i ∗ i
θ∗ θ∗
h h
θ θ
d e
p1:n , f p1:n = RMSE e
p1:n , f p1:n + Reg1 e
p1:n , p1:n + Reg2 ep1:n , p1:n .
h ∗ i
θ
If the distance satisfies d ep1:n , f p1:n < ϵg−1 , then we keep the associated particle θ ∗ . New particles are
g g g g
proposed until we reach J accepted particles denoted by θ1 , . . . , θJ . We also store the distances d1 , . . . , dJ . We
need to set the next tolerance threshold ϵg , which is used to calculate the particle weights
g
g
π(θj )
wj ∝ Id g <ϵg−1 , j = 1, . . . , J.
πϵg−1 (θ) j
The tolerance threshold is chosen so as to maintain a specified effective sample size (ess) of J/2 as in Del Moral
PJ g
et al. [8]. Following Kong et al. [14], the ess is estimated by 1/ j=1 (wj )2 . This weighted sample then allows us
to update the intermediate distribution as
J
g g
X
πϵg (θ|D) = wj KH (θ − θj ),
j=1
where KH is a multivariate kde with smoothing matrix H. A common choice for the kde is the multivariate
Gaussian kernel with a smoothing matrix set to twice the empirical covariance matrix of the cloud of particles
g g
{θj , wj } as in Beaumont et al. [4]. The procedure is summarized in Algorithm 1.
The user must configure several aspects of the algorithm. The prior assumptions π(θ) determine the parameter
space that will be searched. The loss ratio corridor [LRlow , LRhigh ] sets up the two regularization terms,
ensuring that parameters associated with unreasonable pure premiums are excluded. The prior settings and
loss ratio corridor can be guided by expert opinions. The population size J drives the quality of the posterior
distributions approximations through the cloud of particles. A large J also enhances the chances of finding
global optimums, as more particles improve the coverage of the parameter space. A greater number R of Monte
Carlo simulations ensures the accuracy of the pure premium evaluation. Both R and J contribute to the stability
of the algorithm’s results over several runs. The number of generations G relates to the tolerance level ϵ, which
in turn drives the narrowness of the posterior distribution output by the ABC algorithm. As one would expect,
15
Algorithm 1 Population Monte Carlo Approximate Bayesian Computation
1: set ϵ0 = ∞ and πϵ0 (θ | D) = π(θ)
2: for g = 1 → G do
3: for j = 1 → J do
4: repeat
5: generate θ ∗ ∼ πϵg−1 (θ | x)
∗
6: compute piθ = Eθ ∗ [gi (X)] , for i = 1, . . . , n
∗
7: fit the isotonic regression model e pi = f (piθ ) + ei , for i = 1, . . . , n
∗ i ∗ i
θ∗ θ∗
h h
θ θ
8: compute d e p1:n , f p1:n = RMSE ep1:n , f p1:n + Reg1 e p1:n , p1:n + Reg2 e
p1:n , p1:n .
h ∗ i
θ
9: until d ep1:n , f p1:n < ϵg
g g
10: set θj = θ ∗ and dj = d ∗
11: end for
P
J
g 2 −1
12: find ϵg ≤ ϵg−1 so that ess
c= j=1 wj ≈ J/2, where
g
g
π(θj )
wj ∝ g Idj <ϵg , j = 1, . . . , J
πϵg−1 (θj | D)
PJ g g
13: compute πϵg (θ | D) = j=1 wj KH (θ − θj )
14: end for
16
the computational time for the algorithm increases with higher values of R, J and G. Therefore, the choice of
suitable values for G, J, and R can be made in consideration of a predetermined computational time budget. A
practical solution to set G is to stop the algorithm whenever the difference between two consecutive tolerance
levels is lower than some threshold ∆ϵ or if we reach a minimum tolerance level ϵmin . Convergence results for
ABC algorithms are readily available in the litterature as discussed in Remark 3.3.
Remark 3.3. The final output of our ABC algorithm is the following expression:
J
X
πϵG (θ|D) = wjG KH (θ − θjG ), (9)
j=1
which represents an approximate posterior distribution of θ given an iid sample x1:R of X. The data D can be
interpreted as a set of summary statistics calculated based on a sample x1:R of size R, which corresponds to the number
of Monte Carlo replications used to calculate the pure premium. Several types of convergence can be studied:
1. R → ∞
2. ϵG → 0
3. J → ∞
h ∗ i
θ
Convergence (1) determines the accuracy of d ep1:n , f p1:n when using simulation to evaluate the pure premium
θ
p1:n . These are Monte Carlo estimators of Eθ [gi (X)] for i = 1, . . . , n, and convergence occurs at a rate proportional to
√
1/ R.
Regarding convergence (2), as ϵG decreases toward 0, our ABC estimator converges to the distribution of θ conditional
on D. This is equivalent to the true posterior distribution only if D consists of sufficient statistics for the risk model
X. Rubio and Johansen [19, Proposition 2] justify the use of non-sufficient statistics, under certain conditions, to
make inferences about θ. Their results remain valid when using a kernel density estimator such as (9) to represent the
2
posterior distribution. The bias of the ABC posterior has been estimated to be O(ϵG ) in Barber et al. [2].
Convergence (3) pertains to the empirical probability measure approaching the true probability measure. Central
limit theorems for this convergence are discussed in Del Moral et al. [7] and Del Moral et al. [9].
We illustrate the posterior distribution evolution along the algorithm iterations in Example 6.
X ∼ Poisson(λ) − LogNorm(µ = 0, σ ).
17
The prior asumptions are as follows
The algorithm halts at the 9th generation reaching a tolerance level of ϵ = 0.87. Figure 6 shows the sequence
intermediate posterior distributions for λ and σ .
4
1.00
3
0.75
Generation Generation
1 1
2 2
Density
Density
3 3
0.50 4 2 4
5 5
6 6
7 7
8 8
9 9
0.25 1
0.00 0
0.0 2.5 5.0 7.5 10.0 0 1 2 3 4 5
λ σ
s
(a) pϵg (λ|D), g = 1, . . . , 9. (b) pϵg (λ|D), g = 1, . . . , 9.
After the algorithm terminates, it is customary to focus on the last generations of particles for inference.
Pointwise estimators are derived from this final set of particles. Two commonly used estimators include the
Mean A Posteriori (map) obtained by averaging the particles in the last cloud and the Mode A Posteriori (mode),
which is the mode of the empirical distribution within the final cloud of particles. The simulation study,
conducted in the following section, is designed to investigate the convergence behavior and to compare the
characteristics of the map and mode estimators.
18
4 Methodology Assessment via Simulation
In this section, we embark on an empirical exploration, seeking to understand how the posterior distribution
of the parameters behaves as the sample size n increases. This experimentation has been designed to resemble
as much as possible the real data situation considered in Section 5.3. We consider the risk, within a particular
risk class, to be distributed as the random variable
N
X
X= Uk ,
k=1
where
N ∼ Poisson(λ = 0.3), (10)
and
Uk ∼ LogNorm(µ = 6, σ = 1), k = 1, . . . , N . (11)
The Ui ’s are iid and independent from N . We suppose that we know the variance parameter σ and we try
to draw inference on λ and µ. The parameter values of the claim frequency and severity in (10) and (11)
respectively are those infered in Section 5.3 for the Poisson − LogNorm model using the mode estimator. The
prior distributions are set to independent uniforms for λ and µ as
We generate artificial synthetic commercial premiums for this case study according to
where the premium parameters r, d and l are sampled from that of the real data considered in Section 5, so
that the simulated data is as close as possible to the real data. The ηi ’s are iid from ηi ∼ Unif([1.43, 2.5]), which
corresponds to loss ratios between 40% and 70%. We further set LRlow = 40% and LRhigh = 70%. We consider
sample of sizes 25, 50, 100, and 200. We configure the algorithm with a population size of J = 1, 000 and use
R = 2, 000 Monte Carlo replications. To ensure the algorithm’s efficiency, we set a stopping threshold, requiring
that the difference between two consecutive tolerance levels is smaller than ∆ϵ = 1 for the algorithm to halt.
These settings are kept for the analysis of real-world data, as they strike a balanced compromise between
accuracy and computing time. We generate 100 samples of fake data and apply our procedure. Our goal is to
compare the result obtained using our two pointwise estimators: the mean a posteriori map and the mode a
posteriori mode. The estimators of the parameters λ and µ are given on Figure 7.
19
λ µ
7
6
2
map map
mode mode
5
1
4
0
25 50 100 200 25 50 100 200
Sample size Sample size
Figure 7: map and mode estimators of the parameter of the model Poisson(λ = 0.58) − LogNorm(µ = 5.75, σ = 1)
based on synthetic market data of sizes 25, 50, 100, and 200.
Both of the point-wise estimators seem to converge toward the parameter values that generated the data. The
map exhibits a better behavior than the mode as its variability decreases in a notable way as the sample size
increases.
In Figure 8, we present a comparison of key metrics, including the average claim amount, the average claim
frequency, the probability of no reported claims, the average total claim amount, and the average loss ratio ,
defined as
n
1 X pi
LR = .
n pi
e
i=1
Both estimation methods yield satisfactory results in recovering the characteristics of the loss distribution but
the use of the map yields more reliable estimations.
20
Claim Amount Claim Frequency
1500
2
1000
map map
mode mode
1
500
0 0
25 50 100 200 25 50 100 200
Sample size Sample size
0.8
0.75 280
0.7
0.50 240
map map map
mode mode 0.6 mode
0.25 200
0.5
160
0.00
25 50 100 200 25 50 100 200 25 50 100 200
Sample size Sample size Sample size
(c) Posterior distribution of P(N = 0) (d) Posterior distribution of E(X) (e) Posterior distribution of LR
Figure 8: map and mode estimator of the features of the Poisson(λ = 0.3) − LogNorm(µ = 6, σ = 1) loss model
based on synthetic market data of sizes 25, 50, 100, and 200.
Pet insurance is a product designed to cover the costs of veterinary care for pets. It operates on a similar
principle to human health insurance, providing a way for pet owners to manage the financial risks associated
with unexpected medical expenses for their animals. Usually the expenses are covered in case of an accident
or a disease. Pet owners can choose from different policy options based on their budget and coverage needs.
Policies may vary in terms of deductibles (d), coverage limits (l), and coverage rates (r). The cost of premiums
can depend on various factors, including the pet’s age, breed, health condition, and the level of coverage
selected.
21
The pet insurance market has been witnessing significant growth globally, driven by increasing pet ownership
(especially with so-called pandemic pets, i.e. animals adopted during 2020 lockdowns), rising veterinary costs
and the changing role that a pet plays in a families social structure. This latter factor is also influenced by
changing societal views and the increased awareness of the importance of health and welfare of pets, which
in turn comes with increased consideration of regular veterinary health checks . In order to offset the cost
associated with such expenditures, there has begun to be a broader interest in households purchasing pet
insurance.
To date, the adoption and acceptance of pet insurance still varies significantly across regions of the world.
Nordic countries, such as Sweden, have historically had a very high penetration rate with around 70% of
pets insured. Some Anglo-Saxon countries (UK and Germany mostly) have seen significant growth in the pet
insurance market during the last decades, leading to 30% of penetration rate. Other developed countries, like
France, have significantly lower market sizes, with less than 10% of pets that are insured, which suggests high
growth potential. The market place for pet insurance in the USA is currently also experiencing sustained
growth. According to MarketWatch nationwide survey3 , about 44.6%, of pet owners stated they currently have
pet insurance. The North American Pet Health Insurance Association (NAPHIA) conducted a survey for its
2022 State of the Industry Report and found that over 4.41 million pets were insured in North America in 2021,
up from 3.45 million in 2020. The report also revealed that pet insurance premiums totaled $2.84 billion in
2021, marking a 30.5% increase from the previous year.
This growth continues to spur increases in the capital investments associated with such an insurance line of
business:
• in Sweden, Lassie has raised 11m euros in 2022 and 23m euros in 2023 ;
• in the UK, ManyPets has raised $350m at a valuation higher than $2bn in 2021;
• JAB Holding Company has invested around 2 billion dollars in 2021 to create the Pinnacle Pet Group
and the Independence Pet Holdings. Their purpose is to become the pet insurance leaders respectively in
Europe and North America through multiple acquisitions of historic players.
Hence, the pet insurance market is becoming more competitive with an increasing number of insurance
companies or brokers offering pet insurance policies. To capture new market share as a new agent, it is essential
3 https://www.marketwatch.com/guides/pet-insurance/pet-insurance-facts-and-statistics/
22
to offer differentiated products, such as coverage combinations with no deductibles and higher limits.
During the week of the 18th of May 2024, we have collected 1, 080 quotes from 5 insurance companies. Each row
of our datasets corresponds to a yearly premium collected from some insurance company website associated
to a specific insurance coverage and a specific dog. We therefore find the coverage parameters which are the
coverage rate r, the deductible d and the limit l. Recall that the compensation for an annual expense of amount
X is calculated as min [max(r · X − d, 0), l]. We also have the rating factors which reduces for pet insurance in
France to specie, breed, age and gender. Table 1 provides a list of the variables in the datasets.
It means that we have 90 quotes to study each risk class. Figure 9 provides a visual overview of the range of
insurance coverage options available in the pet insurance market.
23
Risk class # specie breed gender age
1 dog australian sheperd female 4 months
2 dog australian sheperd female 2 years
3 dog australian sheperd female 4 years
4 dog french bulldog female 4 months
5 dog french bulldog female 2 years
6 dog french bulldog female 4 years
7 dog german sheperd female 4 months
8 dog german sheperd female 2 years
9 dog german sheperd female 4 years
10 dog golden-retriever female 4 months
11 dog golden-retriever female 2 years
12 dog golden-retriever female 4 years
15
60
10
Frequency
Frequency
Frequency
10
40
5
5 20
0 0 0
0.6 0.8 1.0 0 50 100 150 2000 4000 6000
Coverage Rate Deductible Limit
(a) Rates of coverage of the insurance poli- (b) Deductible of the insurance policies (c) Limits of the insurance policies
cies
Figure 9: Overview of the insurance coverages offered by the five insurance companies operating in the French
market under study.
Remark 5.1. This analysis assumes that all insurance companies cover the same type of risk. While this assumption
may not always hold in general insurance markets, it was valid for the French pet insurance market in May 2024, when
the data for this study was collected. At that time, most insurers offered very similar coverage, with nearly identical
exclusion clauses in their contracts. Only one outlier, a company that imposed sub-limits on specific procedures,
was identified and excluded from this study. This homogeneity reflects the nascent stage of the French pet insurance
market, where insurers tend to adhere to similar guidelines, and product innovation and differentiation have yet to
emerge.
24
We conduct two separate studies. In Section 5.3, we focus on a specific risk class associated to a female, 4 years
old, australian sheperd. Several claim models are compared. One is selected to look into the pricing strategies
of the actors. In Section 5.4, we look into the quotes of various risk classes that we investigate using a single
model.
We study 90 quotes from 5 insurers operating in the pet insurance market for a specific risk class associated
to a 4-year-old female Australian Shepherd. We need to make some parametric assumptions to model claim
frequency and severity. Classical claim frequency distributions include the Poisson, Binomial, and Negative
Binomial distributions, which allow us to accommodate equidispersion, underdispersion, and overdispersion,
respectively. For claim severity, we have chosen the Gamma and Lognormal distributions. The Gamma
distribution is a common choice for modeling claim severity when using generalized linear models, but it is
characterized by a light tail. The Lognormal distribution has a thicker tail, making larger claim sizes more
likely to occur. We limit ourselves to two-parameter models: one parameter for claim frequency and another
for claim severity. So we take the Poisson(λ) to model the claim frequency4 with the following prior setting:
We consider three claim severity distributions including LogNorm(µ = 0, σ ), LogNorm(µ, σ = 1), and Gamma(α, β =
1). The prior settings over the parameters of the claim size distributions are as follows:
Combining the distributions for the claim frequency and severities results in a total of 3 loss models. The
population size in the abc algorithm is set to J = 1,000. The pure premiums are computed using R = 2,000
Monte Carlo replications. The algorithm stops whenever the difference between two consecutive tolerance
levels is lower than ∆ϵ = 1. The bounds for the loss ratio corridor are set to LRlow = 40% and LRhigh = 70%.
The posterior distributions of the parameters for each model are provided in Figure 10.
4 Note that the IsoPriceR package accomodate also the binomial and negative binomial distributions.
25
lambda lambda alpha
0.6 0.006
10
0.4 0.004
5 0.002
0.2
0 0.0 0.000
density
density
density
0.25 0.30 0.35 0.40 3 4 5 6 7 1200 1300 1400 1500
mu sigma lambda
5
3 30
4
2 3 20
2
1 10
1
0 0 0
5.9 6.1 6.3 6.5 2.8 2.9 3.0 3.1 3.2 0.14 0.16 0.18 0.20
Figure 10: Posterior distribution of the parameters of the loss models when fitted to the pet insurance dataset.
For all the models, the algorithm updates the prior distribution in an informative way. Table 4 provides the
tolerance levels (ranked in increasing order) during the last iteration of the abc algorithm for the loss models.
Model ϵ
Table 4: Tolerance level during the last iteration of the abc algorithm fo each loss model
The final tolerance levels lies between 93.72 and 113.15 which is higher than the tolerance obtained in the
simulation study which was around 33 for 50 data points and 50 for 200 data points. This discrepancy indicates
misspecifications which stem from our assumptions about insurance companies adhering to the expectation
principle for premium calculation and the models employed for claim frequency and claim amounts. Table 5
reports the estimations of the parameters of all the model using the map and the mode.
Table 6 reports the estimations of the average total claim amounts and the average loss ratio for all the models
for all models when fitted using the map and the mode.
We note that the risk level, characterized here by the expected total claim amount, is similar for all the models,
maybe a bit higher for the model having the LogNorm(µ = 0, σ ) as claim sizes distribution. We further look
into the loading function approximated via the isotonic regression. We estimate the pure premium for each
model using the map as an estimator of the model parameters and we plot the isotonic regression function to
26
Model map mode
Table 5: map and mode estimator for the parameters of the loss models.
Table 6: map and mode estimators of the average loss ratio and average total claim amounts.
Commercial premium
500
400
300
200
Models
Pois(λ) − Lognormal(µ, σ = 1) Pois(λ) − Gamma(α, β = 1) Pois(λ) − Lognormal(µ = 0, σ)
Figure 11: Isotonic link between pure and commercial premium for the different loss models.
The isotonic fits of the loading function accross all the models are similar which means that the models all
agree on a common ordering of the pure premiums of the various insurance coverages. To highlight the
explanatory power of our methodology, let’s focus on the Poisson(λ) − LogNorm(µ, σ = 1) loss model. Note that
the choice of the loss model is somewhat arbitrary because the information extracted from the data in Figure 11
27
is relatively consistent across the considered models. In Figure 12, we present a plot that illustrates the
relationship between the commercial premium and the pure premium for the Poisson(λ) − LogNorm(µ, σ = 1)
model. Different insurance companies are indicated by distinct colors, providing a visual representation of
each company’s respective rates.
Commercial premium
600
400
200
Figure 12: Commercial premium as a function of the pure premium for the Poisson(λ) − LogNorm(µ, σ = 1)
depending on the insurance carrier.
The accuracy of the loss model fitting enables us to condense the three-dimensional information of the rate
of coverage, deductible, and limit into a single metric: the pure premium. Subsequently, isotonic regression
unveils the relationship between commercial and pure premiums, providing a link between the two. The
distinctions among various players in the pet insurance market come to light through the color-coded points,
offering insights into the pricing strategies adopted by industry participants.
The Poisson(λ) − LogNorm(µ, σ = 1) model is fitted to the data within each risk classes (90 quotes) of Table 3.
The prior settings are given by
The algorithm’s hyperparameters are similar to that of the previous subsection with
28
Figure 13 shows the posterior predictive distribution of the expected total claim amounts and the averaged
loss ratio within each risk class.
E(X) LR
500
0.7
200
0.5
d
d
d
d
r
r
er
er
ve
ve
og
og
er
er
ep
ep
rie
rie
ep
ep
ld
ld
sh
sh
l
l
et
et
sh
sh
bu
bu
−r
−r
n
n
an
an
lia
lia
h
h
en
en
nc
nc
ra
ra
rm
rm
ld
ld
fre
fre
st
st
go
go
ge
ge
au
au
Breed Breed
(a) Posterior predictive distribution of E(X) (b) Posterior predictive distribution of the average loss
ratio
Figure 13: Posterior predictive distribution of E(X) and average loss ratio within each risk class.
Figure 13a allows us to compare the different risk classes. We note that an older dog is more expensive on
average and that the breeds may be ordered as Australian Sheperd, Golden-Retriever, German sheperd and
french bulldog in terms of riskiness. Figure 13b indicates that the loss ratios are arround 62 − 65% for all the
risk classes.
6 Conclusion
We have developed a robust methodology for risk assessment based on market data for pet insurance. We
employ a one-parameter model for the claim frequency and claim size distribution, connecting the pure
premium to the commercial premium through an isotonic regression model. This approach optimizes the
alignment between commercial and pure premiums while providing a framework for quantifying the associated
parameter uncertainty through an Approximate Bayesian Computation algorithm.
The methodology’s effectiveness and reliability have been validated within a simulation study and a practical
application to an actual pet insurance dataset. This methodology is made accessible to the community through
29
our R package, IsoPriceR5 .
While this paper focuses on the specific context of pet insurance—a market that has experienced recent
growth and significant capital investments—we believe that our methodology can be extended, with minor
modifications, to other insurance products with straightforward compensation schemes. For example, this
approach could be applied to unemployment benefits insurance, where the compensated risk would be
redefined as follows: x represents the number of days of actual unemployment, r the daily compensation
amount, while d and l continue to denote the deductible and guarantee limit, respectively.
However, our methodology may face limitations when applied to more complex insurance products and pricing
structures, such as those commonly used in home or vehicle insurance. These products often rely on generalized
linear models with high-dimensional covariates for tariff generation. In contrast, our research focuses on
identifying only two parameters within a specific risk class. The latter may be based on several categorical
variables. Commercial quotes will be needed within each risk classes making the data colllection impossible
without a proper automation of the process. Extending the problem to a high-dimensional parameter space,
such as 30 dimensions, would present significant challenges. Further research is needed to assess the feasibility
of applying our core concept—leveraging commercial quotes to retro-engineer risk components—in such
settings.
While we observe empirical convergence in our specific case, this work does not provide theoretical guarantees
of convergence or an estimation of the data volume required to achieve it.
Finally, this paper primarily addresses risk assessment prior to the launch of a pet insurance product. Once
launched, the company will begin collecting individual-level data. To improve pricing accuracy, the integration
of this historical data should be considered as it becomes available. Consequently, a promising avenue for
future research lies in developing a credibility framework that combines historical and market data, offering a
comprehensive approach to risk assessment and pricing in emerging markets.
Acknowledgements
Pierre-O’s work is conducted within the Research Chair DIALOG under the aegis of the Risk Foundation, an
initiative by CNP Assurances. His research is also supported by the ANR project DREAMES.
5 see the market_based_insurance_ratemaking Github repository
30
References
[1] Katrien Antonio and Emiliano A. Valdez. Statistical concepts of a priori and a posteriori risk classification in
insurance. AStA Advances in Statistical Analysis, 96(2):187–224, February 2011. ISSN 1863-818X. doi: 10.1007/
s10182-011-0152-7.
[2] Stuart Barber, Jochen Voss, and Mark Webster. The rate of convergence for approximate bayesian computation.
Electronic Journal of Statistics, 9(1), January 2015. ISSN 1935-7524. doi: 10.1214/15-ejs988.
[3] Richard E Barlow, HD Brunk, Daniel J Bartholomew, and James M Bremner. Statistical inference under order restric-
tions.(the theory and application of isotonic regression). 1972.
[4] Mark A Beaumont, Jean-Marie Cornuet, Jean-Michel Marin, and Christian P Robert. Adaptive approximate Bayesian
computation. Biometrika, 96(4):983–990, 2009.
[5] Christopher Blier-Wong, Hélène Cossette, Luc Lamontagne, and Etienne Marceau. Machine learning in p&c insurance:
A review for pricing and reserving. Risks, 9(1):4, dec 2020. doi: 10.3390/risks9010004.
[6] Thomas A Dean, Sumeetpal S Singh, Ajay Jasra, and Gareth W Peters. Parameter estimation for hidden markov
models with intractable likelihoods. Scandinavian Journal of Statistics, 41(4):970–987, 2014.
[7] Pierre Del Moral, Arnaud Doucet, and Ajay Jasra. Sequential monte carlo samplers. Journal of the Royal Statistical Society
Series B: Statistical Methodology, 68(3):411–436, May 2006. ISSN 1467-9868. doi: 10.1111/j.1467-9868.2006.00553.x.
[8] Pierre Del Moral, Arnaud Doucet, and Ajay Jasra. An adaptive sequential Monte Carlo method for approximate
Bayesian computation. Statistics and Computing, 22(5):1009–1020, 2012.
[9] Pierre Del Moral, Arnaud Doucet, and Ajay Jasra. On adaptive resampling strategies for sequential monte carlo
methods. Bernoulli, 18(1), February 2012. ISSN 1350-7265. doi: 10.3150/10-bej335.
[10] D. Dickson. Principles of premium calculation. In Insurance Risk and Ruin, pages 38–51. Cambridge University Press,
jan 2005. doi: 10.1017/cbo9780511624155.004.
[11] Pierre-Olivier Goffard and Patrick J. Laub. Approximate bayesian computations to fit and compare insurance loss
models. Insurance: Mathematics and Economics, 100:350–371, sep 2021. doi: 10.1016/j.insmatheco.2021.06.002.
[12] C. Gourieroux, A. Monfort, and E. Renault. Indirect inference. Journal of Applied Econometrics, 8(S1):S85–S118, dec
1993. doi: 10.1002/jae.3950080507.
[13] Lars Peter Hansen. Large sample properties of generalized method of moments estimators. Econometrica, 50(4):1029,
July 1982. ISSN 0012-9682. doi: 10.2307/1912775.
31
[14] Augustine Kong, Jun S. Liu, and Wing Hung Wong. Sequential imputations and Bayesian missing data problems.
Journal of the American Statistical Association, 89(425):278–288, mar 1994. doi: 10.1080/01621459.1994.10476469.
[15] Ronny Luss and Saharon Rosset. Generalized isotonic regression. Journal of Computational and Graphical Statistics, 23
(1):192–210, 2014.
[16] Gareth Peters and Scott Sisson. Bayesian inference, monte carlo sampling and operational risk. Peters GW and Sisson
SA (2006)“Bayesian Inference, Monte Carlo Sampling and Operational Risk". Journal of Operational Risk, 1(3), 2006.
[17] Gareth W Peters, Mario V Wüthrich, and Pavel V Shevchenko. Chain ladder method: Bayesian bootstrap versus
classical bootstrap. Insurance: Mathematics and Economics, 47(1):36–51, 2010.
[18] Arthur E. Renshaw. Modelling the claims process in the presence of covariates. ASTIN Bulletin, 24(2):265–285, 1994.
doi: 10.2143/AST.24.2.2005070.
[19] FJ Rubio and Adam M Johansen. A simple approach to maximum intractable likelihood estimation. Electronic Journal
of Statistics, 7:1632–1654, 2013.
[20] S. SASABUCHI, M. INUTSUKA, and D. D. S. KULATUNGA. A multivariate version of isotonic regression. Biometrika,
70(2):465–472, 1983. ISSN 1464-3510. doi: 10.1093/biomet/70.2.465.
[21] Scott A Sisson, Yanan Fan, and Mark Beaumont. Handbook of Approximate Bayesian Computation. Chapman and
Hall/CRC, 2018.
[22] Mario V. Wüthrich and Johanna Ziegel. Isotonic recalibration under a low signal-to-noise ratio, 2023.
where
ai ∼ Unif(5, 10]), bi ∼ Unif(2, 6]), and c = 2 for i = 1, . . . , n.
This is a Gompertz growth curve type of link. The commercial premiums as a function of the pure premium is
shown on Figure 14.
We further compare the residuals of the isotonic regression model fitted to the data of Figures 4 and 14 to that
of a linear regression model fitted to the same data on Figure 15 We note the proximity of the two models
32
10.0
Commercial premium
7.5
5.0
2.5
1 2 3 4 5
Pure premium
Figure 14: Isotonic link between the pure and commercial premiums.
2
Residuals
0 Regression model
Isotonic
Linear
−2
−4
Gompertz Linear
Link function
Figure 15: Boxplot of the residuals of the linear and isotonic regression models fo a linear and a Gompertz
type link between pure and commercial premiums.
when the link between the pure and commercial premium is linear. When the link is not linear then isotonic
regression model outperforms linear regression.
33
B Other premium principle
This paper focuses on the expectation premium principle as we try to inform the link f between the commercial
premium e
p and the pure premium p = E[g(X)]. Other premium principles such as the standard deviation
principle can be considered by slightly adapting the method. Under such principle we have
p
p = f E[g(X)], V [g(X)] .
e (14)
where f : R+ × R+ 7→ R+ . The same methodology applies, we simply need a bivariate model for f . The
commercial premium should be increasing whenever the pure premium or the variance of the risk increases
which leads to consider generalization of the univariate isotonic regression models which are readily available
in the literature see the work of SASABUCHI et al. [20]. More sophisticated premium principles such as the
Escher principle or the utility indifference principle are also possible. Premium principles are described at
length in actuarial science textbooks such as Dickson [10]. Considering a premium principle instead of another
leads to model misspecification and will impact the final estimates of the underlying risk.
34