0% found this document useful (0 votes)
35 views44 pages

SSRN Id3465424

The document discusses using machine learning techniques to model insurance claims experience and reporting delays for pricing and reserving purposes. It reviews traditional modeling approaches and introduces a new combined model that accounts for reporting delays within the model formulation. The authors illustrate the new approach on data from a reinsurer and show it can produce more accurate results than traditional methods when using machine learning algorithms like gradient boosted trees and deep learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views44 pages

SSRN Id3465424

The document discusses using machine learning techniques to model insurance claims experience and reporting delays for pricing and reserving purposes. It reviews traditional modeling approaches and introduces a new combined model that accounts for reporting delays within the model formulation. The authors illustrate the new approach on data from a reinsurer and show it can produce more accurate results than traditional methods when using machine learning algorithms like gradient boosted trees and deep learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Using machine learning to model claims experience

and reporting delays for pricing and reserving


By L Rossouw and R Richman

Presented at the Actuarial Society of South Africa’s 2019 Convention


22–23 October 2019, Sandton Convention Centre

ABSTRACT
In this paper we review existing modelling approaches for analysing claims experience in the
presence of reporting delays, reviewing the formulation of mortality incidence models such as
GLMs. We then show how these approaches have traditionally been adjusted for late reporting
of claims using either the IBNR approach or the more recent EBNER approach. We then go
on to introduce a new model formulation that combines a model for late reported claims
with a model for mortality incidence into a single model formulation. We then illustrate the
use and performance of the traditional and the combined model formulations on data from
a multinational reinsurer. We show how GLMs, lasso regression, gradient boosted trees and
deep learning can be applied to the new formulation to produce results of superior accuracy
compared to the traditional approaches.

KEYWORDS
Machine learning; IBNR; incurred but not reported; experience analysis; reinsurers; EBNER;
analytics; gradient boosted trees; deep learning; mortality models; pricing and reserving

CONTACT DETAILS
Mr Louis Rossouw, Gen Re, Cape Town; Email: [email protected]
Mr Ronald Richman, QED, Johannesburg; Email: [email protected]

ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019 | 1

Electronic copy available at: https://ssrn.com/abstract=3465424


2 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

1. INTRODUCTION
A fundamental part of the actuarial management of a life insurance portfolio is
setting and updating mortality rates, morbidity rates and other assumptions, to price
the products sold within the portfolio and to value the cashflows resulting from the
portfolio. In much of actuarial practice, these assumptions are set in a relatively simple
manner, by considering the ratio of the actual claims occurring in a portfolio to the
expected claims (derived from a standard table) and adjusting the rates accordingly
(although more technically advanced methods to derive these assumptions have been
proposed, see, for example Tomas and Planchet (2014)). As part of deriving these rates,
an allowance for Incurred but not Reported (IBNR) claims must be made, especially
for the most recent years of experience, or else the rates might be biased too low. This
is normally done in a relatively heuristic manner, by first deriving an estimate of the
IBNR claims, and then applying a correction to the observed deaths.
In this paper, we consider how the assumption setting process can be enhanced,
firstly by combining the models used for estimating rates together with the model
used for estimating IBNR claims, and secondly, by applying several machine and deep
learning techniques, which we show enhance the predictive performance beyond what
is achievable using traditional models. Since we include adjustments for IBNR claims
within a single model explicitly, we make an allowance for these claims in a statistically
principled manner. Thus, we illustrate how one can derive mortality and morbidity
rates on a large-scale, and apply this new technique to a large real-world dataset of
claims from two countries and four different companies.
The rest of the paper is organised as follows. In Section 2, we provide background
on the models used by actuaries to estimate mortality and morbidity rates, as well as on
the IBNR models used within this process. In Section 3, we define the combined model
and point out similarities between this model, and the traditional models previously
described. Section 4 describes the data used to illustrate the models, and Section 5
provides details on the methodology followed to train the models. In Section 6, we
illustrate the results produced and Section 7 is a discussion of these results. Finally, in
Section 8, we conclude and discuss avenues for future research.

2. THE BACKGROUND AND DEFINITIONS


2.1. Approaches to modelling claims experience
Actuaries usually model claims experience as follows:
qx
mx =
Ex

Where mx is the central rate of mortality, θx is the number of deaths, and Ex is the
central exposed to risk. This is the Poisson model of mortality with θx having a Poisson
distribution. x is usually thought of as representing the age of the individual but, in
many cases, could represent all the various rating factors being considered, including
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 3

say, age, gender, smoker status etc. We will use x to represent a vector of variables
including age:
x = x0 , x1 ,¼, xn

with the convention that we set x0=1 to allow for an intercept term.
Based on the above formulation, actuaries have developed various models for
mortality, based on x. In this case, we represent the models by f:

mx » mx = f ( x )

In the case where x is age only for example we have the “simple” Gompertz-Makeham
formula:
mx = ae ( ) + c
b x-M

with a, b, c and M estimated using data.


In the case of generalised linear modelling (GLM) with a log link function, the
formula becomes:
mx = e åi xibi

Or more conveniently:
ln(mx ) = åi xi bi = xb

This is can be written another way by rather modelling θx , the number of deaths, as:

ln(qx ) » xb + ln( Ex )

This broadly corresponds to the typical formulation of such a GLM formula in R:

claims ~ 1 + x1 + x2 + x3 + x4 + offset(log(exposure))

The 1 above is added automatically by R but shown explicitly above. This is the
intercept term, β0.
When applying machine learning techniques, we are replacing the f(x) above with
various complicated functions of x that do not, for example, make the assumption of
additivity or have a linear relationship between the variables. However, these models
are similar to a GLM model, in the sense that the same vector x is input into a function
to produce an estimate of the number of deaths (indeed, one might interpret a GLM
model built primarily for prediction as an application of machine learning). However,
the parameterisations of most machine learning models are often more complicated
and difficult to interpret than GLM models, and we refer the reader to Lipton (2016)
for an extended discussion of the issue of model interpretability, which we address
further in Section 7.
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


4 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

2.2 Modelling expected claims


The above models can, in practice also be restated assuming some prior estimate of
mortality, mx¢ . This could be based on a standard table, or an existing basis.
Such models could be written as:
ln(qx ) » xb + ln(mx¢ Ex ).

e åi xibi then becomes a model of the actual experience, θx against the expected
experience, mx¢ Ex . Our new model would then be:

mx = mx¢ e xb .

In this model, the vector x is used to adjust the prior estimate of mortality to match the
experience, and is in principle similar to the Brass logit transform, see, for example,
Moultrie and Timæus (2013) who explain this technique in the context of estimating
population mortality rates.

2.3 Delayed reporting of claims


The above models need adjustment in the case of IBNR claims, which are late reported
and therefore may not be included within the actual experience, θx , at the time when
the assumption setting exercise is performed. It is well known that claims do not get
reported instantaneously and there is usually a period of time from when a claim
occurred to when the event gets recorded in the data used for estimating mortality
rates. Moreover, reinsurers may experience greater delays than primary insurers, since
an extra delay occurs between the time that a claim is reported to the primary insurers,
and the time this is reported to the reinsurer. These reporting delays result in fewer
claims being reported than have actually occurred, particularly for the most recent
periods within the investigation, thus leading to under-estimation of claims rates if the
late reported claims are not properly allowed for. A further requirement imposed by
regulators and accounting rules is for the provisioning of IBNR reserves at the end of
financial periods to correctly accrue for claims that may have occurred but have not
yet been reported.
Below we review several approaches used to allow for reporting delays in experience
analysis (and reserving), which generally attempt to estimate the losses that have
occurred but have not been reported using historical experience. In the following, we
refer to the period in which the loss occurred as the “loss period”, denoted by t, and
the period in which the claim is reported as the “delay period”, denoted by d. The delay
period is usually measured in the numbers of years elapsed since the loss occurred.
We illustrate the tabulation of claims in this format in Table 1, using a subset of the
data described in Section 4. In this case, the incremental claims occurring in each
delay period have been illustrated, since the method proposed below uses the claims in
this format. However, we note that most commonly used IBNR methodologies involve
cumulative claims, in other words, the running totals of each row.
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 5

   Table 1 Illustrative tabulation of claims


Delay period (d)
1 2 3
Loss period (t) 2007 1 229 293 12
2008 1 385 277 22
2009 1 341 373
2010 1 441

2.3.1 USING OLDER DATA


The simplest, and possibly most common, approach for allowing for late reported
claims in experience analysis is simply to ignore the most recent experience periods.
Actuaries may have an estimate for the typical delays in reporting of claims, and would
then pick data that is suitably old such that all late claims are likely to be reported already.
While this is a convenient way to analyse the experience, it can be problematic in
cases where there are longer delays and applying this method would mean that we
completely ignore say the last year or two of experience information we have.
Furthermore, this approach is not useful when the goal is to estimate the IBNR for
reserving purposes. In other words, if a reserve is required we cannot simply use old
data.

2.3.2 IBNR APPROACHES


A more sophisticated approach is to estimate explicitly the additional claims before
modelling and include them in the experience analysis and associated modelling.
We now extend the notation above to make explicit the issue of IBNR claims.
Usually, the actuary only has the claims which have been reported at the date of the
experience analysis, which we denote as qx¢ . Using various methods, which we describe
next, we then estimate the IBNR claims qx¢¢ and then model:
qx¢ + qx¢¢
mx = .
Ex
The estimates for qx¢¢ can be calculated in various ways. Typically, a run-off triangle is
created in aggregate for the portfolio (or major subdivisions of it) and this is used to
estimate the aggregate unreported claims q.¢¢. Although most of the literature on run-
off triangles refers to the issue of reserving for non-life claims, the methods applied for
estimating life claims are similar, and we refer the reader to Wüthrich & Merz (2008)
for more details. Since these approaches generally lead to an estimate of the aggregate
number of IBNR claims, we need to further estimate the breakdown of that IBNR by
x, such that q.¢¢= å x qx¢¢ . This could be done in proportion to the reported claims, qx¢ , or
in other ways, for example in proportion to mx¢ Ex where mx¢ is some prior estimate of
mortality available before the analysis.
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


6 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

This means we apply an aggregate run-off pattern that does not vary with x to
observed claims that do vary with x. However, we do know that the run-off pattern
may in fact vary with regard to x. For example, typically it is observed that larger
sum assured policies submit claims quicker. Thus, if sum assured (or sum assured
band) is included in the vector x, then the method may produce inaccurate estimates
of mortality for the larger sum assured policies (even though the aggregate IBNR
claims are estimated in an accurate manner). One could implement more complicated
methods to estimate the unreported claims for each combination of x (for example,
see Wüthrich (2018) who discusses this problem in the context of reserving using the
chain ladder method), but this results in a separate model for qx¢¢, that is estimated in
a different step from the model of mortality rates. In any event, the IBNR approach
discussed here results in a mortality model plus a run-off triangle approach, with
spreading of claims, or a more complicated estimation of IBNR for x, with the advantage
of directly producing an estimate of IBNR for reserving. Similarly, this approach is also
possible when modelling against a prior estimate of expected claims.

2.3.3 EXPOSED BUT NOT EXPECTED TO BE REPORTED (EBNER) APPROACHES


In Lewis & Rossouw (unpublished) an alternative approach was developed. This
q
approach focuses on adjusting the denominator of the equation x , as opposed to
Ex
the numerator.
In this method, we estimate the completeness of reported claims and use this
estimate to adjust the exposure measure used in the analysis of mortality rates. This
measure of completeness can be estimated using run-off triangles, providing estimates
of the completeness of reporting by loss period, say Rt , with 0 £ Rt £ 1. For most recent
periods t, Rt is closer to 0 and for periods far in the past Rt =1. This is because fewer
claims are reported in the recent periods and thus those periods are less complete.
Periods far in the past tend to be complete.
Typically, we apply an aggregate run-off to all cells of our analysis using the model:
qx¢
mx =
E x Rx
The EBNER approach has a natural interpretation in this context of modelling actual
claims against expected claims, which is that we are modelling actual reported claims
versus expected reported claims, and a precedent for the EBNER method is the non-
life reserving Cape-Cod method of deriving loss ratios from reported claims, which
estimates the loss ratio as the ratio of incurred claims to earned premiums that have
been adjusted for the expected completeness of reporting. See Gluck (1997) for an
extended discussion.
The EBNER approach has some advantages when using credibility-based
approaches to weigh the observed mx with prior assumptions as it does not create
additional credibility by increasing the expected claims or the actual claims by
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 7

adding IBNR. This is especially useful when trying to review time-based trends in
mx. To illustrate by example, let’s assume the last period is 50% complete in terms
of reporting with 50 claims reported. Let’s assume that credibility is based on actual
claims. Following an IBNR methodology results in 100 claims in the last period which
may push any credibility somewhat artificially higher (based on 50 estimated claims).
An EBNER-based approach would only have 50 claims in the respective period and
would not increase the credibility associated by that period. The same holds true when
expected claims is used to set the credibility factor. The EBNER method would reduce
the expected claims in the period by 50% to compensate for the lower proportion of
claims that are expected to be reported and thus reduce the credibility contribution
for that period.
This approach requires a separate run-off triangle that, as mentioned above,
potentially over-simplifies the run-off patterns.
The EBNER approach does not explicitly produce estimates for IBNR reserves but
once we have modelled mortality, we can estimate IBNR as:

mx Ex (1- Rx ).

2.4 Run-off triangles as generalised linear models


One could use a GLM to model a run-off triangle as a function of time period, t, in
which the claim occurred as well as delay, d. In this case the delay period is 0 if the
claim is reported in the same period t as it occurred. d is 1 if it’s reported in period
t +1 etc. We note that within non-life reserving, a GLM formulation of the run-off
triangle is well known and has been shown to be consistent with the chain-ladder
model (Renshaw & Verrall, 1998).
To apply this method, the claims occurring in time period t and reported in time
period d, would be tabulated as θ t,d where we note that θ t,d is now the incremental
claims values (not the cumulative values as is usually the case with run-off triangles).
We could then fit a GLM by estimating θ t,d as:

ln(q t ,d ) » b0 + åxi¢ bi¢ + åx ¢¢j b j¢¢


i j

where i varies over all possible values of t and j varies over possible values of d. The xi¢
are indicator variables which are 1 at the corresponding time period t and 0 otherwise,
and the regression coefficients bi¢ can be interpreted as the relative level of claims
in each occurrence period. Similarly, x ¢¢j are indicator variables which are 1 at the
corresponding delay period d and 0 otherwise, and the regression coefficients b j¢¢ can
be interpreted as the percentage of claims reported in each period d.
As noted above, the above formulation has been shown to be equivalent to using a
basic chain ladder on a run-off triangle with the equivalent cumulative data. It would
produce the same estimates of claims for future values of t,d.
One could now generalise and replace the indicator variable construct with
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


8 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

alternative functions of t,d. While this formulation would break the equivalence to
the basic chain ladder method, it would allow other more complex formulations. For
example, the GLM model could be adapted to allow for an interaction between t and d,
which would allow for a changing run-off pattern over time (known as a calendar year
effect in non-life reserving), or additional variables could be included.

3.  COMBINED FORMULATION OF MORTALITY AND RUN-OFF MODELS


When one views the existing mortality models in 2.1 together with the formulation
of run-off triangles as a GLM shown in 2.4, an alternative representation of mortality
springs to mind, incorporating both models to produce a unified model of delayed
claims reporting and mortality.
We do so as follows:
qxt ,¢d qxt ,¢d
mxt ,¢d = =
Ext ,¢d Ext ¢
qxt ,¢d denotes the reported claims from time period t reported d time periods after t. It
is assumed that x has now been extended to x ¢ allow for the representation of t and
d, for example, using indicator variables for different t and d, or by entering these as
numerical variables into the model. We note that alternative extended representations
may be possible.
Ext ,¢d represents exposure, but it is important to note that exposure relates to the
period in which the loss occurred and, therefore, cannot be split by delay period. So,
we simply set the exposure associated with various delays to the same exposure values,
in other words, Ext ,¢d = Ext ¢ "d . Therefore, we derive the “partial” mortality rates in each
delay period d, which can then be added together to derive the overall mortality rate
experienced in the period. We note that this approach is similar to the incremental loss
ratio approach of Mack (2002), which derives a “partial” loss ratio in each development
period, and multiplies these with an overall estimate of earned premium to derive
IBNR reserves. We further note that some measure of t may already be included in
the existing modelling framework, most often relating to the year in which the loss
occurred and allowing the model to capture, for example, trends in mortality, and
that here we are making some of the other dimensions of time (development period)
explicit with this notation.
Now we model mtx,d as a function of the (now expanded) x ¢ :

mtx,¢d » mxt ,¢d = f ( x ¢)

Marginalising over the development period d produces the “full” mortality rate:

mx = åmtx,¢d
t ,d

Stating the above in other words, we are separately predicting the “partial” mortality
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 9

rate for each delay period, allowing our mortality model to capture both the overall
rate of mortality and also when the claim is likely to be reported.

3.1 Similarity to EBNER formulation


In the above formulation we had Ext ,¢d = Ext ¢ "d . This can be adjusted by reducing the
Ext ,¢d allowing for expected reported proportions for each delay period. In this case,
an estimate of the full mortality rate relating to a particular year of loss t would be
estimated within each development period.
In that case the formulation would, in aggregate, be similar to the EBNER
formulation shown before as we would have Ext ,¢d = Rxt ,¢d Ext ¢ . This would be equivalent
to the EBNER formulation where Rxt ¢ = å Rxt ,¢d .
d
This method then becomes similar to the EBNER method.

3.2 Modelling expected claims


In a similar way as before we could model expected claims instead of exposure. We
could also adjust the expected claims based on expected reporting patterns.

3.3 Potential advantages of this formulation


This formulation is already shown to be equivalent to the EBNER approach. We believe
this formulation has several advantages:
—— It simplifies the modelling approach as all parameters of the model are captured
in one model.
—— It is difficult to disentangle two separate models and to understand any potential
compromises or interactions between two models, for example, whether choices
in modelling the run-off triangle could cause the mortality model to fit poorly.
—— It also becomes harder to allow for the estimation of uncertainty, whereas this is
relatively easier in a unified model.
—— A formulation like this allows one to dig deeper into how the delay of claims
interacts with other parameters? Does the age of the policyholder affect the
claims delay? Does the policy year (duration in force) affect the delay? One can
then explicitly model these interactions.
—— Should this formulation be used together with a credibility approach, it has
similar advantages to the EBNER method in that it would not inflate the
credibility of periods that are not fully run-off.

On the other hand, this formulation increases the complexity of the mortality model,
especially when one starts model interactions between the claims reporting delay and
various other variables. However,this formulation allows one to easily apply advanced
machine learning techniques to model mortality and the associated claims reporting
delays and the interactions between them.

ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


10 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

4. DATA
To test the new formulation, data from a reinsurer were collected and combined.
The data included exposure and claims for four insurance portfolios from the United
Kingdom and South Africa. The data contained both mortality and critical illness
benefits. Delayed reporting is a bigger issue for a reinsurer as reinsurance claims
have added delays, as not only is the delay in reporting dependent on the delay of the
claimant reporting the claim to the primary insurer, but also further delays occur in
reporting the claim to the reinsurer from the insurer.

4.1 Data adjustments


The data was supplied by the reinsurer’s experience analysis teams in South Africa
and United Kingdom. A condition of using the data was that the reinsurer applied
some adjustments to ensure that commercial advantage was maintained. For example,
the data were adjusted by shifting some of the portfolios by calendar year, and not
all portfolios were similarly adjusted by calendar year. Furthermore, random shocks
were applied to exposure of various subsets of the data. For example, exposure from
portfolio A male smokers may have been increased or decreased by 20%, whereas
exposure from portfolio B, female non-smokers may have been increased by 10%.
We do not believe that these adjustments meaningfully affect the research outcomes,
only that the observed mortality rates and the patterns of relative experience do not
reflect the reality of the claims experience, but the general patterns are representative.

4.2 Data Transformations


The data were transformed to enable analysis as described above.
Data were adjusted such that exposure was repeated for each of 0 to 3 delay periods
(years in this case). Claims were allocated to the corresponding delay periods. In
this case an “ultimate” delay period of 3 was chosen slightly subjectively, similar to
the way one chooses the ultimate run-off period on a run-off triangle. It was based
on experience of what a reasonable run-off period would be that would only be
exceeded by a very small percentage of claims. We further illustrate how the data were
transformed by way of example. Traditional experience data may look as per Table 2.

      Table 2 Example data


Gender Age Exposure Claims
M 40 10 000 10
M 41 12 000 13

The data in Table 2 would be transformed as per Table 3. Note each exposure record
is repeated four times (for each delay period 0, 1, 2 and 3). If we were to sum the
total exposure it would now be four times the original exposure. Total claims would
however stay the same as we are only allocating claims to the appropriate delay period.
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 11

    Table 3 Example data transformed


Gender Age Exposure Delay Claims
M 40 10 000 0 7
M 40 10 000 1 2
M 40 10 000 2 1
M 40 10 000 3 0
M 41 12 000 0 9
M 41 12 000 1 3
M 41 12 000 2 0
M 41 12 000 3 1

4.3 DESCRIPTION OF DATA FIELDS


The fields are shown in Table 4; Tables 5 and 6 show the benefit and product code
descriptions.

Table 4 Description of data fields


Field Description
company Label for portfolio (A–D)
benefit_code Benefit code describing type of benefit (see table 2)
gender Male (“M”) or Female (“F”)
smoker_status Non-smoker (“N”) or Smoker (“S”)
country “UK” or “ZA”
product_code Product codes (see table 3)
joint_life Joint life indicators. UK products include Joint Life First Death products (“JLFD”)”. Otherwise
single life.
rated 0 for standard, 1 for extra mortality loading, 2 for per mille loading. Policies with per mille
loadings were excluded.
policy_year Curtate years since policy start. 0, 1, …, 5 (capped at 5). Label changes on policy anniversary.
calendar_year Calendar year for exposure and occurrence of claim, denoted by t above
underwriting_year Policy commencement year.
current_age Exposure label while age last birthday. Label changes on birthdays
sum_assured_band Sum assured bands in GBP (1000s). South African data approximately converted to same bands.
delay Number of calendar years late reported, denoted by d above. E.g. claim in 2018 reported
in 2019 would be 1. Note this delay is calculated by taking the calendar year the claim was
reported less the calendar year the claim occurred. Delay was capped at 3.
reporting_year After applying maximum delay, the reporting year was calculated as calendar_year + delay.
exposure Central exposed to risk.
claims Number of claims

ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


12 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

Table 5 Description of benefit codes


benefit_code Description
DEATH Death benefit with no accelerators attached to the policy.
DEATH_ACC Death benefit with attached accelerators. The attached accelerators could be critical illness,
lump sum disability or both. Where benefits are recorded in this fashion, the claims only
include the death claims and not the claims of the attached accelerator.
ACI Accelerated critical illness benefit. Note that where the benefit was recorded in this way the
claims for these benefits include critical illness claims as well as death claims. In the data
available to researchers the claims were not differentiated by type.
SACI Stand-alone critical illness benefits not accelerating death benefit. These benefits only include
critical illness claims.

     Table 6 Description of product codes


product_code Description
WL Whole life product
LTA Level term assurance product
DTA Decreasing term assurance product

4.4 Expected claims


Central rates taken from the UK CMI “08” series mortality tables were added to the
data from the T08 and AC08 tables. These tables vary by gender, smoker status, age
and duration. See Institute and Faculty of Actuaries (2017) for details on these tables.
AC08 was used as an expected rate for benefit codes ACI and SACI. T08 was used as
an expected rate for DEATH and DEATH_ACC benefit codes. Only the ultimate rates
were used. The same tables were used for both United Kingdom and South African
business.
This added two more columns to the dataset (m and expected_claims). m would be
the relevant central incidence rate and expected_claims = exposure * m.

4.5 Summary of Data


Table 7 summaries claims and exposure by country of origin.

   Table 7 Claims and exposure by country


calendar_year UK SA Total
Exposure 2 115 567 5 377 802 7 493 369
Claims 4 893 13 282 18 175

Tables 8 and 9 provide a summary of exposure and claims by calendar year and benefit.

ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 13

  Table 8 Exposure by benefit and calendar year


calendar_year DEATH DEATH_ACC ACI SACI
1995 1 349 0 0 0
1996 30 214 0 0 0
1997 55 264 0 0 0
1998 82 831 0 0 0
1999 108 337 0 0 0
2000 270 339 84 175 0 0
2001 377 795 145 410 0 0
2002 429 012 196 491 0 0
2003 433 890 206 986 0 0
2004 423 508 205 862 5 1
2005 411 652 208 311 2 622 698
2006 395 806 210 434 6 698 2 628
2007 377 132 209 506 13 303 5 706
2008 358 674 207 540 21 446 8 084
2009 344 367 206 626 22 027 7 952
2010 329 204 203 351 19 936 7 379
2011 307 066 191 628 17 758 6 853
2012 209 137 106 301 15 772 6 301
Total 4 954 576 2 382 622 119 568 45 603

  Table 9 Claims by benefit and calendar year


calendar_year DEATH DEATH_ACC ACI SACI
1995 1 0 0 0
1996 25 0 0 0
1997 64 0 0 0
1998 90 0 0 0
1999 158 0 0 0
2000 496 53 0 0
2001 805 111 0 0
2002 943 241 0 0
2003 962 274 0 0
2004 1 064 288 0 0
2005 1 154 300 4 3
2006 1 237 334 10 5
2007 1 213 279 39 17
2008 1 292 301 84 20
2009 1 294 332 105 26
2010 1 335 384 101 16
2011 1 348 325 98 29
2012 719 116 65 15
Total 14 200 3 338 506 131

ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


14 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

Table 10 provides a summary of claims by calendar year of occurrence and delay in


reporting. As mentioned before, delay 0 implies a claim reported in the same calendar
year as which it occurred. Delay 1 implies is a claim reported in the calendar year after
the one in which the claim occurred. No claims reported after 2013 were recorded in
this data.

  Table 10 Claims by calendar year and delay


calendar_year Delay 0 Delay 1 Delay 2 Delay 3
1995 0 1 0 0
1996 19 5 0 1
1997 55 8 1 0
1998 78 10 2 0
1999 135 21 0 2
2000 324 176 27 22
2001 528 309 45 34
2002 706 392 46 40
2003 881 303 31 21
2004 1 053 270 7 22
2005 1 121 284 35 21
2006 1 284 273 18 11
2007 1 229 293 12 14
2008 1 385 277 22 13
2009 1 341 373 32 11
2010 1 441 377 18 –
2011 1 376 424 – –
2012 915 – – –
Total 13 871 3 796 296 212

5. METHODOLOGY
To illustrate the new formulation, as well as to show how the formulation enables
the application of machine learning techniques, the claims experience was modelled
using the approaches described later in this section. The aim was to both illustrate the
ease with which the new formulation can be applied, as well as to show how various
machine learning techniques might be applied to mortality experience and reserving
problems. We limited our investigations to statistical and machine learning models fit
to the experience based on the available fields.

5.1 Data partitions


Before we describe the modelling approaches, we will first explain the subdivision of
data into three sets. The data were subdivided by reporting year to represent a realistic
analysis:
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 15

—— Data relating to reporting years up to and included 2009 was labelled as the
“train” set.
—— Data relating to reporting year 2010 was labelled as the “validate” set.
—— Data for reporting years 2011 and 2012 was labelled as the “test” set.

Thus, the train dataset includes exposures for years up to 2009. It also includes claims
reported up to 2009 but not claims that were incurred in 2009 and reported with
delay 1. It also does not include claims reported with delay 2 that were incurred in 2008.
Similarly, the validate set includes exposures relating to 2010, but also exposures
relating to 2009, 2008 and 2007 (labelled with delay 0, 1, 2, and 3 respectively). It only
includes claims reported in 2010 but would include late reported claims incurred in
prior years.
Similarly, the test set includes claims incurred from prior periods but reported later.
Table 11 illustrates this.

    Table 11 Train, validate and test data


calendar_year Delay 0 Delay 1 Delay 2 Delay 3
<=2006 Train Train Train Train
2007 Train Train Train Validate
2008 Train Train Validate Test
2009 Train Validate Test Test
2010 Validate Test Test n/a
2011 Test Test n/a n/a
2012 Test n/a n/a n/a

It should be clear already that data included in the cell relating to calendar year 2009,
delay 1, includes exposure relating to calendar year 2009 but only claims relating to the
that year that were reported in 2010.
The models will then be validated on the validate set (the “diagonal” reported in
2010). Once the final models are fitted they will be evaluated on the test dataset which
includes the diagonals related to reporting years 2011 and 2012. This approach allows
validation and testing on a mixture of future incidence rates and the incidence rates
for late reported claims.

5.2. General modelling approach


The general approach that was followed is representative of modern best practice when
fitting predictive models, and we refer the reader to Chapter 7 in Hastie et al. (2005)
for a more detailed exposition. In summary, our approach can be described as follows:
First, transform the data if required for the approach in question (for example, the
numerical inputs to the deep neural network were scaled to the range [0,1]) and then
fit a model to the train set. This may involve cross-validation on the various folds
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


16 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

of the train dataset (e.g. based on reporting year), to optimise the parameters of the
method. Then, observe the quality of the predictions of the model on the validate
set, as measured by the Poisson deviance and other metrics, such as the actual claims
versus the expected claims (AvE). Based on the outcome of cross-validation and/or
the observed outcomes on the validation set, the modelling approach and/or hyper-
parameters may be adjusted, and prior steps repeated. Once a satisfactory fit was
obtained on the validation set the same approach and hyperparameters were used as,
but this time the model was fit on the combined train and validate set (in other words,
in this step, the validation set is used to calibrate the model parameters, whereas in the
prior step, the validation set was used only to test model architectures). Predictions
were then produced over the full dataset and later evaluated on the test set.
In summary, the aim of the approach described above is to use the validation data to
find (empirically) a model that works well when projecting mortality and morbidity rates
forward, and then to test these models on data that is completely unseen (in other words,
not use it to fine tune the models) to assess the predictive capability of each model.

5.3 Variables used in GLMs


Below we describe some adjustments to the variables used in the GLMs.

5.3.1 USE OF SPLINES


The goal was to produce smooth mortality projections where possible and thus we also
included in the GLMs various splines based on the age, calendar year and underwriting
year variables. Natural splines were used to ensure that the extrapolations beyond
the range of the data were linear. Knots were generally set at the 25%, 50% and 75%
percentiles of expected claims with boundary knots at 5% or 10% and 90% or 95%
percentiles. Some judgement was also applied in this process.

5.3.2 FACTOR VARIABLES


Other variables were treated as factor levels, which included delay and policy year.

5.3.3 EXPECTED CLAIMS


All GLM models included the natural log of expected claims field as an “offset”
(which was set with reference to UK standard tables described before). In this context,
specifying a variable as an offset means that it is a variable included in the model, but
without any associated parameter estimate; in other words, the parameter estimate is
forced to be equal to 1. This means that we are trying to model an adjustment to the
expected claims as described in section 3 above.

5.4 Data folds for cross-validation


Where cross-validation was employed in the lasso regression as well as for the gradient
boosted trees model, training data was split, based on reporting year, into seven broadly
equally sized groups (or folds). It is felt that cross-validation over time would produce
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 17

better results as we are trying to predict mortality in different time periods. Typically,
5 to 10 folds are used. Seven was chosen here as it resulted in broadly equally sized
groups (as measured by expected claims) when grouping together various reporting
years. These are tabulated in Table 12.

    Table 12 Cross-validation folds


Fold Reporting years Expected claims
1 1995–2002 4 194
2 2003–2004 5 593
3 2005 3 419
4 2006 3 687
5 2007 3 954
6 2008 4 251
7 2009 4 543

5.5 Traditional IBNR approach


5.5.1 IBNR MODEL
The aim here was to use a traditional IBNR approach for estimating late reporting
claims, and then to fit to the estimates of incurred claims using a GLM.
The claims data as described were collapsed back to a more typical triangle format.
Data in the format of an incremental run-off triangle were constructed. This was done
separately for each company portfolio and benefit code. A relatively simple GLM
model including calendar year, delay, company, and benefit code was fit to the data as
described in section 2.4 above. This would be equivalent to creating a run-off triangle
for each combination of company and benefit code. We only included calendar years
2005 and later in the run-off triangle above.
The resultant model was used to predict future claims and “grossing up factors” by
calendar year were estimated from these. These are factors by which reported claims
need to be increased to represent estimates for both reported and unreported claims.
These factors were applied to reported claims. We also collapsed exposure into the
traditional format and then simply modelled incidence of the claims in the usual manner.

5.5.2 GLM MODEL


Variable selection was manual but was guided by the output of the lasso regression, but
the delay field is of course not available in this model so this variable was not included
here. Some field interactions were also allowed for.
Predictions were then made by calendar year for total incidence. However, to
compare this model with other models, predictions by calendar year and delay were
required. This was done by estimating, based on the model of run-off, the proportions
of predicted claims that will/would have been reported by delay period.
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


18 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

This is similar to the Bornhuetter-Ferguson method of calculating IBNR. Claims


that are not reported are estimated using basic chain ladder. The total “ultimate” claims
are then modelled using a GLM. This estimate for ultimate claims is then multiplied by
the expected proportion to be reported in each delay period to estimate the reported
claims by delay period similar to the Bornhuetter-Ferguson method.

5.6 Traditional EBNER Approach


A run-off triangle and associated estimates were obtained in the same way as for the
IBNR approach.
Based on the run-off triangle estimates for the proportion of claims reported by
delay 0, 1, 2 and 3 respectively are obtained (separately for each company and benefit
code combination).
The new format data were then used with expected claims reduced by the proportion
estimated above. A model was then fit on the data in the usual way, except that the
delay variable was not used in the model. This results in an approach equivalent to the
EBNER approach. The same variables as per the IBNR approach were used.
Predictions were then made by calendar year for total incidence, and as per the
IBNR approach, estimates were allocated to various delays based on the estimated
proportion of claims for each delay period.

5.7 GLM
The first two approaches above are traditional and do not use the combined model
formulation. The GLM model we fit here used the new approach and is fit to the data
as described in section 3 above.
Data were limited to reporting years 2000 and later for the same reasons as the other
two approaches. Reporting year was available in this case but not in the traditional
model cases.
Variable selection was influenced by the lasso regression.

5.8 Lasso regression


Lasso regression applies variable selection and regularisation aims to find the βj that
minimises the following:
1 æç ö÷2
åç y - b0 - åxijb j ÷÷÷÷ + l å b j
2 i ççè i j ø j

The λ is a complexity parameter and controls the amount to which larger βj are
penalised. As λ increases more and more βi will be reduced and eventually set to 0.
Thus, the lasso technique performs variable selection guided by the choice of λ, which
is usually set based on cross-validation, as discussed next. See Chapter 3 of Hastie
et al. (2005) for a more detailed description of lasso regression and other shrinkage
methods.
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 19

For the lasso regression the glmnet R package was used. The glmnet functions use
data in the form of numerical matrices as input and thus it operates at the individual
βi . Thus, the variable selection operates not at data field level but at the individual
coefficients associated with a field. In the case of factor levels this contains the 1-hot
encoded variables representing each level of the factor variable (but one). In the case
of natural splines these are the individual elements making up the spline. All two-way
interactions between variables were included as well.
The glmnet package produces a model for a diverse range of values of λ and thus
we employed the cross-validating fitting procedure (cv.glmnet) to fit the model. This
produces a λ that minimises the cross-validation error in the model. Cross-validation
was done on the folds as set by reporting year. This λ was not used. We used a higher
λ to pull fewer variables from the model and used those variables to guide variable
selection for other GLMs.
The final λ that was used to produce the model was the one estimated to produce
the lowest Poisson deviance when a model was fit on the training data and the Poisson
deviance evaluated on the validation data. The final model was fit using the same λ on
the combined train and validate datasets.

5.9 Gradient boosted trees


The gradient boosted tree model is an ensemble technique. This means it is a model
made up of multiple other models. The boosting part of the name refers to the concept
of combining multiple simple models to form a better combined model. The technique
uses successive, relatively simple decision tree models that are fitted in such a way as to
reduce the errors of the models up to that point. The “gradient” in the name refers to
gradient descent in the sense that additional trees are fitted in such a way as to reduce
the error of the overall model in the same way as the gradient descent algorithm is
designed to take small steps towards the minimum of a function by assessing the
current gradient and deciding on a direction of the next step.
The R xgboost package was used to fit gradient boosted trees to the data. See Chen &
Guestrin (2016) for more information on this package. The xgboost “count:poisson”
objective was used. This sets the “negative log-likelihood for Poisson regression” as the
metric to be optimised by the xgboost algorithms (eval_metric). This is equivalent to
optimising Poisson deviance.
This package allows the specification of a “base margin” that is used as an existing
prediction which the model then improves upon. This is broadly the equivalent to
using an offset in the GLM models. We again used the natural log of expected claims
as the base margin.
This involved firstly parameter tuning (using cross-validation) of some of the
major parameters available in the xgboost package. Parameter tuning was done with a
learning rate (eta) parameter set to 0.1 and number of trees set using five early stopping
rounds. This means that further trees will not be added if, after adding five trees,
no further improvement in the cross-validated error is seen. Various combinations
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


20 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

of parameters were explored (generally fitting between 150 and 200 trees) for each
combination.
This was repeated until parameters that resulted in the lowest cross-validation error
were chosen.
Other xgboost parameters were left at their default values.
The learning rate was then reduced to 0.01 with 25 early stopping rounds. Based on
that, 1 500 rounds (or trees) produced the lowest cross-validation error.
The above parameters and 1 500 rounds (without early stopping) were then used to
fit on the training data and evaluate the prediction on the validation data. Table 13 sets
out the final parameters. It also indicates a rough range of values tried. Only a handful
of distinct values were attempted in each of those ranges. This was also not a grid
search, but a sequential search on one or two parameters at a time.

     Table 13 xgboost parameters


Parameter Value Range explored
eta 0.01 n/a
nrounds 1 500 n/a
gamma 0.8 0–1
max_delta_step 0.7 0.5–1.25
max_depth 5. 1–6
min_child_weight 15 5–15
subsample 0.55 0.55–1
colsample_bytree 1 0.65–1

The authors felt that the technique may not do well for projecting forward in time as
the technique is not designed to extrapolate. The reason for this is essentially because
tree-based methods partition (or bin) variables, rather than fitting a function to the
variable.
So, for variables that need to be projected beyond where there is data, for example
calendar year, the model is not going to project forward any trend that may be present
but will simply project forward whatever the last splits based on calendar year may be
in the various trees.
Because expected claims (based on a mortality and incidence tables) were used
as the base margin this problem is going to be less noticeable at extreme ages, as it
will project forward using the expected claims as guidance and then would follow the
shape of the existing curve.

5.10 Deep learning


Deep learning refers to the modern approach to designing and fitting neural networks
(Goodfellow et al., 2016) that has achieved state-of-the-art results in several areas of
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 21

machine learning (LeCun et al., 2015), as well as actuarial applications (Richman,


2018).
Since deep neural networks can be constructed in different manners, depending
on the choice of depth, width, activation functions and regularisation techniques, the
empirical performance of several deep neural network architectures was tested to find
an architecture that performed well on this problem. As a baseline architecture, the
deep network presented in Richman and Wüthrich (2019) for forecasting the mortality
of multiple populations was used. We describe how this network was adapted in the
following; for more detailed discussion of the baseline network, we refer the reader to
Richman and Wüthrich (2019) and for explanation of the technical details, we refer to
Richman (2018).
The main network1 consists of five layers of 128 neurons with ReLu activation
functions in each layer and a skip connection between the fourth layer and the inputs.
Between each layer, a batch normalisation layer (Ioffe & Szegedy, 2015) and a dropout
layer (Srivastava et al., 2014) are inserted, to make the network easier to optimise
and to regularise the network. Regarding the inputs to the network, the categorical
variables are input to the network using five dimensional embedding layers, which are
then concatenated together and fed into the main network. The numerical variables
(consisting of the variables relating to time, which are policy, underwriting, reporting
and calendar year) are scaled to the range [0,1] and then processed through a smaller
sub-network, before entering the main network. The rationale for the sub-network is
that some learning of the relationships between the numerical variables should occur
before these are combined with the categorical variables (see Goldberg (2016) for a
similar explanation of why embedding layers are useful in natural language processing
and see Kazemi et al., (2019) for a different approach to allowing for time within neural
networks). The output of the layer is a single variable, with an exponential activation
that is multiplied by the expected claims. In other words, the network aims to learn
an optimal adjustment (in the range [0, ∞]) of the expected claims to best match the
data. This network design is shown in Figure 1 (overleaf). The plot was generated using
Netron software.
The network was trained using batches of 4 096 observations for 30 epochs, using
the Adam optimiser (Kingma & Ba, 2014). To stabilise the network, and achieve the
best test set performance, the network was trained eight times, and the predictions
were averaged (see Guo and Berkhahn (2016), who suggest this approach).

6. RESULTS
We first review results specific to various models, and then review overall results and
accuracy.
In this section we label the models as per Table 14.

1 Several different architectures were tested before a final architecture was selected. Variants of the network
that were tested include those using tanh activations, and wider networks with 256 neurons in each layer.

ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


22 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

Figure 1 D
 esign of the Deep Neural Network. Note that the inputs to the network
contain a question mark, indicating that an unknown batch size may be
passed through the network. Also, the exposure metric that flowed into
the exposure node (shown at the bottom of the diagram) was the expected
claim.

ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 23

  Table 14 Model label key


Model Description
glm_trad_ibnr Traditional IBNR approach
glm_trad_ebner Traditional EBNER approach
glm GLM fitted using the combined formulation
glmnet Lasso regression using the combined formulation
xgb Gradient Boosted Tree fitted using the combined formulation
dl Deep Learning model fitted using the combined formulation

6.1 Results specific to various models


6.1.1 GLM MODELS
The traditional GLM models (both the model based on IBNR and on EBNER) used
the variables company, gender, current_age, benefit_code, smoker_status, country,
underwriting_year, policy_year, calendar_year and sum_assured_band. The models
also contained various interactions of these variables.
The GLM model based on the combined formulation contained the same variables
as the traditional models, but also contained the delay variable together with inter­
actions of the delay variable with some of the other variables in the model. We will not
investigate the parameter estimates here in detail, but we note that they are reasonably
easy to interpret and explain. The exception may be that some work may be needed to
explain the impacts of the interaction terms as well as the spline coefficients (that may
require some calculation to illustrate before they are easy to understand).

6.1.2 LASSO REGRESSION


Figure 2 plots the cross-validated Poisson deviance for various values of λ, which
is the parameter that controls the complexity of the model, as discussed above. The
lambda producing the lowest Poisson deviance was 2.89 ´10-6 . The glmnet package
also produces a λ equivalent to a 1 standard error increase in deviation from the error
associated with the best λ. This value was 1.90 ´10-4 and was used to extract a list of
coefficients to guide the choice of variables for use in other regressions, resulting in a
list of 39 coefficients with non-zero values in these regressions.
The final λ used for the lasso model was the value that produced the lowest Poisson
deviance when measured on the validation dataset. This was 1.01 ´10-5 . This value
of λ was used when the final model was fit on the training and validation data in
combination.
This results in 240 coefficients in the final model with non-zero values. This number
is greater than the number of fields available because fields that are factor levels can
have multiple coefficients. For example, the company field has four distinct values and
would have three coefficients in the model. Also, where we included natural splines
for continuous variables, this would result in more parameters in the model. Lastly,
we should recall that the lasso regression included all possible two-way interactions of
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


24 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

Figure 2 Cross-validation Poisson deviance for various λ for lasso regression

variables. This results in 465 possible coefficients (including the intercept). The value of
λ used therefore “selected” 240 of these coefficients for inclusion in the final regression.

6.1.3 GRADIENT BOOSTED TREE MODEL


As part of the fitting process, the xgboost model produces a feature importance plot.
This is shown in Figure 3 and represents the portion of gain in improvement of the
model due to the use of a particular feature in the model (measured as a proportion of
the total gains associated with using all of the features). It is indicative of the relative
importance of the feature in the model.
It is clear that delay is an important feature. This is probably due to the fact that
different delay periods have significantly different claim rates with later delay periods
having very low incidence rates, whereas the first delay period has a relatively high
incidence. Thus, the use of the delay variable results in significant improvement of the
model. The next most important features are age and sum assured band B. Keeping in
mind that the exposure measure used in the model is the expected claims, the fact that
the age variable remains important tells us that the expected mortality rates used do
not fully reflect the shape of the incidence rates with respect to age in the data.
Note that this plot does not tell us anything about the direction of the effect of the
feature on the outcome, and, with this relatively complicated model, the effect may
go in different directions depending on the other variables relating to the data point.

6.2 Overall results


6.2.1 VALIDATION RESULTS
After models were trained on the train data (reporting year less than 2010) we
evaluated the models on the validation data (reporting year 2010). The results of this
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 25

Figure 3 Feature importance from the gradient boosted tree model

is shown in Table 15. Note that the validation data may have been used for some of the
optimisation of model parameters for the XGB and DL models. We show the Poisson
deviance as well as a simple actual over predicted ratio (based on total number of
claims).

  Table 15 Validation results


Model Poisson deviance Actual vs. predicted
glm_trad_ibnr 13 184 106.9%
glm_trad_ebner 13 181 107.1%
glm 13 135 103.9%
glmnet 13 082 105.3%
xgb 13 082 106.1%
dl 13 024 99.4%

From the above it can be observed that the deep learning model has the lowest Poisson
deviance. The gradient boosted tree model and lasso regression also performed well.
All models seem to underestimate the 2010 reported claims except the deep
learning models.
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


26 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

6.2.2 RESULTS OF FINAL MODELS ON TEST DATA


The results in Table 16 represent the success of models on unseen claims data. These
models were trained on both the training and validation datasets.

  Table 16 Test results


Model Poisson deviance Actual vs. predicted
glm_trad_ibnr 22 944 99.3%
glm_trad_ebner 22 947 99.5%
glm 22 883 97.0%
glmnet 22 826 98.3%
xgb 22 822 95.9%
dl 22 799 93.8%

All models appear to overshoot somewhat on the test experience. The deep learning
model has the best fit, followed by the gradient boosted tree and lasso regression. It is
interesting to note the deep learning model has the highest difference when comparing
total actual vs. predicted number of claims, and we discuss this further in Section 7.

6.2.3 ACCURACY OF PREDICTION FOR IBNR CLAIMS


To get a sense of the overall accuracy on IBNR claims we repeat the above exercise but
focussing on claims that occurred in 2010 and prior but were reported in 2011 and
2012. Table 17 summarises these results.

  Table 17 Results on later reported claims


Model Poisson deviance Actual vs. predicted
glm_trad_ibnr 4 172 106.4%
glm_trad_ebner 4 173 106.6%
glm 4 063 98.5%
glmnet 4 028 97.5%
xgb 4 042 98.0%
dl 3 994 95.7%

The two traditional models underestimate the late reported claims in aggregate. The
deep learning model overestimated these claims in aggregate. However, the deep
learning model is most accurate as measured by the Poisson deviance. The gradient
boosted tree model is slightly less accurate in this instance than the lasso regression.

6.3 Selected figures on test data


The following figures were all compiled on the test data using the final models for
prediction. Figure 4 shows the actual and predicted claims by benefit type. These
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 27

numbers broadly reflect the overall actual vs. predicted ratios reflected in the tables
above.
The models appear to be least accurate on the SACI benefit (at least in proportion),
most likely because this benefit has the least number of claims. The deep learning and
the gradient boosted tree models both seem to be overestimating experience on the
DEATH and DEATH_ACC benefit codes.
Figure 5 shows actual and predicted claims numbers by age on the test data. Figure 6
shows the aggregate rates for the best three models on the test data. On both charts it
appears to be a reasonable fit, but perhaps not so great in the ages below 30. The deep
learning model appears least smooth on the rates chart, whereas the lasso regression
seems the smoothest.
Figure 7 shows the same three models but on the train and validate dataset. Here the
models are accurately predicting data they were trained on. The deep learning model
may have been overfit slightly as, visually, it appears to be mimicking some of the
bumps from the train/validate data into the predictions. An alternative interpretation
is that at least some of the bumps are in fact not only due to noise (in other words,
random variation in the data), but reflect cohort, or other effects that should be
modelled.

Figure 4 Actual and predicted claims by benefit type


ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


28 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

Figure 5 Actual and predicted claims by age

Figure 6 Actual and predicted mortality rates by age – best three models
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 29

Figure 7 Actual and predicted mortality rates by age – best three models (train and
validate sets)

6.4 Results of delay patterns


The figures below explore the delay patterns and how the final models predict them.
Figure 8 shows the actual vs. predicted ratio for the various models by delay on
the test data. The gradient boosted tree and deep learning models appear reasonably
accurate on delay period 0, 1 and 2. All models perform poorly on delay 3 but it should
be noted that this is based on a small number of claims.
Figures 9 and 10 show the proportion of claims reported in the calendar year of
occurrence in the train and validate data and in the test data respectively.
From these charts it appears that most of the models are managing to fit the train
and validate data reasonably. However, all the models appear broadly to predict
reasonably on the test data. None of the models cope well however with the apparent
slowdown in reporting associated with Company C. We had allowed the traditional
IBNR and EBNER approaches to have different run-off triangles for each company
and hence they are reasonably modelling.
Figure 11 represents the proportion of claims reported in the same calendar year
as they occurred by age across the train and validate datasets. It seems apparent that
the proportion varies by age. Claims on younger lives seem to be more likely to be
reported later. The models seem to be able to fit this pattern reasonably well apart from
the two traditional approaches which cannot achieve this. Note the IBNR and EBNER
models are overlapping on this chart.
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


30 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

Figure 8 Actual vs. predicted ratio for claims by delay (test data)

Figure 9 Proportion of claims reported in calendar year of occurrence by company


(train and validate data)
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 31

Figure 10 Proportion of claims reported in calendar year of occurrence by company


(test data)

Figure 11 Proportion of claims reported in calendar year of occurrence by age (train


and validate data)
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


32 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

Figure 12 Proportion of claims reported in calendar year of occurrence by age (test


data)

Figure 12 shows the models doing a reasonable job of predicting the same patter on the
test data, though the data is certainly more volatile.

6.5 Results of over time


Figures 13 and 14 below plot the number of claims and the actual vs predicted claims
by reporting year respectively over all data. It appears that 2010 reporting year was
somewhat heavier than the years preceding. This may be the reason that the gradient
boosted tree and deep learning models are over estimating claims the most in 2011
and 2012.

6.6 Rates produced by the models


In Figure 15 we plot a sample of the predicted rates for a particular combination of
rating factors. We can see that, visually at least, the deep learning model is producing
the least smooth rates. The gradient boosted tree has distinct step changes where
it changes level (relative to the expected table) (presumably where the current_age
variable was used in a tree splits).
We can break up any of the rates set up above into delay periods because our models
allow us to do so. In Figure 16 we show the proportions of the total predicted by delay
for deep learning model rate for the same particular subset of risk factors. This shows
the impact of age on delay for this particular combination of risk factors.
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 33

Figure 13 Number of claims reported by year

Figure 14 Actual vs. predicted claims ratio reported by year


ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


34 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

Figure 15 Predicted rates by age for a particular set of rating factors

Figure 16 Proportion of deep learning model rate from various delay periods age for
a particular set of rating factors
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 35

6.7 Partial dependency plots


We selected 50 random sample data points from the 2012 calendar year and generated
more data points by varying a single field on these data points; for example, by varying
age between 20 and 100 we create the same 50 data points but with only one field
changing. When you plot the average value of predictions for this group of data points
you get a partial dependency plot. It allows the exploration of the impact of single
variables in complex machine learning models.
Figure 17 shows a partial dependency plot by age. Generally, the models have a
similar shape, however as observed before, the gradient boosted tree and deep learning
models are not producing results as smooth as the GLM models. The models diverge
significantly at ages below 30.
For the gradient boosted tree model we plot the 50 data samples for each age in
Figure 18. This is called an “Individual Conditional Expectation” (ICE) plot. This plot
shows the general pattern of age mortality as seen in Figure 17, but it also shows that
the pattern is not universal. Different data points show different patterns with regard
to age. The model has fitted different shapes, by age, for different subsets of the data
automatically. One could compare this to Figure 19 which contains the same plot (with
multiple overlapping lines) based on the expected tables. So, the gradient boosted tree
has expanded a handful of expected rates into multiple different rate tables.

Figure 17 Partial dependency plot by age


ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


36 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

Figure 18 Individual conditional expectation plot of gradient boosted tree by age

Figure 19 Individual conditional expectation plot of expected claims


ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 37

7. DISCUSSION
In this section, we firstly discuss the two main contributions of this paper, that the
combined model for run-off claims and mortality estimation is a useful and efficient
way to model mortality experience and that the application of machine learning
techniques to this problem can lead to more accurate predictions. We then discuss the
observation that Poisson deviance appears to be a more robust statistic to target when
fitting models than the traditional AvE ratio.

7.1 Combined model formulation


It is clear from the results that, as measured by Poisson deviance, the GLM combined
model (with the same variables as the traditional approaches) outperforms the
traditional GLM with IBNR or EBNER approach, which seems to indicate that, at
least on this dataset, there is an advantage to using the combined formulation. This
advantage is present not only in the overall accuracy but also when reviewing the
accuracy of predicting the IBNR claims. In other words, similar models perform better
on the combined approach versus the traditional approach.
This advantage is likely due to the fact that the combined formulation allows the
interaction of reporting delay with other variables in the model, which is more difficult
than with approaches based on estimating IBNR claims using run-off triangles. It
could be argued that a more precise IBNR exercise could be undertaken to improve
the results of the traditional models, for example, by reserving for claims in each age
group separately, but this would be labour-intensive and likely suffer from issues of
lack of data credibility.
The combined formulation also has other practical advantages, as it makes for a
simpler process as only one model needs to be fitted, checked and reviewed. It is also
easier to understand the impact of a modelling decision when only one model is being
used, rather than separate models. Also, the technique is somewhat more advanced
than run-off triangle techniques as it would allow more advanced run-off estimation
than could be easily done with run-off triangles, although we note that recent advances
in IBNR reserving using machine learning techniques might offer similar advantages.
See, for example Wüthrich (2018). We showed that new insights in terms of claims
reporting delays are possible from these models (for example, that the extent of the
delays might depend on the claimants’ age). Finally, these models also have the added
advantage that they can be both used to derive the mortality rates for pricing as well
as projecting late reported claims to set IBNR provisions (which take account of the
exposures over a period when projecting the late reported claims, which should result
in more accurate reserving).
The combined model does not require additional data when compared to the
traditional approach of creating a run-off triangles followed by fitting a GLM. It may
require more data to obtain meaningful results if more interactions with delay and
other variables are included.
A disadvantage of the combined model is that the approach is somewhat more
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


38 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

complex but, to some degree the traditional approaches have the similar complexity
which is not apparent to the user. This method may not add that much value in cases
where IBNR claims is an extremely small proportion of the overall experience or
where there is limited interest in time-based trends.
Considering the relative advantages and disadvantages of the new method, we
believe the combined method to be superior to the traditional manner of performing
a mortality investigation.

7.2 Using machine learning to model mortality


Within the context of this study, we have, somewhat arbitrarily, defined the lasso,
gradient boosted trees and deep neural network techniques as machine learning
techniques, whereas we have not defined a GLM model in this manner. We acknowledge
that the distinction between the models is somewhat difficult to define, and, indeed, one
might view machine learning techniques as being characterised by the goal of making
predictions, thus including GLM models and IBNR techniques applied to run-off
triangles. However, one might also view traditional statistical models such as GLMs as
not being machine learning techniques.2 Regardless of the definition adopted, though,
this study has shown that the combined model formulation outperforms traditional
methods of performing a mortality and morbidity investigation, independent of the
specific technique used for implementation.
Reflecting on the experience of building and fitting the models for this study, we
conclude that the statistical models were more difficult and demanding to fit to the
data than the machine learning models, requiring more judgement and expertise,
especially given the large number of variables involved. Thus we still had to rely on
lasso regression to provide some guidance on variable selection, as well as significant
domain expertise and prior work. However, the machine learning models were fit with
less application of domain knowledge, bearing in mind that these models were fit for
multiple countries, portfolios and products simultaneously. Practically, the machine
learning models required less time to fit, for example, the gradient boosted tree models
were relatively quick to achieve initial fits that were almost as good as the final model.
The authors did spend some time tuning the parameters of the xgboost model, but
this required more computing time than it did person time. The deep learning models
required a bit more time to decide on a structure but were still relatively efficient on
person time.
In the following sections, we consider other aspects of these models.

7.2.1 INTERPRETABILITY
On occasion, practitioners may characterise techniques such as gradient boosted
trees and deep learning models as so-called “black-box” models, although the exact

2 For more discussion of what characterises a machine learning model, we refer the reader to Richman
(2019) and Januschowski et al. (2019).

ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 39

complaint is often poorly defined; see Lipton (2016) who attempts to give more
precise definitions to these issues. We acknowledge, however, that it is often difficult
to understand why these types of models are making their predictions, especially since
these models also can allow for complex interactions between variables.
To assist with this issue, we consider that the interpretability of these models might
be improved with techniques such partial dependency plots (PDPs), independent
conditional expectation (ICE) plots, variable and feature importance plots, and other
techniques. Furthermore, there are also techniques specific to deep learning to extract
the intermediate representations the models build within the intermediate layers of
the model.
Given these concerns, one might favour the choice of GLM models for inter­
pretability; however, the large number of variables and interactions that need to be
specified for GLM models to be accurate when conducting an experience analysis on a
large-scale, such as in this example, might create new interpretability issues. The lasso
regression might, on the surface, appear to be a good compromise, both providing
accurate predictions and yielding an interpretable model. However interpretation and
inference remains difficult for this model, given the manner in which it is optimised
(see Efron & Hastie (2016) for more discussion) and, in this case, the large number of
parameters involved.

7.2.2 SMOOTHNESS
From the results in this paper, it is apparent that the GLM- and lasso-based techniques
can produce smooth results, especially when used in conjunction with splines.
The gradient boosted tree method, which aggregates step-wise functions which are,
by definition not smooth, might be expected not to produce smooth results, but in this
case did not fair too badly, in part because of the inclusion of expect claims from an
existing table as part of the method’s input (to allow for smoother modelling relative
to this expected table). However, rates are still not smooth due to the fact that the
technique inherently bands or bins continuous variables such as age.
Among the techniques evaluated, the deep learning model appeared to produce the
least smooth results. This is probably due to the fact that the age variable was treated as
a categorical variable to allow embedding layers to be applied. Embedding layers is a
useful technique in deep learning and it was felt that applying these layers may aid the
fit of the model; however, this may have come at the cost of smoothness. We intend to
explore deep learning models where age is treated as a continuous numerical variable
and also fitting these models with a penalty applied for not being smooth.

7.2.3 EXTRAPOLATION
Care should be taken with all models when extrapolating the models beyond the
range of the training data. The different machine learning techniques have different
characteristics when extrapolating beyond data they were fit on. This is less of a
problem with age, as the bulk of the data is always present in certain age ranges, and
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


40 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

the behaviour of mortality and morbidity rates as age increases is relatively well known,
but more of a problem with time-based variables such as calendar year. In this study,
we have shown that, based on our test data (which was sampled from a future period),
the machine learning approaches outperformed the traditional approaches, albeit on a
relatively short time scale (of two years).
The gradient boosted tree model has a particular disadvantage here as it does
not fit trends explicitly but merely bins observations on a continuous scale. Thus, its
projections for 2011 and 2012 would simply continue at the level of what the model
has projected for 2010 (where it was still fit on data). Thus, actuaries considering using
models of this type should take care when extrapolating. Although deep learning
models can learn to extrapolate smoothly to future time periods, long range forecasts
might become more unstable.
GLMs may be a better solution here as the form of the model could be limited
in such a way as to not produce unwanted trends, for example, by using natural
splines which are constrained to be linear at their extreme values. Furthermore, these
parameter values can also be explicitly reviewed, and potential problems spotted. Of
course, we note that future period projections also often require business input and
judgement which cannot be forecast using modelling techniques.
We intend to investigate techniques to fit simpler models for future projection
(e.g. simple mortality trends) and using the complex machine learning techniques to
explore variables that are not being extrapolated. A similar proposal in the context of
credit risk modelling is in Breeden and Leonova (2019).

7.2.4 RISK OF OVERFITTING TO THE DATA


With flexible machine learning techniques, there is a very present risk of overfitting
the data and producing projections that are spurious. Here, we tried to limit these risks
using cross-validation as well as testing on a validation set, as described in Section 4.
The validation set was somewhat less representative than we would have liked, since
the year chosen for the validation set appears to have experienced particularly heavy
claims, and this produced a few challenges in fitting models that extrapolated well.
With the detailed (correlational) relationships that machine learning techniques
explore, one should also consider the length of time over which these relationships
may continue to hold. Thus, a feature that is predictive in the data now, may not
continue to be predictive in future. With simpler models and typical rating factors this
is arguably less of a risk but, as we increase the number of new variables, the predictive
power of these variables is less likely to stay static over time. We believe that strategies
to address this issue will be important topics for actuaries to consider.

7.2.5 USING MACHINE LEARNING TO ASSIST STATISTICAL MODELLING


One could also use more advanced machine learning techniques as a form of advanced
data exploration for the purpose of informing the fitting of statistical models. An
example from this paper is using lasso regression to identify variables to be used in
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 41

the more traditional GLMs. We could also use the variable importance plot from
gradient boosted tree models to identify new variables of interest and then explicitly
model them in a GLM. In addition, the features learned in deep neural networks could
be used as inputs to more traditional models, see, for example, Richman (2018). We
intend to test some of these possibilities.

7.3 Criteria for model selection: Poisson deviance vs. AvE


In Section 6 we noted that the deep learning and xbgoost models performed relatively
poorly when measured on the AvE ratio (93.8% and 95.9% respectively), despite
performing well when measured by the Poisson deviance. On the other hand, the other
models. which were not as competitive in terms of the Poisson deviance, performed
better on the AvE ratio. This is of particular interest to actuaries pricing using AvE
ratios, which are often used in practice to adjust mortality tables for actual experience.
To understand this issue, we note that the AvE metric is not a measure of goodness
of fit, unlike the Poisson deviance, but rather can be seen as a measure of bias. Thus,
while the deep learning and xgboost models forecast accurately the experience of
the test set at a detailed level, nonetheless, these models may exhibit some bias on
a portfolio level that would need to be remediated before these models can be used
practically for insurance pricing. This issue has been addressed in detail in Wüthrich
(2019), who writes that (page 2): “The resulting neural network model often provides
a much better predictive performance on an individual policy level compared to the
more rigid GLMs. However, also these neural networks suffer a major deficiency
which, usually, is not discussed in the related literature. Namely, the consequence of
exercising an early stopping rule in the GDM3 implies that the resulting regression
model may be biased on a portfolio level”. We also refer the reader to Zumel (2019)
who discusses this problem in the context of random forest and gradient boosted tree
models. To solve this issue, Wüthrich (2019) provides two potential solutions: firstly,
using the output of the last layer of the neural network as input into a GLM (this
technique is also discussed by Zumel (2019)), and secondly, adding a regularisation
term that penalises bias to the loss function of the neural network.
However, whereas Wüthrich (2019) examined the issue of bias in the context of
predicting out-of-sample claims, we believe that the issue we address here has an
additional aspect, since we seek to predict both out-of-sample and out-of-time
observations. In other words, intrinsic to the models addressed in this paper is the need
to forecast mortality and morbidity rates into the future. Indeed, the deep learning
model evaluated on the validation data showed the best AvE performance of all of the
models considered in this paper, and thus the problem to be addressed is of potentially
biased forecasts at a portfolio level. From this perspective, we note that when forecasts
were made with the models, the deep learning and gradient boosted tree models were
more conservative than the regression models in aggregate (in other words, had AvE

3 gradient descent method

ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


42 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

ratios less than 1) but despite this lower accuracy at an overall level, these models still
performed more accurately at a detailed level. Therefore, the solutions to the problem
of biased neural networks provided in Wüthrich (2019) may not be directly applicable
to this problem, due to the forecasting aspect.
To test this, the bias regularisation technique discussed in Wüthrich (2019)
was applied by adding a term to the loss function used to calibrate the deep neural
network. To apply this technique, it is necessary to set a parameter which defines how
much weight should be given to the bias regularisation term. We chose the value 10-3
heuristically, noting that ideally this should be chosen using cross-validation or a
similar technique. As before, eight runs of the networks were averaged to produce the
final predictions. On the test set, the Poisson deviance was 22 836, which is slightly
worse than the lasso model, and the AvE was 99.7%, which is better than all the other
models. Thus, it appears that the problem of bias was indeed a major contributor to
the issue discussed in this section. With more tuning, it would appear that a more
competitive model in terms of the Poisson deviance could be found, while maintaining
a suitable AvE ratio. Nonetheless, the issue of biased forecasts when extrapolating
models is worthy of more investigation.

8. CONCLUSION
Experience analysis forms a fundamental part of the actuarial management of a
life insurance portfolio and the estimates of mortality and morbidity rates derived
in this process are key for subsequent pricing and reserving. Existing experience
analysis usually follows a relatively simple process of setting or updating assumptions
using ratios of actual to expected claims derived from existing tables, with separate
ad-hoc adjustments for IBNR claims. In this paper, we have shown that models of
mortality and morbidity can be combined with the models used for estimating IBNR
into a single combined model, simplifying the experience analysis process as well as
making it more accurate by enabling interactions between claims delays and other
variables in the mortality model. Furthermore, we have shown how machine learning
techniques can further enhance the performance of these combined mortality models
in predicting claims on new data.
Having illustrated the potential of machine learning techniques to derive mortality
and morbidity rates at a large scale, a number of open issues remain before these
techniques can be applied reliably in practice. The most obvious issue requiring
further research is resolving the discrepancy between the metrics used to measure
model performance (Poisson deviance and AvE ratio), which we believe will require
some further innovation of the existing techniques used to de-bias neural network
models. We believe that these de-biassing techniques might be further innovated
in the context of forecasting. Machine learning techniques that produce smooth
outputs are seemingly more suited for actuarial experience analysis than those that
are unconstrained, and methods of formulating these models could be addressed by
future research. Lastly, fitting relatively simple models for projecting trends in rates
ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING | 43

with (calendar) time, and using more complex machine learning techniques to model
the other variables would appear to be a suitable approach for practical experience
analysis, and further research could explore these proposals.

ACKNOWLEDGEMENTS
The authors wish to thank Gen Re South Africa and Gen Re UK for the data used in this study.
We also thank the developers of Netron software.

REFERENCES
Breeden, JL & Leonova, E (2019). “When Big Data Isn’t Enough: Solving the long-range
forecasting problem in supervised learning,” Paper presented at 2019 International
Conference on Modeling, Simulation, Optimization and Numerical Techniques (SMONT
2019). Atlantis Press.
Chen, T & Guestrin, C (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the
22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
785–794.
Efron, B & Hastie, T (2016). Computer age statistical inference (Vol. 5). Cambridge University
Press.
Gluck, SM (1997). Balancing development and trend in loss reserve analysis. In Proceedings of
the Casualty Actuarial Society (Vol. 84, pp. 482–532).
Goldberg, Y (2016). A primer on neural network models for natural language processing.
Journal of Artificial Intelligence Research, 57, 345–420.
Goodfellow, I, Bengio, Y & Courville, A (2016). Deep learning. MIT press.
Guo, C & Berkhahn, F (2016). Entity embeddings of categorical variables. arXiv preprint
arXiv:1604.06737.
Hastie, T, Tibshirani, R, Friedman, J & Franklin, J (2005). The elements of statistical learning:
data mining, inference and prediction. The Mathematical Intelligencer, 27(2), 68–69, 83–85.
Institute and Faculty of Actuaries (2017). Continuous mortality investigation – Working Paper
94 – Final “08” Series Accelerated Critical illness and Term Mortality Tables.
Ioffe, S & Szegedy, C (2015). Batch normalization: Accelerating deep network training by
reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
Januschowski, T, Gasthaus, J, Wang, Y, Salinas, D, Flunkert, V, Bohlke-Schneider, M & Callot, L
(2019). Criteria for classifying forecasting methods. International Journal of Forecasting.
Kazemi, SM, Goel, R, Eghbali, S, Ramanan, J, Sahota, J, Thakur, S, … & Brubaker, M (2019).
Time2Vec: Learning a Vector Representation of Time. arXiv preprint arXiv:1907.05321.
Kingma, DP & Ba, J (2014). Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
LeCun, Y, Bengio, Y & Hinton, G (2015). Deep learning. Nature, 521(7553), 436.

ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424


44 | L ROSSOUW & R RICHMAN USING MACHINE LEARNING TO MODEL CLAIMS EXPERIENCE/REPORTING DELAYS FOR PRICING & RESERVING

Lewis, PL & Rossouw, LJ (unpublished). IBNR versus EBNER – An Alternative Method of


Allowing for Late Reported Claims. Presented at the Actuarial Society of South Africa
Convention 2007.
Lipton, ZC (2016). The mythos of model interpretability. arXiv preprint arXiv:1606.03490.
Mack, T (2002). Schadenversicherungsmathematik 2. Auflage: Schriftenreihe Angewandte
Versicherungsmathematik, DGVM.
Moultrie, TA & Timæus, IM (2013). “Introduction to model life tables “. In Moultrie TA,
Dorrington, RE, Hill, AG, Hill, K, Timæus, IM and Zaba, B (eds). Tools for Demographic
Estimation. Paris: International Union for the Scientific Study of Population. http://
demographicestimation.iussp.org/content/introduction-model-life-tables. Accessed
12/06/2019.
Renshaw, AE & Verrall, RJ (1998). A stochastic model underlying the chain-ladder technique.
British Actuarial Journal, 4(4), 903–923.
Richman, R (2018). AI in actuarial science. Presented at the Actuarial Society of South Africa
Convention 2018.
Richman, R (2019). Advances in time series forecasting – M4 and what it means for insurance.
The South African Actuary, 3(1), 7–14.
Richman, R & Wüthrich, M (2019) A neural network extension of the Lee–Carter model to
multiple populations. Annals of Actuarial Science, 1–21. doi:10.1017/S1748499519000071
Srivastava, N, Hinton, G, Krizhevsky, A, Sutskever, I & Salakhutdinov, R (2014). Dropout: a
simple way to prevent neural networks from overfitting. The Journal of Machine Learning
Research, 15(1), 1929–1958.
Tomas, J & Planchet, F (2014). Prospective mortality tables and portfolio experience. In
A Charpentier (ed.). Computational actuarial science with R: CRC Press.
Wüthrich, MV & Merz, M (2008). Stochastic claims reserving methods in insurance (Vol. 435).
John Wiley & Sons.
Wüthrich, MV (2018). Neural networks applied to chain-ladder reserving. European Actuarial
Journal, 8(2), 407–436.
Wüthrich, MV (2019). Bias regularization in neural network models for general insurance
pricing. Available at SSRN: https://ssrn.com/abstract=3347177 or
http://dx.doi.org/10.2139/ssrn.3347177
Zumel, N (2019) An ad-hoc method for calibrating uncalibrated models [Blog post]. Retrieved
from http://www.win-vector.com/blog/2019/07/an-ad-hoc-method-for-calibrating-
uncalibrated-models-2/

ACTUARIAL SOCIETY 2019 CONVENTION, SANDTON, 22–23 OCTOBER 2019

Electronic copy available at: https://ssrn.com/abstract=3465424

You might also like