Actuarial Learning For Pension Fund Mortality Forecasting: Eduardo Fraga L. de Melo, Helton Graziadei e Rodrigo Targino
Actuarial Learning For Pension Fund Mortality Forecasting: Eduardo Fraga L. de Melo, Helton Graziadei e Rodrigo Targino
Forecasting
Eduardo Fraga L. de Melo1,2 , Helton Graziadei1 e Rodrigo Targino1
Abstract
For the assessment of the financial soundness of a pension fund, it is necessary
to take into account mortality forecasting so that longevity risk is consistently
incorporated into future cash flows. In this article, we employ machine learn-
ing models applied to actuarial science (actuarial learning) to make mortality
predictions for a relevant sample of pension funds’ participants. Actuarial
learning represents an emerging field that involves the application of machine
learning (ML) and artificial intelligence (AI) techniques in actuarial science.
This encompasses the use of algorithms and computational models to analyze
large sets of actuarial data, such as regression trees, random forest, boost-
ing, XGBoost, CatBoost, and neural networks (eg. FNN, LSTM, and MHA).
Our results indicate that some ML/AI algorithms present competitive out-of-
sample performance when compared to the classical Lee-Carter model. This
may indicate interesting alternatives for consistent liability evaluation and
effective pension fund risk management.
1
1 Introduction
Consistent and realistic evaluation of pension fund liabilities demands updated mor-
tality rates and also the forecasting of these rates for future cash flow discounting.
These requisites align with contemporary solvency assessment principles and finan-
cial reporting standards (refer to Sandström (2016) and IASB (2017)). Even though
pension funds represent the interest of hundreds of millions across the globe, the ma-
jority of studies projecting mortality rates rely on national population data (e.g.,
Cairns et al. (2009), Dowd et al. (2011), Lee and Carter (1992), Li et al. (2013),
Renshaw and Haberman (2006b), Li and Lee (2005)).
While using national mortality rates might not pose a significant issue in regions
where population mortality is relatively homogeneous, it may introduce basis risk
when disparities exist between the national population and a target group. Devel-
oped countries often exhibit more homogeneous mortality patterns, but in nations
marked by significant social disparities, such as varying income levels and access to
healthcare, mortality rates can diverge notably between the general population and
a specific group, like pension fund participants.
Brazil, a country characterized by social inequalities (OCDE (2018)), illustrates
this point. National mortality rates, reflecting the entire population, exceed those
of selected groups like pension fund participants. According to regulatory bodies,
fewer than 8%1 of Brazilians possess a pension plan (open or closed), primarily
concentrated among the higher income segment. Our analysis indicates that, on
average, mortality rates among pensioners are 68% of those observed in the broader
Brazilian population for ages 30 and above.
In this study, interest lies in ages relevant to pension plans, particularly those
above 30 years, since, given the nature of our pension fund sample, information for
ages under 30 is insufficient. Most participants and retirees exceed this age. In this
article, we use machine learning models (most of them implemented in R) within the
context of actuarial science, known as actuarial learning, to predict mortality trends
in individuals above 30 years old, using data from some pension funds in the Brazilian
industry. CatBoost was implemented in Python. The neural network models (feed-
forward - FNN, long-short term memory - LSTM, and multi-head attention - MHA)
were implemented using Keras package in R. Our results indicate that the use
of actuarial learning models generates consistent results for mortality projection,
especially for short projection horizons.
1
https://www.investidorinstitucional.com.br/sessoes/investidores/seguradora
s/40265-previdencia-aberta-atinge-5-3-da-populacao-brasileira-em-maio.html - in
portuguese
2
1.1 Actuarial Learning
Since its origins, artificial intelligence has been driven by the desire to understand
and reveal complex relationships in data, with the aim of developing models capable
not only of making accurate predictions but also of extracting knowledge in an
understandable way. In this quest, the field of machine learning has diversified
considerably, resulting in a wide range of research exploring different aspects and
methodologies.
Within the spectrum of machine learning methods, decision tree-based tech-
niques stand out for their effectiveness and usefulness, offering both reliable and in-
terpretable results for a wide variety of datasets. The development of decision trees
dates back to Morgan and Sonquist (1963), who introduced the concept through
the Automatic Interaction Detector (AID) method, aimed at handling non-additive
effects. This initial milestone was followed by significant evolutions and the creation
of dedicated computer programs for data analysis, notable contributions made by
researchers such as Messenger and Mandell (1972).
However, the methodological evolution of decision trees was significantly driven
by pioneering contributions from Breiman et al. (1984), Friedman et al. (1977),
Friedman (2001), and Quinlan (1979, 1986). These researchers substantially en-
riched the field of machine learning by developing pioneering algorithms for decision
trees. The frequent choice of decision trees and related techniques stems from a num-
ber of advantageous attributes that position them as highly efficient and accessible
analytical tools:
• Intrinsic variable selection: efficiency in identifying and using only the most
relevant variables, increasing the model’s robustness against irrelevant data;
3
It is relevant to emphasize that decision trees form the foundation of a variety
of modern algorithms, such as random forests, boosting (Freund (1995); Freund
and Schapire (1997); Friedman (2001)), and XGBoost (Chen and Guestrin (2016)),
where they are used as building blocks for more complex models. For a comprehen-
sive overview of various machine learning techniques, one recommends James et al.
(2023). This article also explores the use of neural networks, an artificial intelligence
methodology that enables computers to interpret data. Neural networks, a field of
machine learning (ML) known as deep learning, are organized into layered structures
composed of interconnected nodes or neurons.
In this sense, actuarial learning represents an innovative field that integrates
machine learning and artificial intelligence techniques into actuarial science, applying
advanced algorithms and computational models for the analysis of vast volumes of
actuarial data, including information on insurance, pensions, and financial risks.
These methodologies offer broad applications in the actuarial domain, such as:
• Risk and claims analysis: use of algorithms to predict risks and identify trends
that may signal fraud or claims patterns - Aslam et al. (2022).
4
predictive models capable of capturing the complexity of actuarial data, paving the
way for unprecedented innovations and efficiencies in the field.
Richman and Wüthrich (2021) show that a deep neural network architecture may
outperform the Lee-Carter model considering all countries in the Human Mortality
Database (HMD) for mortality rates since 1950. The architecture consists of a
covariate (feature) layer, five intermediate layers, and an output layer. The feature
layer takes as inputs the year, age, country, and gender of each mortality rate.
Makhonza et al. (2024) considers a modified version of this architecture, where
the model is adjusted on unscaled logarithms and the activation function of the
output layer is the linear activation function. The approach adopted by Richman
and Wüthrich (2021) introduces knowledge of the complete dataset to the model
during the training phase by scaling the entire dataset using the minimum and
maximum values of the entire dataset.
In this article, we investigate the application of various ML algorithms to predict
one-year ahead mortality rates, using covariates in a data-driven methodology. We
evaluate and compare the out-of-sample performance of various models, including
the Lee-Carter model and ML techniques such as regression trees, random forests,
Boosting, XGBoost, and CatBoost, as well as Feedforward Neural Network (FNN),
Long-Short Term Memory (LSTM), and Multi-Head Attention (MHA) architec-
tures, to identify those that exhibit the best performance.
2 Models
In this section, we briefly describe the ML algorithms used in this study.
Regression Tree - RT
Decision trees are commonly used machine learning methods for both classifica-
tion and regression. They offer popular methods for measuring variable importance
and binary separation. For this method, we refer to Breiman et al. (1984) and
Denuit and Trufin (2019). Decision trees selects variables to partition the covariate
space, and the importance of the variables is measured by analyzing the contribution
of each component to the total drop in the objective function (the mean-squared
error). A notable advantage of these methods is the ability to perform binary splits
additively, simplifying the interpreation of the final model.
Regression trees form the basis to different regression algorithms, such as random
forests and boosting, which are described below.
5
Random Forest - RF
Boosting
Similar to the random forest, boosting also consists of aggregating different es-
timators of the regression function. However, this combination is done differently.
There are some variations and implementations of boosting, but the estimator is
built incrementally. Initially, it is assigned the value of 0. This estimator has a high
bias but low variance (zero). At each step, the value of the estimator is updated
to decrease the bias and increase the variance of the new function. This is done by
adding to the estimator a function that predicts the residuals.
One way to do this is by using a regression tree and it is important that this
tree has a shallow depth to avoid overfitting. Additionally, instead of simply adding
this function in full, it is added multiplied by λ (called the learning rate): a factor
between 0 and 1 aimed at preventing overfitting. Another different implementation
of boosting mentioned in this article and famous for good performance is XGBoost
(Chen and Guestrin, 2016).
6
XGBoost
CatBoost
In our study, we applied deep learning using FNN, LSTM, and MHA architec-
tures. FNN, consisting of interconnected neurons arranged in successive layers, was
trained and validated on the data. Each neuron processes weighted numeric infor-
mation through an activation function, with the output serving as input for other
7
neurons. FNNs are used for both classification and regression tasks.
The intermediate layers of neurons between the input information and the result
are called hidden layers. In this configuration, there is no circular relationship
between neurons. Neural networks with circular connections are called recurrent
(RNN). FNNs with only one hidden layer are called shallow networks, while FNNs
with more than two hidden layers are called deep networks. In actuarial science
and mortality forecasting, neural networks have been used in Hainaut (2018), Nigri
et al. (2019), and Richman and Wüthrich (2019).
RNN is a class of neural networks that can be used to model sequence data
(see Denuit and Trufin (2019) and its references). The connections between cells in
an RNN form a directed graph over a time sequence, and RNNs use the internal
state (memory) of the cells to capture dynamics and temporal dependencies. Long-
Short Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) is a special type
of RNN capable of learning long-term dependencies. An LSTM cell includes two
internal states: a cell state that is a vector designed to hold long-term memory and
a hidden state that is the output vector of the cell representing the current working
memory.
At each time interval, an LSTM cell (Long Short-Term Memory) processes time
series data and receives two states from the previous LSTM cell as inputs. Subse-
quently, it updates its internal states through an input gate, an output gate, and a
forget gate. This cell is capable of memorizing values for extensive time intervals,
and the mentioned three gates regulate the flow of incoming and outgoing infor-
mation from the cell, allowing efficient modeling of long-duration dependencies in
sequential data. Details of the algorithm can be found in Hochreiter and Schmid-
huber (1997). This feature may have significant relevance and applicability in the
field of actuarial science. This enables the analyst to identify complex patterns in
time series of financial and actuarial data, enabling actuaries and risk analysts to
make more accurate and informed estimates. Examples of papers using LSTM in
actuarial science can be found in Richman (2021); Nigri et al. (2019).
More recently, other RNN architectures have been proposed in the literature, in
particular by the Natural Language Processing (NLP) community. One of the major
breakthroughs was the MHA algorithm from Vaswani et al. (2017) which forms one
of the building blocks of the Transformer algorithm. This algorithm revolutionized
the field due to its ability to handle sequential data, such as sentences or texts, very
effectively. The Transformer stands out for some key features:
8
to each data according to its importance to the sequence under analysis. This
helps the model better capture relationships between the data.
• Activation masks: to ensure the model does not “see” information that is not
yet available to it during output generation (in the decoder), masks are used
to hide parts of the input that should not be used in the current prediction.
This model, due to its ability to handle sequences more efficiently and effectively
than other models, has become the basis for many state-of-the-art NLP models, in-
cluding the renowned GPT (Generative Pre-trained Transformer) and BERT (Bidi-
rectional Encoder Representations from Transformers). Additional details on these
models are available in Radford et al. (2018); Devlin et al. (2018). These models
have revolutionized the NLP field with their capabilities to generate and interpret
human language more closely to human understanding. GPT, with its focus on
text generation, and BERT, with its ability to understand the bidirectional context
of words in a sentence, offer powerful foundations for various applications. In the
actuarial context, these models have been utilized in various studies, including in
Wang et al. (2024); Troxler and Schelldorfer (2024).
In the actuarial literature there are several approaches for mortality forecasting
which use covariates associated with previous ages or years, the age-period models
(see Lee and Carter (1992) and the CBD model - Cairns et al. (2006)). Cohort effects
are also used by some models (for example Renshaw and Haberman (2006a)). In
this sense, LSTM and MHA neural networks seem promising since these algorithms
use past or proxy information for prediction.
2.1 Data
The pension fund data includes exposure and the number of deaths from 2012 to
2021 (10 years), for each gender. Figure 1 shows the annual time series of total
exposure and number of deaths by gender, and the age distribution of these quanti-
ties by gender for the year 2021. As noted, exposure and the number of deaths are
significantly higher for males.
9
Figure 1: Top row: time series of (i) exposure and (ii) number of deaths per year
for the pension fund. Data from 2012 to 2021, both genders and all ages. Second
row: Pyramids of (i) exposure and (ii) number of deaths by age and gender in 2021.
Bottom row: Raw log of age-specific mortality rates for males and females for years
2012-2021.
10
As one may observe in Figure 1, there is an increase in the number of deaths in
2020 and 2021. Possibly, this fact is due to the COVID-19 pandemic. Regarding
the age distribution of the exposed population, it reflects a significant number of
individuals in the retirement ages, where there is also the addition of pensioners
to the sample, who are spouses receiving income due to the death of the principal
insured. Last row of Figure 1 shows the logarithm of mortality rates by gender and
year. The plot with curves by gender clearly denotes the difference between male
and female mortality - the latter having lower rates.
In this section, we detail the structure of the response variable and the covariates
used in formulating our models. Consider that:
• Dx,t
i
is the number of deaths at age x, calendar year t for gender i for the
sample of pension funds;
• Ex,t
i
is the exposure at age x, calendar year t for gender i for the same funds.
The variable we aim to model, i.e., our response variable, is the mortality rate,
defined as:
i
Dx,t
mix,t = i
Ex,t
This rate reflects the proportion of deaths in relation to the exposure for each
specified demographic group. To predict these mortality rates using machine learn-
ing methods, we selected age (x), year (t), gender (i), and the mortality rate from the
previous year (mix,t−1 ) as covariates. This approach allows for capturing temporal
trends and demographic variations in mortality rates and thus forecasting them.
Machine learning methods do not assume a specific distribution for mortality rates.
In addition to machine learning methods, we also apply the model introduced by Lee
11
and Carter (1992), as it is a benchmark in mortality forecasting literature, widely
used in different versions in academia and practical applications. In this model, the
number of deaths for gender i at age x in year t follows:
i i
Dx,t ∼ P oi(eµx,t Ex,t
i
)
The logarithm of the mortality rate (µix,t ) has a parametric structure that de-
pends on age and time:
For the model to be identifiable, the following conditions must be met: x bix = 1
P
approach with the use of the StanMoMo package in R - Barigou and Goffard
(2022).
In this subsection, we present the architectures of the networks used in this article.
Additionally, we present here the two different ways in which LSTM and MHA
networks were used in order to learn about mortality rates by gender, age, and year.
The FNN network used in the article has its architecture presented in the plot
(i) of Figure 2. The LSTM and MHA networks, on the other hand, were applied
with input data organized in two different ways.
The first approach used for LSTM and MHA networks considered the mortality
rate data for gender i, at age x, in year t, as dependent on the respective mortality
rate of the same age x and gender i, but from previous years t − 1 and t − 2. The
models were called LSTM-1 and MHA-1. The second approach used for LSTM and
MHA networks considered the mortality rate data at age x as a long time series,
considering jointly age and year. The models were called LSTM-2 and MHA-2.
In order to predict the response variable, composed of the mortality rate mix,t
for a given age x, year t, and gender i, we define the vector of predictor variables,
i
denoted by Wx,t . This vector is composed of age x, year t, gender i (where i = {0, 1},
indicating male if i = 1), in addition to a set of mortality rates from previous
years and/or ages that are directly considered in the LSTM and MHA architectures.
Specifically, to capture the temporal dynamics in mortality rates, in LSTM-1 and
MHA-1 models, the mortality rates from two immediately preceding years (t − 1
12
and t − 2) are included for all ages x, for both male and female genders. Thus, the
i
vector of Wx,t is defined as:
i
Wx,t = [x, i, t, mix,t−1 , mix,t−2 ]
In LSTM-2 and MHA-2 models, mortality rates are stacked by age and by year,
in this order, as illustrated in Figure 3. For the mortality rate at age x in a given
year t, the mortality rates from previous ages of the same year t and all ages from
i
all previous years to t are taken into account. Thus, the vector of Wx,t is defined as:
i
Wx,t = [x, i, t, mi30:x−1,t , mi30:95,2012:t−1 ]
In the LSTM-3 and MHA-3 models, the mortality rates from the two preceding
years (t − 1 and t − 2) are included for ages x − 1 and x − 2, respectively, for both
male and female genders. This approach aims to capture generational effects. The
i
vector Wx,t is defined as follows:
i
Wx,t = [x, i, t, mix−1,t−1 , mix−2,t−2 ]
13
Figure 2: Architectures of the networks (i) FNN and (ii) FNN and LSTM/MHA
jointly used for learning mortality rates.
Figure 3: Illustration of how the mortality rate series (mix,t ) is handled in LSTM-2
and MHA-2 models. The black line denotes the observed mortality rates arranged
by Year and Age, between 2012 and 2018, from 30 to 95 years, for Males. The blue
dashed line denotes the observed rates for 2019, from 30 to 95 years, for Males.
14
network layer to generate the final output (mix,t ). For both of these networks, the
loss function was the mean absolute error.
2.3 Performance
To evaluate one-year out-of-sample performance, predicted values were compared
with observed values, and the metrics of Mean Absolute Error (MAE) and Root
Mean Squared Error (RMSE) were calculated. MAE and RMSE are two metrics that
measure the differences between predicted and observed values. They are calculated
as follows:
n
1X i
M AE = |m ∗ − m̂ix,t∗ |
n i=1 x,t
v
u n
u1 X
RM SE = t (mi ∗ − m̂ix,t∗ )2
n i=1 x,t
3 Results
In this section, we present the results of the methods applied to forecast the mortality
of our pension fund sample. To avoid misinterpretation of the results, we decided to
use data from 2012 to 2019 in a temporal cross-validation approach, with schematics
presented in Figure 4. The process starts assuming the model is being fit by the end
of 2015, which allows the modeler to use data from 2013 to 2015 (inclusive). For
this standpoint, the aim is to forecast next year’s mortality, so performance metrics
(MAE and RMSE) are computed for 2016. This is represented in the first line of
Figure 5. For the following year, data from 2016 is also available and the forecasting
accuracy is computed for 2017. The performance metrics (MAE and RMSE) for the
out-of-sample cross-validation are presented in Table 1, which shows them for each
one of the models testes in this paper. Ultimately, the best performance averages
(the lowest ones), based on MAE and RMSE, will represent the best models.
Figures 5 – 9 present realized mortality rates for 2019 (black dots) and the
mortality curves (in blue) predicted by each one of the models when using data up
to 2018.
As shown in Table 1, CatBoost and Lee-Carter achieved roughly the same out-of-
sample MAE, which is smaller than any other model tested. When the performance
metric is chosen as the RMSE, the best model was the FNN, followed closely by
15
Figure 4: Schematic representation of the temporal cross-validation procedure. In
the first step, the training set, represented by the black nodes, consists of data from
2013 to 2015, and mortality predictions are obtained for 2016, represented by the
gray nodes, with performance metrics calculated for this year’s data. In subsequent
steps, data from the next year is added to the training set, and predictions and
metrics are obtained for the subsequent year. This procedure is called time series
cross-validation.
16
Figure 5: Lee-Carter Model. Observed mortality rates in 2019 (dots) with the
respective predicted mortality curve. On the left: Males. On the right: Females.
Catboost. It should also be mentioned that even though none of the models impose
any structure on the resulting mortality curves, the FNN results in a highly desirable
smooth and strictly increasing curve. Additionally, it is observed that models based
on LSTM and MHA networks (where the mortality rate series were loaded in a
“stacked” manner), and XGBoost were quite competitive when compared to the
benchmark Lee-Carter model.
Figure 5 shows the mortality projection plots for 2019 and the predicted mortality
curve based on the traditional Lee-Carter model. Figures 6 and 7 present the plots of
machine learning models (regression tree, random forest, boosting, and XGBoost).
It is possible to observe the characteristics of the predicted mortality curves when
tree-based algorithms are used: curves produced are not smooth.
Figure 8 shows the plots of the FNN. For this methodology, out-of-sample results
with a smoothing characteristic are obtained. Based on the plot, an apparent good
fit for raw mortality rates may also be seen for both males and females.
Despite the performance metrics being similar when comparing Lee-Carter to
ML algorithms, one should note that FNN produced a 2019 out-of-sample predicted
mortality curve smoother than the others. This is an important feature when mod-
eling mortality even for sub (and selected) populations that are inherently smaller
than national populations. These results made us choose this algorithm to perform
applications in the next section.
Moreover, regarding the residuals heat maps presented in Figure 10, they show
that fitting in younger ages is better than in older ones (80+) for both models (FNN
and Lee-Carter) and both genders. It is an expected result since older ages have
much less data (exposure). For the same reason, one may also notice that the fit for
Males presents lower residuals for both models when compared to Females. The Lee-
Carter fit presents higher absolute residuals than FNN for Males. For Females, the
FNN fit presents more symmetric residuals than the Lee-Carter one. In summary,
the plots in Figure 10 also show a better performance of FNN for one-year ahead
17
Figure 6: AR - Regression Tree and RF - Random Forest. Observed mortality rates
in 2019 (dots) with the respective predicted mortality curve. On the left: Males.
On the right: Females. First row: Regression Tree. Second row: Random Forest.
18
Figure 7: BST - Boosting and XGB - XGBoost. Observed mortality rates in 2019
(dots) with the respective predicted mortality curve. On the left: Males.On the
right: Females. First row: Boosting. Second row; XGBoost.
19
Figure 9: CatBoost, LSTM, and MHA. Observed mortality rates in 2019 (dots) with
the respective predicted mortality curve for the Male gender. Order: (i) CatBoost
(ii) LSTM-1 (iii) MHA-1 (iv) LSTM-2 and (v) MHA-2.
CatBoost LSTM-1
MHA-1 LSTM-2
MHA-2
Figure 10: Out-of-sample absolute residuals heat maps for FNN and Lee-Carter
models. Years: 2016-2019. First row: FNN. Second row: Lee-Carter.
FNN (Female) FNN (Male)
80
60
40 Residuals
0.10
Age
0.05
Lee−Carter (Female) Lee−Carter (Male)
0.00
−0.05
−0.10
80
60
40
20
Year e60 Male e60 Female
2022 24.89 27.86
2023 25.06 28.01
2024 25.24 28.17
2025 25.42 28.32
2026 25.60 28.47
Table 2: FNN Neural Network. Forecast of life expectancy at age 60 (e60 ). Training
sample: 2012-2019. Forecast period: 2022-2026. Ages: 60-100.
4 Applications
For practical application purposes, we will consider in this section the FNN, which
obtained the best RMSE in Table 1. As applications of the results, we: (i) fore-
casted life expectancy at age 60 over time - Table 2, (ii) estimated the effect of the
pandemics on the pension fund sample in the years 2020 and 2021 - Figure 11, (iii)
measured a hypothetical mathematical provision to retirees over 60 years old in the
sample, considering $1 of annual income, and (iv) constructed the expected cash flow
for income granted to pensioners over 60 years old in 2021 for the following 10 years
- Figure 12. Such cash flow is a necessary input for asset and liability management
(ALM) or market risk calculation (specifically, mismatch risk) purposes.
The mathematical provision calculated consistently with current and realistic
mortality assumptions, considering $1 of annual income and 5% per year of real
interest, was $ 376,825. For comparison purposes, if the BR-EMS 2021 mortality
table2 for Males’ survival is used, the value is $ 370,210 (a difference of 1.76%).
Based on the projection made for the years 2020 and 2021, with data up to 2019,
it is observed that there is a predicted exposure value higher than the observed one
for ages over 60 years for the Male gender. The difference was 0.5%. The five-year
age group that showed the highest relative difference was from 80 to 84 years.
Finally, exposures of the study population for the next 10 years (2022 to 2031)
for each gender were projected. Based on the year 2021, exposures for ages 60
to 95 were projected for the years 2022 to 2031 using expected future mortality
rates obtained with the FNN neural network model. These projected exposures can
provide future cash flows for current retirees. Figure 12 illustrates the total annual
future exposures considering the best fit (FNN) and also the fixed mortality table
2
Available in http://www.susep.gov.br/setores-susep/cgpro/copep/Tabuas%20BR-
EMS%202010%202015%202021-010721.xlsx
21
Figure 11: Exposure for years 2020 and 2021 - Males: observed (black line) x
predicted (red line) by FNN neural network.
Figure 12: Projected cash flow for the years 2022 to 2031 (10 years) of the pension
fund sample population at ages 60 to 95 in 2021, considering: (i) the FNN method,
and (ii) the 2021 insurance market mortality table (BR-EMS 2021).
from the insurance industry for 2021 (BR-EMS 2021). The projected exposures
are consistent with the results presented in this article. According to the trained
models, the mortality of the pension fund population is lower than that predicted
in the insurance industry table for 2021.
5 Concluding remarks
A consistent evaluation of pension fund obligations requires the use of consistent,
realistic, and updated mortality rates and also the prediction of these mortality
rates over time for cash flow discounting purposes. Most mortality rate projection
22
studies use national population data when performing applications. Despite the
basis risk, the use of national mortality rates may not pose a significant problem
when the difference between the national population and the selected population is
not relevant. However, in countries with large social inequalities, mortality rates can
be quite different between a selected population and the average national population.
In Brazil, national mortality rates are higher than those of a selected subpopulation,
such as pension fund participants or insurance customers.
In this paper, several machine learning methods and neural networks were ap-
plied to predict the mortality of participants over 30 years old from a pension fund
population. The use of machine learning in actuarial science has been termed ac-
tuarial learning. The methods used in this paper were decision tree, random forest,
boosting, XGBoost, CatBoost, FNN, LSTM and MHA neural networks. We com-
pared the results obtained with the Lee and Carter (1992) model, a widely used
benchmark for mortality rate forecasting purposes. Our results show that actuarial
learning models are a competitive alternative for mortality rate forecasting for a se-
lected population. Using RMSE as the performance metric, the best fit was achieved
using the FNN neural network. If MAE is considered, there are some competitive
models with Lee-Carter, such as CatBoost, LSTM, and MHA.
The applications made in the article indicate, among other things, the differenti-
ation in mortality levels between Males and Females, the longevity improvement in
the evolution of life expectancy over the years, and an estimated number of deaths
for 60+ in the years 2020 and 2021 that may be due to the COVID pandemic.
Furthermore, we also highlight the forecasting of cash flows consistently, which is
a fundamental tool for pension fund risk management, as it is a necessary step for
ALM risk evaluation (called “mismatch risk”).
The code supporting the findings of this study is available on request from the cor-
responding author, EFLM. The data are not publicly available since they belong to
some Brazilian pension funds that were only made available to the authors through
a non-disclosure agreement - NDA.
23
References
Aslam, F., Hunjra, A. I., Ftiti, Z., Louhichi, W., and Shams, T. (2022). Insur-
ance fraud detection: Evidence from artificial intelligence and machine learning.
Research in International Business and Finance, 62.
Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and
Regression Trees. Wadsworth.
Cairns, A., Blake, D., and Dowd, K. (2006). Pricing death: frameworks for the
valuation and securitization of mortality risk. ASTIN Bulletin, 36(1):79–120.
Cairns, A. J., Blake, D., Dowd, K., Coughlan, G. D., Epstein, D., Ong, A., and
Balevich, I. (2009). A quantitative comparison of stochastic mortality models us-
ing data from england and wales and the united states. North American Actuarial
Journal, 13(1):1–35.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Pro-
ceedings of the 22nd acm sigkdd international conference on knowledge discovery
and data mining, pages 785–794.
Denuit, M. and Trufin, J. (2019). Effective statistical learning methods for actuaries.
Springer.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training
of deep bidirectional transformers for language understanding. arXiv preprint
arXiv:1810.04805.
Dorogush, A. V., Ershov, V., and Gulin, A. (2018). Catboost: gradient boosting
with categorical features support. https://doi.org/10.48550/arXiv.1810.11363.
Dowd, K., Cairns, A. J., Blake, D., Coughlan, G. D., and Khalaf-Allah, M. (2011).
A gravity model of mortality rates for two related populations. North American
Actuarial Journal, 15(2):334–356.
24
Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information
and computation, 121(2):256–285.
James, G., Witten, D., Hastie, T., Tibshirani, R., and Taylor, J. (2023). An intro-
duction to statistical learning: With applications in python. Springer Nature.
Li, N. and Lee, R. (2005). Coherent mortality forecasts for a group of populations:
An extension of the lee-carter method. Demography, 42:575–594.
Li, N., Lee, R., and Gerland, P. (2013). Extending the lee-carter method to model
the rotation of age patterns of mortality decline for long-term projections. De-
mography, 50(6):2037–2051.
Makhonza, B., Mogodi, N. S., and Mbuvha, R. (2024). Mortality forecasting using
temporal fusion transformers. SSRN - electronic copy available at: ht tp s:
// ss rn .c om /a bs tr ac t= 46 84 43 6 .
25
Maynard, T., Bordon, A., Berry, J. B., Baxter, D. B., Skertic, W., Gotch, B. T.,
Shah, N. T., Wilkinson, A. N., Khare, S. H., and Jones, K. B. (2019). What role
for ai in insurance pricing? Available online: ht tp s: // ww w. re se ar ch ga te
.n et /p ub li ca ti on /3 37 11 08 92 _W HA T_ RO LE _F OR _A I_ IN _I NS UR AN CE
_P RI CI NG _A _P RE PR IN T .
Messenger, R. and Mandell, L. (1972). A modal search technique for predictive nom-
inal scale multivariate analysis. Journal of the American statistical association,
67(340):768–772.
Nigri, A., Levantesi, S., Marino, M., Scognamiglio, S., and Perla, F. (2019). A deep
learning integrated lee–carter model. Risks, 7(1):33.
Noll, A., Salzmann, R., and Wuthrich, M. V. (2018). Case study: French motor
third-party liability claims. SSRN - Available online: ht tp s: // pa pe rs .s sr
n. co m/ so l3 /p ap er s. cf m? ab st ra ct _i d= 31 64 76 4 .
Novykov, V., Bilson, C., Gepp, A., Harris, G., and Vanstone, B. (2023). Deep
learning applications in investment portfolio management: a systematic literature
review. Journal of Accounting Literature - https://doi.org/10.1108/JAL-07-2023-
0119.
Quinlan, J. R. (1979). Induction over large data bases. Technical report, Computer
Science Department, School of Humanities and Sciences, Univ.
26
Richman, R. (2021). Ai in actuarial science–a review of recent advances–part 2.
Annals of Actuarial Science, 15(2):230–258.
Sandström, A. (2016). Handbook of solvency for actuaries and risk managers: theory
and practice. CRC press.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser,
L., and Polosukhin, I. (2017). Attention is all you need. Advances in neural
information processing systems, 30.
Wang, J., Wen, L., Xiao, L., and Wang, C. (2024). Time-series forecasting of
mortality rates using transformer. Scandinavian Actuarial Journal, 2024(2):109–
123.
27