Scenario Analysis in Risk Management
Scenario Analysis in Risk Management
Hassani
Scenario
Analysis in Risk
Management
Theory and Practice in Finance
Scenario Analysis in Risk Management
Bertrand K. Hassani
123
Dr. Bertrand K. Hassani
Global Head of Research and
Innovation - Risk Methodology
Grupo Santander
Madrid, Spain
Associate Researcher
Université Paris 1 Panthéon Sorbonne
Labex ReFi
Paris, France
The opinions, ideas and approaches expressed or presented are those of the author and do
not necessarily reflect Santander’s position. As a result, Santander cannot be held responsible
for them. The values presented are just illustrations and do not represent Santander losses,
exposures or risks.
The objective of this book is to show that scenario analysis in financial institutions
can be addressed in various ways depending on what we would like to achieve.
There is not one method better than the other; there are just methods more
appropriate in some particular situations.
I heard so many times opinionated people selecting a scenario strategy over
another because everyone was doing it; that is not the appropriate answer and
may lead to selecting an inappropriate methodology and consequently to unusable
results. Even worse, the managers may lose faith in the process and tell everyone
that scenario analysis for risk management is useless.
Therefore, in this book, I am presenting various approaches to perform scenario
analysis; some are relying on quantitative approaches; others are more qualitative,
but once again, none of them are better than another. Each of them has some pros
and cons and depends on the maturity of your risk framework, the type of risk
that banks are willing to assess and manage and the information available. I tried
to present them in the simplest way possible and to keep only the essence of the
methodologies as in any case; eventually, the managers will have to fine-tune them,
making them their own approach. I hope this book will inspire them. One of my
objectives was also to make supposedly complicated methodologies accessible to
any risk managers. Indeed, these would just need to have a basic understanding of
mathematics.
Note that I implemented all the methodologies I am presenting in this book,
and all the figures presented are my own. Most of them have been implemented
in professional environments to answer practical issues. Therefore, I am giving
some tools for risk managers to address scenario analysis, I am providing leads
for researchers to start proposing solutions to address them and I hope that the
clear perspective of combining the methodologies will lead to future academic and
professional developments.
vii
viii Preface
ix
x Biography
1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.1 Is this War? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.2 Scenario Planning: Why, What, Where, How, When. . . . . . . . . . . . . . 2
1.3 Objectives and Typology .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4
1.4 Scenario Pre-requirements .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6
1.5 Scenarios, a Living Organism . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7
1.6 Risk Culture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10
2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11
2.1 The Risk Framework .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11
2.2 The Risk Taxonomy: A Base for Story Lines . . .. . . . . . . . . . . . . . . . . . . . 12
2.3 Risk Interactions and Contagion.. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14
2.4 The Regulatory Framework .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 17
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 23
3 The Information Set: Feeding the Scenarios . . . . . . . .. . . . . . . . . . . . . . . . . . . . 25
3.1 Characterising Numeric Data . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 27
3.1.1 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 28
3.1.2 Quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 29
3.1.3 Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 29
3.2 Data Sciences .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 30
3.2.1 Data Mining.. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 30
3.2.2 Machine Learning and Artificial Intelligence . . . . . . . . . . . . . 32
3.2.3 Common Methodologies . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 34
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 36
4 The Consensus Approach .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 39
4.1 The Process .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 40
4.2 In Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 43
4.2.1 Pre-workshop . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 44
4.2.2 The Workshops . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 45
xi
xii Contents
8 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 97
8.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 97
8.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 100
8.2.1 A Practical Focus on the Gaussian Case. . . . . . . . . . . . . . . . . . . 103
8.2.2 Moving Towards an Integrated System: Learning . . . . . . . . 104
8.3 For the Managers .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 106
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 108
9 Artificial Neural Network to Serve Scenario Analysis Purposes . . . . . . 111
9.1 Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 112
9.2 In Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 113
9.3 Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 114
9.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 116
9.5 For the Manager: Pros and Cons . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 119
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 120
10 Forward-Looking Underlying Information: Working with
Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 123
10.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 123
10.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 124
10.2.1 Theoretical Aspects. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 125
10.2.2 The Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 131
10.3 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 135
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 139
11 Dependencies and Relationships Between Variables . . . . . . . . . . . . . . . . . . . 141
11.1 Dependencies, Correlations and Copulas . . . . . . .. . . . . . . . . . . . . . . . . . . . 142
11.1.1 Correlations Measures .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 142
11.1.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 144
11.1.3 Copula .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 151
11.2 For the Manager .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 155
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 157
Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 159
Chapter 1
Introduction
Scenarios have been used for years in many areas (economics, military, aeronautics,
public health, etc.) and are far from being limited to the financial industry. Scenarios
are a postulated sequence or development of events, a summary of the plot of a play,
including information about its stakeholders, characters, locations, scenes, weather,
etc., i.e., anything that could contribute to make it more realistic. One of the key
aspects of scenario analysis is the fact that starting from one set of assumptions
it is possible to evaluate and map various outcome of a particular situation. While
in this book we will limit ourselves to the financial industry for our applications
and examples, it would be an extreme prejudice not to inspire ourselves from
what we could use from other industries in terms of methodologies, procedures or
regulations.
Indeed, to illustrate the importance of scenario analysis in our world, let’s
start with famous historical examples combining geopolitics and military strategy.
The greatest leaders in the history of mankind based their decisions on the
outcome of scenarios, Pearl Harbor attack was one of the outcomes of the scenario
analysed by Commanders Mitsuo Fuchida and Minoru Genda considering that
their objective was to make US naval forces inoperative for 6 months at least
(Burbeck, 2013). Sir Winston Churchill analysed the possibility of attacking the
Soviet Union with Americans and West Germans as allied after World War II
(Operation Unthinkable—Lewis 2008). Scenarios are a very useful and powerful
tool to analyse all potential future outcomes and prepare ourselves for them. From
a counter terrorism point of view, the protection scheme of nuclear plants from
terrorist attacks is clearly the result of a scenario analysis, for example, in France
squadron of fighter pilots are ready to take off and intercept an airborne potential
threat in less than 15 min. It is also really important to understand that the risk
assessment resulting from a scenario analysis may result in the acceptance of this
risk. The nuclear plant located in Fessenheim, next to the Switzerland border has
been built in a seismic area, but the authorities came to the conclusion that the risk
was acceptable, besides it is one of the oldest nuclear plants in France and one may
think the likelihood of a failure and age are correlated.1
In the military, most equipments are the results of either field experience or
scenarios or past failure, but in many industries, contrary to the financial sector,
we may not have the opportunity to wait for a failure to be able to identify an issue
and fix it, and therefore learn from it as in other industries such as aeronautics or
pharmaceutical if a failure occurs or a faulty product is released, people’s lives are
at risk.
Now, focusing on scenario analysis within financial institutions, this one has
usually one of the following forms. The first form is stress testing (Rebonato,
2010). Stress testing aims at assessing multiple outcomes resulting from adverse
stories of different magnitude, for instance, likely, mild and worse case scenario
relying on macroeconomic variables. Indeed, it is quite frequent to analyse a
particular situation with respect to how would macroeconomic variables evolve.
The second form relates to operational risk management as prescribed in the current
regulation,2 where scenarios are required for capital calculations (Pillar I and Pillar
II—Rippel and Teply 2011). The recent crisis taught us that banks failing due
to extreme incidents may dramatically impact the real economy, indeed, Société
Générale rogue tading, a massive operational risk resulted in a massive market
risk materialisation as all the prices went down simultaneously, in a huge lack
of liquidity as the interbanking market was failing (banks were not funding each
others) and consequently in the well-known credit crunch as banks were not funding
the real economy, the whole occurring within the context of the subprime crisis.
Impacted companies were suffering and some relatively healthy went even bankrupt.
The last use of scenarios is related to general risk management. It is probably the
most useful use of scenario analysis as it is not necessary a regulatory demand and as
such would only be used by risk managers to improve their risk framework removing
the pressure of a potential higher capital charge.
Presenting scenario analysis in its globality and not only in the financial industry,
the following paragraph presents a military scenario planning. In this book, we
draw a parallel between the scenario process in the Army and in a financial
institution. The scenario planning as suggested in Aepli et al. (2010) is summarised
below. It can be broken down in 12 successive steps of equal importance and we
1
The idea behind these example is neither to generate any controversy nor to feed any conspiracy
theory but to refer to examples which should talk to the largest number of readers.
2
Note that though the regulation might change, scenarios should still be required for risk
management purposes.
1.2 Scenario Planning: Why, What, Where, How, When. . . 3
would recommend risk managers to keep them in mind undertaking such a process
(International Institute for Environment and Development (IIED) 2009; Gregory
Stone and Redmer, 2006).
1. Decide on the key question to be answered by the analysis. This allows creating
the framework for the analysis and condition the next points.
2. Set both time and scope of the analysis, i.e. place the scenario in a period of
time, define the environment and precise the condition.
3. Identify and select major stakeholders to be engaged, i.e. people at the
origination of the risk, responsible or accountable, or impacted by it.
4. Map basic trends and driving forces such as industry, economic, political,
technological, legal and societal trends. Evaluate to what extent these trends
affect the issues to be analysed.
5. Find key uncertainties, assess the presence of relationships between the driving
forces and rule out any inappropriate scenarios.
6. Group the linked forces and try to operate a reduction of the forces to the most
relevant.
7. Identify the extreme outcomes of the driving forces. Check the consistency and
the plausibility of these ones with respect of the time frame, the scope and the
environment of the scenario and stakeholders behaviours.
8. Define and write out the scenarios. The narrative is very important as it will be
a reference for all the stakeholders, i.e., a common ground for analysis.
9. Identify research needs (e.g. data, information, elements supporting the stories,
etc.).
10. Develop quantitative methods. Depending on the objectives, methodologies
may have to be refined or developed. This is the book main focus and it provides
multiple examples, but these are not exhaustive.
11. Assess the scenarios implementing for example one of the strategies presented
in this book, such as the consensus approach.
12. Transform the outcome of the scenario analysis into key management actions
to prevent, control or mitigate the risks.
These steps are almost applicable as such to perform a reliable scenario analysis in
a financial institution. None of the questions should be a priori left aside.
Remark 1.2.1 An issue to bear in mind during the scenario planning phase of the
process which may impact the model selection and the selection of the stakeholders
is what we would refer to as the seniority bias. This is something we observed
facilitating the workshops, even if you have the best experts of a topic in the
room, the presence of a more senior person might lead to a self-censorship. People
may censor themselves due to threats against them or their interests from their
line manager, shareholders, etc. Self-censor occurs when employees deliberately
distort their contributions either to please the more senior manager or by fear of him
without any other pressure than their own perception of the situation.
4 1 Introduction
To make this book more readable and to help risk managers sorting issues
in a simple scenario taxonomy, we propose the following classification. The
most destructive risks a financial institution has to bear are those we will label
Conventional Warfare, Over Confident, Black Swans, Dinosaurs and Chimera.
By “conventional warfare”, we are talking about the traditional risk, those you
would face on a “Business as Usual” basis, such as credit risk and market risk.
Taken independently, they are not usually leading to dramatic issues and the
bank address then permanently, but when an event transforms their non-correlated
behaviour into highly correlated one, i.e., each and every individual component fails
simultaneously, they might be dramatic (and may fall in the last category). The Over
Confident label refers to types of incidents which have already materialised but the
magnitude was really low, or it led to a near miss therefore practitioners assumed
that their framework was functioning until we have a similar but larger incident.
The Black Swan is as reference to Nassim Taleb’s book, entitled the Black Swan
(Taleb, 2010). The allegory of the Black Swan was, no one could ever believe that
Black Swans existed until someone saw one. For a financial institution it is “the
risk that can never materialise in a target entity” type of scenario, but only pure
lack of experience made us make that judgment. The Dinosaur is the risk that the
institution thought did not exist anymore but suddenly “comes back to life” and
stomps on the financial institution. This is typically the exposure to the back book
financial institutions are experiencing. The last one is the Chimera, the mythological
beast, the one which is not supposed to exist, it is the impossible, the things that do
not make sense a priori. Here, we know it can happen, we just do not believe it will
such as the Fessenheim nuclear plant example before, a meteor striking the building
or a rogue wave which until the middle of the twentieth century was consider as
nonexistent by scientist, despite having been reported by multiple witnesses. The
difference between the Black Swan and the Chimera types of scenarios is that the
Black Swan did exist we just did not know it, we did not even think about the
possible existence of a Black Swan, while the Chimera is not supposed to exist,
we do not want to believe it can happen even if we could imagine it, as it is
mythological, and we have not been able to understand the underlying phenomenon
yet.
Scenarios can both find their roots in endogenous and exogenous issues. Exam-
ples of endogenous risk are those due to the intrinsic way of doing business, of
crating value, of dealing with customers, etc. Exogenous risks are those having
external roots such as terrorist attacks and earthquakes. The main problem with
endogenous risk is that we may be able to point fingers at people if we experience
some failures and therefore, we may have an adverse incentive as these people
may not want anyone to discover that there is a potential issue in their area. While
exogenous risk, we may experience another problem, in the sense that sometime
not much can be done to control it, though awareness is still important. The human
aspect of scenario analysis briefly discussed here is really important and should
always be bore in mind. As if the process is not clearly explained and the people
working in the financial institution do not buy in then we will face a major issue, the
scenarios will not be reliable as they will not be performed properly, they would do
6 1 Introduction
them because it is compulsory, but they will never try to obtain any usable outcome
as for them it is a waste of time. The first step of a good scenario process is to teach
and train people on why scenarios are useful, how to deal with them, in other words
to market the process. The objective is to embedded the process. The best evidence
of an embedded process is the transformation of a demanded “tick the box” kind of
process to scenarios analysis performed by business unit themselves without being
requested to do so as it became part of their culture.
Another question which is worth addressing in the process is the moment when
we should capture the controls already in place. Indeed, facilitating a scenario
analysis, you will often hear the following answer to the question “do you have
a risk?”, “no, we have controls in place”. To what, the manager should reply, you
have controls because you have a risk. This comes from the confusion made between
inherent and residual risk. Indeed, the inherent risk is the one the entity faces, the
one it has before putting any controls or mitigants in place. The residual risk is
the one the financial institution faces after the controls. The one that will face
even if the mitigants are functioning. Performing a scenario analysis, it is really
important working with inherent risk in a first step, otherwise our perception of the
risk might be biased. Indeed, let’s assume we would rather work with the residual
risk, then your control is failing, you would never have captured the real exposure,
and therefore would have assumed you were safe when you were not. Therefore,
we would recommend working with the inherent risk in the first place and capturing
the impact of the control in a second stage. The inherent risk will also support the
internal process of prioritisation.
Another question arise, should scenarios be analysed independently one from
the other or should we adopt a holistic methodology? Obviously here it not only
depends on the quality and the availability of the information, inputs, experts, timing
and feasibility, but also on the type of scenario you are interested in analysing.
Indeed, if your scenario is for stress testing purposes and a contagion channel has
been identified between various risks, you would need to capture this phenomenon
otherwise the full exposure will not be taken into account and your scenario will
not be representative of the threat. Now, if you are only working on a limited scope
kind of scenarios and you only have a few weeks to do the analysis you may want to
adopt an alternative strategy. Note that holistic approaches are usually highly input
consuming.
One of the key success factors of scenario analysis is the analysis of the underlying
inputs, for instance, the data. These are analysed prior to the scenario analysis,
this is the starting point to evaluate the extreme exposure. No one should ever
underestimate the importance of data in scenario analysis, in both what it brings
and the limitations associated. Indeed, the information used for scenario analysis,
obtained internally (losses, customer data, etc.) or externally (macroeconomic
variables, external LGD, etc.) are key to the reliability of the scenario analysis,
1.5 Scenarios, a Living Organism 7
but some major challenges may arise that could limit the use of these data and
worse may mislead people owning the scenarios, i.e., responsible for evaluating the
exposures and dealing with the outcomes. Some of the main issues we would need
to discuss are
• Data security: It is the issue of individual privacy. While using the data we have
to be careful not to threaten the character confidential of most data.
• Data integrity: Clearly, data analysis can only be as good as the data relying upon.
A key implementation challenge is integrating conflicting or redundant data from
different sources. A data validation process should be undertaken. This is the
process of ensuring that a program operates on clean, correct and useful data,
checking the correctness, the meaningfulness and the security of data used as
input into the system.
• Stationarity analysis: In mathematics and statistics, a stationary process is a
stochastic process whose joint probability distribution does not change when
shifted in time. Consequently, moments such as mean and variance, if they exist,
do not change over time and do not follow any trends. In other words, we can
rely on past data to predict the future (up to certain extent).
• Technical obsolescence: The requirement we all have to store large quantity of
data drives technological innovation in storage. This results in fast advances in
storage technology. However, the technologies that used to be the best not so
long ago are rapidly discarded by both suppliers and customers. Proper migration
strategies have to be anticipated at the risk of not being able to access the data
anymore.
• Data relevance: How old should be the data? Can we assume a single horizon
of analysis for all the data or depending on the question we are interested in
answering, should we use different horizons? This question is almost rhetorical
as obviously we need to use the data that are appropriate and consistent with what
we would be interested in analysing. It also means that the quantity of data and
their reliability depends on the possibility to use outdated data.
in the genetic code, as in the savana, the bank that is going to survive the longer is
not the biggest or the strongest, but the one the most likely to adapt, and scenario
allows adaptation through understanding of the environment.
Darwin’s theory of evolution is a slow gradual process. Darwin wrote, “Natural
selection acts only by taking advantage of slight successive variations; she can never
take a great and sudden leap, but must advance by short and sure, though slow
steps” formed by numerous, successive, slight modifications. The transcription of
the evolution into a financial institution tells us that scenarios may evolve slowly,
but they will evolve as long as practices. A scenario to be plausible should capture
the largest number of impacts and interactions. As for Darwin’s theoretical starting
point for evolution, the starting point of a scenario analysis process is always quite
gross, but by digging more and more every time, learning from experience, this
heuristic process would lead to better ways of assessing the risk, better outcomes,
better controls, etc.
Indeed, we usually observe that the scenario analysis process in a financial
institution mature in parallel of the framework. The first time the process is
undertaken, this one is never based on the most advanced strategy, the latest
methodologies and does not necessarily provide the most precise results. But this
phase is really important and necessary as it is the ignition phase, i.e., the one
that triggers a cultural change in terms of risk management procedure. The process
will constantly evolve towards the most appropriate strategy for the target financial
institution as the stakeholders will own the process.
Scenario is not a box ticking process.
It is widely agreed that failures of culture (Ashby et al., 2013), which permitted
excessive and uncontrolled risk-taking and a loss of focus on end customer, were
at the heart of the financial crisis. The cultural dimensions of risk-taking and
control in financial organisations have been widely discussed, arguing that, for
all the many formal frameworks and technical modelling expertise of modern
financial risk management, risk-taking behaviour and a questionable ethics were
misunderstood by individuals, companies and regulators. The growing interest in
financial institution risk culture since 2008 has been symptomatic of a desire to
reconnect risk-taking, related management and appropriate return. The couple risk-
return which somehow has been forgotten came back not as a couple but as a single
polymorphic organism in which risk and return are indivisible elements.
When risk culture change programs were being led by risk functions the reshap-
ing of the organisational risk management was at the centre of these programs. Risk
culture is a way of framing and perceiving risk issues in an organisation. In addition,
risk culture is itself a composite of a number of interrelated factors involving many
trade-offs. Risk culture is not static but dynamic, a continuous process which repeats
and renews itself constantly. The risk culture is permanently subject to shocks that
1.6 Risk Culture 9
lead to permanent questioning. The informal aspect is probably the most important,
i.e., small behaviours and habits which in the aggregate constitute the state of
risk culture. Note that risk culture can be taken in a more general sense, as risk
culture is what makes us fasten our seat-belts in our cars. Risk culture is usually
transorganisational, and different risk cultures may be found within organisations or
across the financial industry.
The most fundamental issue at stake in the risk culture debate is an organisations
self-awareness of its balance between risk-taking and control. It is clear that
many organisational actors prior to the financial crisis were either unaware of,
or indifferent to, the risk profile of the organisation as a whole as soon as the
return generated was appropriate or sufficient according to their own standard.
Indeed, inefficient control functions and revenue-generating functions considered
more important created an unbalanced relationship leading to the disaster we know.
The risk appetite framework now helps articulating these relationships with more
clarity.
The risk culture discussion shows the desire to make risk and risk management a
more prominent feature of organisational decision-making and governance, with
the embedded idea to move towards a more convoluted risk framework, i.e., a
framework in which the risk department is engaged before rather than after a
business decision is made. The usual structure of the risk management framework
currently relies on
• a three Lines of Defence backbone
• risk oversight units and capabilities and
• increased attention to risk information consolidation and aggregation.
Risk representatives engage directly with the businesses, acting as trusted advisors;
they usually propose risk training programs and general awareness-raising activities.
Naturally this is only possible if the risk function is credible. The former approach
involves acting on the capabilities of the risk function and in developing greater
business fluency and credibility. Combining the independence of the second line
of defence and the construction of partnerships might be perceived as inconsistent,
though one may argue that an effective supervision requires proper explanations and
clear statements of the expectations to the supervisee. Consequently, they need to
have good relationships and regular interactions (structured or ad-hoc).
According to Ashby et al. (2013), two kinds of attitude have been observed
towards interactions: enthusiastic and realistic. The former are developing tools
on their own, and are investing time and resource in building informal internal
networks. Realists have a tendency to think that too much interaction can inhibit
decision-making. Realists have more respect for the lines of defense models than
enthusiasts who continually work across first and second lines. Limits and related
risk management policies and rules unintentionally become a system in their own
right. The impact of history and collective memory of past incidents should not be
underestimated as this is a constituting part of the culture of the company and may
drive future risk management behaviours.
10 1 Introduction
Regulation has undoubtedly been a big driver of risk culture change programmes.
Though a lot of organisations were frustrated about the weigh of the regulatory
demand, they had no choice but to cooperate and most of them sooner or latter
accepted the new regulatory climate and worked with it more actively; however, it
is still unclear if the extent of the regulatory footprint on the business has been fully
understood.
Behaviour alteration related to cultural change requires repositioning customer
service at the centre of financial institutions activities, and good behaviour should
be incentivised for faster changes. Martial artists say that it requires 1000 repetitions
of a single move to make it a reflex, and 10,000 thousands to change it. Therefore it
is critical to adjust behaviours before it becomes a reflex.
Scenario analysis will impact the risk culture within a financial institution as it
will change the perception of some risks and will consequently lead to the creation,
the amendment or enhancement of controls, leading themselves to the reinforcement
of the risk culture. As mentioned previously, scenarios will evolve and the risk
culture will evolve simultaneously. We believe that the current three line of defence
model will slowly fade away as the empowerment of the first line will grow.
References
Aepli, P., Summerfield, E., & Ribaux, O. (2010). Decision making in policing: Operations and
management. Lausanne: EPFL Press.
Ashby, S., Palermo, T., & Power, M. (2013). Risk culture in financial organisations - a research
report. London: London School of Economics.
Burbeck, J. (2013). Pearl Harbor - a World War II summary. http://www.wtj.com/articles/pearl_
harbor/.
Darwin, C. (1859). On the origin of species by means of natural selection, or the preservation of
favoured races in the struggle for life (1st ed.). London: John Murray.
Gregory Stone, A., & Redmer, T. A. O. (2006). The case study approach to scenario planning.
Journal of Practical Consulting, 1(1), 7–18.
International Institute for Environment and Development (IIED). (2009). In Profiles of tools and
tactics for environmental mainstreaming. Scenario planning, No. 9.
Lewis, J. (2008). Changing direction: British military planning for post-war strategic defence (2nd
ed.). London: Routledge.
Rebonato, R. (2010). Coherent stress testing: A Bayesian approach to the analysis of financial
stress. London: Wiley.
Rippel, M., & Teply, P. (2011). Operational risk - scenario analysis. Prague Economic Paper, 1,
23–39.
Taleb, N. (2010). The black swan: The impact of highly improbable (2nd ed.). New York: Random
House and Penguin.
Chapter 2
Environment
Banks’ risk strategy drives the management framework as it sets the tone for
risk appetite, policies, controls and “business as usual” risk management processes.
Policies should be efficiently and effectively cascaded at all levels as long as across
the entity to ensure a homogeneous risk management.
The risk governance is the process by which the Board of Directors sets
objectives, oversees the framework and the management execution. A successful
risk strategy is equivalent to the risk being embedded at every level of a financial
institution. Governance sets the precedence for strategy, structure and execution. An
ideal risk management process ensures that organisational behaviour is consistent
with its risk appetite or tolerance, i.e., the risk an institution is willing to take to
generate a particular return. In other words, the risk appetite has two components:
risk and return. Through the risk appetite process, we see that risk management
clearly informs business decisions.
In financial institutions, it is necessary to evaluate the risk management effective-
ness regularly to ensure its quality in the long term, and to test stressed situations
to ensure its reliability when extreme incidents materialise. Here, we realise that
scenario analysis is inherent to risk management as we are talking about situations
which never materialised.
The appropriate risk management execution requires risk measurement tools
relying on the information obtained through risk control self-assessments, data
collection, etc., to better replicate the company risk profile. Indeed, appropriate risk
mitigation and internal control procedures are established in the first line such that
the risk is mitigated. “Key Risk Indicators” are established to ensure timely warning
is received prior to the occurrence of an event (COSO, 2004).
In this section we present the main risks to which scenario analysis is usually or can
be applied in financial institutions. This list is non-exhaustive but gives a good idea
of the task to be accomplished.
Starting with credit risk, this one is defined as the risk of default on a debt
that may arise from a borrower failing to make contractual payments, such as the
principal and/ or the interests. The loss may be total or partial. Credit risk can itself
be split as follows:
• Credit default risk is the risk of loss arising from a debtor being unable to pay its
debt. For example, if the debtor is more than 90 days past due on any material
credit obligation. A potential story line would be an increase in the probability of
default of a signature due to a decrease in the profit generated.
• Concentration risk is the risk associated with a single type of counterparty
(signature or industry) having the potential to produce losses large enough to
lead to the failure of the financial institution. An example of story line would be
2.2 The Risk Taxonomy: A Base for Story Lines 13
a breach in concentration appetite due to a position taken by the target entity for
the sake of another entity of the same group.
• Country risk is the risk of loss arising from a sovereign state freezing foreign
currency payments or defaulting on its obligations. The relationship between
this risk, macroeconomics and countries stability is non-negligible. Political risk
analysis lies at the intersection between politics and business, and it deals with
the probability that political decisions, events or conditions significantly affect
the profitability of a business actor or the expected value of a given economic
action. An acceptable story line would be the bank has invested in a country in
which the government has changed and has nationalised some of the companies.
Market risk is the risk of a loss in positions arising from movements in market
prices. This one can be split between,
• Equity risk: the risk associated with changes in stock or stock index prices.
• Interest rate risk: the risk associated with changes in interest rates.
• Currency risk: the risk associated with changes in foreign exchange rates.
• Commodity risk: the risk associated with changes in commodity prices.
• Margining risk results from uncertain future cash outflows due to margin calls
covering adverse value changes of a given position.
A potential story line would be a simultaneous drop in all indexes, rates and currency
of a country due to a sudden decrease of GDP.
Liquidity risk is the risk that given a certain period of time, a particular financial
asset cannot be traded quickly enough without impacting the market price. A
story line could be a portfolio of structured notes that was performing correctly
is suddenly crashing as the index on which they have been built is dropping, but
the structured notes have no market and therefore the products can only be sold at a
huge loss. It might make more sense to analyse the liquidity risk at the micro level
(portfolio level). Regarding this risk of illiquidity at the macro level, considering that
a bank is transforming the money with a short duration such as savings into money
with a longer one through lending, a bank is operating a maturity transformation.
This ends up in banks having an unfavourable liquidity position as they do not
have access to the money they lent while the money they owe to customer can be
withdrawn at any time on demand. Through “asset and liability management”, banks
are managing this mismatch, however, and we cannot emphasise enough this point,
this implies that banks are structurally illiquid (Guégan and Hassani, 2015).
Operational risk is defined as the risk of loss resulting from inadequate or failed
internal processes, people and systems or from external events. This definition
includes legal risk, but excludes strategic and reputational risk (BCBS, 2004). It
also includes other classes of risk, such as fraud, security, privacy protection, cyber
risks, physical, environmental risks and currently one of the most dramatic, conduct
risk. Contrary to other risks such as those related to credit or market, operational
risks are usually not willingly incurred nor are they revenue driven (i.e. they are
not resulting from a voluntary position), they are not necessarily diversifiable, but
they are manageable. An example of story line would be the occurrence of a rogue
trading on the “delta one” desk on which a trader took an illegal position. Note that
14 2 Environment
for some bank this might not be a scenario as it happened, but for others it might be
an interesting case to test their resilience.
Financial institutions misconduct or perception of misconduct leads to con-
duct risk. Indeed, the terminology “conduct risk” gathers various processes and
behaviours which fall into operational risk Basel category 4 (Clients, Products
and Business Practices), but goes beyond as it generally implies a non-negligible
reputational risk. Conduct risk can lead to huge losses, usually resulting from
compensations, fines or remediation costs and the reputational impact (see below)
might non negligible. Contrary to other operational risks, conduct risk is connected
to the activity of the financial institution, i.e. the way the business is driven.
Legal risk is a component of operational risk. It is the risk of loss which is
primarily caused by a defective transaction, a claim, a change in law, an inadequate
management of non-contractual rights, a failure to meet non-contractual obligations
among other things (McCormick, 2011). Some may define it as any incident
implying a litigation.
Model risk is the risk of loss resulting from using models to make decisions
(Hassani, 2015). Understanding this risk partly as probability and partly as impact
provides insight into other risk measured. A potential story line would be a
model not properly adjusted due to a paradigm shift in the market leading to an
inappropriate hedge of some positions.
Reputational risk is a risk of loss resulting from damages to a firm’s reputation
in terms of revenue, operating costs, capital or regulatory costs, or destruction of
shareholder value, resulting from an adverse or potentially criminal event even if the
company is not found guilty. In that case, a good reputational risk scenario would be
a loss of income due to the discovery that the target entity is funding illegal activities
in a banned country. Once again, for some banks this might not be as scenario as the
incident already materialised, but the lesson learnt might be useful for others.
The systemic risk defines itself as the risk of collapse of an entire financial
system, as opposed to the risk associated with the failure of one of its component
without jeopardising the entire system. The financial system instability engendered
potentially caused or exacerbated by idiosyncratic events or conditions in financial
intermediaries may lead to the destruction of the system (Piatetsky-Shapiro, 2011).
The materialisation of a systemic risk implies the presence of interdependencies
in the financial system, i.e. the failure of a single entity may trigger a cascading
failure, which could potentially bankrupt or bring down the entire system or market
(Schwarcz, 2008).
In fact this aspect is too often left aside when it should be at the centre of the
topic. Combined effect due to contagion can lead to larger losses than the sum of the
impact of each components taken separately. Consequently, capturing the contagion
effect between the risks may be a first way of tackling systemic risks.
Originally, financial contagion referred to the spread of market disturbances from
one country to the other. Financial contagion is a natural risk for countries whose
financial systems are integrated in international financial markets as obviously what
occurs in a country would mechanically impact the other in a way or another. The
impact is usually proportional to the incident, in other words, the larger the issue,
the larger the impact on the other countries belonging to the same system unless
some mitigants are in place to at least confine the smaller exposures. The contagion
phenomenon is usually one of the main components explaining that a crisis is not
contained and may pass across borders and affect an entire region of the globe.
Financial contagion may occur at any level of a particular economy and may be
triggered by various things. Note that lately, banks have been at the conjunction of a
dramatic contagion process (subprime crisis), but inappropriate political decision
may lead to even larger issues. At the domestic level, usually the failure of a
domestic bank or financial intermediary triggers a transmission when it defaults on
interbank liabilities and sells assets in a fire sale, thereby undermining confidence
in similar banks. International financial contagion, which happens in both advanced
and developing economies, is the transmission of a financial crisis across financial
markets to directly and indirectly connected economies. However, in today’s
financial system, due to both cross-regional and cross-border operations of banks,
financial contagion usually happens simultaneously at the domestic level and across
borders.
Financial contagion usually generates financial volatility and may damage the
economy of countries. There are several branches of classifications that explain
the mechanism of financial contagion, which are spillover effects and financial
crisis that are caused by the influence of the four agents’ behaviour. These are
governments, financial institutions, investors and borrowers (Dornbusch et al., 2000)
The first branch, spillover effects, can be seen as a negative externality. Spillover
effects are also known as fundamental-based contagion. These effects can occur
globally, i.e., affecting several countries simultaneously, or regionally, only impact-
ing adjacent countries. The larger the countries, the more global the effect is the
general rule. Conversely, the smaller countries are those triggering regional effects.
Though some debates arose regarding the difference between co-movements and
contagion, here we will state that if what happen in a particular location directly or
indirectly impact the situation in another geographical region, with a time lag1 then,
we should refer to it as contagion.
At the micro level, from a risk management perspective, contagion should be
considered when the materialisation of a first risk (say operational risk) triggers the
materialisation of subsequent risk (for instance, market or credit). This is typically
1
This one might be extremely short.
16 2 Environment
what happened in Société Générale rogue trading issue as briefly discussed in the
previous chapter.
From a macroeconomic point of view, contagion effects have repercussions on
an international scale transmitted through channels such as trade links, competitive
devaluations and financial links. “A financial crisis in one country can lead to direct
financial effects, including reductions in trade credits, foreign direct investment, and
other capital flows abroad”. Financial links come from globalisation since countries
try to be more economically integrated with global financial markets. Many authors
have analysed financial contagions. Allen and Gale (2000) and Lagunoff and
Schreft (2001) analyse financial contagion as a result of linkages among financial
intermediaries.
Trade links are another type of shock that has its similarities to common shocks
and financial links. These types of shocks are more focused on its integration
causing local impacts. Kaminsky and Reinhart (2000) document the evidence that
trade links in goods and services and exposure to a common creditor can explain
earlier crises clusters, not only the debt crisis of the early 1980s and 1990s, but also
the observed historical pattern of contagion.
Irrational phenomenon might also cause financial contagion. Co-movements
are considered irrational when there is no global shock triggering and interde-
pendence channeling. The cause is related to one of the four agents’ behaviours
presented earlier. Contagion causes are increased risk aversion, lack of confidence
and financial fears. Transmission channel can be through typical correlations or
liquidation processes (i.e. sell in one country to fund a position in another) (King
and Wadhwani, 1990; Calvo, 2004).
Remark 2.3.1 Investor’s behaviour seems to be one of the biggest issues that can
impact a country’s financial system.
So to summarise, a contagion may be caused by:
1. Irrational co-movements related to crowed psychology (Shiller, 1984; Kirman,
1993)
2. Rational but excessive co-movements
3. Liquidity problems
4. Information asymmetry and coordination problems
5. Shift of equilibrium
6. Change in the international financial system, or in the rules of the game
7. Geographic factors or neighbourhood effect (De Gregorio and Valdes, 2001)
8. The developments of sophisticated financial products, such as credit default
swaps and collateralised debt obligations which spread the exposure across the
world (sub-prime crisis).
Capturing interactions and contagion effects leads to analysing financial crises.
The term financial crisis refers to a variety of situations resulting in a loss of
paper wealth, which may ultimately affect the real economy. An interesting way
of representing financial contagion can be done extending models used to represent
epidemics as illustrated by Figs. 2.1 and 2.2.
2.4 The Regulatory Framework 17
l
l
USA
l UK
l France
ll China
l l Brazil
l
l ll l l l
l lll l l
l ll ll l lll l l
l llll lllllll l l
l l l lllll lll
l
ll l
ll
l l ll l l l l
llllllll lll l
l l lll l ll
ll ll
ll ll l
l ll ll
l l lll
l
l l lllllll l
ll
ll
l llll lll l
l l l lllll
lll ll
l l l l lll lllll l l l
ll ll l
l l llllllll ll
l lllll l l l l
l ll ll l
l
l l ll l ll l
l l l ll ll l l
l
l l
l l l
l
l l
l
l
l
Fig. 2.1 In order to graphically represent a financial contagion, I inspired myself from a model
created to represent the way epidemies move from a specific geographic region to another
(Oganisian, 2015)
l l
French
lTunisia l lArabia
Polynesia
Malaysia
Saudi
l
Croatia
Bermuda l
Oman
l lPanama
Kazakhstan
l Azerbaijan
Czech lBelize
Republic l l
Lebanon
lIslands Madagascar
Falkland l l Singapore
l
New
Argentina l
Caledonial l l
Fiji
NA
l l
Peru
Tuvalu l
Mauritius
CostalCameroon
Rical l
Paraguay Thailand l
Guadeloupe
l l
Romania
Latvia
l l Greenland l lMacau
l l
Nicaragua Luxembourg Jersey Togo
Israel l l
Japan
l
Seychelles
Dominican l Republic
United Arab l l Kuwait
Botswana
lColombia
Emirates l l
Sweden l
Guernsey
l
China
l lLucia
Saint
Haiti
l
Finland l l l l Burma
Gabon
l
Uganda Mozambique
Nigeria l
Morocco Switzerlandl
l l
Ireland l
Slovenia
l Burkina Faso
South
Sudan Africa l l
Bulgaria
Angola l l
UkraineHungary
l
Greece l Saint Kittsland Nevis
l
Anguilla l Equatorial
Venezuela l Guinea l
Netherlands
Ethiopia l
l Australia l and
Trinidad
Georgia lTaiwan
Tobago
l Egypt l
l
Guam
l Guinea
Papua New
l l l
Italy
l
Cuba United l
Kingdom
United Montenegro
lStatesBritish Virgin Islands l
l l
lEritrea
Chile l Turkey
Somalia Barbados
l Iran
l l
Pakistan
Germany lPortugal
Vanuatu l
l Qatar l
Liberia
l
Spain l
Serbia
l l
l l Antilles
Cyprus
l
Niger l Brazil
Tajikistan
l
Afghanistan
Laos
Netherlands Capel l l
Bahrain
Tanzania
Verde
l
Yemen l
Slovakia
lRussia l Senegal
lMexico l Ghana
Vietnam lFrance l
Zambia
l l Jamaica l lNepal
Uzbekistan
lSriVirgin
Mauritania
l l Southl
Islands l Uruguay
Philippines
Korea l l lEcuador l
Lanka
l
Belgium Antigua
l and
Denmark Barbuda
Honduras lBolivia
l Lithuania
Canada l
l New Zealand Malta
l Bangladesh
lDjibouti
Kyrgyzstan
Northern l
l Mariana l Islands
Indonesia
l Western
Norway lSahara l l
l
Rwanda l Austria
Bahamas
l
Algeria l
lMalawi Kenya l
l Guyana
Bosnia landPuerto
Maldives l Rico
Herzegovina HonglKong
Cookl Islands l
Iraq l
Guatemala
l l
Turkmenistan
l Belarus
Congo (Kinshasa) l
Armenia Impact
lCambodia
El Salvador l l
Jordan
Poland Catalyst
l India
l Trigger
Fig. 2.2 This figure is similar to the Fig. 2.1, though here the representation is more granular
and sort countries involved in three categories: Trigger (origin), Catalyst (enabler or transmission
channel) and Impact (countries impacted)
One approach is to view the business from a portfolio perspective, with capital
management, liquidity management and financial performance integrated into the
process. Comprehensive stress testing and scenario analysis must take into account
all risk factors, including credit, market, liquidity, operational, funding, interest,
foreign exchange and trading risks. To these must be added operational risks due to
inadequate systems and controls, insurance risk (including catastrophes), business
risk factors (including interest rate, securitisation and residual risks), concentration
risk, high impact low-probability events, cyclicality and capital planning.
In the following paragraphs, we extract quotes from multiple regulatory docu-
ments or international associations discussing scenario analysis requests to empha-
sise how important the process is considered. We analysed documents from multiple
countries and multiple industries. These documents are also used to give some
perspectives and illustrate the relationships between scenario analysis, stress testing
and risk management.
In IAA (2013), the International Actuarial Association points out the differences
between scenario analysis and stress testing: “A scenario is a possible future
environment, either at a point in time or over a period of time. A projection of
the effects of a scenario over the time period studied can either address a particular
firm or an entire industry or national economy. To determine the relevant aspects of
this situation to consider, one or more events or changes in circumstances may be
forecast, possibly through identification or simulation of several risk factors, often
over multiple time periods. The effect of these events or changes in circumstances
in a scenario can be generated from a shock to the system resulting from a sudden
change in a single variable or risk factor. Scenarios can also be complex, involving
changes to and interactions among many factors over time, perhaps generated by a
set of cascading events. It can be helpful in scenario analysis to provide a narrative
(story) behind the scenario, including the risks (events) that generated the scenario.
Because the future is uncertain, there are many possible scenarios. In addition
there may be a range of financial effects on a firm arising from each scenario. The
projection of the financial effects during a selected scenario will likely differ from
those seen using the modeler’s best expectation of the way the current state of the
world is most likely to evolve. Nevertheless, an analysis of alternative scenarios can
provide useful information to involved stakeholders. While the study of the effect
of likely scenarios is useful for business planning and for the estimation of expected
profits or losses, it is not useful for assessing the impact of rare and/or catastrophic
future events, or even moderately adverse scenarios. A scenario with significant or
unexpected adverse consequences is referred to as a stress scenario.”
“A stress test is a projection of the financial condition of a firm or economy
under a specific set of severely adverse conditions that may be the result of several
risk factors over several time periods with severe consequences that can extend
over months or years. Alternatively, it might be just one risk factor and be short
in duration. The likelihood of the scenario underlying a stress test has been referred
to as extreme but plausible.”
20 2 Environment
Analysing the case of the United Kingdom, a firm must carry out an ICAAP in
accordance with the PRA’s rules. These include requirements on the firm to assess,
on an ongoing basis the amounts, types and distribution of capital that it considers
adequate to cover the level and nature of the risks to which it is exposed. This
assessment should cover the major sources of risks to the firm’s ability to meet
its liabilities as they fall due, and should incorporate stress testing and scenario
analysis. If a firm is merely attempting to replicate the PRA’s own methodologies, it
will not be carrying out its own assessment in accordance with the ICAAP rules.
The ICAAP should be documented and updated annually by the firm, or more
frequently if changes in the business, strategy, nature or scale of its activities or
operational environment suggest that the current level of financial resources is no
longer adequate.
Specifically PRA (2015) says that firms have “to develop a framework for stress
testing, scenario analysis and capital management that captures the full range of
risks to which they are exposed and enables these risks to be assessed against a
range of plausible yet severe scenarios. The ICAAP document should outline how
stress testing supports capital planning for the firm”.
In the European Union (Single Supervisory Mechanism jurisdiction), the RTS
(EBA, 2013)—and later the final guideline (EBA, 2014)—is prepared taking into
account the FSB Key Attributes of Effective Resolution Regimes for Financial Insti-
tutions and current supervisory practices. The draft RTS covers the key elements and
essential issues that should be addressed by institutions when developing financial
distress scenarios against which the recovery plan will be tested.
Quoting: “Drafting a recovery plan is a duty of institutions or groups undertaken
prior to a crisis in order to assess the potential options that an institution or a
group could itself implement to restore financial strength and viability should
the institution or group come under severe stress. A key assumption is that
recovery plans shall not assume that extraordinary public financial support would
be provided.
The plan is drafted and owned by the financial institution, and assessed by the
relevant competent authority or authorities. The objective of the recovery plan is
not to forecast the factors that could prompt a crisis. Rather it is to identify the
options that might be available to counter; and to assess whether they are sufficiently
robust and if their nature is sufficiently varied to cope with a wide range of shocks
of different natures. The objective of preparing financial distress scenarios is to
define a set of hypothetical and forward-looking events against which the impact
and feasibility of the recovery plan will be tested. Institutions or groups should use
an appropriate number of system wide financial distress scenarios and idiosyncratic
financial distress scenarios to test their recovery planning. More than one of each
scenario is useful, as well as scenarios that combine both systemic and idiosyncratic
events. Financial distress scenarios used for recovery planning shall be designed
such that they would threaten failure of the institution or group, in the case recovery
measures are not implemented in a timely manner by the institution or group”.
2.4 The Regulatory Framework 21
scenarios that have been identified. A range of techniques is available for eliciting
these assessments from business managers and subject matter experts, each with
its own strengths and weaknesses. More than 30 years of academic literature is
available in the area of eliciting probability assessments from experts. Much of
this literature is informed by psychologists, economists and decision analysts, who
have done research into the difficulties people face when trying to make probability
assessments. The literature provides insight into the sources of uncertainty and bias
surrounding scenario assessments, and the methods available for their mitigation.”
The purpose of APRA (2007) was “to increase awareness of the techniques that are
available to ensure scenario analysis is conducted in a structured and robust manner.
Banks should be aware of the variety of methods available, and should consider
applying a range of techniques as appropriate”.
Besides, the COAG (Council of Australian Governments) Energy Council in
COAG (2015) requires some specific scenario analysis: “The Council tasked offi-
cials with a scenario analysis exercise and to come back to it with recommendations,
if necessary, about the need for further work. At its July 2015 meeting, the
Council considered these recommendations and tasked officials to further explore
the implications of key issues that emerged from the initial stress-testing exercise.
This piece of work is being considered as part of the Council’s strategic work
program to ensure regulatory frameworks are ready to cope with the effects of
emerging technologies”. This is an example of scenario analysis requirement for
risk management in an industry different from the financial sector.
In the USA, in the nuclear industry, the US Nuclear Regulatory Commission
(NRC) requested scenario analysis in USNRC (2004) and USNRC (2012). “The
U.S. Nuclear Regulatory Commission (NRC) will use these Regulatory Analysis
Guidelines (“Guidelines”) to evaluate proposed actions that may be needed to pro-
tect public health and safety. These evaluations will aid the staff and the Commission
in determining whether the proposed actions are needed, in providing adequate
justification for the proposed action, and in documenting a clear explanation of why
a particular action was recommended. The Guidelines establish a framework for
(1) identifying the problem and associated objectives, (2) identifying alternatives
for meeting the objectives, (3) analysing the consequences of alternatives, (4)
selecting a preferred alternative, and (5) documenting the analysis in an organised
and understandable format. The resulting document is referred to as a regulatory
analysis”.
Specifically for the financial industry, “the Comprehensive Capital Analysis and
Review (CCAR) (Fed, 2016b) is an annual exercise by the Federal Reserve to assess
whether the largest bank holding companies operating in the United States have
sufficient capital to continue operations throughout times of economic and financial
stress and that they have robust, forward-looking capital-planning processes that
account for their unique risks”.
As part of this exercise, the Federal Reserve evaluates institutions’ capital
adequacy, internal capital adequacy assessment processes and their individual plans
to make capital distributions, such as dividend payments or stock repurchases.
Dodd-Frank Act (Fed, 2016a) stress testing (DFAST)—a complementary exercise
References 23
References
Allen, F., & Gale, D. (2000). Financial contagion. Journal of Political Economy, 108(1), 1–33.
APRA. (2007). Applying a structured approach to operational risk scenario analysis in Australia.
Sydney: Australian Prudential Regulation Authority.
BCBS. (2004). International convergence of capital measurement and capital standards. Basel:
Bank for International Settlements.
Calvo, G. A. (2004). Contagion in emerging markets: When wall street is a carrier. In E. Bour, D.
Heymann, & F. Navajas (Eds.), Latin American economic crises: Trade and labour (pp. 81–91).
London, UK: Palgrave Macmillan.
COAG. (2015). Electricity network economic regulation; scenario analysis. In Council of Aus-
tralian Governments, Energy Council, Energy Working Group, Network Strategy Working
Group.
COSO. (2004). Enterprise risk management - integrated framework executive summary. In
Committee of Sponsoring Organizations of the Treadway Commission.
De Gregorio, J., & Valdes, R.O. (2001). Crisis transmission: Evidence from the debt, tequila, and
Asian flu crises. World Bank Economic Review, 15(2), 289–314.
24 2 Environment
Dornbusch, R., Park, Y., & Claessens, S. (2000). Contagion: Understanding how it spreads. The
World Bank Research Observer, 15(2), 177–197.
EBA. (2013). Draft regulatory technical standards specifying the range of scenarios to be used
in recovery plans under the draft directive establishing a framework for the recovery and
resolution of credit institutions and investment firms. London: European Banking Authority.
EBA. (2014). Guidelines on the range of scenarios to be used in recovery plans. London: European
Banking Authority.
FAO. (2012). South Asian forests and forestry to 2020. In Food and Agriculture Organisation of
the United Nations.
Fed. (2016a). 2016 supervisory scenarios for annual stress tests required under the Dodd-Frank
act stress testing rules and the capital plan rule. Washington, DC: Federal Reserve Board.
Fed. (2016b). Comprehensive capital analysis and review 2016 summary instructions. Washington,
DC: Federal Reserve Board.
Guégan, D., & Hassani, B. (2015). Stress testing engineering: The real risk measurement? In A.
Bensoussan, D. Guégan, & C. Tapiero (Eds.), Future perspectives in risk models and finance.
New York: Springer.
Hassani, B. (2015). Model risk - from epistemology to management. Working paper, Université
Paris 1.
IAA. (2013). Stress testing and scenario analysis. In International Actuarial Association.
Kaminsky, G. L., & Reinhart, C. M. (2000). On crises, contagion, and confusion. Journal of
International Economics, 51(1), 145–168.
King, M. A., & Wadhwani, S. (1990). Transmission of volatility between stock markets. Review of
Financial Studies, 3(1), 5–33.
Kirman, A. (1993). Ants, rationality, and recruitment. Quarterly Journal of Economics, 108(1),
137–156.
Lagunoff, R. D., & Schreft, S. L. (2001). A model of financial fragility. Journal of Economic
Theory, 99(1), 220–264.
Markowitz, H. M. (1952). Portfolio selection. The Journal of Finance, 7(1), 77–91.
McCormick, R. (2011). Legal risk in the financial markets (2nd ed.). Oxford: Oxford University
Press.
Oganisian, A. (2015). Modeling ebola contagion using airline networks in R. www.r-bloggers.com.
Piatetsky-Shapiro, G. (2011). Modeling systemic and sovereign risk. In A. Berd (Ed.), Lessons
from the financial crisis (pp. 143–185). London: RISK Books.
PRA. (2015). The internal capital adequacy assessment process (ICAAP) and the supervisory
review and evaluation process (SREP). In Prudential Regulation Authority, Bank of England.
Quagliariello, M. 2009. Stress-testing the banking system - methodologies and applications.
Cambridge: Cambridge University Press.
Schwarcz, S. L. (2008). Systemic risk. Georgetown Law Journal, 97(1), 193–249.
Shiller, R. J. (1984). Stock prices and social dynamics. Brookings Papers on Economic Activity,
1984(2), 457–498.
USNRC. (2004). Regulatory analysis guidelines of the U.S. nuclear regulatory commission. In
NUREG/BR-0058, U.S. Nuclear Regulatory Commission.
USNRC. (2012). Modeling potential reactor accident consequences - state-of-the-art reactor con-
sequence analyses: Using decades of research and experience to model accident progression,
mitigation, emergency response, and health effects. In U.S. Nuclear Regulatory Commission.
Chapter 3
The Information Set: Feeding the Scenarios
A point needs to be made absolutely clear before any further presentation. None of
the methodologies presented in the following chapters can be used if these are not
fed by appropriate inputs. Therefore, we will start this chapter characterising and
defining data, then we will discuss pre-processing these inputs to make them ready
for further processing.
Data are a set of qualitative or quantitative pieces of information. Data are
engendered or obtained by both observation and measurement. They are collected,
reported, analysed and visualised. Data as a general concept refers to the fact
that some existing information or knowledge is represented in some form suitable
for better or different processing. Raw data, or unprocessed data, are a collection
of numbers and characters; data processing commonly occurs by stages, and the
processed data from one stage may become the raw data of the next one. Field data
are raw data that is collected in an uncontrolled environment. Experimental data
are data generated within the context of a scientific investigation by observation
and recording, in other words these are data generated carrying out an analysis
or implementing a model. It is important to understand, in particular for scenario
analysis, that the data used to support the process are not most of the time numeric
values. Indeed, these are usually pieces of information gathered to support a story
line, such as articles, media, incidents experienced by other financial institutions or
expert perceptions.
Indeed, specifying the definition, data are any facts, numbers or text that can be
processed. Nowadays, organisations are capturing and gathering growing quantities
of data in various formats. We can split the data in three categories:
• operational or transactional data such as, sales, cost, inventory, payroll and
accounting
• non-operational data, such as industry sales, forecast data and macroeconomic
data
• meta data—data about the data itself, such as logical database design or data
dictionary definitions
Recent regulatory documents, for instance, the Risk Data Aggregation (BCBS,
2013b) aims at ensuring the quality of the data used for regulatory purposes.
However, one may argue that any piece of data could be used for regulatory
purposes, consequently, this piece of regulation should lead in the long term to
a wider capture of data for risk measurement and consequently to better risk
management.
Indeed, BCBS (2013b) requires that the information banks used in decision-
making process capture all risks accurately as well as timely. This piece of
regulation sets out principles of effective and efficient risk management by pushing
banks to adopt the right systems and develop the right skills and capabilities instead
of ticking regulatory boxes to be compliant at a certain date.
It is important to understand that this piece of regulation cannot be dealt with in
silo. It has to be regarded as part of the larger library of regulations. This paragraph
provides some illustrations, indeed, BCBS 239 compliance is required to ensure
a successful Comprehensive Capital Analysis and Review (CCAR—Fed 2016) in
the USA, a Firm Data Submission Framework (FDSF—BoE 2013) in the UK, the
European Banking Authority stress tests (EBA, 2016) or the Fundamental Review
of the Trading Book (FRTB—BCBS 2013a). The previous chapter introduced in
more details some of these regulatory processes. The resources required for these
exercises are quite significant and should not be underestimated. If banks are not
able to demonstrate compliant solutions for data management, data governance
across the multiple units such as risk, finance and the businesses, these will have to
change their risk measurement strategies and as a corollary their risk framework. In
the short term, these rules may imply larger capital charges for financial institutions,
but in the long term the better risk management processes implied by this regulation
should help reducing capital charges for bank using internal model, or at least the
banks exposures.
With the level of change implied, BCBS 239 might be considered as the core
of regulatory transformation. However, banks task to make evolve their operating
model remains significant and adapting their technology infrastructures will not be
straightforward. However, both banks and regulators acknowledge the challenges.
The principles are an enabler to transform the business strategically speaking, to
survive in the new market environment. Furthermore, combining BCBS 239 specific
requirements and business as usual tasks across, business units and geographical
locations will not be easy and will require appropriate change management.
In the meantime, a nebula emerged, usually referred to as big data. Big data
is a broad term for data sets so large or complex that traditional data processing
applications are inadequate. Challenges include analysis, capture, cleansing, search,
sharing, storage, transfer, visualisation and information privacy. The term often
refers simply to the use of predictive analytics or other certain advanced methods
to extract valuable information from data, and rarely to a particular size of data
set. Accuracy in big data may lead to more confidence in the decision-making
process and consequently improvement in operational efficiency, reduction of costs
and better risk management.
3.1 Characterising Numeric Data 27
Data analysis is the key to the future of banking, our environment will move from
traditional to rational though a path which might be emotional. Data analysis allows
looking at a particular situation from different angles. Besides the possibilities are
unlimited as long as the underlying data are of good quality. Indeed, data analysis
may lead to the detection of correlations, trends, etc., and can be used in multiple
areas and industries. Dealing with large data sets is not necessarily easy. Most of the
time it is quite complicated as many issues arise related to data completeness, size
or reliability of the IT infrastructure.
In otherwords, “big data” combines capabilities, users objectives, tools deployed,
methodologies implemented. The field evolves quickly as what is considered big
data one year becomes “business as usual” the next (Walker 2015). Depending on
the organisation, the infrastructure to put in place will not be the same as the needs
are not identical from an entity to another, e.g., parallel computing is not always
necessary. There is no “one-size fits all” infrastructure.
3.1.1 Moments
where X is a random variable, F.X/ its cumulative distribution and E denotes the
expectation.
When
Z 1
E ŒjX n j D jxn j dF.x/ D 1; (3.1.3)
1
then the moment does not exist (we will see example of such problems in Chap. 5
with the Generalised Pareto and the ˛-stable distributions and with the Generalised
1
Moments can be defined in a more general way than only considering real.
3.1 Characterising Numeric Data 29
Extreme Value distribution in Chap. 6). If the nth moment exists so does the .n1/th
moment as well as all lower-order moments.
Note that the zeroth moment of any probability density function is 1, since
Z 1
f .x/dx: D 1 (3.1.4)
1
3.1.2 Quantiles
Quantiles divide a set of observations into groups of equal sizes. There is one
quantile less than the number of groups created, for example, quartiles have only
three points that allow dividing a dataset into four groups of equal size of 25 %. If
there are ten different buckets each of them representing 10 %, we will talk about
decile.
More generally quantiles are values that split a finite set of values into q subsets
of equal sizes. There are q 1 of the q-quantiles, one for each integer k satisfying
0 < k < q. In some cases the value of a quantile may not be uniquely determined,
for example, for the median of a uniform probability distribution on a set of even
size. Quantiles can also be applied to continuous distributions, providing a way to
generalise rank statistics to continuous variables. When the cumulative distribution
function of a random variable is known, the q-quantiles are the application of the
quantile function
n (the inverse
o function of the cumulative distribution function) to
1 2 .q1/
the values q ; q ; : : : ; q .
Understanding the quantiles of a distribution is particularly important as it is
a manner to represent the way the data are positioned. Indeed, the larger the
quantiles at a particular point, the larger the risk. Indeed, quantiles are the theoretical
foundation of the Value-at-Risk and the Expected Shortfall which will be developed
in the next chapter. Quantiles are in fact risk measures, therefore are very useful
for evaluating exposures to a specific risk as soon as we have enough information
to ensure the robustness of these quantiles, i.e., if we have not many data, then the
occurrence of an event will materially impact the quantiles. Note that this situation
might be acceptable for tail events, but this is generally not the case for risks more
representative of the body of the distribution.
3.1.3 Dependencies
The previous paragraphs built the path to introduce data sciences. Most methodolo-
gies presented in the next chapters either rely or are introduced somehow in this
section. Data science is a generic term gathering data mining, machine learning,
artificial intelligence, statistics, etc., under a single banner.
Data mining (Hastie et al., 2009) is a field belonging to computer science. The
purpose of data mining is to extract information from data sets and transform them
into an understandable structure with respect to the ultimate use of these data.
The embedded computational process of discovering patterns in large data sets
combines methods from artificial intelligence (Russell and Norvig, 2009), machine
learning (Mohri et al., 2012), statistics, and database systems and management. The
automatic or semi-automatic analysis of large quantities of data permits to detect
interesting patterns such as clusters (Everitt et al., 2011), anomalies, dependencies
and the outcome of the analysis can then be perceived as the essence or the
quintessence of the original input data, and may be used for further analysis in
machine learning, predictive analytics or more traditional modelling.
Usually, the term data mining refers to the process of analysing raw data and
summarising them into information used for further modelling. In data mining the
data are analysed from many different dimensions. More precisely, data mining
aims at finding correlations or dependence patterns between multiple fields in
large relational databases. The patterns, associations or relationships among all this
data can provide information usable to prepare and support the scenario analysis
program of a financial institution. While the methodologies, the statistics and the
mathematics behind are not new, until very recently and innovations in computer
3.2 Data Sciences 31
processing, disk storage and statistical software data mining were not reaching the
goal set.
Advances in data capture, processing power, data transmission and storage
capabilities are enabling organisations to integrate their various databases into
data warehouses or data lakes. Data warehousing is a process of centralised data
management and retrieval. Data warehousing, like data mining, is a relatively new
term although the concept itself has been around for years. Data warehousing
represents an ideal vision of maintaining a central repository of all organisational
data. Centralisation of data is needed to maximise user access and analysis. Data
lakes in some sense generalise the concept and allow structured and unstructured
data as well as any piece of information (PDF documents, emails, etc.) that are not
necessarily instantly usable for pre-processing.
Until now, data mining was mainly used by companies with a strong consumer
focus, in other words, retail, financial, communication and marketing organisations
(Palace, 1996). These types of companies were using data mining to analyse
relationships between endogenous and exogenous factors such as, price, product
positioning, economic indicators, competition or customer demographics, as well
as their impacts on sales, reputation, corporate profits, etc. Besides, it permitted
summarising the information analysed. It is interesting to note that nowadays retailer
and suppliers have joined forces to analyse even more relationships at a deeper level.
The National Basketball Association developed a data mining application to support
a more efficient coaching. Billy Bean from the Oackland Athletics used data mining
and statistics to select the players forming his team.
Data mining enables analysing relationships and patterns in stored data based on
open-ended user queries. Generally, any of four types of relationships are sought:
• Classes: This is the simplest kind of relationship, as stored data is used to analyse
subgroups.
• Clusters: Data items are gathered according to logical relationships related to
their intrinsic characteristics. More generally, a cluster analysis aims at grouping
a set of similar objects (in some sense) in one particular group (Everitt et al.,
2011).
• Associations: Data can be analysed to identify associations. Association rule
learning is intended to identify strong rules discovered in databases measuring
how interesting they are for our final purpose (Piatetsky-Shapiro, 1991).
• Sequential patterns: Data are analysed to forecast and anticipate behaviours,
trends or schemes, such as the likelihood of a purchase given what someone has
already the product in his Amazon basket.
Data mining consists in several major steps. We would recommend following these
steps to make sure that the data used to support the scenario analysis (if some data
are used) are appropriate and representative of the risk to be assessed.
• Data capture: In a first step data are collected from various sources and gathered
in a data base.
32 3 The Information Set: Feeding the Scenarios
Once these data have been analysed and formatted, these can be further used for
prediction, forecasting and evaluation, in other words, for modelling.
3.2 Data Sciences 33
Machine learning deals with the study of pattern recognition and computa-
tional learning theory in artificial intelligence. Machine learning aims at building
algorithms that can learn from data and make predictions from them, i.e., which
operate dynamically adapting themselves to changes in the data, not only relying on
statistics but also on mathematical optimisation. Automation is the keyword of this
paragraph, the objective is to make machines think by possibly mimicking the way
human brains function (see Chap. 10).
Machine learning tasks are usually classified into four categories (Russell and
Norvig, 2009) depending on the inputs and the objectives:
• In supervised learning (Mohri et al., 2012), the goal is to infer a general rule from
example data mapped to the desired output. The example data are usually called
training data. These consist in couples input and desired output or supervisory
signal. Once the algorithm analysed the training data and inferred the function, it
can be used to map new examples and generalise its use to previously unknown
situations. Optimally, algorithms should perfectly react to new instances in
providing an unbiased and accurate outcome, e.g., a methodology outcomes
which reveal to be accurate once they can be compared with the future real
occurrences.
• The second possibility is unsupervised learning in which no training data are
given to the learning algorithm, consequently it will have to extract patterns from
the input. Unsupervised learning can actually be used to find hidden structures
and patterns embedded within the data. Therefore, unsupervised learning aims
at inferring a function describing hidden patterns from unlabelled data (Hastie
et al., 2009). In the case of unsupervised learning, it is complicated to evaluate
the quality of the solution as initially no benchmark is available.
• When the initial training information (i.e. data and/or targets) is incomplete, a
intermediate strategy called semi-supervised learning main be used.
• In reinforcement learning (Sutton and Barto, 1998), a program interacts and
evolves within a dynamic environment in which it is supposed to achieve a
specific task. However, as for unsupervised learning, there is no training data
and no benchmark. This approach aims at learning what to do, i.e., how to map
situations to actions, so as to optimise a numerical function, i.e., the output. The
algorithm has to discover which actions lead to the best output signal by trying
them. These strategies allow capturing situations in which actions may affect all
subsequent steps with or without any delay, which might be of interest.
Another way of classifying machine learning strategies is by desired output
(Bishop, 2006). Indeed, we will illustrate that classification briefly introducing some
strategies and methodologies used in the next chapters. Our objective is to show how
interconnected all the methodologies are as one may leverage on some of them to
achieve other purposes. Indeed all the methodologies belonging to data sciences can
be used as a base for scenario analysis.
The first methodology (we actually presented in the previous section) is the
classification in which inputs are divided in at least two different classes, and the
learning algorithm has to assign unseen inputs to at least one of these classes. This
34 3 The Information Set: Feeding the Scenarios
is a good example of supervised learning but it could be adapted and fall in the
semi-supervised alternative. The second methodology is the regression which also
belongs to the supervised learning, which focuses on the relationship between a
dependent variable and at least one independent variable (Chap. 11). In clustering,
inputs have to be divided into groups of similar data. Contrary to classification, the
groups are unknown a priori therefore this methodology belongs to the unsupervised
strategies. Density estimation (Chap. 5) provides the distribution of input data and
belongs by essence to the family of unsupervised learnings, though if we use
a methodology such as Bayesian inference it would be more a semi-supervised
strategy.
As mentioned before machine learning is closely related to optimisation. Most
learning problems are formulated as optimising (i.e. minimising or maximising) an
objective function. Objective functions express the difference between the output of
the trained model and the actual values. Contrary to data mining, machine learning
does not only aim at detecting patterns or for a good adjustment of a model to some
data but to a good adjustment of this model to previously unknown situations, which
is a far more complicated task. Machine learning models goal is accurate prediction
generalising patterns originally detected and refined by experience.
Machine learning and data mining often rely on identical methodologies and/or
overlap quite significantly though having different objectives. As mentioned in the
previous paragraphs, machine learning aims at prediction using properties learned
from training data while data mining focuses on the discovery of unknown patterns
embedded in the data. In this section, we briefly introduce methodologies used in
data mining and machine learning as some of them will be implemented in the next
chapters as scenario analysis requires first analysing data to identify the important
patterns embedded and second to make prediction from them. The following list
is far from being exhaustive; however, it provides a good sample of traditional
methodologies:
• Decision tree learning (deVille, 2006) is a predictive model. The purpose is
to predict the values of a target variable based on several inputs, which are
graphically represented by nodes. Each edge of a node leads to children,
respectively, representing each of the possible values the variable can take
given the input provided. A decision tree may be implemented for classification
purposes or for regression purposes, respectively, to identify to which class the
input belongs or to evaluate a real outcome (prices, etc.). Some examples of
decision tree strategies are Bagging decision trees (Breiman, 1996), Random
Forest classifier, Boosted Trees (Hastie et al., 2009) or Rotation forest. In Chap. 7,
a related strategy (a fault tree) has been implemented, though in our case the root
will be reverse engineered.
3.2 Data Sciences 35
References
Aggarwal, C. C., & Yu, P. S. (1998). A new framework for itemset generation. In Symposium on
Principles of Database Systems, PODS 98 (pp. 18–24).
Agrawal, R. & Srikant, R. (1994). Fast algorithms for mining association rules in large databases.
In J.B. Bocca, M. Jarke, & C. Zaniolo (Eds.), Proceedings of the 20th International Conference
on Very Large Data Bases (VLDB) (pp. 487–499).
BCBS. (2013a). Fundamental review of the trading book: A revised market risk framework. Basel:
Basel Committee for Banking Supervision.
References 37
BCBS. (2013b). Principles for effective risk data aggregation and risk reporting. Basel: Basel
Committee for Banking Supervision.
Ben-Hur, A., Horn, D., Siegelmann, H. T., & Vapnik, V. (2001). Support vector clustering. Journal
of Machine Learning Research, 2, 125–137.
Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer.
BoE. (2013). A framework for stress testing the UK banking system. London: Bank of England.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Brin, S., Motwani, R., Ullman, J. D., & Tsur, S. (1997). Dynamic itemset counting and implication
rules for market basket data. In ACM SIGMOD record, June 1997 (Vol. 26, No. 2, pp. 255–264).
New York: ACM.
Chechik, G., Sharma, V., Shalit, U., & Bengio, S. (2010). Large scale online learning of image
similarity through ranking. Journal of Machine Learning Research, 11, 1109–1135.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Deng, L., & Yu, D. (2013). Deep learning methods and applications. Foundations and Trends in
Signal Processing, 7(3–4), 197–387.
deVille, B. (2006). Decision trees for business intelligence and data mining: Using SAS enterprise
miner. Cary: SAS Press.
EBA. (2016). 2016 EU wide stress test - methodological note. London: European Banking
Authority.
Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis (5th ed.). New York:
Wiley.
Fed. (2016). Comprehensive capital analysis and review 2016 summary instructions. Washington,
DC: Federal Reserve Board.
Goldberg, D. (2002). The design of innovation: Lessons from and for competent genetic algorithms.
Norwell: Kluwer Academic Publishers.
Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In ACM
SIGMOD record (Vol. 29, No. 2, pp. 1–12). New York: ACM.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining,
inference, and prediction. New York: Springer.
Holland, J. (1992). Adaptation in natural and artificial systems. Cambridge: MIT.
Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with
categorical values. Data Mining and Knowledge Discovery, 2, 283–304.
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of machine learning. Cam-
bridge: MIT.
Muggleton, S. (1991). Inductive logic programming. New Generation Computing, 8(4), 295–318.
Omiecinski, E. R. (2003). Alternative interest measures for mining associations in databases. IEEE
Transactions on Knowledge and Data Engineering, 15(1), 57–69.
Palace, W. (1996). Data mining: What is data mining? www.anderson.ucla.edu/faculty_pages/
jason.frand.
Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. In G. Piatetsky-
Shapiro & W. Frawley (Eds.), Knowledge discovery in databases (pp. 229–248). Menlo Park:
AAAI.
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the
American Statistical Association, 66(336), 846–850.
Russell, S., & Norvig, P. (2009). Artificial intelligence: A modern approach (3rd ed.). London:
Pearson.
Shapiro, E. Y. (1983). Algorithmic program debugging. Cambridge: MIT.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge:
A Bradford Book/MIT.
Walker, R. (2015). From big data to big profits: Success with data and analytics. Oxford: Oxford
University Press.
Zaki, M. J. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge
and Data Engineering, 12(3), 372–390.
Chapter 4
The Consensus Approach
In this chapter, we will present the so-called consensus approach in which the
scenarios are analysed in a workshop and a decision is made if a consensus is
reached.
Formally, consensus decision-making is a group process in which members
gather, discuss, agree, implement and support afterwards, a decision in the best
interest of the whole, in that case the whole can be an entity, a branch, a group,
etc. A consensus is an acceptable resolution, i.e., a common ground that might
not be optimal for each individual but it is the smallest common denominator.
In other words, it is a general agreement and the term consensus describes both
the decision and the process. Therefore, the consensus decision-making process
involves deliberations, finalisation and the effects of the application of the decision.
For scenario analysis purposes, this is typically the strategy implied when a
workshop is organised and the experts gathered are supposed to evaluate a potential
exposure together. Coming back to the methodology itself, being a decision-making
process, the consensus strategy (Avery, 1981; Hartnett, 2011) aims to be all of the
following:
1. Agreement seeking—The objective is to reach the largest possible number of
endorsements and approvals or at least no dramatic antagonism. The keyword
being “seeking” as it is not given that a unanimous position will be reached.
2. Collaborative—Members of the panels discuss proposals to reach global decision
that at least tackles the largest numbers of participants concerns. Once again, it
is highly unlikely that all the issues will be tackled through this process though
it should be at least attempted to do so.
3. Cooperative—Participants should not be competing for their own benefit, the
objective is to reach the best possible decision for the greater good (up to a certain
extent). In our case, this strategy is particularly appropriate if the global exposure
is lower for all participants when they collaborate than when they do not, in
other words, if the outcome altogether is lower than the sum of all parties. Here,
a game theory aspect is appearing as we can draw a parallel between consensus
agreement seeking and a generalised version of the prisoner’s dilemma (Fehr and
Fischbacher, 2003).
4. Balanced—All members are allowed to express their opinions, present their
views and propose amendments. This process is supposed to be democratic.
This will be discussed in the manager’s section as the democratic character of
a company is still to be demonstrated.
5. Inclusive—As many stakeholders as possible should be involved as soon as they
add-value to the conversation. Their seniority should not be the only reason of
their presence in the panel. It is really important that the stakeholders are open
minded and able to put their seniority aside to listen to other people despite their
potential youth or lack of experience.
6. Participatory—All decision-makers are required to propose ideas. This point is a
corollary of the previous one. No one should be sitting in the conference room
for the sake of being there. Besides, ideas proposed should be constructive, i.e.,
they should be solution seeking and not destruction oriented.
Now that the necessary principles to reach a consensus have been presented, we can
focus on the process to be implemented.
As mentioned previously, the objective of the process is to generate widespread
levels of participation and agreement. There are variations regarding the degree
of agreement necessary to finalise a group decision, i.e., to determine if it is
representative of the group decision. However, the deliberation process demands
including any individual proposal. Concerns and alternatives raised or proposed by
any group member should be discussed as this will usually lead to the amendment
of the proposal. Indeed, each individual’s preferences should be voiced so that
the group can incorporate all concerns into an emerging proposal. Individual
preferences should not obstruct the progress of the group. A consensus process
makes a concerted attempt to reach full agreement.
There are multiple stepwise models supporting the consensus decision-making
process. They merely vary in what these steps require as long as on how decisions
are finalised. The basic model involves collaboratively generating a proposal,
identifying unsatisfied concerns and then modifying the proposal to generate as
much agreement as possible. The process described in this paragraph and the
previous can be summarised in the following six step process which can either circle
or exit with a solution:
1. A discussion is always the initial step. A moderator and a coordinator are usually
required to ensure that the discussions are going in the right direction and are not
diverging.
2. A proposal should result from the discussion, i.e., an initial optimal position
(likely to be sub-optimal in a first stage though).
4.1 The Process 41
3. All the concerns should be raised, considered and addressed the best way
possible. If these are show-stoppers it is required to circle back to the first step
before moving to the next step.
4. Then the initial proposal should be revised. (It might be necessary to go through
the second or the third point again, as new issues might arise and these should be
dealt with).
5. Then the level of support is assessed. If the criterion selected is not satisfied, then
it is necessary to circle back at least to point 3 and 4.
6. Outcomes and key decisions: This level represents the agreement. It is really
important to bear in mind that we cannot circle back and forth indefinitely, as a
decision is ultimately required. It is necessary that after some time (or a number
of iterations) the proposal is submitted to an arbitral committee to rule.
Depending on the company culture, the sensitivity of the scenarios or the temper
of the participants, the agreement level required to consider that we successfully
reached a consensus may differ. Various possibilities are generally accepted to
assess if the general consensus has been reached, these are enumerated in what
follows:
• The ultimate goal is the unanimous agreement (Welch Cline, 1990), however,
reaching this one is highly unlikely especially if the number of participants is
large as the number of concerns raised is at least proportional to the number of
participants. But if the number of participant is limited, it is probably the strategy
which should be selected.
• Another possibility is to obtain unanimity minus a certain number of disagree-
ments. This may overcome some issues, though it is necessary to make sure that
issues overruled are not show-stoppers.
• Another possibility is the use of majority thresholds (qualified, simple, etc.). This
alternative strategy is very close to what you would expect from a poll requiring
a vote (Andersen and Jaeger, 1999). It is important (and that point is valid for all
the strategies presented in this book) that the consensus only ensure the quality
of the decision made to a certain extent.
• The last possibility is a decision made by the executive committee or an
accountable person. This option should only be considered in last resort as in our
experience this may antagonise participants and jeopardise the implementation
of the decision.
Each of the previous possibilities has pros and cons, for instance, trying to reach
unanimous decisions allows participants the option of blocking the process, but, on
the other hand, if the consensus is reached the likelihood of this one leading to a
good decision is higher. Indeed, unless someone steps back for irrational reason, the
micro economist would say that they all maximised their utility.
The rules of engagement for such a solution have to be properly stated prior the
workshop otherwise, we may end up with a solution in which the participants are left
in a closed environment forbidden to leave the room until they found an agreement.
In principle, with this strategy, the group is placed over and above the individual,
42 4 The Consensus Approach
and it is in the interest of each individual to compromise for the greater good, and
both dissenters and aligned participants are mechanically encourage to collaborate.
No one has a veto right in the panel. Common “blocking rules” are as follows:
• Limiting the option to block consensus to issues that are fundamental to
the group’s mission or potentially disastrous to the group, though it is often
complicated to draw the line.
• Providing an option for those who do not support a proposal to “stand aside”
rather than block.
• Requiring two or more people to block for a proposal to be put aside.
• Requiring the blocking party to supply an alternative proposal or at least an
outlined solution.
• Limiting each person’s option to block consensus to a handful of times in a given
session.
Unanimity is achieved when the full group consents to a decision. Giving consent
does not necessarily mean the proposal being considered is one’s first choice.
Group members can vote their consent to a proposal because they choose to
cooperate with the direction of the group, rather than insist on their personal
preference. This relaxed threshold for a yes vote can help make unanimity easier
to achieve. Alternatively, a group member can choose to stand aside. Standing aside
communicates that while a participant does not necessarily support a group decision,
he does not wish to block it.
Note that critics of consensus blocking have a tendency to object to giving the
possibility to individuals to block proposals widely accepted by the group. They
believe that this can result in a group experience of widespread disagreement, the
opposite of a consensus process’s primary goal. Further, they believe group decision
making may stagnate by the high threshold of unanimity. Important decisions may
take too long to make, or the status quo may become virtually impossible to change.
The resulting tension may undermine group functionality and harm relationships
between group members as well as the future execution of the decision (Heitzig and
Simmons, 2012).
Defenders of consensus blocking believe that decision rules short of unanimity
do not ensure a rigorous search for full agreement before finalising decisions. They
value the commitment to reach unanimity and the full collaborative effort this
goal requires. They believe that under the right conditions unanimous consent is
achievable and the process of getting there strengthens group relationships. In our
opinion, these arguments are only justifiable if we do not have any time constraint,
which realistically almost never happens.
The goals of requiring unanimity are only fully realised when a group is
successful in reaching it. Thus, it is important to consider what conditions make full
agreement more likely. Here are some of the most important factors that improve
the chances of successfully reaching unanimity:
• Small group size: The smaller the size of the group, the easier the consensus will
be reached, however, the universality of the decision might become questionable,
4.2 In Practice 43
as one may wonder if this small group is representative of the entire entity to
which will be applied the decision.
• Clear common purpose: The objective should be clearly stated to avoid diverging
discussions.
• High levels of trust: This is a prerequisite. If people do not trust each other or the
methodology owner, they will question the proposals and the decisions made. Or
worse, they will undermine the process.
• Participants well trained in consensus processes: training is key in the sense
that we should explain people what is expected from them before the workshop.
The lack of training inevitably results in participants not handling properly the
concepts, misunderstanding the process, scenario not properly analysed and as a
result a waist of participants’ time.
• Participants willing to put the best interest of the group before their own,
therefore, it may take time to reach a consensus. Patience is a virtue. . .
• Participants willing to spend sufficient time in meetings.
• Appropriate facilitation and preparation: particularly in the long term. If the
workshops are led by unskilled people, the seriousness and the professionalism
of the process will be questioned. Note that the time of preparation of a workshop
should not be underestimated. The general rule is the more thorough the ground
work, the smoother the workshops.
• Multiplying decisions rules to avoid blockages might also be a good idea,
particularly when the scenario to be analyse is complex.
Most institutions implementing a consensus decision-making process consider
non-unanimous decision rules. The consensus process can help prevent problems
associated with Robert’s Rules of Order or top-down decision making (Robert,
2011). This allows hierarchical organisations to benefit from the collaborative
efforts of the whole group and the resulting joint ownership of final proposals.
A small business owner may convene a consensus decision-making discussion
among her staff to generate proposals of changes to the business. However, after
the proposal is given the business owner may retain the authority to accept or reject
it, obviously up to a certain extent. Note that if a person accountable rejects a
decision representative of the group, he might put himself in a difficult position
as his authority would be questioned.
The benefits of consensus decision making are lost if the final decision is made
without regard to the efforts of the whole group. When group leaders or majority
factions reject proposals that have been developed with widespread agreement of a
group, the goals of consensus decision making will not be realised.
4.2 In Practice
risk subject to analysis. A story line representing each horizon ought to be selected
consistently with entity risk profile discussed in the previous chapter, and will be
presented to the businesses for evaluation. The business stakeholders will be chosen
with regard to the business area in which the selected risk may materialise, on the
one hand, and the business area supposed to control that risk (these two areas might
be identical), on the other hand.
4.2.1 Pre-workshop
The type of scenario analysis discussed in that chapter requires multiple steps.
The first one being the identification of the scenarios to be analysed. In this first
stage the previous chapter dealing with data analysis might be useful as it should
provide stakeholders with benchmarks and key metrics to support their selection.
The department responsible for the scenario analysis program in any given entity—
it might be the risk department or more specifically the operational risk department,
or a strategic department—is also usually in charge of the ground work, the material
for the workshops and the workshops facilitation themselves.
These departments are supposed to define the question to be answered and to
formulate the issue to be analysed (see Chap. 1 - Scenario Planning). They suggest
the story lines, but they do not own them, ownership lies with the stakeholders
or more specifically with the risk owners (Fig. 4.1). Owners are fully entitled to
amend, modify or change the scenarios to be analyse if they believe that they are
not representative of the target issue to be analysed.
Before scheduling the workshop, a set of scenarios should be written and poten-
tially pre-submitted depending on the maturity of the business experts regarding that
process. These scenarios should consider both technical and organisational aspect
during the analysis.
Remark 4.2.1 It is really important to understand that scenario analysis is necessary
to find a solution to a problem, raising the issues is just a step towards solving it.
To organise the workshops, the presence of three kinds of people is necessary, a
planning manager, a facilitator and the experts. The facilitator guarantees that the
workshops are held in a proper fashion, for example, ensuring that all participants
have the same time allowed for expressing their views, or that the discussion is not
diverging. Experts should be knowledgeable, open minded and good communicators
with an overview of their field. The person responsible for planning has the overall
responsibility of making sure that the process is transparent and has been clearly
communicated to the experts before the workshop. As mentioned before, one of the
key success factor is that the process is properly documented and communicated to
the stakeholders, in particular what is expected from them.
4.2 In Practice 45
4.3.1 Sponsorship
In this section, we discuss the question of sponsorship of the scenario program. The
most important tasks an executive sponsor has to achieve are the following (Fed,
2016; Prosci, 2009):
• Take the lead in establishing a budget and assigning the right resources for the
project including, (1) set priorities and balance between project work and day-
to-day work, (2) ensure that the appropriate budget is allocated, (3) appoint an
experienced change manager to support the process.
• Be active with the project team throughout the project: (1) support the definition
of the program and the scope, (2) attend key meetings, (3) set deadlines and
expectations, (4) control deliverables, (5) make himself available to the team
members and (6) set expectations and hold the team accountable, (7) transform a
vision into objectives.
• Engage and create support with other senior managers: (1) represent the project
in front of its peers, (2) enure that key stakeholders are properly trained, (3) sell
the process to other business leaders and ensure good communication, (4) hold
mid-level managers accountable, (5) form, lead and drive a steering committee
of key stakeholders and (6) ensure that resistance from other senior managers is
dealt with prior the initialisation of the process.
• Be an active and visible spokesperson for the change: (1) help the team under-
stand the political landscape and hot spots, (2) use authority when necessary.
Participants cited the following areas as the most common mistakes made by
executive sponsors that they would advise other senior managers to avoid. Note
that each one of them may lead to a failing scenario analysis
• Not visibly supporting the change throughout the entire process. The sponsor
should ensure that he does not become disconnected from the project.
• Abdicating responsibility or delegating too much.
• Not communicating properly to explain why the task undertaken is necessary.
• Failing to build a coalition of business leaders and stakeholders to support the
project.
• Moving on to the next change before the current change is in place or changing
priorities too soon after the project has started.
• Underestimating resistance of managers and not addressing this one properly.
• Failing to set expectations with mid-level managers and front-line supervisors
related to the change and change process.
• Spending too little time on the project to keep it on track and with the project
team to help them overcome obstacles.
48 4 The Consensus Approach
4.3.2 Buy-In
Employees buy-in is when employees are committed to the mission and/or goals set
by their company, and/or also find the day-to-day work personally meaningful. Buy-
in promotes engagement and a willingness to go the extra mile on the job (Davis,
2016).
Most of the time, when a request is made from a perfect stranger, even those
who comply will give the person asking a really odd look. The main reason why
so few comply, and those who do still show reluctance, is that no one knows why
they are supposed to do something on demand, especially if doing so seems rather
pointless. They are not committed to following the instruction, and have thus not
“bought into” the goal of the request. Now, if you were asked to do something that
you know is important, or that you feel committed to doing, you would very likely
comply because you buy into the aims and goals underlying the request. In fact,
you would comply willingly, and perhaps even eagerly, because of how much the
request echoes with you.
Obtaining stakeholders buy-in provide more assurance that the process will lead
to decision of better quality as they would be committed to the success of the
process.
4.3.3 Validation
The validation aspect is also very important as the idea is to tackle the issues
mentioned earlier such as the fact that potentially the consensus would lead to
sub-optimal outcomes and therefore would have a limited reliability. Indeed, this
would jeopardise future use of scenarios but even more dramatically may limit the
applicability or the usefulness of the process in terms of risk management.
One way to validate would be to use a challenger-champion approach (Hassani,
2015; BoE, 2013) and therefore to implement, for example, one or more strategies
suggested in the next chapters. The second is to use internal and external data
available as benchmarks.
4.3.4 Sign-Offs
All projects need at some stage or other a formal sign-off. This step of the process
is the final stamp given by people ultimately accountable. This is the guarantee that
the consensus is now accepted by top executives (Rosenhead, 2012).
It is rather important to note that following the workshops and therefore the selec-
tion of the rules, a pre sign-off should be provided, i.e., mid to top managers in the
scale of accountability should sign-off the results as they are, before any challenge
4.4 Alternatives and Comparison 49
process or any piece of validation as this would demonstrate the ownership of the
scenarios. This demonstrates that the accountability of the materialisation of these
scenarios lies with them. Furthermore, speaking from experience and from a more
pragmatic point of view, if someone does not pre sign-off the initial outcome and
these are challenged following the validation process, for instance this one require
that the scenario has to be reviewed, the managers will be reluctant to sign them off
afterwards, and the entire process will be jeopardised.
Consensus decision making addresses the problems observed in the previous two
alternatives. To summarise, the consensus approach should lead to better decisions
as the inputs of various stakeholders are considered, consequently issued proposals
are more likely to tackle most concerns and issues raised during the workshops and
therefore be more reliable for the group. In this collaborative process the wider the
agreement, the better the implementation of the resulting decision. As corollary the
relationships quality, the cohesion and the collaboration among or between factions,
groups of people or departments would largely be enhanced.
To conclude this chapter, more elaborate models of consensus decision making
exist as this field is in perpetual evolution such as consensus-oriented decision-
making model (Hartnett, 2011), however, as they are not the focal point of this
book, we refer the reader to the appropriate bibliography.
References
Andersen, I.-E., & Jaeger, B. (1999). Scenario workshops and consensus conferences: Towards
more democratic decision-making. Science and Public Policy, 26(5), 331–340.
Avery, M. (1981). Building united judgment: A handbook for consensus decision making. North
Charleston: CreateSpace Independent Publishing Platform.
BoE. (2013). A framework for stress testing the UK banking system. London: Bank of England.
Davis, O. (2016). Employee buy-in: Definition & explanation. study.com/academy.
Fed. (2016). Comprehensive capital analysis and review 2016 summary instructions. Washington,
DC: Federal Reserve Board.
Fehr, E., & Fischbacher, U. (2003). The nature of human altruism. Nature, 425(6960), 785–791.
Hartnett, T. (2011). Consensus oriented decision-making. Gabriola Island: New Society Publishers.
Hassani, B. (2015). Model risk - From epistemology to management. Working paper, Université
Paris 1.
Heitzig, J., & Simmons, F. W. (2012). Some chance for consensus: Voting methods for which
consensus is an equilibrium. Social Choice and Welfare, 38(1), 43–57.
Postma, T., & Liebl, F. (2005). How to improve scenario analysis as a strategic management tool?
Technological Forecasting and Social Change, 72, 161–173.
Prosci (2009). Welcome to the change management tutorial series.
www.change-management.com/tutorial-change-sponsorship.htm.
Robert, H. M. (2011). Robert’s rules of order newly revised (11th ed.). Philadelphia: Da Capo
Press.
Rosenhead, R. (2012). Project sign off - do people really know what this means? www.
ronrosenhead.co.uk.
Wang, H., & Suter, D. (2007). A consensus-based method for tracking: Modelling background
scenario and foreground appearance. Pattern Recognition, 40(3), 1091–1105.
Welch Cline, R. J. (1990). Detecting groupthink: Methods for observing the illusion of unanimity.
Communication Quarterly, 38(2), 112–126.
Chapter 5
Tilting Strategy: Using Probability Distribution
Properties
In this section we introduce the concepts required to implement a tilting strategy, for
instance, the distribution and the risk measures as long as the estimation approaches
required to parametrise these distributions.
5.1.1 Distributions
This section proposes several alternatives for the fitting of a proper distribution
to the information set related to a risk (losses, incidents, etc.). Understanding the
distributions characterising each risk is necessary to understand the associated
measures. The elliptical domain (Gaussian or Student distribution) should not be
left aside, but has its properties are well known, we will focus on distributions which
are asymmetric and leptokurtic such as the generalised hyperbolic distributions
(GHD), the generalised Pareto distributions or the extreme value distributions
among others.1 But before discussing parametric distributions, we will introduce
non-parametric approaches as these allow representing the data as they are and may
support the selection of a parametric distribution if necessary.
Non-parametric statistics are a very useful and practical alternative to represent
the data (Müller et al., 2004), either using a histogram or a kernel density. A his-
togram (Silverman, 1986) gives a good representation of the empirical distribution,
but the kernel density has the major advantage of enabling the transformation of a
discrete empirical distribution into a continuous one (Wand and Jones, 1995).
To introduce this method, we give the density estimator formula. Let X1 ; : : : ; Xn
be an empirical distribution. Its unknown density function is denoted f , and we
assume that f has continuous derivatives of all order required, denoted f 0 ; f 00 ; : : :.
Then the estimated density of f is
1 X
n
x Xi
fO.xI h/ D K ; (5.1.1)
nh iD1 h
R C1 R C1
where K is the kernel function satisfying 1 K.t/dt D 1, 1 tK.t/dt D 0
R C1
and 1 t2 K.t/dt D k2 ¤ 0, k2 is a constant denoting the variance of the kernel
distribution and h is the bandwidth.
The choice of the kernel nature has no particular importance; however, the
resulting density is very sensitive to the bandwidth selection. The global error of
the density estimator fO.xI h/ may be measured by the mean square error (MSE):
1
Note that the elliptic domain is part of the GH family.
5.1 Theoretical Basis 53
MSE. fO.xI h// D Var. fO.xI h// .EŒ fO.xI h/ f .x//2 ; (5.1.3)
where,
Z C1
1 1
VarfO.xI h/ D f .x/ K.t/2 dt C O. / (5.1.8)
nh 1 n
Z C1
1
f .x/ K.t/2 dt: (5.1.9)
nh 1
Indeed, estimating the bandwidth, we face a trade-off between the bias and the
variance, but this decomposition allows easier analysis and interpretation of the
performance of the kernel density estimator.
The most widely used way of placing a measure on the global accuracy of fO.xI h/
is the mean integrated squared error (MISE):
Z C1
MISE. fO.xI h// D EŒ fO.xI h/ f .x/2 dx (5.1.11)
1
Z C1
D MSE. fO.xI h//dx (5.1.12)
1
Z C1 Z C1
D 2
biash .x/ dx C VarfO.xI h/dx: (5.1.13)
1 1
R C1 R C1
Let #.K.t// D 1 t2 K.t/dt and . f .x// D 1 f .x/2 dx, for any square
integrable function f , then the relation (5.1.14) becomes
1 1
AMISE. fO.xI h// D .K.t// C h4 k22 . f 00 .x//: (5.1.15)
nh 4
The minimisation of the AMISE with respect to the parameter h permits the
selection of the appropriate bandwidth. As the optimal bandwidth selection is not in
the core of this book, we will only refer the reader to the bibliography included
in this section. Now that the non-parametric distributions have been properly
introduced, we can present other families of distributions that will be of interest
for the methodology presented in this chapter.
The GHD is a continuous probability distribution defined as a mixture of
an inverse Gaussian distribution and a normal distribution. The density function
associated with the GHD is
p
.=ı/ 2
ˇ.x/ K 1=2 .˛ ı C .x / /
2
f .x; / D p e p ; (5.1.16)
2
K .ı / . ı 2 C .x 2 /=˛/1=2
with 0 jˇj < ˛. This class of distributions is very interesting as it relies on five
parameters. If the shape parameter is fixed then several well-known distributions
can be distinguished:
1. D 1: Hyperbolic distribution
2. D 1=2: NIG distribution
3. D 1 and ! 0: Normal distribution
4. D 1 and ! 1: Symmetric and asymmetric Laplace distribution
5. D 1 and ! ˙: Inverse Gaussian distribution
6. D 1 and jj ! 1: Exponential distribution
7. 1 < < 2: Asymmetric Student
8. 1 < < 2 and ˇ D 0: Symmetric Student
9. D 0 and 0 < < 1: Asymmetric Normal Gamma distribution
The four other parameters can then be associated with the first four moments
permitting a very good fit of the distributions to the corresponding losses as it
captures all intrinsic features of these ones.
The next interesting class of distribution permits to model extremes relying on a
data set defined above a particular threshold. Let X a r.v. with distribution function
F and right end point xF and a fixed u < xF . Then,
is the excess distribution function of the r.v. X (with the df F) over the threshold u,
and the function
is called the mean excess function of X which can play a fundamental role in risk
management. The limit of the excess distribution has the distribution G defined by:
( 1
1 .1 C x/ ¤ 0;
G .x/ D
1 ex D 0; :
where,
x0 0;
0 x 1 < 0; :
The function G .x/ is the standard generalised Pareto distribution (Pickands, 1975;
Danielsson et al., 2001; Luceno, 2007). One can introduce the related location-scale
family G;
;ˇ .x/ by replacing the argument x by .x
/=ˇ for
2 R, ˇ > 0. The
support has to be adjusted accordingly. We refer to G;
;ˇ .x/ as GPD.
The next class of distributions is the class of ˛-stable distributions (McCulloch,
1996) defined through their characteristic function also relying on several param-
eters. For 0 < ˛ 2, > 0, ˇ 2 Œ1; 1 and 2 RC , S˛ .; ˇ; / denotes
the stable distribution with the characteristic exponent (index of stability) ˛, the
scale parameter , the symmetric index (skewness parameter) ˇ and the location
parameter . S˛ .; ˇ; / is the distribution of a r.v. X with characteristic function,
exp.ix ˛ jxj˛ .1 iˇsign.x/ tan.
˛=2/// ˛ ¤ 1;
EŒeixX D (5.1.17)
exp.ix jxj.1 C .2=
/iˇsign.x/ ln jxj// ˛ D 1 ;
Thus
This transformation allows for asymmetry and heavy tails. The parameter g
determines the direction and the amount of asymmetry. A positive value of g
corresponds to a positive skewness. The special symmetric case which is obtained
for g D 0 is known as h distribution. For h > 0 the distribution is leptokurtic with
the mass in the tails increasing with h.
Now with respect to the risks we need to assess if the estimates and the fitting of
the univariate distributions is adapted to the data sets. The models will be different
depending on the kind of risks we would like to investigate.
It is important to bear in mind that the distributions presented in this chapter
are non- exhaustive, and other kind of distributions might be more appropriate in
specific situations. We focused on these distributions as their characteristics make
them appropriate to capture risk data properties, in particular the asymmetry and
the thickness of the tails. Besides, in the next chapter, we present another scenario
strategy relying on generalised extreme value distributions.
Scenario analysis for risk management cannot be departed from the concept of risk
measure, as there is no risk management without measurement, in other words, to
evaluate the quality of the risk management, this one needs to be benchmarked.
Initially risks in financial institutions were evaluated using the standard deviation.
Nowadays, the industry moved towards quantile-based downside risk measures
including the Value-at-Risk (VaR˛ for confidence level ˛) or Expected Shortfall.
The VaR˛ measures the losses that may be expected for a given probability, and
corresponds to the quantile of the distribution which characterises the asset or the
type of events for which the risk has to be measured, while the ES represents the
average loss above the VaR. Consequently, the fit of an adequate distribution to the
risk factor is definitively an important task to obtain a reliable risk measure.
The definitions of these two risks measures are recalled below:
Definition 5.1.1 Given a confidence level ˛ 2 .0; 1/, the VaR˛ is the relevant
quantile2 of the loss distribution, VaR˛ .X/ D inffx j PŒX > x 6 1 ˛g D
inffx j FX .x/ > ˛g where X is a risk factor admitting a loss distribution FX .
2
VaR˛ .X/ D q1˛ D FX1 .˛/.
5.1 Theoretical Basis 57
Definition 5.1.2 The Expected Shortfall (ES˛ ) is defined as the average of all losses
which are equal or greater than VaR˛ :
Z 1
1
ES˛ .X/ D VaR˛ dp
1˛ ˛
The Value-at-Risk initially used to measure financial institutions market risk was
popularised by Morgan (1996). This measure indicates the maximum probable loss
given a confidence level and a time horizon.3 The expected shortfall has a number of
advantages over the VaR˛ because it takes into account the tail risk and fulfills the
sub-additive property. It has been widely dealt with in the literature, for instance, in
Artzner et al. (1999), Rockafellar and Uryasev (2000, 2002) and Delbaen (2000).
Nevertheless even if regulators require banks to use the VaR˛ and recently the
ES˛ to measure their risks and ultimately provide the capital requirements to avoid
bankruptcy these risk measures are not entirely satisfactory:
• They provide a risk measure for an ˛ which is too restrictive considering the risk
associated with the various financial products.
• The fit of the distribution functions can be complex or inadequate in particular
for the practitioners who want to follow regulatory guidelines (Basel II/III
guidelines). Indeed, in the operational risk case, the suggestions is to fit a GPD
which does not correspond very often to a good fit and its implementation turns
out to be difficult.
• It may be quite challenging to capture extreme events, when taking into account
these events in modelling the tails of the distributions is determinant.
• Finally all the risks are computed considering unimodal distributions which may
be unrealistic in practice.
Recently several extensions have been analysed to overcome these limitations
and to propose new routes for the risk measures. These new techniques are briefly
recalled and we refer to Guégan and Hassani (2015) for more details, developments
and applications:
• Following our proposal we suggest the practitioners to use several ˛ to obtain
a spectrum of their expected shortfall and to visualise the evolution of the ES
with respect to these different values. Then, a unique measure can be provided
making a convex combination of these different ES with appropriate weights.
This measure is called spectral measure (Acerbi and Tasche, 2002).
• In the univariate approach if we want to take into account information contained
in the tails we cannot restrict to the GPD as suggested in the guidelines provided
by the regulators. As mentioned before, there exist other classes of distributions
3
The VaR˛ is sometimes referred to as the “unexpected” loss.
58 5 Tilting Strategy: Using Probability Distribution Properties
which are very interesting, for instance, the generalised hyperbolic distribu-
tion (Barndorff-Nielsen and Halgreen, 1977), the extreme value distributions
including the Gumbel, the Frechet and the Weibull distributions (Leadbetter,
1983), the ˛-stable distributions (Taqqu and Samorodnisky, 1994) or the g-and-h
distributions (Huggenberger and Klett, 2009) among others.
• Nevertheless the previous distributions are not always sufficient to properly fit the
information in the tails and another approach could be to build new distributions
shifting the original distribution on the right or left parts in order to take a
different information in the tails. Wang (2000) proposes such a transformation
of the initial distribution which provides a new symmetrical distribution. Sereda
et al. (2010) extend this approach to distinguish the right and left part of the
distribution taking into account more extreme events. The function applied to
the initial distribution for shifting is called a distortion function. This idea is
interesting as the information in the tails is captured in a different way using the
previous classes of distributions.
• Nevertheless when the distribution is shifted with a function close to the Gaussian
one as in Wang (2000) and Sereda et al. (2010) the shifted distribution remains
unimodal. Thus we propose to distort the initial distribution with polynomials of
odd degree in order to create several humps in the distributions. This permits to
catch all the information in the extremes of the distributions, and to introduce a
new coherent risk measure .X/ computed under the g ı f .x/ distribution where
g is the distortion operator and f .x/ the initial distribution (FX represent the
cumulative distribution function), thus we get
All these previous risk measures can be included within a scenario analysis
process or a stress-testing strategy.
5.1.3 Fitting
In order to use the distributions presented above and the associated risk measures
discussed in the previous section, their parameters have to be estimated, i.e.,
the parameters allowing an appropriate representation of the phenomenon to be
modelled. In the next paragraphs, several methodologies which could be imple-
mented, depending on the situation (i.e. the data, the properties of the distributions,
etc.), to estimate the parameters of the distributions selected, are presented. The
first methodology to be presented is the maximum likelihood estimation (MLE)
(Aldrich, 1997). This one can be formalised as follows:
Let x1 ; x2 ; : : : ; xn be n independent and identically distributed (i.i.d.) observa-
tions, of probability density function f .:j/, where is a vector of parameters. In
order to use the maximum likelihood approach, the joint density function for all
5.1 Theoretical Basis 59
Y
n
L. I x1 ; : : : ; xn / D f .x1 ; x2 ; : : : ; xn j / D f .xi j /: (5.1.21)
iD1
X
n
ln L. I x1 ; : : : ; xn / D ln f .xi j /; (5.1.22)
iD1
1
`O D ln L: (5.1.23)
n
For some distributions the maximum likelihood estimator can be written as a closed
form formula, while for some others a numerical method has to be implemented.
Bayesian estimation may be used to fit the distribution, though this one will only
be briefly introduced here as the maximum likelihood estimator coincides with the
most probable Bayesian estimator (Berger, 1985) given a uniform prior distribution
on the parameters. Note that Bayesian philosophy differs from the more traditional
frequentist approach. Indeed, the maximum a posteriori estimate of is obtained
maximising the probability of given the data:
f .x1 ; x2 ; : : : ; xn j /P./
P. j x1 ; x2 ; : : : ; xn / D (5.1.25)
P.x1 ; x2 ; : : : ; xn /
where P./ is the prior distribution of the parameter and where P.x1 ; x2 ; : : : ; xn /
is the probability of the data averaged over all parameters. Since the denominator is
independent of , the Bayesian estimator is obtained maximising f .x1 ; x2 ; : : : ; xn j
/P./ with respect to . If the prior P./ is a uniform distribution, the Bayesian
estimator is obtained maximising the likelihood function f .x1 ; x2 ; : : : ; xn j /
as presented above. We only wanted to introduce that aspect of the maximum
60 5 Tilting Strategy: Using Probability Distribution Properties
m.0 /
EŒg.zt ; 0 / D 0; (5.1.26)
where E denotes expectation. Moreover, the function m./ must differ from zero for
¤ 0 . The basic idea behind the GMM is to replace the theoretical expected value
EŒ: with its empirical sample average:
1X
T
O
m./
g.zt ; / (5.1.27)
T tD1
and then to minimise the norm of this expression with respect to . The value (or
set of values) minimising the norm of the expression above is our estimate for 0 .
By the law of large numbers, m. O / EŒg.z; /Dm. / for large data sample, and thus
O 0 / m.0 /D . The GMM looks for a number O which would make
we expect that m.
O O / as close to zero as possible.4 The properties of the resulting estimator will
m.
depend on the particular choice of the norm function, and therefore the theory of
GMM considers an entire family of norms, defined as
2
km./k
O O 0 W m./;
W D m./ O (5.1.28)
4
The norm of m, denoted as jjmjj, measures the distance between m and zero.
5.1 Theoretical Basis 61
1X
n
Fn .x/ D IŒ1;x .Xi / (5.1.30)
n iD1
where IŒ1;x .Xi / is the indicator function, equal to 1 if Xi x and 0 otherwise. The
statistic for a given cumulative distribution function F.x/ is
different distance
Z 1
.Fn .x/ F.x//2
ADn dF.x/; (5.1.33)
1 F.x/ .1 F.x//
X
n X
n 2
2 .Oi Ei /2 Oi =N pi
D DN pi : (5.1.34)
iD1
Ei iD1
pi
5.2 Application
Fig. 5.1 This figure represents three types of data, as illustrated, these data sets combined (as
discussed in the first section) may lead to multimodal distribution
Combination of Data
10,000 20,000 30,000 40,000 50,000 60,000
Frequency
0
Fig. 5.2 This figure is represent the same data as the previous one, though, here the data are not
juxtaposed but combined
0e+00 1e−05 2e−05 3e−05 4e−05 5e−05 Kernel Density Adjusted on the Data
Probabilities
Fig. 5.3 This figure represents how the empirical distributions should have been modelled if the
data were not combined
Losses
Fig. 5.4 This figure illustrates a kernel density estimation on the combined data set
Once these have been represented, the first strategy to be implemented to fit the
data is a kernel density estimation. In that case, assuming an Epanechnikov kernel,
it is possible to see that the shape of the densities adjusted on each individual
distribution (Fig. 5.3), as long as the one adjusted on the combined data sets
(Fig. 5.4), is similar to the histogram represented in Fig. 5.2. Therefore these could
be adequate solutions to characterise the initial distribution. However, as these
methodologies are non-parametric, it is not possible to shock the parameters, but
the shape of the represented distribution may help selecting the right family as
introduced earlier in this chapter.
Therefore, once the right distribution has been selected, such as a lognormal, an
˛-stable or any other suitable distribution, we can compare the fittings. Figure 5.5
shows on a single plot different adjustments. As depicted depending on the
5.3 For the Manager: Pros and Cons 65
0.20 X3
X1
0.15
Probabilities
X2
0.10
0.05
X4
0.00
Losses
Fig. 5.5 In this figure four distributions are represented illustrating how data would be fitted
and represented by these distributions. This figure illustrates how by tilting the data, we could
move from an initial thin tailed distribution (X1) to a fat-tailed distribution (X4). The fat-tail
representation will lead by construction to higher risk measures
5.3.1 Implementation
In this section, we discuss the pros and cons of the methodology from a manager
point of view, and in particular the added value of the methodology. Indeed, this
methodology is very useful in some cases but it is not appropriate in others. The
right question once again is what are the objectives? For example, for some stress-
testing purposes, this is quite powerful as some of the distributions have properties
that can capture asymmetric shocks, extremes values, etc.
66 5 Tilting Strategy: Using Probability Distribution Properties
As the name suggests, the generalised hyperbolic family has a very general
form combining various distributions, for instance, the Student’s t-distribution,
the Laplace distribution, the hyperbolic distribution, the normal-inverse Gaussian
distribution, the variance-gamma distribution, among others. It is mainly applied to
areas requiring the capture of larger probabilities in the tails, property the normal
distribution does not possess. However, the five parameters required may make this
distribution complicated to fit.
To apply the second distribution presented above, the GPD, the choice of the
threshold might be extremely complicated (Guégan et al., 2011). Besides, the
estimation of the shape parameter may lead to infinite mean models (shape superior
to 1) which might be complicated to use in practice.
Finally, stable distributions generalise the central limit theorem to random vari-
ables without second moments. Once again, we might experience some problems
as if ˛ 1, the first moment does not exist, and therefore the distribution might be
inappropriate in practice.
VaR has been controversial since 1994, date of its creation by Morgan (1996).
Indeed, the main issue is that VaR is not sub-additive (Artzner et al., 1999). In other
words, the VaR of a combined portfolio can be larger than the sum of the VaRs of
its components.
References 67
References
Acerbi, C., & Tasche, D. (2002). On the coherence of expected shortfall. Journal of Banking and
Finance, 26(7), 1487–1503.
Aldrich, J. (1997). R. A. fisher and the making of maximum likelihood 1912–1922. Statistical
Science, 12(3), 162–176.
Anderson, J. A., & Blair, V. (1982). Penalized maximum likelihood estimation in logistic
regression and discrimination. Biometrika, 69(1), 123–136.
Anderson, T. W., & Darling, D. A. (1952). Asymptotic theory of certain “goodness-of-fit” criteria
based on stochastic processes. Annals of Mathematical Statistics, 23(2), 193–212.
Artzner, P., Delbaen, F., Eber, J. M., & Heath, D. (1999). Coherent measures of risk. Mathematical
Finance 9(3), 203–228.
Barndorff-Nielsen, O., & Halgreen, C. (1977). Infinite divisibility of the hyperbolic and general-
ized inverse Gaussian distributions. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte
Gebiete, 38(4), 309–311.
Berger, J. O., (1985). Statistical decision theory and Bayesian analysis. New York: Springer.
Cramér, H. (1928). On the composition of elementary errors. Scandinavian Actuarial Journal,
1928(1), 13–74.
Danielsson, J., et al. (2001). Using a bootstrap method to choose the sample fraction in tail index
estimation. Journal of Multivariate Analysis, 76, 226–248.
Delbaen, F. (2000). Coherent risk measures. Blätter der DGVFM 24(4), 733–739.
Donsker, M. D. (1952). Justification and extension of Doob’s heuristic approach to the
Kolmogorov–Smirnov theorems. Annals of Mathematical Statistics, 23(2), 277–281.
68 5 Tilting Strategy: Using Probability Distribution Properties
Guégan, D., & Hassani, B. (2015). Distortion risk measures or the transformation of unimodal
distributions into multimodal functions. In A. Bensoussan, D. Guégan, & C. Tapiro (Eds.),
Future perspectives in risk models and finance. New York: Springer.
Guégan, D., & Hassani, B. (2016). More accurate measurement for enhanced controls: VaR vs
ES? In Documents de travail du Centre d’Economie de la Sorbonne 2016.15 (Working Paper)
[ISSN: 1955-611X. 2016] <halshs-01281940>.
Guégan, D., Hassani, B. K. & Naud, C. (2011). An efficient threshold choice for the computation
of operational risk capital. The Journal of Operational Risk, 6(4), 3–19.
Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.
Biometrika, 50(4), 1029–1054.
Huggenberger, M., & Klett, T. (2009). A g-and-h Copula approach to risk measurement in
multivariate financial models, University of Mannheim, Germany, Preprint
Leadbetter, M. R. (1983). Extreme and local dependence in stationary sequences. Zeitschrift für
Wahrscheinlichkeitstheorie und Verwandte Gebiete, 65, 291–306.
Lindsay, B. G. (1988). Composite likelihood methods. Contemporary Mathematics, 80, 221–239.
Luceno, A. (2007). Likelihood moment estimation for the generalized Pareto distribution.
Australian and New Zealand Journal of Statistics, 49, 69–77.
McCulloch, J. H. (1996). On the parametrization of the afocal stable distributions. Bulletin of the
London Mathematical Society, 28, 651–655.
Müller, M., Sperlich, S., & Werwatz, A. (2004). Nonparametric and semiparametric models.
Springer series in statistics. Berlin: Springer.
Patterson, H. D., & Thompson, R. (1971) Recovery of inter-block information when block sizes
are unequal. Biometrika, 58(3), 545–554.
Pickands, J. (1975). Statistical inference using extreme order statistics. Annals of Statistics, 3,
119–131.
Morgan, J. P. (1996). Riskmetrics technical document.
Rockafellar, R. T., & Uryasev, S. (2000). Optimization of conditional value-at-risk. Journal of
Risk, 2(3), 21–41.
Rockafellar, R. T., & Uryasev, S. (2002). Conditional value at risk for general loss distributions.
Journal of Banking and Finance, 26(7), 1443–1471.
Sereda, E. N., et al. (2010). Distortion risk measures in portfolio optimisation. Business and
economics (Vol. 3, pp. 649–673). Springer, New York
Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman
and Hall/CRC.
Smirnov, N. (1948). Table for estimating the goodness of fit of empirical distributions. Annals of
Mathematical Statistics, 19, 279–281.
Taqqu, M., & Samorodnisky, G. (1994). Stable non-Gaussian random processes. New York:
Chapman and Hall.
Tucker, H. G. (1959). A generalization of the Glivenko-Cantelli theorem. The Annals of
Mathematical Statistics, 30(3), 828–830.
Wand, M. P., & Jones, M. C. (1995). Kernel smoothing. London: Chapman and Hall/CRC.
Wang, S. S. (2000). A class of distortion operators for pricing financial and insurance risks. Journal
of Risk and Insurance, 67(1), 15–36.
Yates, F. (1934). Contingency table involving small numbers and the 2 test. Supplement to the
Journal of the Royal Statistical Society, 1(2), 217–235.
Chapter 6
Leveraging Extreme Value Theory
6.1 Introduction
1
The parameters of the GEV distributions are estimated by maximum likelihood (Hoel, 1962).
72 6 Leveraging Extreme Value Theory
While some aspects of extreme value theory have been discussed in the previous
chapter, here we will present its application in a different context and theoretical
framework.
2
Sometimes known as the extreme value theorem.
6.2 The Extreme Value Framework 73
In probability theory and statistics, the generalised extreme value (GEV) distri-
bution (sometimes called the Fisher–Tippet distribution) is a family of continuous
probability distributions developed within extreme value theory combining the
Gumbel, Fréchet and Weibull families also known as type I, II and III extreme value
distributions.
The generalised extreme value distribution has cumulative distribution function
h x i1=
F.xI ; ; / D exp 1 C (6.2.2)
for 1 C .x /= > 0, where 2 R is the location parameter, > 0 the scale
parameter and 2 R the shape parameter. Thus for > 0, the expression just given
for the cumulative distribution function is valid for x > =, while for < 0 it
is valid for x < C =./. For D 0 the expression just given for the cumulative
distribution function is not defined and is replaced taking the limit as ! 0 by,
n x o
F.xI ; ; 0/ D exp exp ; (6.2.3)
without any restriction on x.
The resulting density function is
h x i1=
1h x i.1=/1
f .xI ; ; / D 1C exp 1 C
(6.2.4)
again, for x > = in the case > 0, and for x < C =./ in the case < 0.
The density is zero outside of the relevant range. In the case D 0 the density is
positive on the whole real line and equal to
1 h x i n h x io
f .xI ; ; / D exp exp exp : (6.2.5)
The first four moments as long as the mode and the median are
• Mean -
8
ˆ .1/1
< C
ˆ if ¤ 0; < 1;
C if D 0; (6.2.6)
ˆ
:̂1 if 1;
• Median -
( 1
C .ln 2/ if ¤ 0;
(6.2.7)
ln ln 2 if D 0:
• Mode -
( 1
C .1C/ if ¤ 0;
(6.2.8)
if D 0:
• Variance -
8
ˆ 2 2 2
if ¤ 0; < 12 ;
< .g2 g1 /=
ˆ
2
2
6 if D 0; (6.2.9)
ˆ
:̂1 if 1
;
2
where gk D .1 k/.
• Skewness -
8
ˆ g3 3g1 g2 C2g31
ˆ
ˆ if > 0;
< .g2 g21 /3=2
g3 3g1 g2 C2g31
ˆ 2 3=2 if < 0; (6.2.10)
ˆ p.g2 g1 /
:̂ 12 6.3/ if D 0:
3
In order to apply the methodology, the first step is to build the data set, considering
for example a banking group which possesses several branches, subsidiaries and
legal entities all over the world. Note that this kind of structure is typical of
what we can find with systematically important financial institutions (SIFIs) or
large insurance companies. In each branch, subsidiary, legal entity or business
unit, the group has experts responsible for managing the risks, the so-called first
line of defense. This methodology is particularly appropriate for operational risk
6.2 The Extreme Value Framework 75
management as the Basel Matrix provides each and every entity with a base
taxonomy of the risks (BCBS, 2004).
We assume that we have i D 1; : : : ; p subsidiaries or branches, each one being
represented by a risk manager. This manager can provide j D 1; : : : ; n quotations
per risk in a year (for instance) or any relevant period of time. Thus, for a given date,
we can have np quotations for a risk type. These quotations can also be obtained for
different level of granularity. Then, these np quotations per risk provide a data set
which corresponds to a sequence we will refer to as a maxima data set (MDS).
Remark 6.2.1 Once the data collection process properly explained to risk managers,
the information can be collected by email or through the risk management system,
they do not necessarily need to meet on a regular basis. Consequently, this
methodology is particularly appropriate for large, complex and global companies
an relatively costless.
Given the MDS created in the previous section, we will estimate the parameters of
the GEV distribution whose density is given by Eq. (6.2.4).
As mentioned before, this distribution contains the Fréchet distribution for > 0,
the Gumbel distribution for D 0 and the Weibull distribution for < 0 (Fisher
and Tippett, 1928; Gnedenko, 1943). Therefore, the shape parameter governs the
tail behaviour of the distribution. The sub-families defined above have the following
cumulative distribution functions:
Gumbel or type I extreme value distribution ( D 0)
.x/=
F.xI ; ; 0/ D ee for x 2 R: (6.2.12)
where > 0.
Remark 6.2.2 Though we are working with maxima, the theory is equally valid for
minima. Indeed, a generalised extreme value distribution can be fitted the same way.
76 6 Leveraging Extreme Value Theory
3
The estimation procedure is a very important aspect of the approach. Under regular conditions
the maximum likelihood estimate can be unbiased, consequently, if it is possible to use it, it will
not make any sense opting for another approach. Unfortunately, this approach may lead to an
infinite estimated mean model. To avoid this problem we can use a “probability weighted moment”
estimation approach, as this would have enabled constraining the shape parameter within Œ0; 1.
But, as discussed in the following sections we will see that estimation procedures are not the main
problem because they are linked to the information set used.
6.3 Summary of Results Obtained 77
In this section, the main results obtained in Guégan and Hassani (2012) are
summarised to illustrate the approach.4
In this paper, the information provided by the expects is sorted according to the
Basel taxonomy for operational risk which has three level of granularity (BCBS,
2004).
• In a first risk category of the Basel Matrix, for instance, the “Payment and
Settlement”/“Internal Fraud” cell, the estimated value for is 4:30 for the first
level of granularity. Consequently, this estimated GEV distribution has an infinite
mean and is therefore inapplicable. Working on the second level of granularity,
even if the value decreases, it remains larger than 1 and therefore the fitted
GEV distribution cannot be used for risk management purposes, or at least the
outcomes might be very complicated to justify. This means that we need to
consider a lower level of granularity to conclude: the third one, for instance.
Unfortunately, this information set is not available for the present exercise. So
the methodology is not always applicable, particularly if the data are not adapted.
• The second application is far more successful. Indeed the application to the
“Retail Banking”/“Clients, Products and Business Practices/Improper Business
or Market Practice” cell, disaggregating the data set from the first to the
subsequent level of granularity, i.e., from the highest level of granularity to the
lowest, the value of increases from D 0:02657934 to D 0:04175584 for
the first subcategory, D 3:013321 for the second subcategory, D 0:06523997
for the fourth subcategory and D 0:08904748 for the fifth. Again, the influence
of the data set built for estimation’s purpose is highlighted. The aggregation of
different risk natures—the definition behind this sub-event covers many kinds of
incidents—in a single cell cannot permit to provide an adequate risk measure.
For the first level of granularity, is less than 1 and this is probably due to the
fact that the corresponding information set is biased by the combination of data.
In this specific case, we have four cells in the second level of granularity for
which some quotations are available, i.e., the bank may consider that some major
threats may arise from these categories, as the result, working at a lower level of
granularity tends to make sense. Note that the data for the third subcategory at
the second level of granularity were not available.
• In a successful third case, the methodology has been applied to the cell “Payment
and Settlement”/“Execution, Delivery and Process Management”. In this case,
D 2:08 for the first level of granularity, and D 0:23 for the subcategory
quoted at the next level, i.e., the “Payment and Settlement”/“Vendors and
Suppliers” cell. Note that some cells are empty, because the banks top risk
managers dealt with these risks in different ways and did not ask quotations to the
risk managers. In these situations, we would recommend switching to alternative
4
Note that this methodology has been tested and/ or is used in multiple large banking group.
78 6 Leveraging Extreme Value Theory
methodologies. We also noted in our analysis that the shape parameter was
positive in all cases, thus the quotations’ distributions follow Fréchet distributions
given in relationship (6.2.2).
Thus, using MDS from different cells permit to anticipate incidents, losses,
corresponding capital requirements and prioritise key management decisions to be
undertaken. Besides it shows the necessity to have precise information.
In the summarised piece of analysis, comparing the risk measures obtained using
experts opinion with the ones obtained from the collected losses using the classical
loss distribution approach (LDA) (Lundberg, 1903; Frachot et al., 2001; Guégan and
Hassani, 2009), we observe that even focusing on extreme losses, the methodology
proposed in this chapter does not always provide larger risk measures than those
obtained implementing more traditional approaches. This outcome is particularly
important as it means that using an appropriate framework even focusing on extreme
events does not necessarily imply that the risk measures will be higher. This tackles
one of the main clichés regarding the over conservativeness of the EVT and risk
managers should be aware of that feature.
On the other hand, the EVT approach vs the LDA (Frachot et al., 2001) which
relies on past incidents, even if the outcomes may vary, the ranking of these ones
with respect to the class of incidents is globally maintained. Regarding the volatility
between the results obtained from the two methods, we observe that the experts tend
to provide quotations embedding the entire information available at the moment
they are giving their quotations as well as their expectations, whereas historical
information sets are biased by the delays between the moment an incident occurred,
is detected and the moment it has been entered in the collection tool.
Another reason explaining the differences between the two procedures is the
fact that experts anticipate the loss maximum values with respect to the internal
policy of risk management, such that the efficiency of the operational risk control
system, the quality of the communication from the top management or the lack
of insight regarding a particular risk, or the effectiveness of the risk framework.
For example, on the “Retail Banking” business line for the “Internal Fraud” event
type, a VaR of 7,203,175 euros using experts opinions is obtained against a VaR of
190,193,051 euros with the LDA. The difference between these two amounts may
be interpreted as a failure inside the operational risk control system to prevent these
frauds.5
The paper summarised highlighted the importance to consider an a priori
knowledge of the experts associated with an a posteriori backtesting based on
collected incidents.
5
Theoretically, the two approaches (Experts vs. LDA) are different, therefore this way of thinking
may be easily challenged, nevertheless it might lead practitioners to question their system of
control.
References 79
6.4 Conclusion
In this chapter, a new methodology based on experts opinions and extreme value
theory to evaluate risks has been developed. This method does not suffer from
numerical methods and provide analytical risk measures, though GEV’s parameters
estimation might sometimes be challenging.
With this method, practitioner’s judgements have been transformed into com-
putational values and risk measures. The information set might only be biased by
people’s personality, risk aversion and perception, but not by obsolete data. It is
clear that these values include an evaluation of the risk framework and might be
used to evaluate how the culture is embedded.
The potential unexploitability of the GEV ( > 1) may just be caused by the fact
that several risk types are mixed in a single unit of measure, for example, “Theft
and Fraud” and “System Security” within the “External Fraud” event type. But
from splitting the data set some other challenges may appear, as this will require a
procedure to deal with the dependencies, such as the approach presented in Guégan
and Hassani (2013).
However, it is important to bear in mind that the reliability of the results mainly
depends on the risk management quality and particularly on the risk managers
capability to work as a team.
References
BCBS. (2004). International convergence of capital measurement and capital standards. Basel:
Bank for International Settlements.
Coles, S. (2004). An introduction to statistical modeling of extreme values. Berlin: Springer.
Embrechts, P., Klüppelberg, C., & Mikosh, T. (1997). Modelling extremal events: For insurance
and finance. Berlin: Springer.
Fisher, R. A., & Tippett, L. H. C. (1928). Limiting forms of frequency distributions of the largest or
smallest member of a sample. Proceedings of the Cambridge Philological Society, 24, 180–190.
Frachot, A., Georges, P., & Roncalli, T. (2001). Loss distribution approach for operational risk.
Working paper, GRO, Crédit Lyonnais, Paris.
Gnedenko, B. V. (1943). Sur la distribution limite du terme d’une série aléatoire. Annals of
Mathematics, 44, 423–453.
Guégan, D., & Hassani, B. K. (2009). A modified Panjer algorithm for operational risk capital
computation. The Journal of Operational Risk, 4, 53–72.
Guégan, D., & Hassani, B. K. (2012). A mathematical resurgence of risk management: An extreme
modeling of expert opinions. To appear in Frontier in Economics and Finance, Documents de
travail du Centre d’Economie de la Sorbonne 2011.57 - ISSN:1955-611X.
Guégan, D., & Hassani, B. K. (2013). Multivariate VaRs for operational risk capital computa-
tion: A vine structure approach. International Journal of Risk Assessment and Management
(IJRAM), 17(2), 148–170.
Guégan, D., Hassani, B. K., & Naud, C. (2011). An efficient threshold choice for the computation
of operational risk capital. The Journal of Operational Risk, 6(4), 3–19.
Haan, L. de, & Ferreira, A. (2010). Extreme value theory: An introduction. Springer Series in
Operations Research and Financial Engineering. New York: Springer.
80 6 Leveraging Extreme Value Theory
Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. Annals
of Statistics, 3, 1163–1174.
Hoel, P. G. (1962). Introduction to mathematical statistics (3rd ed.). New York: Wiley.
Leadbetter, M. R., Lindgren, G., & Rootzen, H. (1983). Extreme and related properties of random
sequences and series. New York: Springer.
Lundberg, F. (1903). Approximerad framställning av sannolikhetsfunktionen Aterförsäkring av
kollektivrister. Uppsala: Akad. Afhandling. Almqvist och Wiksell.
Pickands, J. (1975). Statistical inference using extreme order statistics. Annals of Statistics, 3, 119–
131.
Resnick, S. I. (1987). Extreme values, regular variation, and point processes. New York: Springer.
Chapter 7
Fault Trees and Variations
In order to analyse the process leading to a failure, we have seen various strategies.
In this chapter we are presenting another approach which is also very intuitive and
would obtain business buy-in as it is by design, built and informed by risk owners:
the fault tree analysis (FTA) (Barlow et al., 1975; Roberts et al., 1981; Ericson,
1999a; Lacey, 2011). This methodology relies on a binary system which makes the
underlying mathematics quite simple and easy to implement.
Therefore, the FTA is a top down, deductive (and not inductive) failure analysis
in which an undesired state of a system is analysed using Boolean logic to combine
a series of lower-level events (DeLong, 1970; Larsen, 1974; Martensen and Butler,
1975; FAA, 1998). This methodology is mainly used in fields of both safety and
reliability engineering to analyse how systems may fail, to mitigate and manage the
risks or to determine event rates of a safety accident or a particular system level
failure. This methodology is directly applicable to financial institutions (Benner,
1975; Andrews and Moss, 1993; Vesely, 2002; Lacey, 2011).
To be more specific regarding how FTA can be used the following enumeration
should be enlightening:
1. understand the logic, events and conditions as well as their relationships leading
to an undesired event (i.e. root cause analysis (RCA)).
2. show compliance with the system safety and reliability requirements.
3. identify the sequence of causal factors leading to the top event.
4. monitor and control the safety performance to design safety requirements.
5. optimise resources.
6. assist in designing a system. Indeed, the FTA may be used to design a system
while identifying the potential causes of failures.
7. identify and correct causes of the undesired event. The FTA is a diagnosis tool.
8. quantify the exposure by calculating the probability of the undesired event (risk
assessment).
7.1 Methodology
In a fault tree, events are associated with probabilities, e.g., a particular failure may
occur at some constant rate . Consequently, the probability of failure depends on
the rate and the moment of occurrence t:
P D 1 exp. t/ (7.1.1)
P t, t < 0:1.
7.2 In Practice 83
If the probabilities of a failure on fault trees are very small (negligible), P.A \ B/
may be discarded in the calculations.1 As a result, the output of an OR gate may be
approximated by:
assuming that the two sets are mutually exclusive. An exclusive OR gate represents
the probability that one or the other input, but not both, occurs:
7.2 In Practice
7.2.1 Symbols
The basic symbols used in FTA are grouped as events, gates and transfer symbols
(Roberts et al., 1981).
Remark 7.2.1 Depending on the software used these symbols may vary as they
may have been borrowed from alternative approaches to represent causality such
as circuit diagrams.
1
It becomes an error term.
84 7 Fault Trees and Variations
Undeveloped
Basic Event External Event
Event
Conditioning Intermediate
Event Event
Event symbols are used for primary events and intermediate (or secondary)
events. Primary events are not developed any further on the fault tree. Intermediate
events are located after the output of a gate. The event symbols are represented in
Fig. 7.1.
The primary event symbols are typically used as follows:
• Basic event—failure or error root.
• External event—exogenous impact (usually expected).
• Undeveloped event—an event for which we do not have enough information or
which has no impact on our analysis of the main problem.
• Conditioning event—conditions affecting logic gates.
Gate symbols describe the relationship between input and output events. The
symbols are derived from Boolean logic symbols (Parkes, 2002; Givant and Halmos,
2009). These are represented in Fig. 7.2.
The gates work as follows:
• OR gate—the output occurs if any input occurs
• AND gate—the output occurs only if all inputs occur
• Exclusive OR gate—the output occurs if exactly one input occurs
• Priority AND gate—the output occurs if the inputs occur in a specific sequence
specified by a conditioning event
• Inhibit gate—the output occurs if the input occurs under an enabling condition
specified by a conditioning event.
In a first step, it is necessary to explain the difference between a fault and a
failure. A failure is related to a basic component, it is the result of an internal
7.2 In Practice 85
Exclusive OR
OR Gate AND Gate
Gate
truncation is the fact of not considering particular segments during the evaluation
of the fault tree. Cut sets are usually truncated when they exceed a specific order
and/or probability.
A transfer event indicates a subtree branch that is used elsewhere in the tree.
A transfer always involves a gate event node on the tree, and is symbolically
represented by a triangle. The transfer has various purposes such as (1) starts a
new page (for plots), (2) indicates where a branch is used in various places in the
same tree, but is not repeatedly drawn (internal transfer) (MOB) and (3) indicates
an input module from a separate analysis (external transfer).
Transfer symbols are used to connect the inputs and outputs of related fault trees,
such as the fault tree of a subsystem to its system. Figure 7.3 exhibits an example of
simple FTA regarding a building on fire.
The construction of a fault tree is an iterative process, which has 6 clearly defined
steps, for instance (Ericson, 1999b):
1. Review the gate event under investigation
2. Identify all the possible causes of this event and ensure that none are missed
3. Identify the cause–effect relationship for each event
4. Structure the tree considering your findings
5. Ensure regularly that identified events are not repeated
6. Repeat the process for the next gate.
While informing each gate node involves a three steps:
• Step 1—Immediate, necessary and sufficient (INS)
• Step 2—Primary, secondary and command (PSC)
• Step 3—State of the system or component.
Analysing this first step in detail, the question to be answered is are the
factors INS to cause the intermediate event? Immediate means that we do not
skip past events, necessary means that we only include what is actually necessary
and sufficient means that we do not include more than the minimum necessary.
Regarding the second step, it is necessary to consider the fault path for each enabling
event and identify each causing event identifying if they are primary fault, secondary
faults or command faults (or even induced fault or sequential fault). Then, it is
possible to structure the subevents and gate logic from the path type. Finally, the
third step requires answering the question is the intermediate event a state of the
system or a state of the component. If it is a “state of the component” we are at the
lowest level of that issue, while if the answer to the previous question is “state of
the system”, this implies subsequent or intermediate issues.
7.2 In Practice 87
Fig. 7.3 Simple fault tree: this fault tree gives a simplified representation of what could lead to
a building on fire. In this graph, we can see that the building is on fire if and only if a fire has
been triggered, the safety system malfunctioned and the doors have been left open. Analysis the
“Fire Triggered” node located in the upper right part of the diagram, this one results from three
potential issues, for instance, a faulty electrical appliance, someone smoking in the building or an
arsonist, while the safeguard system is not functioning if the smoke alarms are not going off or the
fire extinguishers are not functioning
88 7 Fault Trees and Variations
7.2.3 Analysis
An FTA can be modelled in different manners, the usual way is summarised below.
A single fault tree permits analysing only one undesired event but this one may be
subsequently fed into another fault tree as a basic event. Whatever the nature of the
undesired event, an FTA is applicable as the methodology is universal.
FTA analysis involves five steps (note that each and every steps should be
properly documented):
1. Define the undesired event to study
• Identify the undesired event to be analysed, and draft the story line leading to
that event.
• Analyse the system and the threat. i.e. what might be the consequences of the
materialisation of the undesired event. This step is necessary to prioritise the
scenarios to be analysed.
2. Obtain an understanding of the system
• Obtain the intermediate probabilities of failure to be fed into the fault tree in
order to evaluate the likelihood of materialisation of the undesired event.
• Analyse the courses, i.e., the critical path, etc.
• Analyse the causal chain, i.e. obtain a prior understanding of what conditions
are necessary and intermediate events have to occur to lead to the materialisa-
tion of the undesired event.
3. Construct the fault tree
• Replicate the causal chain identified in the previous step of the analysis, from
the basic events to the top
• Use the appropriate gates where necessary, OR, AND etc. (see section 7.2.1.)
4. Evaluate the fault tree
• Evaluate the final probability of the undesired event to occur
• Analyse the impact of dealing with the causal factors. This is a “what if” stage
during which we identify the optimal positioning of the controls.
5. Control the hazards identified
• Key management actions (what controls should be put in place).
The implementation of appropriate key management actions is the end game of a
proper scenario analysis. The objective is to manage and if possible, mitigate the
potential threats.
7.2 In Practice 89
In this subsection, the objective is to outline the calculations, i.e., to evaluate the
probability of the top event to occur assuming the probabilities of the bottom events
are known. We use the fault tree presented in Fig. 7.3. Let’s assume the trigger events
in the bottom have the following probabilities:
• Outdated fire extinguisher: 1e106
• Faulty fire extinguisher: 1e106
• Battery remained unchecked: 1e106
90 7 Fault Trees and Variations
7.3 Alternatives
RCA aims at solving problems by dealing with their origination (Wilson et al., 1993;
Vanden Heuvel et al., 2008; Horev, 2010; Barsalou, 2015). A root cause defines
itself by the fact that if it is removed from a causal sequence, the final undesirable
event does not occur; whereas a causal factor affects an event’s outcome, but is not a
root cause as it does not prevent the undesired event from occurring. Though dealing
with a causal factor usually benefits an outcome, such as reducing the magnitude of
a potential loss, it does not prevent it. Note that several measures may effectively
deal with root causes.
RCA allows methodically identifying and correcting the root causes of events,
rather than dealing with the symptoms. Dealing with root causes has for ultimate
objective to prevent problem recurrence. However, RCA users acknowledge that the
complete prevention of a corrective action might not always be achievable.
The analysis is usually done after an event has occurred, therefore the insights in
RCA make it very useful to feed a scenario analysis process. It is indeed compatible
with the other approaches presented in this book. RCA can be used to predict a
failure and is a prerequisite to manage the occurrence effectively and efficiently.
92 7 Fault Trees and Variations
The general principles and usual goal of the RCA are the following:
1. to identify the factors leading to the failure: magnitude, location, timing,
behaviours, actions, inactions or conditions.
2. to prevent recurrence of similar harmful outcomes, focusing on what has been
learnt from the process.
3. RCA must be performed systematically as part of an investigation. Root causes
identified must be properly documented.
4. The best solution to be selected is the one that is the most likely to prevent the
recurrence of a failure at the lowest cost.
5. Effective problem statements and event descriptions are a must to ensure the
appropriateness of the investigations conducted.
6. Hierarchical clustering data-mining solutions can be implemented to capture root
causes (see Chap. 3).
7. The sequence of events leading to the failures should be clearly identified,
represented and documented to support the most effective positioning of controls.
8. Transform a reactive culture into a forward-looking culture (see Chap. 1).
However, the cultural changes implied by the RCA might not be welcome gently
as it may lead to the identification of personnel’s accountability. The association
of the RCA with a no blame culture might be required as well as a strong
sponsorship (see Chap. 4).
The quality of RCA depends on the data quality as well as its capability to
use them and transform the outcome into management actions. One of the main
issues that RCA may suffer is the so-called analyst bias, i.e., the selection and
the interpretation of the data supporting a prior opinion. The process transparency
should be ensured to avoid that problem. Note that RCA, as most of the factor
models presented in this book, are highly data consuming (Shaqdan et al., 2014).
However, the RCA is not necessarily the best approach to estimate the likelihood
and the magnitudes of future impacts.
The why-because analysis has been developed to analyse accidents (Ladkin and
Loer, 1998). It is an a posteriori analysis which aims at ensuring objectivity,
verifiability and reproducibility of results. A why-because graph presents causal
relationships between factors of an accident. It is a directed acyclic graph in which
the factors are represented by nodes and relationships between factors by directed
edges.
“What?” is always the first question to ask. It is usually quite easy to define as
the consequences are understood. The following steps are an iterative process to
determine each and every potential causes. Once the causes of the accident have
been identified, formal tests are applied to all potential cause–effect relationships.
7.3 Alternatives 93
This process can be broken down for each cause identified until the targeted level is
reached, such as the level of granularity the management can have an effect on.
Remark 7.3.1 For each node, each contributing cause must be a necessary condition
to cause the accident, while all of causes taken together must be sufficient to cause it.
In the previous paragraph, we mentioned the use of some tests to evaluate how
necessary the potential causes are necessary or sufficient. Indeed, the counterfactual
test addresses the root character of the cause, i.e., is the cause necessary for the
incident to occur. Then, the causal sufficiency test deals with the combination of
causes and aims at analysing whether a set of causes are sufficient for an incident
to occur, and therefore help identifying missing causes. Causes taken independently
must be necessary, and all causes taken together must be sufficient.
This solution is straightforward and may support the construction of scenarios,
but it might not be particularly efficient to deal with situations that never crystallised.
Good illustration of WBAs can be found in Ladkin (2005)
Ishikawa diagrams are causal diagrams depicting the causes of a specific event
created by Ishikawa (1968) for quality management purposes. Ishikawa diagrams
are usually used to design a product and to identify potential factors causing a bigger
problem. As illustrated this methodology can easily be extended to operational risk
or conduct risk scenario analysis, for example. Causal factors are usually sorted into
general categories. These traditionally include
1. People: Anyone involved in the process.
2. Process: How the process is performed- policies, procedures, rules, regulations,
laws, etc.
3. Equipment: Tools required to achieve a task.
4. Materials: Raw materials used to produce the final product (in our case these
would be risk catalysts).
5. Management and measurements: Data used to evaluate the exposure.
6. Environment: The conditions to be met so the incident may happen.
Remark 7.3.2 Ishikawa’s diagram is also known as a fishbone diagram because of
its shape, similar to the side view of a fish skeleton.
Cause-and-effect diagrams are useful to analyse relationships between multiple
factors, and the analysis of the possible causes provides additional information
regarding the processes behaviour. As in Chap. 4, potential causes can be defined
in workshops. Then, these groups can then be labeled as categories of the fishbone,
in our case, we used the traditional ones to illustrate what the analysis of a fire
exposure would look like (Fig. 7.4).
94 7 Fault Trees and Variations
Security
Building on Fire
Measurement &
Materials Environment
Management
In this section, we present a methodology that has been widely used at the early
stages of scenario analysis for risk management: fuzzy logic. In fuzzy logic, values
representing the “truth” of a variable is a real number lying between 0 and 1
contrary to Boolean logic in which the “truth” can only be represented by 0 or
1. The objective is to capture the fact that the “truth” is a conceptual objective and
can only be partially reached, and therefore the outcome of an analysis may range
between completely true and completely false (Zadeh, 1965; Biacino and Gerla,
2002; Arabacioglu, 2010).
Classical logic does not permit to capture situation in which answers may vary,
in particular when we are dealing with people’s perceptions, and only a spectrum
of answers may lead to a consensual “truth”, which should converge to the “truth”.
This approach makes a lot of sense, when we only have a partial information at our
disposal.
Most people are instinctively apply “fuzzy” estimates in daily situation, based
upon previous experience, to determine how to park their car in a very narrow space,
for example.
References 95
Fuzzy logic systems can be very powerful when input values are not available
or are not trustworthy, and can be used and adapted in a workshop such as those
described in Chap. 4, as this method aims for a consensus.
Cipiloglu Yildiz (2008) provides the following algorithm to implement a fuzzy
logic:
1. Define the linguistic variables, i.e., variable that represents some characteristics
of an element (color, temperature, etc.). This variable takes words as values.
2. Build the membership functions which represents the degree of truth.
3. Design the rulebase i.e. the set of rules, such as IF-THEN rules etc.
4. Convert input data into fuzzy values using the membership functions.
5. Evaluate the rules in the rulebase.
6. Combine the results of each rule evaluated in the previous step.
7. Convert back the output data into non-fuzzy values so these can be used for
further processing or management in our case.
References
Andrews, J. D., & Moss, T. R. (1993). Reliability and risk assessment. London: Longman Scientific
and Technical.
Arabacioglu, B. C. (2010). Using fuzzy inference system for architectural space analysis. Applied
Soft Computing, 10(3), 926–937.
Barlow, R. E., Fussell, J. B., & Singpurwalla, N. D. (1975). Reliability and fault tree analysis,
conference on reliability and fault tree analysis. UC Berkeley: SIAM Pub.
Barsalou, M. A. (2015). Root cause analysis: A step-by-step guide to using the right tool at the
right time. Boca Raton: CRC Press/Taylor and Francis.
Benner, L. (1975). Accident theory and accident investigation. In Proceedings of the Society of Air
Safety Investigators Annual Seminar.
Biacino, L., & Gerla, G. (2002). Fuzzy logic, continuity and effectiveness. Archive for Mathemat-
ical Logic, 41(7), 643–667.
Cipiloglu Yildiz, Z. (2008). A short fuzzy logic tutorial. http://cs.bilkent.edu.tr/~zeynep.
DeLong, T. (1970). TA fault tree manual. (Master’s thesis) Texas A and M University.
Ericson, C. (1999a). Fault tree analysis - a history. In Proceedings of the 17th International Systems
Safety Conference.
Ericson, C. A., (Ed.) (1999b). Fault tree analysis. www.thecourse-pm.com.
FAA. 1998. Safety risk management. In ASY-300, Federal Aviation Administration.
Givant, S. R., & Halmos, P. R. (2009). Introduction to Boolean algebras. Berlin: Springer.
Horev, M. (2010). Root cause analysis in process-based industries. Bloomington: Trafford
Publishing.
Ishikawa, K. (1968). Guide to quality control. Tokyo: Asian Productivity Organization.
Koch, J. E. (1990). Jet propulsion laboratory reliability analysis handbook. In Project Reliability
Group, Jet Propulsion Laboratory, Pasadena, California JPL-D-5703.
Lacey, P. (2011). An application of fault tree analysis to the identification and management of
risks in government funded human service delivery. In Proceedings of the 2nd International
Conference on Public Policy and Social Sciences.
Ladkin, P. (2005). The Glenbrook why-because graphs, causal graphs, and accimap. Working
paper, Faculty of Technology, University of Bielefeld, German.
Ladkin, P., & Loer, K. (1998). Analysing aviation accidents using WB-analysis - an application of
multimodal reasoning. (AAAI Technical Report) SS-98-0 (pp. 169–174)
96 7 Fault Trees and Variations
Larsen, W. (1974). Fault tree analysis. Picatinny Arsenal (Technical Report No. 4556).
Martensen, A. L., & Butler, R.W. (1975). The fault-tree compiler. In Langely Research Center,
NTRS.
Parkes, A. (2002). Introduction to languages, machines and logic: Computable languages, abstract
machines and formal logic. Berlin: Springer.
Roland, H. E., & Moriarty, B. (Eds.), (1990). System safety engineering and management. New
York: Wiley.
Shaqdan, K., et al. (2014). Root-cause analysis and health failure mode and effect analysis: Two
leading techniques in health care quality assessment. Journal of the American College of
Radiology, 11(6), 572–579.
Vanden Heuvel, L. N., Lorenzo, D. K., & Hanson, W. E. (2008). Root cause analysis handbook: A
guide to efficient and effective incident management (3rd ed.). New York: Rothstein Publishing.
Vesely, W. (2002). Fault tree handbook with aerospace applications. In National Aeronautics and
Space Administration.
Vesely, W. E., Goldberg, F. F., Roberts, N. H., & Haasl, D. F. (1981). Fault tree handbook (No.
NUREG-0492). Washington, DC: Nuclear Regulatory Commission.
Wilson, P. F., Dell, L. D., Anderson, G. F. (1993). Root cause analysis: A tool for total quality
management (Vol. SS-98-0, pp. 8–17). Milwaukee: ASQ Quality Press.
Woods, D. D., Hollnagel, D. D., & Leveson, N. (Eds.). (2006). Resilience engineering: Concepts
and precepts (New Ed ed.). New York: CRC Press.
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353.
Chapter 8
Bayesian Networks
8.1 Introduction
This chapter introduces Bayesian belief and decision networks (Koski and Noble,
2009) as quantitative tools for risks measurement and management. Bayesian
networks are a powerful statistical tool which can be applied to risk management
in financial institutions at various stages (Pourret et al., 2008). As stated in the third
chapter, this methodology belongs to the field of data science and can be applied to
various situations beyond scenario analysis.
To effectively and efficiently manage risks, influencing factors from triggers to
catalyst must be clearly identified. Once the key drivers have been identified, the
second stage regards the controls in place to mitigate these risks and ideally to
reduce the exposures. But before initiating these tasks, and assuming that the risk
appetite of the company has been taken into account, three main components need to
be analysed: those are control effectiveness, potential negative impact of the controls
on associated risks and cost of these controls (Alexander, 2003):
1. Effectiveness: Bayesian network factor modelling may help understanding the
impact of a factor (control, risk or trigger) on the overall exposure. The Bayesian
networks are designed to deal with such situations.
2. Dependency: It is possible that the reduction of one risk increases the risks
in another area or a different kind of risks. The Bayesian networks provide
practitioners with a solution to analyse that possibility. This aspect is particularly
important for practitioners as most of the time, dealing with risk implies various
trade-offs and usually requires to compromise.
3. Cost: Would controls cost reduce the risk significantly to at least cover the
investment? This question is fully related to the question of firm risk appetite.
Do we want to accept the risk, or are we willing to offset it?
Addressing now the core topic of this chapter, we can start with the defi-
nition of a Bayesian network. A Bayesian network is a probabilistic graphical
model representing random variables and their conditional dependencies (hence the
Bayesian terminology) via a directed acyclic graph (DAG). Formally, the nodes
represent random variables in the Bayesian sense, i.e., these may be observable
quantities, latent variables, unknown parameters, hypotheses, etc. Arcs or edges
represent conditional dependencies; nodes that are not connected represent variables
that are conditionally independent from each other. Each node is associated with
a probability function which takes a particular set of values from the node’s
parent variables, and returns the probability of the variable represented by the
node. Figure 8.1 illustrates a simple Bayesian network presenting how three initial
conditionally independent variables may lead to an issue.
The node where the arc originates is called the parent, while the node where the
arc ends is called the child. In our example (Fig. 8.1), A is a parent of C, and C is
a child of A. Nodes that can be reached from other nodes are called descendants.
Nodes that lead a path to a specific node are called ancestors. Here, C and E are
descendents of B, and B and C are ancestors of E. Note that children cannot be
its own ancestor or descendent. Bayesian networks will generally include tables
providing the probabilities for the true/false values of the variables. The main point
of Bayesian networks is to allow for probabilistic inference (Pearl, 2000) to be
performed. This means that the probability of each value of a node in the Bayesian
network can be computed when the values of the other variables are known. Also,
because independence among the variables is easy to recognise since conditional
relationships are clearly defined by graphic edges, not all joint probabilities in the
Bayesian system need to be calculated in order to make a decision.
In order to present Bayesian network practically, we will rely on a simple
example related to IT failures as depicted in Fig. 8.2. Assuming that two events in the
IT department could lead to a business disruption and a subsequent financial loss:
A B D
C F
Fig. 8.1 Illustration: a simple directed acyclic graph—this graph contains six nodes from A to F.
C depends on A and B, F depends on D, E depends on C and F and t hrough these nodes A, B
and D
8.1 Introduction 99
Fig. 8.2 This figure represents a Bayesian network, allowing to analyse the exposure to a financial
loss due to a business disruption caused by two potential root causes, for instance, an IT failure
and/or a cyber attack. The conditional probabilities are also provided allowing to move from one
node to the next
either the entity endures an IT failure or the entity suffers a cyber attack. Also, it is
possible to assume that the cyber attack may impact the IT system too (e.g. this one
is disrupted). Then a Bayesian network can model the situation, as represented in the
previous diagram. We assume that the variables have only two possible outcomes,
True or False. The joint probability function is given as follows:
where L represents the business disruption and the financial loss, F represents the
IT failure and C the cyber attack. The model should then be able to answer the
question “What is the probability of suffering a business disruption given that we
100 8 Bayesian Networks
had a cyber attack?” by using the conditional probability formula and summing over
all nuisance variables:
P
P.L D T; C D T/ F2fT;Fg P.L D T; F; C D T/
P.C D T j L D T/ D D P :
P.L D T/ F;C2fT;Fg P.L D T; F; C/
(8.1.2)
Using the expansion for the joint probability function P.L; F; C/ and the condi-
tional probabilities as presented in the diagram, we can compute any combination.
For example,
which leads to 0:7 0:2 0:7 D 0:098. Then the numerical results are
0:189TTT C 0:027TFT
P.C D T j L D T/ D 68:78 %:
0:189TTT C 0:098TTF C 0:027TFT C 0:0TFF
8.2 Theory
In this section, we will address the Bayesian network from a theoretical point of
view, not only focusing on our problem, i.e., scenario analysis, but also discussing its
use beyond scenario analysis, or in other words, its use for automated and integrated
risk management.
The first point to introduce is the concept of joint probability, i.e., the probability
that a series of events will happen subsequently or simultaneously. The joint
probability distribution can be expressed either in terms of a joint cumulative
distribution function or in terms of a joint probability density function in the
continuous case or joint probability mass function in the discrete case.1 These in turn
can be used to find two other types of distributions: the marginal distributions giving
the probabilities for any of the variables, and the conditional probability distribution
for the remaining variables.
The joint probability mass function of two discrete random variables X, Y is
given by
1
Chapter 11 provides alternative solution to build joint probability functions.
8.2 Theory 101
In parallel, the joint probability density function fX;Y .x; y/ for continuous random
variables is
where fYjX .yjx/ and fXjY .xjy/ give the conditional distributions of Y given X D x and
of X given Y D y, respectively, and fX .x/ and fY .y/ give the marginal distributions
for X and Y, respectively.
In the case of a Bayesian network, the joint probability of the multiple variables
can be obtained from the product of individual probabilities of the nodes:
Y
n
P.X1 ; : : : ; Xn / D P.Xi j parents.Xi // : (8.2.3)
iD1
P.AjS/ P.IjA; S/
P.AjI; S/ D ; (8.2.4)
P.IjS/
where our belief in assumption A can be refined given the additional information
available I as long as secondary inputs S. P.AjI; S/ is the posterior probability, i.e.,
the probability of A to be true considering the initial information available as long as
the added information. P.AjS/ is the prior probability or the probability of A being
true given S. P.IjA; S/ is the likelihood component and gives the probability of the
evidence assuming that both A and S are true. Finally, the last term P.IjS/ is called
the expectedness, or how expected the evidence is, given only S. It is independent
of A, therefore it is usually considered as a scaling factor, and may be rewritten as
X
n
P.IjS/ D P.IjAi ; S/ P.Ai jS/; (8.2.5)
i
where i denotes the index of a particular assumption Ai , and the summation is taken
over a set of hypotheses which are mutually exclusive and exhaustive. It is important
to note that all these probabilities are conditional. They specify the degree of
102 8 Bayesian Networks
belief in propositions assuming that some other propositions are true. Consequently,
without prior determination of the probability of the previous propositions, the
approach cannot be functioning.
Going one step further, we can now briefly present the statistical inference.
Given some data x, and parameter , a simple Bayesian analysis starts with a
prior probability p./ and likelihood p.x j / to compute a posterior probability
p. j x/ / p.x j /p./ (Shevchenko, 2011).
Usually the prior distributions depend on other parameters ' (not mentioned in
the likelihood), referred to as hyperparameters. So, the prior p./ must be replaced
by a likelihood p. j '/, and a prior p.'/ on the newly introduced parameters ' is
required, resulting in a posterior probability
The process may be repeated multiple times if necessary; for example, the parame-
ters ' may depend in turn on additional parameters , which will require their own
prior. Eventually the process must terminate, with priors that do not depend on any
other unmentioned parameters.2
For example, suppose we have measured the quantities x1 ; : : : ; xn , each with
normally distributed errors of known standard deviation ,
xi N.i ; 2 /: (8.2.7)
i D xi : (8.2.8)
However, if the quantities are not independent, a model combining the i is required
such as,
xi N.i ; 2 /; (8.2.9)
i N.'; 2 / (8.2.10)
with improper priors ' , 2 .0; 1/. When n 3, this is an identified model
(i.e. there exists a unique solution for the model’s parameters), and the posterior
distributions of the individual i will tend to converge towards their common mean.3
2
The symbol / means proportional too, and to draw a parallel with the previous paragraph related
to Bayes’ theorem, we see that the scaling factor does not have any impact in the research of the
appropriate values for the parameters.
3
This shrinkage is a typical behaviour in hierarchical Bayes’ models (Wang-Shu, 1994).
8.2 Theory 103
In order to specify the Bayesian network and therefore represent the joint probability
distribution, the probability distribution for X conditional upon X’s parents has to
be specified for each node X. These distributions may take any form, though it is
common to work with discrete or Gaussian distributions since these simplifies the
calculations.
In the following we develop the Gaussian case because of the so-called conjugate
property. Indeed, if the posterior distributions p.jx/ are in the same family as the
prior probability distribution p./, the prior and posterior are then called conjugate
distributions, and the prior is called a conjugate prior for the likelihood function. The
Gaussian distribution is conjugate to itself with respect to its likelihood function.
Consequently, the conjugate prior of the mean vector is another multivariate normal
distribution, and the conjugate prior of the covariance matrix is an inverse-Wishart
distribution W 1 (Haff, 1979). Suppose then that n observations have been gathered
where
and
Then,
nNxCm0 1
p. j †; X/ N nCm ; nCm † ;
(8.2.15)
p.† j X/ W 1 ‰ C nS C nCmnm
.Nx 0 /.Nx 0 /0 ; n C n0 ;
where
X
n
xN D n1 xi ;
iD1
Xn (8.2.16)
S D n1 .xi xN /.xi xN /0 :
iD1
104 8 Bayesian Networks
N D 1 C † 12 † 1
22 .a 2 / (8.2.20)
† D † 11 † 12 † 1
22 † 21 : (8.2.21)
This matrix is the Schur complement (Zhang, 2005) of †22 in †. This means that
to compute the conditional covariance matrix, the overall covariance matrix need to
be inverted, the rows and columns corresponding to the variables being conditioned
upon have to be dropped, and then inverted back to get the conditional covariance
matrix. Here † 1
22 is the generalised inverse of † 22 .
In the simplest case, a Bayesian network is specified by an expert and is then used
to perform inference, as briefly introduced in the first section. In more complicated
situations, the network structure and the parameters of the local distributions must
be learned from the data.
As discussed in Chap. 4, Bayesian networks are part of the machine learning
field of research. Originally developed by Rebane and Pearl (1987) the automated
learning relies on the distinction between the three possible types of adjacent triplets
allowed in a DAG:
• Type 1: X ! Y ! Z
• Type 2: X Y!Z
• Type 3: X ! Y Z
8.2 Theory 105
Type 1 and type 2 are both independent given Y, therefore, they are indis-
tinguishable. On the other hand, Type 3 can be uniquely identified as X and Z
are marginally independent and all other pairs are dependent. Thus, while the
representations of these three triplets are identical, the direction of the arrows defines
the causal relationship and is therefore of particular importance. Algorithms have
been developed to determine the structure of the graph in a first step and orient the
arrows according to the conditional independence observed in a second step (Verma
and Pearl 1991; Spirtes and Glymour 1991; Spirtes et al. 1993; Pearl 2000).
Alternatively, it is possible to use structural learning methods which require
a scoring function and a search strategy, such as a Markov Chain Monte Carlo
(MCMC) to avoid being trapped in local minima. Another method consists in focus-
ing on the sub-class of models, for which the MLE have a closed form, supporting
the discovery of a consistent structure for hundreds of variables (Petitjean et al.,
2013).
Nodes and edges can be added using rule-based machine learning techniques,
inductive logic programming or statistical relational learning approaches (Nassif
et al., 2012, 2013).
Often the conditional distributions require a parameter estimation, using, for
example, a maximum likelihood approach (see Chap. 5) though any maximisation
problem (likelihood or posterior probability) might be complex if some variables
are unobserved. To solve this problem the implementation of the expectation–
maximisation algorithm, which iteratively alternates evaluating expected values
of the unobserved variables conditional on observed data, and maximising the
complete likelihood (or posterior) assuming that previously computed expected
values are correct, is particularly helpful. Alternatively it is possible estimate the
parameters by treating them as additional unobserved variables and to compute a full
posterior distribution over all nodes conditional upon observed data, but this usually
leads to large dimensional models, which are complicated to implement in practice.
Bayesian networks are complete models capturing relationships between vari-
ables and can be used to evaluate probabilities at various stages of the causal chain.
Computing the posterior distribution of variables considering the information gath-
ered about them is referred to as probabilistic inference. To summarise, a Bayesian
network allows automatically applying Bayes’ theorem to complex problems.
The most common exact inference methods are: (1) variable elimination, which
eliminates either by integration or summation the non-observed non-query variables
one by one by distributing the sum over the product; (2) clique tree propagation
(Zhang and Yan 1997), which stores in computers memory the computation so
that multiple variables can be queried simultaneously and new evidence propagated
quickly; (3) and recursive conditioning which allow for a space-time trade-off and
match the efficiency of variable elimination when enough space is used (Darwiche
2001). All of these methods see their complexity growing with the network’s tree
width.
The most common approximate inference algorithms are importance sampling,
stochastic MCMC simulation, mini-bucket elimination, loopy belief propagation,
106 8 Bayesian Networks
In this section, we discuss the added value of Bayesian networks for risk practition-
ers. As these are some kind of models, the possibilities are almost unlimited as long
as the information and the strategies used to feed the nodes are both accurate and
appropriate. Indeed, the number of nodes leading to an outcome can be as large as
practitioners would like though it will require more research to feed the probabilities
required for each node.
The network in Fig. 8.3 shows how starting from a weak IT system, we may
analyse the likelihood of putting customers data at risk and therefore getting a
regulatory fine, of losing customer due to the reputational impact, of suffering an
opportunistic rogue trading incident, up to the systemic incident. In that example, we
can see a macro contagion mimicking a bit the domino effect observed after Societe
Generale rogue trading incident in 2008. Note that each node can be analysed and/or
informed by either discrete or continuous distributions. It is also interesting to note
how the two illustrations in this chapter start from similar underlying issues though
aim at analysing different scenarios (i.e., comparing Figs. 8.2 and 8.3).
The key to the use of this network is the evaluation of the probabilities and
conditional probabilities at each node. Note once again that this kind of method-
ology is highly data consuming, as to be reliable we need evidence and information
Weak IT Systems
Credit Crunch
Fig. 8.3 In this figure, we illustrate the possibility to analyse the cascading outcomes resulting
from a weak IT System, i.e. the likelihood of putting customers data at risk and therefore getting
a regulatory fine, of losing customers due to the reputational impact and in parallel analyse the
probability of suffering an opportunistic rogue trading incident, implementing a Bayesian Network
8.3 For the Managers 107
must be calculated. While the resulting ability to describe the network can be
performed in linear time, this process of network discovery is a hard task which
might either be too costly to perform, or impossible given the number and
combination of variables.
• Calculations and probabilities using Bayes’ rule and marginalisation can become
complex, therefore calculation should be undertaken carefully.
• System’s users might be keen to violate the distribution of probabilities upon
which the system is built.
References
Alexander, C. (2003). Managing operational risks with Bayesian networks. Operational Risk:
Regulation, Analysis and Management, 1, 285–294.
Bayes, T., & Prince, R. (1763). An essay towards solving a problem in the doctrine of chance. By
the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M. A. and F.
R. S. Philosophical Transactions of the Royal Society of London, 53, 370–418.
Darwiche, A. (2001). Recursive conditioning. Artificial Intelligence, 126(1–2), 5–41.
Haff, L. R. (1979). An identity for the Wishart distribution with applications. Journal of
Multivariate Analysis, 9(4), 531–544.
Hassani, B. K., & Renaudin, A. (2013). The cascade Bayesian approach for a controlled
integration of internal data, external data and scenarios. Working Paper, Université Paris 1.
ISSN:1955-611X [halshs-00795046 - version 1].
Holmes, D. E. (Ed.). (2008). Innovations in Bayesian networks: Theory and applications. Berlin:
Springer.
Koski, T., & Noble, J. (2009). Bayesian networks: An introduction (1st ed.). London: Wiley.
MacKay, D. (2003). Information theory, inference, and learning algorithms. Cambridge: Cam-
bridge University Press.
Nassif, H., Wu, Y., Page, D., & Burnside, E. (2012). Logical differential prediction Bayes net,
improving breast cancer diagnosis for older women. In AMIA Annual Symposium Proceedings
(Vol. 2012, p. 1330). American Medical Informatics Association.
Nassif, H., Kuusisto, F., Burnside, E. S., Page, D., Shavlik, J., & Costa, V. S. (2013). Score
as you lift (SAYL): A statistical relational learning approach to uplift modeling. In Joint
European conference on machine learning and knowledge discovery in databases, September
2013 (pp. 595–611). Berlin/Heidelberg: Springer.
Pearl, J. (Ed.). (2000). Causality: Models, reasoning, and inference. Cambridge: Cambridge
University Press.
Petitjean, F., Webb, G. I., & Nicholson, A. E. (2013). Scaling log-linear analysis to high
dimensional data. In International Conference on Data Mining, Dallas, TX (pp. 597–606).
Pourret, O., Naim, P., & Marcot, B. (Eds.). (2008). Bayesian networks: A practical guide to
applications (1st ed.). London: Wiley.
Rebane, G., & Pearl, P. (1987). The recovery of causal poly-trees from statistical data. In
Proceedings 3rd Workshop on Uncertainty in AI, Seattle, WA.
Shevchenko, P. V. (2011). Modelling operational risk using Bayesian inference. Berlin: Springer.
Spirtes, P., & Glymour, C. N. (1991). An algorithm for fast recovery of sparse causal graphs. Social
Science Computer Review, 9(1), 62–72.
Spirtes, P., Glymour, C. N., & Scheines, R. (1993). Causation, prediction, and search. New York:
Springer.
References 109
Verma, T., & Pearl, J. (1991). Equivalence and synthesis of causal models. In P. Bonissone
et al. (Eds.), UAI 90 Proceedings of the Sixth Annual Conference on Uncertainty in Artificial
Intelligence. Amsterdam: Elsevier.
Wang-Shu, L. (1994). Approximate Bayesian shrinkage estimation. Annals of the Institute of
Statistical Mathematics, 46(3), 497–507.
Zhang, F. (2005). The Schur complement and its applications. New York: Springer.
Zhang, N. L., & Yan, L. (1997). Independence of causal influence and clique tree propagation. In
Proceedings of the thirteenth conference on uncertainty in artificial intelligence, August 1997
(pp. 481–488). Los Altos: Morgan Kaufmann Publishers Inc.
Chapter 9
Artificial Neural Network to Serve Scenario
Analysis Purposes
Artificial neural networks (ANN), though inspired by the way brains are func-
tioning, have been largely replaced by approaches based on statistics and signal
processing, but the philosophy remains the same. Consequently and as briefly
introduced in the third chapter, artificial neural networks are a family of statistical
learning models.
An artificial neural network is an interconnected group of nodes (“neurons”)
mimicking neural connections in a brain, though it is not clear to what degree
artificial neural networks mirror brain functions. As represented in Fig. 9.1 a circular
node characterises an artificial neuron and an arrow depicts the fact that the output
of one neuron is the input of the next. They are used to estimate or approximate
functions that can depend on a large number of inputs. The connections have
weights that can be modified, fine tuned or adapted according to experience or new
situations: this is the learning scheme.
To summarise the process, neurons are activated when they receive a signal, i.e.,
a set of information. After being weighted and transformed, the activated neurons
pass the modified information, message or signal onto other neurons. This process is
reiterated until an output neuron is triggered, which determines the outcome of the
process. Neural networks (Davalo and Naim 1991) have been used to solve multiple
tasks that cannot be adequately addressed using ordinary rule-based programming,
such as handwriting recognition (Matan et al. 1990), speech recognition (Hinton
et al. 2012) or climate change scenario analysis (Knutti et al. 2003), among others.
Neural networks are a family or class of processes that have the following
characteristics:
• It contains weights which are modified during the process based on the new
information available, i.e., numerical parameters that are tuned by a learning
algorithm.
• It allows approximating non-linear functions of their inputs.
• The adaptive weights are connection strengths between neurons, which are
activated during training and prediction by the appropriate signal.
1 1
X1
X2
Output
X3
X4
Fig. 9.1 This figure illustrates a neural network. In this illustration, only one hidden layer has
been represented
9.1 Origins
1
Turing’s B machine already existed (sic!).
9.2 In Theory 113
9.2 In Theory
2
According to prespecified criteria.
114 9 Artificial Neural Network to Serve Scenario Analysis Purposes
Training a neural network model essentially means selecting one model from the
set of allowed models that minimise the objective function criterion. There are
numerous algorithms available for training neural network models; most of them
can be viewed as a straightforward application of optimisation theory and statistical
estimation. Most of the algorithms used in training artificial neural networks employ
some form of gradient descent, using backpropagation to compute the actual
gradients. This is done by simply taking the derivative of the objective function with
respect to the network parameters and then changing those parameters in a gradient-
related direction. The backpropagation training algorithms are usually classified in
three categories: steepest descent (with variable learning rate, with variable learning
rate and momentum, with resilient backpropagation), quasi-Newton (Broyden–
Fletcher–Goldfarb–Shanno, one step secant, Levenberg–Marquardt) and conjugate
gradient (Fletcher–Reeves update, Polak–Ribiére update, Powell–Beale restart,
scaled conjugate gradient) (Forouzanfar et al. 2010).
Evolutionary methods (Rigo et al. 2005), gene expression programming (Ferreira
2006), simulated annealing (Da and Xiurun 2005), expectation–maximisation, non-
parametric methods and particle swarm optimisation (Wu and Chen 2009) are some
commonly used methods for training neural networks.
Perhaps the greatest advantage of ANNs is their ability to be used as an arbitrary
function approximation mechanism that “learns” from observed data. However,
using them is not so straightforward, and a relatively good understanding of the
underlying theory is essential.
Obviously, the approximation accuracy will depend on the data representation
and the application. Complex models tend to lead to problems with learning. Indeed,
there are numerous issues with learning algorithms. Almost any algorithm will work
well with the correct hyperparameters for training on a particular fixed data set.
However, selecting and tuning an algorithm for training on unseen data requires a
significant amount of experimentation.
If the model’s, objective function and learning algorithm are selected appropri-
ately the resulting ANN might be quite robust. With the correct implementation,
ANNs might be used naturally for online learning and large data set applications.
9.3 Learning Algorithms 115
Their simple structure and the existence of mostly local dependencies exhibited in
the structure allows for fast parallel implementations.
The utility of artificial neural network models lies in the fact that they can be
used to infer a function from observations. This is particularly useful in applications
where the complexity of the data or task makes the design of such a function by
hand impractical. Indeed, the properties presented in the next paragraphs support
the capability of Neural Networks to capture particular behaviors embedded within
data sets and infer a function from it.
Artificial neural network models have a property called “capacity”, which means
that they can model any function despite the quantity of information, its type or its
complexity.
Addressing the question of convergence is complicated since it depends on a
number of factors: (1) many local minima may exist, (2) it depends on the objective
function and the model, (3) the optimisation method used might not converge when
starting far from a local minimum, (4) for a very large number of data points or
parameters, some methods become impractical.
In applications where the goal is to create a system which works well in unseen
situations, the problem of overtraining has emerged. This arises in convoluted or
over-specified systems when the capacity of the network significantly exceeds the
needed free parameters.
There are two schools of thoughts to deal with that issue. The first suggests using
cross-validation and similar techniques to check for the presence of overtraining
and optimally select hyperparameters such as to minimise the generalisation error.
The second recommends using some form of regularisation. This is a concept that
emerges naturally in a probabilistic framework, where the regularisation can be
performed by selecting a larger prior probability over simpler models; but also in
statistical learning theory, where the goal is to minimise over two quantities: the
“empirical risk” and the “structural risk”, which roughly corresponds to the error
over the training set and the predicted error in unseen data due to overfitting.
Supervised neural networks that use a mean squared error (MSE) objective
function can use formal statistical methods to determine the confidence of the
trained model. The MSE on a validation set can be used as an estimate for variance.
This value can then be used to calculate the confidence interval of the output of the
network, assuming a normal distribution. A confidence analysis made this way is
statistically valid as long as the output probability distribution stays the same and
the network is not modified.
It is also possible to assign a generalisation of the logistic function, referred to
as the softmax activation function so that the output can be interpreted as posterior
probabilities (see Chap. 8).
The softmax activation function is
ex i
yi D Pc : (9.3.1)
jD1 ex j
116 9 Artificial Neural Network to Serve Scenario Analysis Purposes
9.4 Application
In this section, our objective is to apply neural network to scenario analysis. Indeed
scenario analysis includes many tasks that can be independently performed by
neural networks such as function approximation, regression analysis, time series
prediction, classification (pattern and sequence recognition), novelty detection and
sequential decision making and can also be used in data processing for tasks such
as mining, filtering, clustering, knowledge discovery in databases, blind source
separation and compression. After training, the networks could predict multiple
outcomes from unrelated inputs (Ganesan 2010).
Applications of neural networks to risk management are not new. Indeed, Trippi
and Turban (1992) provide multiple chapters presenting methodologies using neural
networks to predict bank failures. In this book, the neural network strategy is also
compared to more traditional approaches. Relying on the results presented in these
chapters, we see that neural networks can be used as follows.
Considering that neural networks are relying on units. Each unit u receives inputs
signals from other units, aggregates these signals based on the input function Ui and
generates an output signal based on an output Oi . The output signal is then directed
to other units consistently with the topology of the network. Although the form of
input/ output functions at each node has no constraint other than to be continuous
and differentiable, using the function obtained from Rumelhart et al. (1996):
X
Ui D wij Oj C i (9.4.1)
j
and
1
Oi D ; (9.4.2)
1 C e Ui
where
1. Ui D input of unit i,
2. Oi D output of unit i,
3. wij D connection weight between unit i and j,
4. i = bias of unit i
Here, the neural network can be represented by a weighted directed graph where
the units introduced in the previous paragraph represent the nodes and the links
represent connections. To the links are assigned the weights of the corresponding
connection. A special class of neural networks referred to as feedforward networks
are used in the chapters in question.
A feedforward network contains three types of processing units, for instance,
input, output and hidden. Input units, initialising the network, receive the seed infor-
mation from some data. Hidden units do not directly interact with the environment,
9.4 Application 117
they are invisible, though they are located in the subsequent intermediate layers.
Finally, output units provide signals to the environment and are located in the final
layers. Note that layers can be skipped, but we cannot move backward.
The weight vector W, i.e., weights associated with the connections, is the core
of the neural network. W represents what a neural knows and permits responding
to any input provided. “A feedforward network with an appropriate W can be used
to model the casual relationship between a set of variables”. The fitting and the
subsequent learning is done by modifying the connections’ weights.
Determining the appropriate W is not usually easy, especially when the charac-
teristics of the entire population are barely known. As mentioned previously, the
network is trained using examples. The objective is to obtain a set of weights W
leading to the best fit of the model to the data used initially. The backpropagation
algorithm has been selected here to perform the learning as it is able to train multi-
layer networks. Its effectiveness comes from the fact that it is capable of exploiting
regularities and exceptions contained in the initial sample. The backpropagation
algorithm consists in two phases: forward-propagation and backward-propagation.
Mechanically speaking, let s be a training sample, each piece of information
described by an input vector Xi D .xi1 ; xi2 ; : : : ; xim / and an output vector Di D
.di1 ; di2 ; : : : din /, 1 i s. In forward propagation, Xi is fed to the input layer, and
an output Yi D .yi1 ; yi2 ; : : : ; yin / is obtained using W, in other words Y D f .W/
where f characterises any appropriate function. The value of Yi is then compared
with the desired output Di by computing the squared error ..yij dij /2 /, 1 i n,
for each output unit. Output differences are aggregated to form the error function
SSE (sum squared error).
X
s X
n
.yij dij /2
SSE D : (9.4.3)
iD1 jD1
2
The objective is to minimise the SSE with respect to W so that all input vectors
are correctly mapped into their corresponding output vectors. As a matter of fact,
the learning process can be considered as a minimisation problem with objective
function SSE defined in the space of W, i.e., arg maxW SSE:
The second phase consists in evaluating the gradient of the function in the weight
space to locate the optimal solution. Both direction and magnitude change wij of
each wij are obtained using
ıSSE
wij D ; (9.4.4)
ıwij
where 0 < < 1 is a parameter controlling the convergence rate of the algorithm.
The sum squared error calculated in the first phase is propagated back, layer
by layer, from the output units to the input units in the second phase. Weight
adjustments are obtained through propagation at each level. As Ui , Oi and SSE are
continuous and differentiable, ıSSE=ıwij can be evaluated at each level applying
118 9 Artificial Neural Network to Serve Scenario Analysis Purposes
In this process, W can be updated in two manners. For instance, either W is updated
sequentially for each couple .Xi ; Di /, or considering the aggregation of wij after a
complete run of all examples. For each iteration of the back-propagation algorithm,
the two phases are executed until the SSE converges.
In this book neural networks offer a viable alternative for scenario analysis. Here
this model is applied to bankruptcy prediction. In Trippi and Turban (1992), the
results exhibited for neural networks show a better predictive accuracy than those
obtained from implementing a linear discriminant model, a logistic regression,
a k nearest neighbour strategy and a decision tree. Applying their model to the
prediction of bank failures, the authors have modified the original backpropagation
algorithm to capture prior probabilities and misclassification. Indeed, the error
of misclassifying a failed bank into the non-failed group (type I error) is more
severe than the other way. The original function SSE is generalised to SSEw by
multiplying each error term by Zi , in other word by weighting it. The comparison of
the methodologies is based on a training set with an equal proportion of failed and
non-failed banks, though quite often, the number of defaults constitutes a smaller
portion of the whole population than the non-failed entities. The matching process
may bias the model, consequently, they recommended the entire population to be
used as the training set. As actually mentioned in earlier chapters, neural networks
can be helpful to identify a single group from a large set of alternatives.
Alternatively, Fig. 9.2 provides another application of neural networks with two
hidden layers. In that model, the data provided are related to cyber security. The
1 1 1
−1
1.
Antivirus_Updates 11.3
1 03
162 32
5
−10.7775499
−48.3
−2.1
.4
8 01
7037
1
5
−5.3088 −4
94
Industry_Reputation .5
45
−0.7
−7 31
.87
−1.2
2.4.8732
9.7
106
617
4−1. 17
35 6
43 6 7
.98
112
886
6
6 3
88
23
4
Budget_Security_Program 14.5517
4
−3.63973
94
−2 6.94499
−−15.73
.9 15
−3
35 77
.4
3 . 93
320
−0
54
54
22.1 50
4
.03
−0.1
Number_Of_Malware_Attack 9
0.47
82
−2
05
.79
1184
6
72
51
507
74
100.344
−
42 9
9.2
3
.6 53
03
3 00
03
.
−1
2. −9
07
51 −0
41
.3 .
−212.50
46
23
−1
78 96
524
.51
−0.0075
4
074
6.48083
−
75
4.4
8
Level_Of_Formation_Of_Managers
1
−41 2
46
.541 83
33
27 80
3.4
317.3105
0.
43
.63 6453 −2 .897
2.
835 84
−0.32507
.4
24
3
Traffic_To_Unwanted_Addresses 53.9801
−0
3
.45
−424.320
79
0.78836
2608
7
−0.19
9 9
325
0.6
−1 31
6
16.3
Quality_Of_Security_Checks −2.2838 16
1.
320726
8
.6 72
−63.07024
63
44
−3.5.3
34
785
1.10
Number_Of_Daily_Users
Fig. 9.2 This figure illustrates a neural network applied to IT security issues, considering
information coming from anti-virus updates frequency, industry reputation (how likely it is to be
threatened), the budget of security programs within financial institutions, the number of malware
attack, the number of security patches, the level of training of managers, the traffic to unwanted
addresses, the quality of security checks and the number of daily users
9.5 For the Manager: Pros and Cons 119
In this section we discuss the main issues and advantages of implementing a neural
network strategy for scenario analysis purposes, starting with the issues.
To be properly applicable neural networks require sufficient representative data
to capture the appropriate underlying structure which will allow a generalisation to
new situations. These issues can be dealt with in various manners such as randomly
shuffling the training data, using a numerical optimisation or reclassifying the data.
From a computational and IT infrastructure point of view, to implement large,
efficient and effective neural network strategies, considerable processing and storage
resources are required (Balcazar 1997). Simulating even the most simplified neural
network may require filling large database and may consume huge amounts of
memory and hard disk space. Besides, neural network methodologies will usually
require some simulations to deal with the signal transmission between the neurons—
and this may need huge amounts of CPU processing power and time.
Though, neural networks methodologies are questioned as it is possible to create
a successful net without understanding how it works. However it is arguable that an
unreadable table that a useful machine could read would still be well worth having
(NASA 2013). Indeed, the discriminant capability of a neural network is difficult to
express in symbolic form. However, neural networks are limited if one wants to test
the significance of individual inputs.
Remark 9.5.1 In that case we are somehow already talking about artificial intelli-
gence.
Other limitations reside in the fact that there is no formal method to derive a
network configuration for a given classification task. Although it was shown that
only one hidden layer is enough to approximate any continuous functions, the
number of hidden units can be arbitrarily large, the risk of overfitting the network is
real especially if the size of the training sample is insufficient. Researchers exploring
learning algorithms for neural networks are uncovering generic principles allowing
a successful fitting, learning, analysis and prediction. A new school of thoughts
actually consider that hybrid models (combining neural networks and symbolic
120 9 Artificial Neural Network to Serve Scenario Analysis Purposes
approaches) can even improve neural networks’ outcomes on their own (Bengio
and LeCun 2007; Sun 1994).
However, on the positive side, a neural network allows adaptive adjustments
of the predictive model as new information becomes available. This is the core
property of this methodology especially when the underlying group of distributions
are evolving. Statistical methods do not generally weigh the information and assume
that old and new examples are equally valid, and the entire set is used to construct
a model. However, when a new sample is obtained from a new distribution, keeping
the old information (likely to be obsolete) may bias the outcome and lead to a model
of low accuracy. Therefore, the adaptive feature of a neural network is that past
information is not ignored but receives a lower weight than the latest information
received and fed into the model. To be more effective a rolling window might be
used in practice. The proportion of the old data to be kept depends on considerations
related to stability, homogeneity, adequacy and noise of the sample.
Neural networks have others properties particularly useful, indeed the non-linear
discriminant function represented by the net provides a better approximation of the
sample distribution, especially when the latter is multimodal. Many classification
tasks have been reported to have a non-linear relationship between variables, and as
mentioned previously, neural networks are particularly robust as they do not assume
any probability distribution. Besides, there is no restriction regarding input/output
functions other than these have to be continuous and differentiable.
The research in that field is continuous. In fact one of the outcome led to the
application of genetic algorithm (Whitley 1994). Applying genetic algorithms for
network designs might be quite powerful as they mechanically retain and combine
good configurations in the next generation. The nature of the algorithm allows the
search for good configurations reducing in parallel the possibility of ending up with
a local optimum.
As presented in this chapter, neural networks can be used for scenario analysis,
for bankruptcy detection and can be easily extended to managerial applications.
Note that the topic is currently highly discussed as it is particularly relevant for the
trendy big data topic.
References
Davalo, E., & Naim, P. (1991). Neural networks. MacMillan computer science series. London:
Palgrave.
Deng, L., & Yu, D. (2014). Deep learning: Methods and applications. Foundations and Trends in
Signal Processing, 7(3–4), 1–199.
Farley, B. G., & Clark, W. A. (1954). Simulation of self-organizing systems by digital computer.
IRE Transactions on Information Theory 4(4), 76–84.
Ferreira, C. (2006). Designing neural networks using gene expression programming. In A. Abra-
ham, et al. (Eds.), Applied soft computing technologies: The challenge of complexity (pp. 517–
536). New York: Springer.
Forouzanfar, M., Dajani, H. R., Groza, V. Z., Bolic, M., & Rajan, S. (2010). Comparison of feed-
forward neural network training algorithms for oscillometric blood pressure estimation. In 2010
4th international workshop on soft computing applications (SOFA), July 2010 (pp. 119–123).
New York: IEEE.
Ganesan, N. (2010). Application of neural networks in diagnosing cancer disease using demo-
graphic data. International Journal of Computer Applications, 1(26), 76–85.
Hebb, D. (1949). The organization of behavior. New York: Wiley.
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., et al. (2012). Deep neural
networks for acoustic modeling in speech recognition: The shared views of four research
groups. IEEE Signal Processing Magazine, 29(6), 82–97.
Knutti, R., Stocker, T. F., Joos, F., & Plattner, G. K. (2003). Probabilistic climate change projections
using neural networks. Climate Dynamics, 21(3–4), 257–272.
Matan, O., Kiang, R. K., Stenard, C. E., Boser, B., Denker, J. S., Henderson, D., et al. (1990).
Handwritten character recognition using neural network architectures. In Proceedings of the
4th USPS advanced technology conference, November 1990 (pp. 1003–1011).
McCulloch, W., & Pitts, W. (1943). A logical calculus of ideas immanent in nervous activity.
Bulletin of Mathematical Biophysics, 5(4), 115–133.
NASA (2013). NASA neural network project passes milestone. www.nasa.gov.
Rochester, N., Holland, J., Haibt, L., & Duda, W. (1956). Tests on a cell assembly theory of the
action of the brain, using a large digital computer. IRE Transactions on Information Theory,
2(3), 80–93.
Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organiza-
tion in the brain. Psychological Review, 65(6), 386–408.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1996). Learning representations by backprop-
agating errors. Nature, 323, 533–536.
Sun, R. (1994). A two-level hybrid architecture for common sense reasoning. In R. Sun &
L. Bookman (Eds.), Computational architectures integrating neural and symbolic processes.
Dordrecht: Kluwer Academic Publishers.
Trippi, R. R., & Turban, E. (Eds.), (1992). Neural networks in finance and investing: Using arti
ficial intelligence to improve real-world performance. New York: McGraw-Hill Inc.
Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysis in the behavioral
sciences. Ph.D. thesis, Harvard University.
Whitley, D. (1994). A genetic algorithm tutorial. Statistics and computing, 4(2), 65–85.
Wilson, W. (2012). The machine learning dictionary. www.cse.unsw.edu.au/~billw.
Wu, J., & Chen, E. (2009). A novel nonparametric regression ensemble for rainfall forecasting
using particle swarm optimization technique coupled with artificial neural network. In H. Wang,
et al. (Eds.), 6th International Symposium on Neural Networks. Berlin: Springer.
Chapter 10
Forward-Looking Underlying Information:
Working with Time Series
10.1 Introduction
In order to capture serially related events, banks may need to consider the complete
dependence scheme. This is the reason why this chapter focuses on time series. It is
important to note that the presence of autocorrelation is not compulsory, sometimes
the independence assumption should not be rejected a priori. Indeed, if there is no
statistical evidence to reject the assumption of independence, then this one should
not be rejected for the sake of it. Besides, these dependencies may take various
forms and may be detected on various time steps. We will come back to that point
in the next paragraphs. In this chapter, we assume that serial dependence exists
and we model it using time series processes (McCleary 1980; Hamilton 1994; Box
et al. 2015). In many cases, the scenario analysis has to integrate macro-economical
factors, and here time series models are particularly useful. The literature on this
topic is colossal (in the bibliography of this chapters as well as the previous we will
find some interesting articles). But strategies relying on time series should not be
limited to macro-economic factors or stock indexes for instance. In this chapter, we
illustrate the models with applications, but in order not to bias the manager trying to
implement the methodologies presented we do not emphasise the data to which we
applied them, though in this case they were macro-economic data.
Our objective is to capture the risks associated with the loss intensity which
may increase during crises or turmoils, taking into account correlations, embedded
dynamics and large events thanks to adequate distributions fitted on the residuals.
Using time series permit capturing the embedded autocorrelation phenomenon
without losing any of the characteristics captured by traditional methodologies such
as fat tails.
Consequently, a time series is a sequence of data points, typically consisting
in successive measurements made over a period of time. Time series are usually
represented using line charts. A traditional application of time series processes is
forecasting, which in our language can be translated into scenario analysis. Time
10.2 Methodology
There exist several models to represent various patterns and behaviour. Variations
in the level of a process using the following approaches or a combination of them
can be obtained. Time series processes can be split into various classes, each of
them having their own variations, for instance, the autoregressive (AR) models, the
10.2 Methodology 125
integrated (I) models and the moving average (MA) models. These three classes
depend linearly on past data points (Gershenfeld, 1999). Combinations of these lead
to autoregressive moving average (ARMA) and autoregressive integrated moving
average (ARIMA) models. The autoregressive fractionally integrated moving aver-
age (ARFIMA) model combines and enlarges the scope of the previous approaches.
VAR1 strategies are an extension of these classes to deal with vector-valued data
(multivariate time series), besides these might be extended to capture exogenous
impacts.
Non-linear strategies might also be of interest as empirical investigations have
shown that using predictions derived from non-linear models, over those from
linear models, might be more appropriate (Rand 1971 and Holland 1992). Among
these non-linear time series models those capturing the evolution of variance over
time (heteroskedasticity) are of particular interest. These models are referred to as
autoregressive conditional heteroskedasticity (ARCH) and the library of variation
contains a wide variety of representation such as GARCH, TARCH, EGARCH,
FIGARCH and CGARCH. The changes in variability are related to recent past
values of the observed series.
Originally the theory has been built on two sets of conditions, for instance,
stationarity and its generalisation, ergodicity. However, ideas of stationarity must be
expanded: strict stationarity and second-order stationarity. Models can be developed
under each of these conditions, but in the latter case the models are usually regarded
as partially specified. Nowadays, many time series models have been developed to
deal with seasonally stationary or non-stationary series.
1
Vector autoregression.
126 10 Forward-Looking Underlying Information: Working with Time Series
10.2.1.2 Autocorrelation
EŒ.Xt t /.Xs s /
R.s; t/ D ; (10.2.3)
t s
where E is the expected value operator. Note that this expression cannot be evaluated
for all time series as the variance may be zero (e.g. for a constant process), infinite or
nonexistent. If the function R is computable, the returned value in the range Œ1; 1,
where 1 indicates a perfect correlation and 1 a perfect anti-correlation.
If Xt is a wide-sense stationary process, then and 2 are not time-dependent.
The autocorrelation only depends on the lag between t and s, i.e., the time-distance
between two values. Therefore the autocorrelation can be expressed as a function of
the time-lag D s t, i.e.,
The framework in which we are evolving implies that observed data series are
the combination of a path dependent process (some may say “deterministic”) and
random noise (error) terms. Then an estimation procedure is implemented to param-
eterise the model using observations. The noise (error) values are assumed mutually
uncorrelated with a mean equal to zero and the same probability distribution, i.e.,
the noise is white. Traditionally, a Gaussian white noise is assumed, i.e. the error
term follows a Gaussian distribution, but it is possible to have the noise represented
by other distributions and the process transformed.
If the noise terms underlying different observations are correlated, then the
parameters are still unbiased, however, uncertainty measures will be biased. This
is also true if the noise is heteroskedastic, i.e., if its variance varies over time. This
fact may lead to the selection of an alternative time series process.
10.2 Methodology 127
10.2.1.4 Estimation
There are many ways of estimating the coefficients or parameters, such as the
ordinary least squares procedure or the method of moments (through Yule–Walker
equations).
For example, the AR. p/ model is given by the equation
X
p
Xt D 'i Xti C "t ; (10.2.5)
iD1
X
p
m D 'k mk C "2 ım;0 ; (10.2.6)
kD1
which can be solved for all f'm I m D 1; 2; : : : ; pg. The remaining equation for m D
0 is
X
p
0 D 'k k C "2 ; (10.2.8)
kD1
X
p
./ D 'k .k /: (10.2.9)
kD1
10.2.1.5 Seasonality
As mentioned before, time series data are collected at regular intervals, implying
that some peculiar schemes might be observed multiple times over a long period.
Indeed, some patterns tend to repeat themselves over known, fixed periods of time
within the data set. These might characterise seasonality, seasonal variation, periodic
variation or periodic fluctuations (risk cycle).
Seasonality may be the result of multiple factors and consists in periodic,
repetitive and relatively regular, and predictable patterns of a time series. Seasonality
can repeat on a weekly, monthly or quarterly basis, these periods of time are
structured while cyclical patterns extend beyond a single year and may not repeat
themselves over fixed periods of time. It is necessary for organisations to identify
and measure seasonal variations within their risks to support strategical plans and
to understand their true exposure and not the exposures point in time, indeed if a
relationship such as “the volume impact the exposure” (credit card fraud is a good
example, as the larger the number of credit card sold, the larger the exposure),
if the volumes tend to increase, the risk tends to increase, the seasonality in the
volume will mechanically imply larger losses, but it does not necessarily mean that
the institution is facing more risks.
Multiple graphical techniques can be used to detect seasonality: (1) a run
sequence plot, (2) a seasonal plot (each season is overlapped), (3) a seasonal
subseries plot, (4) multiple box plots, (5) an autocorrelation plot (ACF) can help
identify seasonality or (6) seasonal index measuring the difference between a
particular period and its expected value.
10.2 Methodology 129
A simple run sequence plot is usually a good first step to analyse time series
seasonality. Although seasonality appears more clearly on the seasonal subseries
plot or the box plot, besides the seasonal subseries plot exhibit the evolutions of
the seasons over time contrary to the box plot but the box plot is more readable for
large data sets.
Seasonal, seasonal subseries and box plots rely on the fact that seasonal periods
are known, e.g., for monthly data we have 12-regular period in a year. However, if
the period is unknown, the autocorrelation plot is probably the best solution. If there
is significant seasonality, the autocorrelation plot should show regular pikes (i.e. at
the same period every year).
10.2.1.6 Trends
Dealing with time series, the analysis of the tendencies in the data related to
measurements to the times at which they occurred is really important. In particular, it
is useful to understand if measurements exhibiting increasing or decreasing patterns
are statistically distinct from random behaviours.2
Considering a data set for modelling purposes, various functions can be chosen
to represent them. Assuming the data are unknown, then the simplest function (once
again) to fit is an affine function (Y D aX C b) for which the magnitudes are given
on the vertical axis, while the time is represented in abscissa.
Once the strategy has been selected, the parameters need to be estimated usually
implementing a least-squares approach, as presented earlier in this book. Applying
it to our case we obtain the following equation,
X˚
Œ.at C b/ yt 2 ; (10.2.10)
t
where yt are the observed data, and a and b are to be estimated. The difference
between yt and at C b provides the residual set. Therefore, yt D at C b C "t is
supposed to be able to represent any set of data (though the error might be huge).
If the errors are non-stationary, then the non-stationary series yt is referred to as
trend stationary. It is usually simpler if the "’s are identically distributed, but if it
is not the case and some points are less certain than other a weighted least square
methodology can be implemented to obtain more accurate parameters.
In most cases, for a simple time series, the variance of the error term is calculated
empirically by removing the trend from the data to obtain the residuals. Once the
“noise” of the series has been properly captured, the significance of the trend can be
addressed by making the null hypothesis that the trend a is not significantly different
from 0.
The presented methodology has been the subject of criticisms related to the non-
linearity of the time trend, the impact of this non-linearity on the parameters, the
2
In the latter case, homogeneity problems may have to be dealt with.
130 10 Forward-Looking Underlying Information: Working with Time Series
where ˛ is a constant, ˇ the coefficient on a time trend and p the lag order of
the autoregressive process. Remark that setting ˛ D 0 and ˇ D 0 is equivalent
corresponds to modelling a random walk, only setting ˇ D 0 leads to modelling a
random walk with a drift.
Remark 10.2.1 Note that the order of the lags (p) permits to capture high order
autoregressive processes. The order has to be determined either using the t-value of
the coefficient or using the Akaike criterion (AIC) (Akaike, 1974), the Bayesian
information criterion (BIC) (Schwarz, 1978) or the Hannan–Quinn information
criterion (Hannan and Quinn, 1979).
The null hypothesis D 0 is tested against the alternative < 0. The test statistic
O
DF D (10.2.12)
SE.O /
is then computed, and compared to the relevant critical value for the Dickey–Fuller
test. A test statistic lower than the critical value implies a rejection of the null
hypothesis, i.e., the absence of a uniroot.
A widely used alternative is the Kwiatkowski–Phillips–Schmidt–Shin (KPSS)3
test (Kwiatkowski et al., 1992) which tests the null hypothesis that a time series
is stationary around a deterministic trend. The series is the sum of deterministic
trend, random walk and stationary error, and the test is the Lagrange multiplier
test of the hypothesis that the random walk has zero variance. The founding paper
actually states that by testing both unit root hypothesis and stationarity hypothesis
simultaneously, it is possible to distinguish series that appear to be stationary, series
that have a unit root and series for which the data are not sufficiently informative to
be sure whether they are stationary or integrated.
3
The KPSS is included in many statistical softwares (R, etc.).
10.2 Methodology 131
X
p
Xt D c C 'i Xti C "t ; (10.2.13)
iD1
where '1 ; : : : ; 'p are parameters, c is a constant and the random variable "t
represents a white noise.
The parameters of the model have to be constrained to ensure the model
remains stationary. AR processes are not stationary if j'i j 1.
• Moving average model: the notation MA(q) refers to the moving average model
of order q:
X
q
Xt D C " t C i "ti (10.2.14)
iD1
where the 1 ; : : : ; q are the parameters of the model, equals EŒXt and the
"t ; "t1 ; : : : are white noise error terms. In this process the next value of Xt builds
up on past combined errors.
• ARMA model: the notation ARMA(p, q) refers to the model with p autoregres-
sive terms and q moving average terms. This model contains the AR(p) and
MA(q) models,
X
p X
q
Xt D c C " t C 'i Xti C i "ti : (10.2.15)
iD1 iD1
.1 B/2 D 1 2B C B2 ; (10.2.16)
where
B2 Xt D Xt2 ; (10.2.17)
132 10 Forward-Looking Underlying Information: Working with Time Series
so that
Both ARFIMA and ARIMA (Palma, 2007) models have the same form,
though, d 2 NC for the ARIMA while d 2 R.
! !
X
p
X
q
1 i B i
.1 B/ Xt D 1 C
d
i B i
"t : (10.2.19)
iD1 iD1
ARFIMA models have the intrinsic capability to capture long range depen-
dencies, i.e., the fact that present data points are linked to information captured a
long time ago.
• ARCH(q): "t denotes the error terms which in our case are the series terms. These
"t are divided into two pieces: a stochastic component zt and a time-dependent
standard deviation t ,
"t D t zt : (10.2.20)
The random variable zt is a strong white noise process. The series t2 is
formalised as follows:
X
q
t2 D ˛0 C ˛1 "2t1 CC ˛q "2tq D ˛0 C ˛i "2ti ; (10.2.21)
iD1
X
q
"O2t D ˛O 0 C ˛O i "O2ti ; (10.2.22)
iD1
In that case, the GARCH (p, q) model (where p is the order of the GARCH
terms 2 and q is the order of the ARCH terms "2 ) is given by
X
q
X
p
t2 D ˛0 C˛1 "2t1 C C˛q "2tq Cˇ1 t1
2 2
C Cˇp tp D ˛0 C ˛i "2ti C 2
ˇi ti :
iD1 iD1
(10.2.23)
To test for heteroskedasticity in econometric models, the White (1980) test is
usually implemented. However, when dealing with time series data, this means
to test for ARCH errors (as described above) and GARCH errors (below).
where ˛ , ˇ 0 ; ! > 0.
134 10 Forward-Looking Underlying Information: Working with Time Series
where g.Zt / D Zt C .jZt j E.jZt j//, t2 is the conditional variance, !, ˇ,
˛, and are coefficients and Zt is a representation of the error term which
may take multiple forms. g.Zt / allows the sign and the magnitude of Zt to have
different effects on the volatility.
Remark 10.2.2 As log t2 can take negative values the restrictions on param-
eters are limited.
– GARCH-in-mean (Kroner and Lastrapes 1993): In this model a heteroskedas-
ticity term is added in the mean equation of the GARCH, such that,
"t D t zt ; (10.2.30)
t2 D K C ı t1
2
C ˛ "2t1 C "2t1 It1 ; (10.2.32)
where "C C
t1 D "t1 if "t1 > 0 , and "t1 D 0 if "t1 0. Likewise,
"t1 D "t1 if "t1 0, and "t1 D 0 if "t1 > 0.
– the Gegenbauer process (Gray et al., 1989):
1
X
f .Xt1;::: / D j "tj ; (10.2.34)
jD1
Œ j=2
X .1/k .d C j k/.2
/j2k
j D ;
kD0
.d/.k C 1/. j 2k C 1/
10.3 Application
In this section, we illustrate some of the models presented in the previous section as
long as some of their properties. Starting from Fig. 10.1 representing an autocorre-
lation function (ACF). This one presents a rapid decay towards zero characterising
an autoregressive function.
Figure 10.2 exhibits an AR(2) process with two parameters 1 D 1 and 2 D
0:5 which ensure the stationarity of the underlying model. In that case, the event
occurring in Xt is related to the two previous occurrences recorded in Xt1 and Xt2 .
In real life applications, losses generated by identical generating processes usually
lead to that kind of situations. It is also important to note that even if the series is
really volatile, this one may still be stationary as soon as the moments remain stable
over time.
136 10 Forward-Looking Underlying Information: Working with Time Series
1.0
0.8
0.6 ACF Weekly Aggregated Series
ACF
0.4
0.2
0.0
−0.2
0 1 2 3 4 5 6 7
Lag
Fig. 10.1 This figure represents an autocorrelation function (ACF). This one presents a rapid
decay towards zero characterising an autoregressive function
Fig. 10.2 This figure exhibits an AR(2) process with two parameters 1 D 0:5 and 2 D 0:4
which ensure the stationarity of the underlying model. In that case, the event occurring in Xt is
related to the two previous occurrences recorded in Xt1 and Xt2
20 50
40
30 ARIMA(1, 1, 1) φ = 0.5 θ = 0.5
x
10
0
−10
Fig. 10.3 This figure illustrates an ARIMA process, i.e., a process that contains an integrated
autoregressive model and an MA process
Series: x
1.0
0.5
ACF
0.0
−0.5
0 5 10 15 20 25 30
LAG
1.0
0.5
PACF
0.0
−0.5
0 5 10 15 20 25 30
LAG
Fig. 10.4 This figure presents the ACF and the PACF of an AR(2) process as the top quadrant
exhibits an ACF plot quickly decreasing to zero denoting an autogressive process and the bottom
quadrant exhibits the partial autocorrelation function (PACF) of the series, showing the order of
the process
0.3
0.2 PACF Weekly Aggregated Series
Partial ACF
0.1
0.0
−0.1
0 1 2 3 4 5 6 7
Lag
Fig. 10.5 The PACF represented here exhibits the presence of long memory, i.e., the loss Xt is
related to events which occurred a long time ago
Standardized Residuals
2
1
−2 −1 0
l
2
l l l l l
ll ll
llllll l
lllll
llllll
1
l
lllllll
llll
ACF
llllllllll
llllll
lllllll
−2 −1 0
llllll
lllllllll
llllllllllll
llllll
lllllllll
lllllllll
l l l lll
l
l l l
l l
l
l
l
5 10 15 20 −2 −1 0 1 2
LAG Theoretical Quantiles
l
p value
l
l
l
l
l
l
l l
l
l l
l
l
l
l
l l
5 10 15 20
lag
Fig. 10.6 Following the adjustment of a SARIMA model to macro-economic data (selected for
illustration purposes), this figure provides the analysis of the residual, showing their evolution over
time, and demonstrating their stationarity. The residuals are independent according to the ACF
and the QQ-plot advocate that the residuals are normally distributed, and the Ljung-Box statistic
provides evidence that the data are independent
to the ACF and the QQ-plot advocates that the residuals are normally distributed.
The Ljung–Box statistic provides evidence that the data are independent.
Time series are particularly interesting as once it has been established that Xt
is related to past incidents, and we are interested in a particular scenario, then
the scenarios can be analysed by shocking the time series, the parameters or the
distribution representing the residuals.
References 139
References
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on
Automatic Control, 19(6), 716–723.
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econo-
metrics, 31(3), 307–327.
Box, G., & Jenkins, G. (1970). Time series analysis: Forecasting and control. San Francisco, CA:
Holden-Day.
Box, G. E. P., & Pierce, D. A. (1970). Distribution of residual autocorrelations in autoregressive-
integrated moving average time series models. Journal of the American Statistical Association,
65, 1509–1526.
Box, G. E., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis:
Forecasting and control. New York: Wiley.
Cameron, S. (2005). Making regression analysis more useful, II. Econometrics. Maidenhead:
McGraw Hill Higher Education.
Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series
with a unit root. Journal of the American Statistical Association, 74(366), 427–431.
Dickey, D. A., & Said, S. E. (1984). Testing for unit roots in autoregressive-moving average models
of unknown order. Biometrika, 71(366), 599–607.
Engle, R. F., & Granger, C. W. J. (1987). Co-integration and error correction: Representation,
estimation and testing. Econometrica, 55(2), 251–276.
Engle, R. F., & Ng, V. K. (1991). Measuring and testing the impact of news on volatility. Journal
of Finance, 48(5), 1749–1778.
Gershenfeld, N. (1999). The nature of mathematical modeling. New York: Cambridge University
Press.
Glosten, L. R., Jagannathan, D. E., & Runkle, D. E. (1993). On the relation between the expected
value and the volatility of the nominal excess return on stocks. The Journal of Finance, 48(5),
1779–1801.
Gray, H., Zhang, N., & Woodward, W. (1989). On generalized fractional processes. Journal of
Time Series Analysis, 10, 233–257.
Guégan, D. (2003). Les chaos en finance. Approche statistique. Paris: Economica.
Hamilton, J. D. (1994). Time series analysis (Vol. 2). Princeton: Princeton University Press.
Hannan, E. J., & Quinn, B. G. (1979). The determination of the order of an autoregression. Journal
of the Royal Statistical Society, Series B, 41(2), 190–195.
Hentschel, L. (1995). All in the family nesting symmetric and asymmetric GARCH models.
Journal of Financial Economics, 39(1), 71–104.
Holland, J. (1992). Adaptation in natural and artificial systems. Cambridge, MA: MIT.
140 10 Forward-Looking Underlying Information: Working with Time Series
Kroner, K. F., & Lastrapes, W. D. (1993). The impact of exchange rate volatility on international
trade: reduced form estimates using the GARCH-in-mean model. Journal of International
Money and Finance, 12(3), 298–318.
Kwiatkowski, D., Phillips, P. C., Schmidt, P., & Shin, Y. (1992). Testing the null hypothesis of
stationarity against the alternative of a unit root: How sure are we that economic time series
have a unit root?. Journal of Econometrics, 54(1–3), 159–178.
McCleary, R., Hay, R. A., Meidinger, E. E., & McDowall, D. (1980). Applied time series analysis
for the social sciences. Beverly Hills, CA: Sage.
Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Economet-
rica, 59(2), 347–370.
Palma, W. (2007). Long-memory time series: Theory and methods. New York: Wiley.
Rabemananjara, R., & Zakoian, J. M. (1993). Threshold ARCH models and asymmetries in
volatility. Journal of Applied Econometrics, 8(1), 31–49.
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the
American Statistical Association, 66(336), 846–850.
Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.
Sentana, E. (1995). Quadratic ARCH models. The Review of Economic Studies, 62(4), 639–661.
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for
heteroskedasticity. Econometrica, 48(4), 817–838.
Yule, U., & Walker, G. (1927). On a method of investigating periodicities in disturbed series, with
special reference to Wolfer’s sunspot numbers. Philosophical Transactions of the Royal Society
of London, Series A, 226, 267–298.
Zadeh, L. A. (1953). Theory of filtering. Journal of the Society for Industrial and Applied
Mathematics, 1, 35–51.
Zaki, M. J. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge
and Data Engineering, 12(3), 372–390.
Chapter 11
Dependencies and Relationships Between
Variables
In this chapter we address the topic of the capture of dependencies, as these are
intrinsically connected to scenario analysis. Indeed, as implied in the previous
chapters, the materialisation of large losses usually results from multiple issues,
faults or failures occurring simultaneously. As seen, in some approaches, the
magnitude of the correlations and the dependencies are not explicitly evaluated
though they are the core of some strategies such as neural networks or Bayesian
networks. Here, we discuss the concepts of correlation and dependencies explicitly,
i.e., these are measured and specific models or functions are built, in order to capture
them and reflect them in risk measurement.
Statistically speaking, a dependence is a relationship between random variables
or data sets (at least two). The related concept of correlation refers to statistical
relationships embedding dependencies. Correlations are useful as they indicate a
relationship that can be exploited in practice for forecasting purposes, for example.
However, statistical dependence does not necessarily imply the presence of a causal
relationship. Besides, issues related to non-linear behaviours may arise. These will
be developed in the following paragraphs.
Formally, dependencies refer to any situation in which random variables do
not satisfy a mathematical condition of probabilistic independence, which may
seem quite obvious, though this definition implies that the emphasis is made on
independence, therefore if the variables are not independent, these are somehow
dependent. The literature counts several correlation measures and coefficients
(usually denoted or ) allowing to evaluate the degrees of these relationships. The
most famous of these is Pearson (1900) correlation coefficient, which captures linear
relationships between two variables. This measure is usually what practitioners
and risk managers have in mind when the question of correlation is addressed, for
instance, the related coefficient takes its values between 1 and 1. Other correlation
coefficients have been developed to address issues related to the Pearson approach
such as the capture of non-linear relationships and the correlations between more
that 2 factors simultaneously.
In this chapter, we will present the theoretical foundations of the various concepts
surrounding dependencies—from correlations to copula and regressions, as well
as the characteristics and properties which may help practitioners analysing risk
scenarios. We will also illustrate them with figures and examples.
1
The Cauchy–Schwarz inequality (Dragomir 2003) implies that this correlation coefficient cannot
exceed 1 in absolute value.
11.1 Dependencies, Correlations and Copulas 143
the ranked variables. For example, considering a data sample containing n data
points, the data points Xi ; Yi are ranked and become xi ; yi , and is calculated as
follows:
P
6 ıi2
.X;Y/ D 1 : (11.1.2)
n.n2 1/
Ns Nd
GD : (11.1.4)
Ns C Nd
Ps Pd
D ; (11.1.5)
Ps C Pd
where Ps and Pd are the probabilities that a random couple of observations will
position itself in the same or opposite order, respectively, when ranked by both
variables.
Critical values for the statistic are obtained using the Student t distribution, as
follows:
s
Ns C Nd
tG ; (11.1.6)
n.1 G2 /
144 11 Dependencies and Relationships Between Variables
n ¤ Ns C Nd ; (11.1.7)
11.1.2 Regression
Bonus
Income
Office Hours
Market Volume
Losses
Adventurous Positions
Desk Volume
Experience
Economics
Controls
Fig. 11.1 This figure shows correlations pair by pair. The circle represents the magnitude of
the correlations. These are equivalent to a correlation matrix, providing an representation of the
Pearson correlations. This figure allows to analyse pairwise correlations between various elements
related to a rogue trading in the front office
2
Sometimes called predictors.
11.1 Dependencies, Correlations and Copulas 145
2.0 Scatterplot
l l
l l
l
l
1.5
l
l
l
l
Losses
l
1.0
l
l
l ll
l l
l l
l l
0.5
l
l
l l l
l
l
l
l
2 3 4 5 6
Controls
Fig. 11.2 This figure is a scatterplot representing losses with respect to controls. Here, we have
the expected behaviour, i.e., the level of losses decreases when the level of controls increases
3D Scatterplot
l
400
ll
350
l
l
l
300
l
l
l
Desk Volume
l
250
l l
l
l ll
l
200
l
l
l l
l l
l 7
150
l
l l
6
l
l
l
5
l
100
l
4
3 trols
2 Con
50
1
10 15 20 25 30 35 40
Office Hours
Fig. 11.3 This figure is similar to Fig. 11.2, i.e., this is a scatterplot, though compared to the
previous figure, this one represents three variables
15 20 25 30 35
l l l l l l
l l l
l l l l l l
l l l
Office Hours l
l
l l l
l l
l
l
ll l
l l l
l
l
l l
ll
l l l l l l
ll l
l l l l l l l l l
l l l
l l l l l l
l l l
l l l l l l l l l l l l
l l l l l l
l l l l ll
0.5 1.0 1.5 2.0
l l l
l l l l l l
l l l l l l
l l l
l l l
l l l
l l l
l l l
l l l l l l
l
l l ll
l l l
l
Losses l l
l
l
l l
l
l
l
l
l l
l
l
ll
l
l
l l l l l l
l l l l l l
l ll l l l l
l l
l l l
l l l l l l
l l l
6.5
l l l
l l l l l l
l l l
l l l l l l
5.5
l l l l l l
l l l l l l
l
l
l
l
l
l
l l
l
l
l
l
l l
l l Adventurous Positions l
l
l
l
l
l
l
l
4.5
l l l ll l l l l l l l l l l l l
l
l l l l l
l l l
l l l
l l l
l l l l ll
3.5
l l l
l l l
1 2 3 4 5 6 7 8
l l l
l l l
ll
l
l
l
l
l
ll
l
l l
l
l l l l l
l
l
l l l
l l l
l
Number of People on the Desk
l l ll l l
l l l
l l l l l l
l l l l l l
ll l l l l ll l l l l l l
l l l
l l l l ll
l l l l l l
l l l
l l l l l l
Fig. 11.4 This figure illustrates a scatter plot matrix, plotting pairwise relationships between
components of rogue trading issues
3
The factors have to be linearly independent.
11.1 Dependencies, Correlations and Copulas 147
yi D ˇ0 C ˇ1 xi C "i ; i D 1; : : : ; n: (11.1.9)
The expression is still linear but is now quadratic. In both cases, "i is an error
term and the subscript i refers to a particular observation. Multiple linear regressions
are built the same way, however, these contain several independent variables or
functions of independent variables.
Fitting the first model to some data, we obtain ˇO0 and ˇO1 the estimates,
respectively, of ˇ0 and ˇ1 . Equation (11.1.9), becomes
yi D ˇO0 C ˇO1 xi :
b (11.1.11)
X
n
SSE D i2 : (11.1.12)
iD1
A set of linear equations in the parameters are solved to obtain ˇO0 ; ˇO1 . For a simple
affine regression, the least squares estimates are given by
P
b̌1 D .xi xN /.yi yN /
P and b̌0 D yN b̌1 xN ; (11.1.13)
.xi xN /2
where xN represent the mean of the xi values, and yN the mean of the yi values. The
estimate of the variance of the error terms is given by the mean square error (MSE):
SSE
O "2 D : (11.1.14)
np
These can be used to create confidence intervals and test the parameters.
The previous regression models can be generalised. Indeed, the general multiple
regression model contains p independent variables:
where xij is the ith observation on the jth independent variable. The residuals can be
written as
Another very popular regression widely used in risk management is the logistic
regression which has a categorical dependent variable (Cox 1958 and Freedman
2009). The logistic model is used to estimate the probability of a binary response
based on some predictor(s), i.e., 0 or 1.
The logistic regression measures the relationship between the categorical depen-
dent variable and some independent variable(s), estimating the probabilities using
11.1 Dependencies, Correlations and Copulas 149
the c.d.f. of the logistic distribution. The residuals of this model are logistically
distributed.
The logistic regression is a particular case of the generalised linear model and
thus analogous to the linear regression presented earlier. However, the underlying
assumptions are different from those of the linear regression. Indeed, the conditional
distribution y j x is a Bernoulli distribution rather than a Gaussian distribution,
because the dependent variable is binary, and the predicted values are probabilities
and are therefore restricted to the interval Œ0; 1.
The logistic regression can be binomial, ordinal or multinomial. In a binomial
logistic regression only two possible outcomes can be observed for a dependent
variable. In a multinomial logistic regression we may have more than two possible
outcomes. In an ordinal logistic regression the dependent variables are ordered.
The logistic regression is traditionally used to predict the odds of obtaining “true”
(1) to the binary question based on the values of the independent variables. The odds
are given by the ratio, probability of obtaining a positive outcome divided by the
probability of obtaining “false” (0).
As implied previously, here, most assumptions of the linear regression do not
hold. Indeed, the residuals cannot be normally distributed. Furthermore, linear
regression may lead to predictions making no sense for a binary dependent variable.
To convert a binary variable into a continuous one which may take any real value,
the logistic regression uses the odds of the event happening for different levels of
each independent variable, the ratio of those odds and then takes the logarithm of
that ratio. This function is usually referred to as logit.
The logit function is then fitted to the predictors using linear regression analysis.
The predicted value of the logit is then transformed into predicted odds using
the inverse of the natural logarithm, i.e., the exponential function. Although the
observed dependent variable in a logistic regression is a binary variable, the related
odds are continuous.
The logistic regression can be translated into finding the set of ˇ parameters that
best fit:
4
The associated latent variable is y0 D ˇ0 C ˇ1 x C ". Note that " is not observed consequently y0
is not observed.
150 11 Dependencies and Relationships Between Variables
Formalising the concept presented before, the logistic function .t/ is defined as
follows:
et 1
.t/ D D : (11.1.21)
et C 1 1 C et
t D ˇ0 C ˇ1 x (11.1.22)
1
F.x/ D : (11.1.23)
1C e.ˇ0 Cˇ1 x/
F.x/
D eˇ0 Cˇ1 x : (11.1.25)
1 F.x/
g./ is the logit function. Here g.F.x// is equivalent to the linear regression
expression, ln denotes the natural logarithm, F.x/ is the probability that the
dependent variable equals “true” considering a linear combination of the predictors.
F.x/ shows that the probability of the dependent variable to represent a success is
equal to the value of the logistic function of the linear regression expression. ˇ0 is
the intercept from the linear regression equation (the value of the criterion when
the predictor is equal to zero). ˇ1 x is the regression coefficient and e denotes the
exponential function.
From above we can conclude that the odds of the dependent variable leading to
a success is given by
11.1.3 Copula
While in the first section we have measured the dependence, in the second we have
captured the impact of a variable on another, in this section, we propose building
multivariate functions.
Following (Guegan and Hassani 2013), a robust way to measure the dependence
between large data sets is to compute their joint distribution function using copula
functions. Indeed, a copula is a multivariate distribution function linking a large
data sets through their standard uniform marginal distributions (Sklar 1959; Bedford
and Cooke 2001; Berg and Aas 2009). The literature often states that the use
of copulas is complicated in high dimensions except when implementing elliptic
structures (Gaussian or Student) (Gourier et al. 2009). However, they fail to capture
asymmetric shocks. For example, using a Student copula with three degrees of
freedom5 to capture a dependence between the largest losses (as implied by the
regulation (EBA 2014)), would also be translated into higher correlations between
the smallest losses. An alternative is found in Archimedean copulas (Joe 1997)
which are interesting as they are able to capture the dependence embedded in
different parts of the marginal distributions. The marginal distributions might be
those presented in Chap. 5. However, as soon as we are interested in measuring
a dependence between more than two sets (Fig. 11.4), the use of this class of
copulas becomes limited as these are usually driven by a single parameter. Therefore
traditional estimation methods may fail to capture the intensity of the “true”
dependence. Therefore, a large number of multivariate Archimedean structures have
been developed, for instance, the fully nested structures, the partially nested copulas
and the hierarchical ones. Nevertheless, all these structures have restrictions on the
parameters and impose only using an Archimedean copula at each node (junction)
making their use limited in practice. Indeed, the parameters have to decrease as the
level of nesting increases.
An intuitive approach proposed by Joe (1997), based on a pair-copula decom-
position, might be implemented (Kurowicka and Cooke 2004; Dissmann et al.
2013). This approach rewrites the n-density function associated with the n-copula,
as a product of conditional marginal and copula densities. All the conditioning
pair densities are built iteratively to get the final one representing the complete
dependence structure. The approach is easy to implement,6 and has no restriction
for the choice of functions and their parameters. Its only limitation is the number
of decompositions we have to consider as the number of vines grows exponentially
with the dimension of the data sample and thus requires the user to select a vine
5
A low number of degrees of freedom imply a higher dependence in the tail of the marginal
distributions.
6
A recent packages has been developed to carry out this approach - for instance the R package
VineCopula (Schepsmeier et al. https://github.com/tnagler/VineCopula) and the R package vines
(Gonzalez-Fernandez et al. https://github.com/yasserglez/vines).
152 11 Dependencies and Relationships Between Variables
from nŠ2 possible vines (Antoch and Hanousek 2000; Bedford and Cooke 2002;
Brechmann et al. 2012; Guégan and Maugis 2011).
To be more accurate the formal representation of copulas is defined in the
following way. Let X D ŒX1 ; X2 ; : : : ; Xn be a vector of random variables, with joint
distribution F and marginal distributions F1 ; F2 ; : : : ; Fn , then (Sklar 1959) theorem
insures the existence of a function C mapping the individual distributions F1 ; : : : ; Fn
to the joint one F:
where
and c1;2 .F.x1 /; F.x2 // is the density copula associated with the copula C which links
the two marginal distributions F.x1 / and F.x2 /. With the same notations we have
Then,
That last formula is called vine decomposition (Fig. 11.7). Many other decompo-
sitions are possible using different permutations. Details can be found in Berg and
Aas (2009), Guégan and Maugis (2010) and Dissmann et al. (2013).
In the applications below, we focus on these vine copulas and in particular the
D-vine whose density f .x1 ; : : : ; xn / may be written as,
Y
n YY
n1 nj
f .xk / c;i;iCjjiC1;:::;iCj1 .F.xi jxiC1 ; : : : ; xiCj1 /; F.xiCj jxiC1 ; : : : ; xiCj1 //:
kD1 jD1 iD1
(11.1.28)
Other vines exist such as the C-vine:
Y
n Y
n1 Y
nj
f .xk / c;i;iCjj1;:::;j1 .F.xj jx1 ; : : : ; xj1 /; F.xjCi jx1 ; : : : ; xj1 //;
kD1 jD1 iD1
(11.1.29)
where index j identifies the trees, while i runs over the edges in each tree (Figs. 11.5,
11.6, and 11.7).
C abcd (C abc , u d )
C abc (C ab , u c )
C ab (u a , u b )
ua ub uc ud
C abcd (C ab , C cd )
C ab (u a , u b ) C cd (u c , u d )
ua ub uc ud
C abc (C ab , C bc )
C ab (u a , u b ) C bc (u b , u c )
ua ub uc
Fig. 11.7 Three-dimensional D-vine illustration: it represents another kind of structure we could
have considering a decomposition similar to (11.1.27), considering the CDFs
Gumbel Galombos
0.0 0.2 0.4 0.6 0.8 1.0
1.0
lllll
ll
ll
l
l
ll
l
ll
l
l
ll
l
l l l
l
ll
l
l
ll
l
l
ll
l
l
l
l l lll
ll
lll
l
ll
ll
ll
ll l
l
ll
l
ll
ll
l l l ll ll ll
ll
l
ll
l
l
ll
ll
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
l
l
ll l l l
ll
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
llll
l l ll l lllll ll ll lllll
ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
ll
l
ll
l
l
ll
l
l
l
l
l
l
l
l
ll
l
ll
l
l
lll
l
l
l
l
lll
l
ll
l
lll
l l
l
ll l
l
llll
lll
lll
l
ll
l
l
l
l
l
l
lll
ll
l
l
l
l
ll
l
ll
lll
l
l
ll
l
l
l
ll
l
l
ll
l
l
l
l
l
l
l l ll lll l
lllll l llllll
l ll ll
lll
l
l
ll
l
l
l
lll
l
ll
lll
l
l
l
ll
ll
l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
ll
l
l
l
l
l
l
l
l
ll
ll
ll
ll
ll
lll
l ll ll ll
lll
l ll
ll
l
llll
l
ll
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
lll
ll
l
l
l
l
l
l
l
ll
ll
l
l
l l l
l ll l ll lll
l l
ll
ll l l
ll
l l llll
l
llll
l l
ll
l
l
l
l
l
l ll
l
ll
l
l
l
l
lll
l
l
l
ll
l
l
l
l
l
l
lll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l l
llll
l l
ll ll l
llll
l ll
lll
lll
lllll
ll
l
l
l
lll
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l l
ll ll
l lll
l
l l ll ll
llllll
l
llll ll l
l l
l
lll lllll
l
l
l
ll
l
l
lll
l
l
l
ll
ll
lll
l
l
l
l
lll
l
ll
lllll
lll
l
l
l
l
l l
lll
l
l
l
l
l
l
l
ll
l
lllllll
l
l
l ll ll
llllll l lll
ll l
l l
lll
llll
l
l
ll
l
l
l
l
l
l
l
lll
l
l
l
ll
l
l
ll
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
lll
ll
l ll
l
ll
ll
l l
llll
l ll
l l l l lll lll llll ll
lll
l ll
l l
l l lll
ll ll
l
l
ll
ll l
l l
ll
ll
l
ll
ll
l l l
l
0.8 ll ll
l
lll
l lllll
ll
ll
lll
ll
l
l
ll
l lll
lll
ll
l lll
l
l
l l l l l ll ll ll
l l lll
lll ll
lll
l
lllll
l ll l
ll
l ll
ll
l
lllllll
ll
ll
l
l l l
lll l l l
lll
llllllll l
l
ll
lll
lll
ll
ll
ll
ll l
ll
l
lll ll
l ll l l l l
l lll ll
l
lll
ll
lllll
ll
l
ll
ll
l
ll
l l
lll
l
l
l
ll
l
l
lll
l l
ll
lll
l
ll
ll
llll
l
l
ll
ll
lll lll
ll ll
l l
llll
l
ll l
l l llll l l
l l llll
l
l ll
l
l
l
ll
l
ll
l
l
l ll
l
l
ll
l
ll
l
ll
ll
l
l
l
l
l
l
l
l
ll
ll
l
l
l
lll
l
ll
l
l
l
l
ll
l
l
l
ll
l
ll
l
ll
l
l
l
l
l
l
l
l
llll
l
l
l
l
l
l
lllll
l l l l ll l
ll l l ll ll llll l l ll lllll
ll ll lllllll
ll
ll
lll
l l
l
l
ll
ll
l
ll
l
l
l
lll
lll
l
ll
l
l
l
l
llll
ll
ll
l
l
l
l
ll
ll
l
ll
l
l
lllll
llll
l
llll
lll
ll
ll lll
ll ll
l l l l ll
llll
l
l
ll
l lll ll
l
l
l
l
ll
l
l
l
ll
l
l
lll
l
l
ll
ll
ll
l
l
l
l
l
l
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
ll
l
ll
l
ll
l
ll
ll
l
l
lll
l llll
l l l
l ll ll l ll ll l l lll lll
lll
l l
ll
llllllllll lll
ll
l
ll
lll
l
l
llll
l
l ll
ll
l
l
ll
l
lll
l
l
ll
l
ll
ll
l
ll
ll l l
llll
l
ll
ll
l ll
l l
l
l
l
l
ll
l
l ll
lll l ll l llll l l ll ll
l l l
ll
l
ll
ll
l
ll
ll
ll
l
l
ll
ll
ll
l
l
ll
ll
l
ll
ll
ll
l
l
ll
l
l
l
l
ll
l
ll
l
l
l
l
ll
l
l
ll
l
ll
l
ll
lll
lll
l l
l
l l lll
l l
l l l l lll l lll l ll lllll ll
l
ll l ll
llll lll
l lll
lllll
llllll l l
ll
l
lll
l lll llll
l ll
ll
lllll
l l ll ll llll lllll llllll
llll
l
ll
lll
l l
lll
l
l llll
ll
ll
lll
ll
ll
ll
ll
lll
l
llll
l
l l
lll
l llll lll
ll l
ll l l l lllll ll
l ll lll
lll lllll l
ll
l lll
l
ll l
ll
l
ll
lll l
l
l ll
llllll
ll l
l l
lll
ll
l
l ll llll
lllll l l l l l l ll ll l l ll
l lll
ll
l
ll
ll
lll
l
l
ll
l
l
l
l
ll
l
l
lll
l
lll ll
l
l
l
l
l
l
lll
l
l
l
l
l
lll
l
l
l
l
l
l
ll
lllll
lll l
l ll
l l l
ll l ll l l l l ll l l lll ll ll ll ll llllll
lll ll
llllll
lll
ll
ll
l
lll
ll
lll
l
ll
ll
l
ll
l
l
ll
l
l l
lll
l
l
l ll l
l
l
ll
l l
ll
l
ll
ll
l
l
lll
lll
ll llll
lllll
l l llllllll ll l l l l ll ll
l l ll
lll
ll l
ll
ll
l
ll
l
ll
ll
l
l
l
ll
l
ll
ll
l
l
lll
l l
l
ll
l
ll
l
l
lll
l
l
l
l
l
l
l
l l
ll
l
l
ll
ll
l
l
l
l
l
lll
l
l
l
ll
lll
l
llll
lll
l
l
l
l
l
l ll llll ll l llll
l l
l l llll l
ll l
ll l l
l
l
ll l
ll
l l l l
l
ll
lll
ll
lll
llll ll l
ll
ll ll lllll l
ll l
l l l ll ll l l l ll ll
l l
ll
l
ll
l l
lll
l l
l
ll
ll
lll
ll
ll
ll l l
l ll l l ll
l
0.6
l l lll ll l lll l
llllll l l ll l l
ll lll l ll ll
lll l l l l
l
l l
llll l
ll
ll
ll
l l
ll l
l l l ll ll lll l l ll l lllll lllll ll
lllll lllllllll
l llll
l
ll
l
l
l
lll
ll l
lll
lll
lllll
ll llllll llll
l
ll
l lll
ll l l ll ll l ll
ll
ll
l ll l llll llll ll
llll
l
l
ll
lll
ll
l
llllll
ll
ll
l l
ll
l ll
ll
ll
ll
l
ll
l l
lll
ll
l
l ll l
lll lll
l
l l
l
l lll
ll
llll l ll ll lllll
l l
lllll
l
l
l
l
l
l ll llll
l
l
llll l
l
l
l
l
l ll
l ll
l
l l llllllll l
ll
l l
l
lll
lll
l
l ll
llll l l
ll
lllll
ll lll l lll l
l l
llllll
l
ll
l
llll l
lll l ll
l l
l
lll
ll
l
l
llll
lll
l
ll
ll
l ll
l l
ll
l
l
llll
l
ll
lll
l
ll
lllll
lll
l
lll
ll l
l
ll ll llll l
l lll l l
l l l l l l l lll
ll l ll
ll ll l l l l
l ll
llllll l
l l
l
llll l
lllllllll l
l
ll ll
l ll
l ll
l
l
ll
l llll
l ll
l
ll
ll
lllll
ll ll l
l lll
ll
ll l
ll ll llll l l ll ll
ll ll l l
l
lllll
l
l
lll
lll
lll ll
l
l
l
l ll l
l
lll
l
l
l
l
l
llll
l ll
l
l
ll
ll
l
l
l
l
ll
ll
ll
l
l
l
l
l
ll
ll
ll
lll
l ll
l
ll l llllll l
Y
lll l l l ll l ll l l l ll ll l ll ll l l
l l ll ll l
l llll
l
l lllll l
l llll lll lll ll l
lll lllll l
ll
lllllll llllllll l
l
lll l lllll l
ll
l ll
l l lll ll llllll l
ll lllll lll l l ll
l l ll l
ll
ll
l
ll
lll
lll
l
lll
l
l l
lll l
l ll
ll
ll
l
lll
ll
lll
l ll l
lll
l l
l l l l
llll l
ll ll
l ll
l
ll l ll lll l llll lll l l lllll
l l l
l lll ll
l l ll l l l l
ll
ll l ll l ll l l ll lll ll l ll lll
l ll
l lll l l
lll
l l l l ll l l
Y
l l ll ll l ll ll l
lll l lll l l l lll l ll l l l
ll l llll l l llll ll l lll l
ll lll l llll
l l
l l
ll l ll
l
lll
lll
ll
l
l
l
lll
l lll l lll
ll l
ll lll
llll
ll l
lll
lll ll
lll ll l
ll l ll
l
llll
l l l l l l l lll ll l
llll
ll
l
ll l
lllll
lll l
ll
llll
ll
l
l
l
lll
l
ll
l
l
l
l
ll
lll
l
l
l
ll
l
l
l
ll
l l
ll
l
l
l l
l
lll
ll
l
l
l
l
l
l
ll
l
l
l l
llll
l
ll
l
llllll
l
llll lll l
lll l
llll l
l
ll l lll l ll ll lll l
llll
l lll lll
ll
l lllll l
llll l l lll l
lll
l l
ll ll l
ll
lll
llll ll
ll lll
l l
l lll l
l l l l l l l l l ll
l
lll l
l l
l
ll llll
llll
l
ll
l
l
ll
ll
lll
ll
l
ll
l
l
ll
l
ll l
l
l lll
l l
lll
ll
lll
l
lllll
l
l lllll
l l
lll l lll
ll
ll l lll lll ll l l lll lllll l lllllll
l
l
l
l lll
ll l
ll
l ll lll
ll l
ll
lll llll
l l
lll
ll
l
lll
l lll lll
l lll
l l l
l ll ll l
l l l l l ll l l l ll ll ll
ll
ll ll
lll
ll
l lll
lll
l l
l
ll
lll
ll
ll
ll
lll
ll
llllll
l llllll ll l ll l l l
l
l lllll l
l
0.4
l ll l ll l lll l
ll
l l ll l
llllll l
l l l lll l l llllll l ll l l ll lll
l ll
lll l ll l
l l l ll l ll ll lllll
llll l
ll
ll l
l ll l ll
lll llll
ll l
lll ll ll ll ll
l ll ll l l llll l
ll ll l
l
llll
ll
lllll lll l
ll
lll
lll lll
ll
ll
ll
l ll
lll
l ll
lllll
l
l l
lll
l ll
l
ll
ll ll l lll lll
lll l l lll l l l l lll lll
l
l
lll
llll
l l
ll
l lll
l
ll
ll
ll
ll
l
l
l
ll
ll
lll
l
ll
l
ll
l
l
ll
l
l
ll
ll
l
ll
l
l
ll
l
l
lll
l
l
llll
ll ll
lllll l
llll
ll l
l ll l ll l l
llll ll ll l l l l l
ll
l
llll
ll l
l
lllll
l
lll
l l
lll l
ll l
ll
ll
l ll
l
l l
llll
ll l
l l
ll lll ll
l
l ll l l l
l l
l ll
l l l ll l
l
l ll l
ll
ll
ll
l
l l
l llll
l
lll
l
l
l
ll
ll
l
l
l
l
l
llll
l
lll
ll
l
ll
l
ll
lll
l
ll
l
l
l
l
l l
lll
lll ll
l l
ll l
ll l
l
l ll
l l ll l l
l l ll ll l ll l l
l l l l ll ll lll
l llll l l
ll l l l l l lll l l l ll l l ll l lll l llll l llll ll
l ll l ll
l
lllll
ll
l lll l l l
l
llll ll l l lllllll l l
ll
ll
ll
ll
l
lllll
lll
l
lllll l
ll
l
ll
l
ll
l
lllll
l ll
ll llll l l
lll
ll
ll
l ll ll lll l lll
ll l
lllll
l
lll
l
l
ll
ll
lll
llll
llll
l ll l
l
l
l
l
lll
ll
l
l
ll
l
lllll ll
lll
l
l lll
ll
l l ll
l ll l l ll
l
l l
ll lllllllllll
l
l l l ll
l
l
lll
l ll
lll
llll
ll
llll
l l
lll l l
llll llll l l lllll llllll l ll l l l l l ll l ll l
ll
ll
ll
l
lllll
l
ll
lllllllll l
l
ll
ll
ll
lll
ll
ll
l
llll
lllll
l
l ll
l
l l
l llll l l
l l
l ll lll llll llll l ll l
l
l
lllllll l
l l
lll
l llll
l ll
ll
l llll lll l
l
l
ll
ll
l
llllll ll l
l
lllllllll ll l lll ll l
l l l l l l l l l ll l lll l
l lll
ll l
l
llll
l
l
ll
l
l lll
l
llll
l
l
l l
lll
l
ll
ll
lll
l
l
l
ll
l
l
lll
l
lll
ll
ll
lll
ll
l
llll
l
l l l
l
ll
lllll l
l l
llll lllllll
l
ll l
llll l l
l ll l
l l l
ll ll lll
ll
l
ll
l
lll
l ll llll
ll
l l
lll ll l
ll l l l
ll l l lll l
ll ll l l
l ll l l ll
lllll l l
lll
l lll
l
l ll
ll l
ll
l
ll
l
ll
l
ll l
l lllll
l
l
llll
lll
l
l
lll
l
ll
llll l lllll ll l l ll
l l l l lllll l l
lll
l
llllll ll
lll
l l
lll lll
l
l l l
l
ll
l
l
l
ll
l ll lll l
lllll
l l
ll
l
llllll ll
llll l l l ll l lll
l ll l
ll l ll l
l l l l ll
ll l
l
lllll
ll
l
ll lll l
ll
lll
lll
lll
l
lll
l
l
ll
ll
ll
l
ll
ll
l
lll
l l llll l lllll l ll l
llll ll
l lll l l l
l llll
lll
lll
lll ll
l lll llll
l
ll ll ll
lll
llll
ll
l
ll llllll
l l lll l ll
l l l
ll l l l llll ll
lll
ll
ll
l ll ll l
ll
ll
ll l
l
lll
l l
ll
ll
l
ll
l
l
l
l
ll
l
lll
l l
ll
l
l l
lll l llllllll ll ll ll
0.2
l l l lll ll l l l l l ll l l ll l l l l l
l llll l ll l
l lll
llll
lllll l
llll
l
ll
l lll
ll l
lll
l ll
ll
l
llllll
llllll
lll
l
l
ll
lll l
ll
lll lllllll
ll
ll llll lll ll l
l ll
lll l llllll
l ll l llllll
lll
l
ll
l
ll
ll
lll
ll
ll
lll
l l
lllll
l
ll
ll
llll
l
lll
l l
lll
l
ll
lll
l
l
l
l
l
lll
l lll l l l l l
l
l lll lll l
l ll l
lllllll
l
lll
l
l
ll
ll
lll lll
ll
l
l
lll
lll l
ll
l
llll
llllllll
lll
llll l
ll ll l
lll
llll
ll
ll
ll ll l
l l
lll lll l l
llll
l l
lll l ll
lllll l
l
llll
lllll
ll
ll
ll
l
lll
ll
ll
l
ll
l
lll
l
ll
ll
l
ll
ll
l ll
ll
l
l
l
l
ll
l
l l
lll
l
l
l
ll
lll
l l
ll ll
l lll l ll l ll l l
lllll
ll ll
l
ll
lll lll
ll ll l
l l
l ll lll
ll
llll
ll l lll l ll ll l l l
l ll ll l l ll l
ll
ll
llllllllllll ll
ll
l ll
lllll
ll
l
ll
ll
ll
l ll
l l l
ll
l llll lll l
lll l l
lllll
l
ll
l
lll
l
l
ll ll
ll
llllll
l
l
l
l
ll
l
l
ll
l l
ll
l l
ll
ll ll
l
l
l
l
l
l
ll
ll
l
ll
ll
ll
l
l
ll
l ll
l
ll
lll
l
ll
lll
l
ll
l
llll ll ll
ll
lllll
ll
lllll lll
lllll ll llll
l l ll
l l l
l l l l l l l
llll
l l
ll l
lll
l
l
l l
l
ll
l
ll
l
ll
l
ll
ll
l
ll
ll l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
llll
ll
l
lll
l
llll
l
l
l
l
llll
l l
l l ll ll
l lll l ll
lllll
l
l
ll
l
ll
l l
ll
l
l
l
lll l
ll
lll
ll
l
l
l
l
l
l
l
l
lllll
l
lll l
ll
l
lll
l
ll
l
ll ll
ll
l
lll
ll
ll
lll ll
lll
l ll llll
l lll l
l l
l ll l llll
l
ll
l
ll
ll
ll
l
l
ll
l
l
l
l
l
ll
ll
l
l
l
l
l
l
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
ll
ll
l
l
lll
l
l
l
l
l
ll
l l
ll
ll
l lllll lll l
lll
lll
lllll
l
l
l
l
l
lll
ll
ll
l
ll
l
ll
l
l
ll
lll
ll
lll l
l ll
l
l
l l
l
llll
l l lll ll lllll
l
l l
llll llll lllll ll l lll l l ll
l
lll
l
ll
l
ll
l
ll
ll
l
lll
l
ll
ll
l
l
lll
l
l
ll
l
ll
l
ll
l l
ll
ll
ll
ll
ll llllllll ll l l l
ll
ll
ll
ll
ll
l
ll
lll
ll
llll
l
l l
l
llll
ll
l
l l
lll
l
ll
lll l
l ll
ll ll
ll
l
ll ll
lll
l ll l
l ll lll l l llll
ll
l
ll
l
ll
l
ll
l
lll
l
l
ll
l
l
ll
ll
lll
l
ll
ll
l
l
ll
ll
l ll
ll
ll
ll l
l
lllll
ll l ll ll ll l l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
ll
l
l
ll
l
l
l
l
ll
l
ll
l
l
l
lll
l
l
lll
ll
lll
l ll
l l
l
ll
ll
l l
ll ll
lll l l
lll lllll l ll l l ll l l l
l
ll
ll
l
l
l
ll
l
ll
l
ll
l
l
l
ll
l
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
lll
l
l
l
l
ll
l
l
l
l l
l l
ll
l
0.0
l
l
l
l
l
l
ll
l
l
ll
ll
l
l
ll
l
l
ll
l
ll
l
ll
ll
lll
l
ll
l
ll
l lll l l l
l
l l l
l
l
ll
l
l
l
lll
l
l
l
l
ll
ll
l
l
ll
l
lll ll
l
ll
ll
l
ll
l lll
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
X X
Gaussian Clayton
0.0 0.2 0.4 0.6 0.8 1.0
1.0
l l l l l l ll l l lllll l ll l
lllll l ll l ll ll l ll llllll ll lll
ll lll
ll llll
l
ll l
ll
l
lll
ll l l l l ll lll l llll ll
lll lllll llll l
l l
lllll l lll ll
ll
ll l
lll lll
l
llll
l
ll
l lll l
l l
ll
ll l l lll l ll ll l ll llll ll l
llllll ll l llll l ll
l l
ll lll ll ll ll llll lllll
l lll
llll
llllll
ll
ll
lll l
l
ll
l l l ll
ll
ll
l l
lll
l
ll
ll
l
l l l l
lll ll ll ll
lll lll
llll l
ll llllll
ll l lll l ll ll ll
lll
lll
l ll
l ll
l
l l lll
ll
l
l l l
ll lll
llll
l
l
ll
ll
l
lll
ll
ll l ll ll ll ll lll ll ll l l ll
l
l l ll lll
l l l ll lll
l
ll lll llll ll
l
lll l
l lll lll lll l
llll
l
l l
l
l
l
lll
lll
ll
ll
l
ll
ll
l l l l ll ll l ll lll
lll
ll ll
l l
l ll l
l
l ll
ll
lll
lllll
llll
lllllll
llllll
llll
ll
llllll l
lll
l
lll
ll
lll
lll
llll
ll
l
l
l l l ll ll
llll l l l
llll l
lllllll l ll ll l
llll
ll l lllll l l lll ll
l llll
l ll
l ll
lllll l
lll l lll
l ll
l lllllllll
l
l ll lll ll l
lllll l ll l l l ll l ll llllll ll l l
lll
l ll l l
l l
l ll
llll
l l l
l ll
ll
ll l
llllll
l l
l
l l
lll
lllll
ll
llll
lll
ll l
l
l
l l
ll
l
l l
l ll l ll l l l ll l
ll ll
ll llll lll
llll l
lll l llll ll lllll lll l
llll
llll ll
lll
lllllll
l
l ll
l l
llll ll l
ll
l lll
l
llll
l
l ll
lll l lll l
l ll
l
ll ll
llllll
l
l
l l l l l l l ll ll l lllllll l
l l llll
l l
l
lll
l
l
llll l
l
lll
l
ll l
l lll
ll
l
ll
l
lll
l
lll
l ll
ll
l
l l l
lll
ll
ll
lll
l l
llll
ll
lll l
l
l
ll
ll
llll
l
ll
l
lll
lll
l ll l ll ll l llll lll l l ll l lll lll l l
l lll ll lll ll l l ll ll l
ll
ll l
ll
l l
ll
llll l
l ll
llll l
lllll ll ll
l
l ll
llll ll l ll l l
llll l lll l lll l
l ll
l ll ll l
l lll lll
lllll
llllll llll l llll
l l l
l
l
ll l ll
ll
l l
ll
l
lllll ll
ll
ll
l l
lll
l
lll l
l ll l
ll
ll
l
ll l l l
l l l l ll l
ll l
l
llll
l l lllll ll ll
ll
l lll l
l l lll lll l ll
lllll
lll l
lll lll ll l
lll
l ll
lll l lll
ll l
lll llll
l l
lllll
ll
l l ll
lll lll ll llllll ll
ll l ll
l llll
llll ll l
lll ll
llllllllll l
ll
l
lll l
ll
l
lllll llll
ll
ll
ll
l
l l l
ll l
llll
l
lllll
ll llll
l ll
ll
l
ll
l l ll ll l l
l l l lll
l lll llll ll ll
l
ll lll l lll lll l ll ll lll lllll llllll l ll lllll l ll llll l
ll l ll llllll l lll l
lll ll ll
l l l
ll llll
ll
l l l
lll
l l
l l l
ll ll l llll
l llll
llll l
ll lll
l
l
ll lll
ll
l ll
l
l ll ll ll
l
l l lll
l l
lll
ll
l
llll
ll l
ll l
lll
l l l
l lll lllll
l lll
l
ll l l
ll
l
l l ll llllll ll l llll l l l l
l l
0.8
ll l l l ll ll l l l ll l l ll llll
l l l lll l l ll ll
l l ll ll l l lllllll l l
l l l
l llll l ll ll l l ll ll l l
ll lll lllll llll lllll l ll
llll l llll ll ll
l ll
lll ll l l l
l
lll
ll
l l
ll
l ll llll llll
l
ll
ll
lll
lll
l l lllll l lll ll l l l ll lll ll ll
l ll l
ll ll
lll lll l
ll
l
l
lll
l l llll
ll ll l
l l
lllll l
lll ll l llll ll
lll l l l l l
llll lllll ll l llll l l ll lllllll l
llll llllll l llll l ll l llllll lll l llll
l llll l llll ll l ll l ll ll
l ll ll
l lll llll
l ll
ll
llll l
ll
lllllll ll l
llllll llll l
llll
ll
ll
lll
ll lll lll
l
l ll ll
lll
l ll
lll ll l
l ll llll l l
l l
l l lll ll l l
llll l
lll lll lll
l llll
l
l
l
l
ll lll l
l l l llllll
l lll
llll
l l ll
ll
ll ll l
lllll l
l
ll
l
l
llllll
l
l ll
l
l l
l
lll l
lll
l
lll
l l l ll l l
ll
ll
l l l l
l l
llllllll
l
lll
llll
l
l
l
llll ll l
l ll l
ll ll lll
lllllll l ll
lll l l
l
ll
lllllll
l ll
lllll
lll
l
l
l
l
ll
llllll
ll
l
llll
lll
l
l
l
ll
ll
l
l
l
llll
l
l
l
l l
l l
ll
ll l
l
l
l
llll
l ll l
l
l
l l
lllllllll
l
l
l
l
l lll
llll
l
l
l
ll
l l
lll
l
ll
l
llllll l
l
l
lll
l
l
l
l
l
l
l
l
l
ll
l l l l l l llll l
l lllll ll llll l
l ll l l l llll l l l ll ll l l l l l
l l l l l l llll l ll lll ll l l lll llll ll lll l
ll l l l l lllll ll llll
l lll l
l ll l
ll llll
l ll lll llll lll
ll
l l l
l lll lll
ll l llll l
l l l l l l ll l l l l llll ll l
llll l l
ll l
ll l
l llll l
l
llllll ll
l
ll
l ll l l l llll
lll
ll
l
l lll
lll
llllll l lll l ll
ll l
lllllll
l
l llll ll ll
ll l l l ll l
l l
l ll lll ll
l llll lll lllll llllll ll
l llll l ll l l lllllll ll ll ll l
ll l l lllll lll l l
ll
llll lll llllllllll l
l
l ll l
l ll
l
llllllll lllll l l
l l l ll llllll
ll lll
l
l l
llllll
l ll
ll llll
lll
ll
l ll l
lll
l
ll lll ll ll lllll
l lllll l
llll
llll ll llll
ll
l
ll ll
l
l l l l
l ll l
l l
l
l lllll ll
l lll l l l
l l l llll
l ll
l
ll l
lllll l
l llllll ll
l
l
ll
l l ll
ll l l
l
l
ll
llllllllll
l
l
l l
lllllll ll l l
ll
ll
lll
ll
ll
lll
ll
ll
l
l
l l l l
l l l ll
l
ll
l lll
llll
l l l
l
lllll ll
ll lll
ll
l l
l l lll l ll l llll l
l
ll
ll l l
l ll l l l
l l l l
lll lll l
l
ll
ll l l ll l
l
lll
ll l
ll
lll
l
ll
l
llll
l l
l
lll
l l
ll
l
l
l
l
l
l
l
l
l
l
lll
l
ll
l
l
l l
l
l
l
ll ll
l
l
lll
ll
l
l
l
ll
llllll lll
l lll
l
lllll l
l l
l
ll
l ll ll ll
l l
l
l
ll
ll
ll
lll
l
llll lllll lll l l
ll ll ll l l l l
l ll l l l l ll l
ll l ll lll l ll ll l ll ll l l
l l ll l llllll lll llll l ll l lll ll l l l l ll llll ll llll l llll l llll ll l l l l l ll l
ll ll ll l
ll l
l l l l lll l l ll l l
l l ll l ll lll l ll l ll l ll
lllll ll
lll l ll lll ll l
l l ll lllllll l ll lll llll
lll l l
l
lllllll l l l l l
l llllll ll l l
lll l
l ll l lll l l
ll
l l l l
lll l
lllll
lll lll
l llll
lll l llll ll l
ll lll l l l l l lllll l
0.6
l ll l l l l l l l
l ll l ll l ll l l lllll l
ll lllll ll
l l
ll llllll l
ll llll llll ll ll l ll
ll ll ll llll l lllllllll l llllll llll ll llll ll lllll l
ll l llllllll l ll l
lll lll lll l ll l l l lll ll ll ll lll
l l ll
ll
l l ll
l lll
l lll lllll
ll ll llll
l ll
l llllll
l
l
l ll
ll l l
lllll lllllll l llll l l
ll lll l ll l l
l l l ll l ll lllll
ll lllll l llllllll ll lll ll l l l ll l lll l l ll ll llll
l l l l l l l l lll l l ll l l
Y
ll ll lll l l ll ll ll l l l l l l l l l l
l l
l ll
l
l l lllll ll l llllll lll l l
ll l
l lll l ll l lllll
l
ll
ll
llll lll
l lll l l lll llll
ll
ll
llllllll l l
ll
l l lll
l llll
l l llll ll l l ll
ll ll ll ll ll l ll l
l
lll l l
l ll ll ll lll lllll
l lll
l l
lll
lllll ll
l
lll l
l
l
llll
llll
l
l l
l
ll
lll
lll
l
ll
l
l
l
ll
ll
ll
ll
ll
l ll
llllll
l
lllllll
l
ll
l
ll lllll l
lllll ll
l
lll l ll
ll
l
ll
lll ll l l l l l l ll l l
l ll l l
llllllll
l llll l l lll ll l lll
l
l lll
l ll ll lll llll l l l ll l l l ll llll l l lll ll
llll l ll ll l l ll ll ll l l l ll l ll
ll
l
ll
ll
lll
l
lll
l
lllll l
lll
lll
ll ll l
l l
l l l lll ll ll
l
ll
llll l
ll llllll ll lll l l
ll ll ll lll ll l llll ll
l ll
l ll
ll ll ll ll lll l
l ll ll ll lll
l l lllll ll ll
l l l lll ll l ll l ll ll
llll
ll ll l l l ll l llllll
l
l l
l
ll llll
lll
llll
l lll ll
l
llll
lll
llll
lll
l
l lll lll
l
lll lll
l ll ll
l llll ll ll l l l l
l
llll
lllllll
ll llll l
l
ll l l
l l
ll
l llll l ll
ll ll lll l l ll
ll ll
l l l lll l l ll llllll ll llll ll l l l ll l l ll l ll
lll
l
lllll
ll
l llll l l
ll l
l
ll
lllll
ll
lll
ll ll ll
l ll
lll
llll l
l l
l l
llllllll
lll llll l ll l l ll ll l l l l
lll
l llll ll l ll l lll l l l l l lll ll ll ll l
ll lll l
ll ll ll ll
lll
llll llllll l
llll
l
l
l
l l l
l lll l l
ll llll ll l
ll
ll
l
l ll
l
l
ll l
ll
l l ll
l
l
l
lll
ll
l
l
lll llll l
l l
l l llllll
lll
l
ll
ll l
l l
l l
l l
l
ll
l l llll
l
l l l l
ll ll l l l l
l l
l l ll l ll l ll l
l ll
ll
l
lll
ll
ll
ll
l
l
ll
lll
l
llll
l
l
l
l
l l
l
l
lllll
l
ll
lll
l
l
l
l
l
ll
l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
ll
l
l l
l
l
l
l
ll
l
ll
lll
ll l
ll
ll l ll
ll
l
l
ll l
l l l l l
ll ll l l l ll llll ll l
l l ll
l l ll l ll ll
ll lll l ll lll l
l lll l
lll lll ll l ll l l l lll
l lll
l ll l lll l
lll l l
lllll l ll lll l l l ll
l ll l
lllll l
ll
llll
ll
l
ll
lllllllll
lll l
l
llll
l
l lllll
l lll
ll llllll l l lll ll l l l l l l l ll
l
ll l
l lll ll l ll ll l
ll llll l l
lll ll ll ll
lll l lll l l ll ll l l l l
lll llll ll l
l l llllll l ll ll lll ll lll l ll lllll l l ll l ll l l ll
l ll
llll
l l
l llll
ll
l
lll
ll
ll
lll
l
ll
l l l
lll lll
l l
ll
l
l
ll ll l l
l lllll
lll llll ll l ll l ll l l l
l
ll l l
l ll l l llllll ll
lllll
l lllll l
ll ll l
l
lll lll l l
llll
l
ll
l
l
l
l
llllll ll l l lll ll
l ll lll l
l
l
ll ll l lll l l ll l lllll ll
l
llll ll
l ll
l ll
l
llll
l
ll
l
l
ll
ll
l
lll
l
l
l
ll
l
l
l
l
ll
ll
l
ll
ll
l
l
l l
l
l
lll ll
ll
ll lll
lll l
ll l
l
l
l
l l l
ll
l
lll l l
l l ll ll l ll l
l llll l l l ll l l lllll ll ll llll l l l ll ll
l l l lll
l
ll l
l l l l l l
ll l
0.2
l
l
l
ll
ll
ll
ll
l l
l l
ll
ll ll l
llll
ll ll
l lllll ll ll l ll l lll
l l l ll ll l l l ll ll l l ll l
l
ll
l
l
ll
l
l
l
l
ll
l
ll
l
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
X X
Fig. 11.8 This figure represents four types of copulas. Starting from the top left hand corner,
the Gumbel copula which is an Archimedean copula is upper tail dependent. The top right-hand
corner copula is the Galambos, an extreme value copula. The bottom left-hand corner represents
the Gaussian copula belonging to the elliptic family. Mathematically this copula is the inverse of
the multivariate Gaussian distribution. The last one represents the Clayton copula, an Archimedean
copula which is lower tail dependent
Some usual copulas (Fig. 11.8) are provided in the following (Ali et al. 1978; Joe
1997; Nelsen 2006):
ˆ
• Gaussian: C† .u/ D ‚† .‚1 .u1 /; : : : ; ‚‹1 .ud //, † being a correlation matrix.
• Student-t: C†;v .u/ D t†;v .tv1 .u1 /; : : : ; tv1 .ud //, † being a correlation matrix
t
In this section, we will discuss points that are to be remembered when these
methodologies are implemented. Following the structure of the chapter, we start
with the correlation coefficients in particular the most commonly used, the Pearson
correlation which measures the strength of linear association between two variables.
The first interesting point is that outliers can heavily influence linear correlation
coefficients and may lead to spurious correlations between two quantitative vari-
ables. Besides, Pearson’s correlation relates to covariances, i.e., variables moving
together, but it does not mean that a real relation exists.
Besides, the correlation coefficient is a numerical way to quantify the relationship
between two variables and is always between 1 and 1, thus 1 < < 1. Larger
correlation coefficients, i.e., closer to 1 suggest a stronger relationship between the
variables, whilst closer to 0 would suggest weaker ones. This leads to outcomes easy
to interpret.
It is important to remember that correlation coefficients do not imply causality.
If two variables are strongly correlated, it does not mean that the first is responsible
for the other’s occurrence and conversely.
Now, discussing the performance of regression analysis methods in practice,
this depends on the data-generating process, and the model used to represent
them. As the first component, i.e., the data-generating process is usually unknown,
the appropriateness of the regression analysis depends on the assumptions made
regarding this process. These are sometimes verifiable if enough data are available.
Regression models for prediction are often useful even when the assumptions are
moderately violated, although they may not perform optimally, but we should
beware misleading results potentially engendered in these situations.
Sensitivity analysis, such as variation from the initial assumptions may help
measuring the usefulness of the model and its applicability.
Now focusing on the use of copulas, it is important to understand that though
they are powerful tools they are not the panacea. Some would actually argue that
the application of the Gaussian copula to CDOs acted as catalyst in the spreading
of the sub-prime crisis, even though the limitations of copula functions such as the
lack of dependence dynamics and the poor representation of extreme events were
tried to be addressed.
Note that Gaussian and Student copulas have another problem, despite being
widely used these are symmetric structure, i.e., if we have asymmetric negative
shocks, these will be automatically transferred on the other side. In other words, if
only large negative events have a tendency to occur simultaneously, the structure
will also consider that large positive events also occur simultaneously which as
mentioned previously might not be the case.
Alternative briefly presented in this chapter are not necessary easier to use, as the
parametrisation might be complicated.
Further to the brief discussions regarding the presented methodology, as they are
related to the analysis of correlations, we thought it might be of interest to briefly
address and illustrate the exploratory data analysis methodologies, for instance,
156 11 Dependencies and Relationships Between Variables
the principal component analysis (PCA) and the correspondence analysis (CA).
PCA (Jolliffe 2002) is an orthogonal linear transformation of the data. These are
transferred to new sets of coordinates, ranking the variance of each component such
that the component with the largest variance will be represented on the first axis, the
second largest variance on the second axis, and so on and so forth. On the other
hand, correspondence analysis (CA) is a multivariate statistical technique (Hair
2010; Hirschfeld 1935; Benzécri 1973) which is similar to principal component
analysis, but applies to categorical rather than continuous data. As PCA, it allows
representing a set of data in a two-dimensional graphical form.
In other words, these methodologies break down existing dependencies in large
data sets. Basically, the methodology groups together highly correlated variables.
Though, the accuracy is reduced, the simplification and the dimension reduction
makes the outcome usable in practice. These methodologies are illustrated in the
following Figs. 11.9 and 11.10.
These approaches may be very useful to break down a set of correlated variables
into linearly uncorrelated variables making then ready for further analysis. This may
help practitioners reducing the number of variables to be analysed only focusing on
the most important while reducing the noise.
−8 −6 −4 −2 0 2 4
T8
4
T6
0.2
Bonus
T21
T25
T12
T23 T4 T9
T22T5
2
T14 T3 T26
T15
T17
Controls T11 Income T20
EconomicsT16
0.0
T7 T13 T18
0
Experience T2 Office Ho
T10
Comp.2
T32
−2
T24 T1
Desk Volume
T27 Volume T19
Market
−0.2
Losses
Number of People on the Desk T28
−4
Adventurous Positions
−6
−0.4
T29
T30
−8
T31
Comp.1
Fig. 11.9 This figure represents a PCA providing an analysis of a rogue trading exposure. Each
trader is characterised by a value in each fields
References 157
4.36
l
3
2.26
2
l
Dimension 2 (4%)
2.44
l 6.04
l
1
0.68
3.44 2.75
l l
1.02
1.73 2.80.67
l 3.612.07
1.13
1.32 0.48
4.6 l 1.62
1.62
1.68
1.38
0.96
0.09
4.491.12 2.85
3.53l
l 4.03
3.91 0.75
0.26
0.34
0.91
0.2 l
l
4.51
1.56
1.21
0
l l 1.95 4.320.17
1.75
l0.44
0.744.3
3.62
0.32
l
3.6
0.793.472.28 l
l l
3.22 4.24
l6.3 4.56 l
l l5.75 l 3.88
l
−1
3.38
3.9
l
−2
−6 −4 −2 0 2 4 6
Dimension 1 (4%)
Fig. 11.10 This figure represents a CA providing an analysis of a rogue trading exposure
References
Ali, M. M., Mikhail, N. N., & Haq, M. S. (1978). A class of bivariate distributions including the
bivariate logistic. Journal of Multivariate Analysis 8, 405–412.
Antoch, J., & Hanousek, J. (2000). Model selection and simplification using lattices. CERGE-EI
Working Paper Series (164).
Bedford, T., & Cooke, R. M. (2001). Probability density decomposition for conditionally depen-
dent random variables modeled by vines. Annals of Mathematics and Artificial Intelligence, 32,
245–268.
Bedford, T., & Cooke, R. (2002). Vines: A new graphical model for dependent random variables.
The Annals of Statistics, 30(4), 1031–1068.
Benzécri, J.-P. (1973). L’Analyse des Données. Volume II: L’Analyse des Correspondances. Paris:
Dunod.
Berg, D., & Aas, K. (2009). Models for construction of multivariate dependence - a comparison
study. The European Journal of Finance, 15, 639–659.
Brechmann, E. C., Czado, C., & Aas, K. (2012). Truncated regular vines in high dimensions with
application to financial data. Canadian Journal of Statistics, 40(1), 68–85.
Capéraà, P., Fougères, A. L., & Genest, C. (2000). Bivariate distributions with given extreme value
attractor. Journal of Multivariate Analysis, 72, 30–49.
Chatterjee, S., & Hadi, A. S. (2015). Regression analysis by example. New York: Wiley.
Cox, D. R. (1958). The regression analysis of binary sequences (with discussion). Journal of Royal
Statistical Society B, 20, 215–242.
Dissmann, J., Brechmann, E. C., Czado, C., & Kurowicka, D. (2013). Selecting and estimating
regular vine copulae and application to financial returns. Computational Statistics & Data
Analysis, 59, 52–69.
Dowdy, S., Wearden, S., & Chilko, D. (2011). Statistics for research (Vol. 512). New York: Wiley.
Dragomir, S. S. (2003). A survey on Cauchy–Bunyakovsky–Schwarz type discrete inequalities.
JIPAM - Journal of Inequalities in Pure and Applied Mathematics, 4(3), 1–142.
EBA. (2014). Draft regulatory technical standards on assessment methodologies for the advanced
measurement approaches for operational risk under article 312 of regulation (eu), no. 575/2013.
London: European Banking Authority.
Freedman, D. A. (2009). Statistical models: Theory and practice. Cambridge: Cambridge Univer-
sity Press.
158 11 Dependencies and Relationships Between Variables
Galambos, J. (1978). The asymptotic theory of extreme order statistics. Wiley series in probability
and mathematical statistics. New York: Wiley.
Gonzalez-Fernandez, Y., Soto, M., & Meys, J. https://github.com/yasserglez/vines.
Goodman, L. A., & Kruskal, W. H. (1954). Measures of association for cross classifications.
Journal of the American Statistical Association, 49(268), 732–764.
Gourier, E., Farkas, W., & Abbate, D. (2009). Operational risk quantification using extreme value
theory and copulas: from theory to practice. The Journal of Operational Risk 4, 1–24.
Guégan, D., & Maugis, P.-A. (2010). New prospects on vines. Insurance Markets and Companies:
Analyses and Actuarial Computations, 1, 4–11.
Guégan, D., & Maugis, P.-A. (2011). An econometric study for vine copulas. International Journal
of Economics and Finance, 2(1), 2–14.
Guegan, D., & Hassani, B. K. (2013). Multivariate vars for operational risk capital computation:
a vine structure approach. International Journal of Risk Assessment and Management, 17(2),
148–170.
Hair, J. F. (2010). Multivariate data analysis. Pearson College Division.
Hirschfeld, H. O. (1935). A connection between correlation and contingency. Proceedings of
Cambridge Philosophical Society, 31, 520–524.
Joe, H. (1997). Multivariate models and dependence concepts. Monographs on statistics, applied
probability. London: Chapman and Hall.
Jolliffe, I. (2002). Principal component analysis. New York: Wiley.
Kendall, M. (1938). A new measure of rank correlation. Biometrika, 30(1–2), 81–89.
Kurowicka, D. and Cooke, R. M. (2004). Distribution - Free continuous bayesian belief nets.
In Fourth international conference on mathematical methods in reliability methodology and
practice. New Mexico: Santa Fe.
Kutner, M. H., Nachtsheim, C. J., & Neter, J. (2004). Applied linear regression models (4th ed.).
Boston: McGraw-Hill/Irwin.
Mendes, B., de Melo, E., & Nelsen, R. (2007). Robust fits for copula models. Communications in
Statistics: Simulation and Computation, 36, 997–1017
Mosteller, F., & Tukey J. W. (1977). Data analysis and regression: A second course in statistics.
Addison-Wesley series in behavioral science: Quantitative methods. Reading, MA: Addison-
Wesley.
Nelsen, R. B. (2006). An introduction to copulas. Springer series in statistics. Berlin: Springer.
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case
of a correlated system of variables is such that it can be reasonably supposed to have arisen
from random sampling. Philosophical Magazine Series 5, 50(302), 157–175.
Schepsmeier, U., Stoeber, J., Christian Brechmann, E., Graeler, B., Nagler, T., Erhardt, T., et al.
https://github.com/tnagler/VineCopula.
Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges. Publication Institute of
Statistics, 8, 229–231.
Spearman, C. (1904). The proof and measurement of association between two things. American
Journal of Psychology, 15, 72–101.
Index
A C
Activation, 113 Capital analysis and review (CCAR), 22, 26
Adaptive weights, 111 Chimera, 5
Advanced measurement approach (AMA), 21 Cleansing, 26, 32
Agreement, 39–43, 49, 50, 70 Clusters, 16, 30, 31, 34–36, 92, 116
Ancestors, 7, 98, 112 Collaborative, 39, 40, 42, 43, 49, 50
Approximation, 35, 54, 114–115, 120, 146 Computation, 30, 82, 105, 107, 112, 119
Autocorrelation, 123, 124, 126, 136, 137 Concentration risk, 12
Autoregressive conditional heteroskedasticity Concerns, 39–41, 49, 50, 67, 107
(ARCH), 125, 132 Conditional dependencies, 36, 98
Autoregressive fractionally integrated moving Conduct risk, 93
average (ARFIMA) Conjugate prior, 103
autoregressive (AR), 124, 127, 128, Consensus, 39–50, 70, 95
131–133, 135–137 Construction, 9, 65, 69, 86–87, 93, 124, 135
autoregressive integrated moving average Contagion, 6, 14–18, 30, 106, 107
(ARIMA), 125, 131, 132, 136, 137 Control, 3–6, 8–12, 19, 25, 30, 44–47, 69, 78,
autoregressive moving average (ARMA), 81, 88, 91, 92, 97, 107, 117, 145
125, 131 Cooperative, 39
moving average (MA), 125, 131, 136, 137 Copula
Archimedean, 151, 152
Clayton, 154
B Elliptic, 52, 151, 154
Back propagation, 114 Frank, 22, 154
Balanced, 40 Galambos, 151, 154
Bayesian Gaussian, 154, 155
estimation, 59, 60 Gumbel, 154
network, 36, 97–108, 113 Joe, 151, 154
Bayes theorem, 101, 102, 105 student, 151, 154, 155
BCBS 239, 26 Correlation
Big data, 26, 27, 120 Goodman and Krushal, 143
Black Swan, 5 Kendall, 143
Blocking rules, 42 Pearson, 30, 141, 142, 155
Block maxima, 71 Spearman, 142
Boolean, 81, 83, 84, 89, 94 Correspondence analysis, 156
Buy-in, 46, 48, 81 Country risk, 13, 21
L P
Latent variables, 36, 98, 149 Pattern, 16, 30–35, 112, 116, 124, 128, 129,
Learning, 8, 30–34, 36, 104–107, 111–119, 139, 146
145 Perceptron, 112, 115
Least-square, 129 Planning, 2, 3, 18–20, 22, 44, 91
Legal risk, 13, 14 Posterior, 101–103, 105, 115
Learning Principal component analysis, 156
semi-supervised, 33, 34 Prior, 6, 9, 12, 20, 30, 32, 41, 47, 59, 92,
supervised, 33–35, 112 101–103, 107, 115, 118
unsupervised, 33–35, 112 Processing, 25, 26, 31, 32, 35, 36, 94, 107,
Liquidity risk, 13 111, 112, 116, 119
Logic, 23, 25, 31, 35, 81–84, 86, 89, 94–95,
105, 112
Q
Quantiles, 27, 29, 56, 144
M
Market risk, 5, 13, 57
Markov Chain Monte Carlo (MCMC), 105 R
Maxima data set, 75 Rank, 29, 78, 91, 143, 154
Maximum likelihood estimation, 128 Regression
Mean, 7, 28, 49, 51–53, 55, 66, 72, 73, 76–78, linear, 145, 149, 150
86, 98, 102–104, 114, 115, 125, 126, logistic, 118, 148, 149
128, 133, 134, 139, 147, 148, 155 Regulation, 1, 2, 10, 17, 18, 21, 26, 93
Mean square error, 52, 148 Reputational risk, 14
Meta data, 25 Requirements, 7, 20, 22, 26, 57, 66, 78, 81, 101
Military, 1, 2 Residual risk, 6, 19
Minutes, 46 Residuals, 6, 19, 123, 129, 132, 137, 148–149
Model risk, 14 Risk
Moderator, 40 culture, 8–10
Moment, 6, 7, 27–29, 43, 54, 60, 66, 73, 76, data aggregation, 26
78, 82, 114, 125, 127, 135, 142 framework, 2, 9, 11–12, 26, 46, 69, 78, 79
measures, 12, 14, 26, 29, 51, 52, 56–58, 62,
65–67, 69, 71, 76–79, 141, 149
N owner, 44, 81
Nested copula, 151 Root cause analysis, 81, 91–92
Networks Rule of order, 49
Bayesian, 36, 97–108, 113
neural, 35, 111–120, 141
Neural network. See Networks S
Neuron, 35, 111, 115, 119 Seasonality, 69, 128–129, 137
Nodes, 34, 36, 86, 87, 92, 93, 98, 99, 101, 103, Seniority bias, 3, 70
105–107, 111, 113, 116, 151 Shape, 28, 51, 54, 64, 66, 73, 75, 76, 78, 93
Numeric data, 27–30, 124 Signal, 33, 111, 113, 116, 119
Sign-offs, 46, 48–49
Sklar, A., 151, 152
O Softmax activation function, 115
Objective function, 34, 36, 113, 114, 144 Spill-over, 14, 15
Observable quantities, 36, 98 Sponsorship, 46, 47, 92
Odds, 48, 58, 149, 150 Stationarity (stationary process), 7, 125, 128,
Operational risk, 2, 4, 13–15, 17, 19, 21, 44, 130, 135–138
57, 69, 74, 77, 78, 93 Stepwise, 40
Optimisation, 33, 34, 36, 114, 115, 119 Stress testing, 2, 6, 17–20, 22, 23, 58, 65
Origins, 7, 15, 30, 34, 58, 91, 98, 104, 112, Sum of squared error (SSE), 117, 118, 148
118, 125 Supervised neural network, 115
162 Index
Y
U Yule–Walker, 127, 128
Unanimous (Unanimity), 39, 41–43
Uni-root, 130, 134