100% found this document useful (9 votes)
2K views171 pages

Scenario Analysis in Risk Management

Risk communication and management

Uploaded by

ruissilva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (9 votes)
2K views171 pages

Scenario Analysis in Risk Management

Risk communication and management

Uploaded by

ruissilva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 171

Bertrand K.

Hassani

Scenario
Analysis in Risk
Management
Theory and Practice in Finance
Scenario Analysis in Risk Management
Bertrand K. Hassani

Scenario Analysis in Risk


Management
Theory and Practice in Finance

123
Dr. Bertrand K. Hassani
Global Head of Research and
Innovation - Risk Methodology
Grupo Santander
Madrid, Spain

Associate Researcher
Université Paris 1 Panthéon Sorbonne
Labex ReFi
Paris, France

The opinions, ideas and approaches expressed or presented are those of the author and do
not necessarily reflect Santander’s position. As a result, Santander cannot be held responsible
for them. The values presented are just illustrations and do not represent Santander losses,
exposures or risks.

ISBN 978-3-319-25054-0 ISBN 978-3-319-25056-4 (eBook)


DOI 10.1007/978-3-319-25056-4

Library of Congress Control Number: 2016950567

© Springer International Publishing Switzerland 2016


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG Switzerland
To my sunshines, Lila, Liam and Jihane
To my parents, my brother, my family, my
friends and my colleagues without whom I
would not be where I am
To Pr. Dr. Dominique Guégan, who believed
in me. . .
Preface

The objective of this book is to show that scenario analysis in financial institutions
can be addressed in various ways depending on what we would like to achieve.
There is not one method better than the other; there are just methods more
appropriate in some particular situations.
I heard so many times opinionated people selecting a scenario strategy over
another because everyone was doing it; that is not the appropriate answer and
may lead to selecting an inappropriate methodology and consequently to unusable
results. Even worse, the managers may lose faith in the process and tell everyone
that scenario analysis for risk management is useless.
Therefore, in this book, I am presenting various approaches to perform scenario
analysis; some are relying on quantitative approaches; others are more qualitative,
but once again, none of them are better than another. Each of them has some pros
and cons and depends on the maturity of your risk framework, the type of risk
that banks are willing to assess and manage and the information available. I tried
to present them in the simplest way possible and to keep only the essence of the
methodologies as in any case; eventually, the managers will have to fine-tune them,
making them their own approach. I hope this book will inspire them. One of my
objectives was also to make supposedly complicated methodologies accessible to
any risk managers. Indeed, these would just need to have a basic understanding of
mathematics.
Note that I implemented all the methodologies I am presenting in this book,
and all the figures presented are my own. Most of them have been implemented
in professional environments to answer practical issues. Therefore, I am giving
some tools for risk managers to address scenario analysis, I am providing leads
for researchers to start proposing solutions to address them and I hope that the
clear perspective of combining the methodologies will lead to future academic and
professional developments.

vii
viii Preface

As failures of risk management related to failures of scenario analysis pro-


grammes may have disastrous impacts, note that all the proceedings of this book
are going to charities to contribute to the relief of suffering people.

Global Head of Research and Innovation - Bertrand K. Hassani


Risk Methodology
Grupo Santander
Madrid, Spain
Associate Researcher
Université Paris 1 Panthéon Sorbonne
Labex ReFi
Paris, France
Biography

Bertrand is a risk measurement and management specialist (Credit, Market, Oper-


ational, Liquidity, Counterparty etc.) for SIFIs. He is also an active associate
researcher at Paris Pantheon-Sorbonne University. He wrote several articles dealing
with Risk Measures, Risk Modelling, and Risk Management. He is still studying
to obtain the D.Sc. degree (French H.D.R.). He spent two years working in the
Bond/Structure notes market (Eurocorporate), four years in the banking industry
in a Risk Management/Modelling department (BPCE) and one year as a Senior
Risk Consultant (Aon-AGRC within Unicredit in Milan). He is currently working
for Santander where he successively held the Head of Major Risk Management
position (San UK), the Head of Change and Consolidated Risk Management
position (San UK), the Global Head of Advanced and Alternative Analytics
position (Grupo Santander) and is now Global Head of Research and Innovations
(Grupo Santander) for the risk division. In this role, Bertrand aims at developing
novel approaches to measure risk (financial and non-financial) and integrating them
in the decision-making process of the bank (business orientated convoluted risk

ix
x Biography

management), relying on methodologies coming from the field of data science


(data mining, machine learning, frequentist statistics, A.I., etc.).
Contents

1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.1 Is this War? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.2 Scenario Planning: Why, What, Where, How, When. . . . . . . . . . . . . . 2
1.3 Objectives and Typology .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4
1.4 Scenario Pre-requirements .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6
1.5 Scenarios, a Living Organism . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7
1.6 Risk Culture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10
2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11
2.1 The Risk Framework .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11
2.2 The Risk Taxonomy: A Base for Story Lines . . .. . . . . . . . . . . . . . . . . . . . 12
2.3 Risk Interactions and Contagion.. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14
2.4 The Regulatory Framework .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 17
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 23
3 The Information Set: Feeding the Scenarios . . . . . . . .. . . . . . . . . . . . . . . . . . . . 25
3.1 Characterising Numeric Data . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 27
3.1.1 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 28
3.1.2 Quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 29
3.1.3 Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 29
3.2 Data Sciences .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 30
3.2.1 Data Mining.. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 30
3.2.2 Machine Learning and Artificial Intelligence . . . . . . . . . . . . . 32
3.2.3 Common Methodologies . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 34
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 36
4 The Consensus Approach .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 39
4.1 The Process .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 40
4.2 In Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 43
4.2.1 Pre-workshop . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 44
4.2.2 The Workshops . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 45

xi
xii Contents

4.3 For the Manager .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 46


4.3.1 Sponsorship .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 47
4.3.2 Buy-In .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 48
4.3.3 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 48
4.3.4 Sign-Offs .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 48
4.4 Alternatives and Comparison . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 49
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 50
5 Tilting Strategy: Using Probability Distribution Properties .. . . . . . . . . . 51
5.1 Theoretical Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 52
5.1.1 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 52
5.1.2 Risk Measures . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 56
5.1.3 Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 58
5.1.4 Goodness-of-Fit Tests . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 61
5.2 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 62
5.3 For the Manager: Pros and Cons . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 65
5.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 65
5.3.2 Distribution Selection . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 66
5.3.3 Risk Measures . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 66
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 67
6 Leveraging Extreme Value Theory . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 69
6.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 69
6.2 The Extreme Value Framework .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 71
6.2.1 Fisher–Tippett Theorem .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 72
6.2.2 The GEV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 72
6.2.3 Building the Data Set . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 74
6.2.4 How to Apply It? . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 75
6.3 Summary of Results Obtained .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 77
6.4 Conclusion .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 79
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 79
7 Fault Trees and Variations.. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 81
7.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 82
7.2 In Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 83
7.2.1 Symbols .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 83
7.2.2 Construction Steps . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 86
7.2.3 Analysis .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 88
7.2.4 For the Manager . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 89
7.2.5 Calculations: An Example . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 89
7.3 Alternatives .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 90
7.3.1 Failure Mode and Effects Analysis . . . .. . . . . . . . . . . . . . . . . . . . 91
7.3.2 Root Cause Analysis . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 91
7.3.3 Why-Because Strategy . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 92
7.3.4 Ishikawa’s Fishbone Diagrams.. . . . . . . .. . . . . . . . . . . . . . . . . . . . 93
7.3.5 Fuzzy Logic .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 94
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 95
Contents xiii

8 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 97
8.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 97
8.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 100
8.2.1 A Practical Focus on the Gaussian Case. . . . . . . . . . . . . . . . . . . 103
8.2.2 Moving Towards an Integrated System: Learning . . . . . . . . 104
8.3 For the Managers .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 106
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 108
9 Artificial Neural Network to Serve Scenario Analysis Purposes . . . . . . 111
9.1 Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 112
9.2 In Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 113
9.3 Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 114
9.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 116
9.5 For the Manager: Pros and Cons . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 119
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 120
10 Forward-Looking Underlying Information: Working with
Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 123
10.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 123
10.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 124
10.2.1 Theoretical Aspects. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 125
10.2.2 The Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 131
10.3 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 135
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 139
11 Dependencies and Relationships Between Variables . . . . . . . . . . . . . . . . . . . 141
11.1 Dependencies, Correlations and Copulas . . . . . . .. . . . . . . . . . . . . . . . . . . . 142
11.1.1 Correlations Measures .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 142
11.1.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 144
11.1.3 Copula .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 151
11.2 For the Manager .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 155
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 157

Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 159
Chapter 1
Introduction

1.1 Is this War?

Scenarios have been used for years in many areas (economics, military, aeronautics,
public health, etc.) and are far from being limited to the financial industry. Scenarios
are a postulated sequence or development of events, a summary of the plot of a play,
including information about its stakeholders, characters, locations, scenes, weather,
etc., i.e., anything that could contribute to make it more realistic. One of the key
aspects of scenario analysis is the fact that starting from one set of assumptions
it is possible to evaluate and map various outcome of a particular situation. While
in this book we will limit ourselves to the financial industry for our applications
and examples, it would be an extreme prejudice not to inspire ourselves from
what we could use from other industries in terms of methodologies, procedures or
regulations.
Indeed, to illustrate the importance of scenario analysis in our world, let’s
start with famous historical examples combining geopolitics and military strategy.
The greatest leaders in the history of mankind based their decisions on the
outcome of scenarios, Pearl Harbor attack was one of the outcomes of the scenario
analysed by Commanders Mitsuo Fuchida and Minoru Genda considering that
their objective was to make US naval forces inoperative for 6 months at least
(Burbeck, 2013). Sir Winston Churchill analysed the possibility of attacking the
Soviet Union with Americans and West Germans as allied after World War II
(Operation Unthinkable—Lewis 2008). Scenarios are a very useful and powerful
tool to analyse all potential future outcomes and prepare ourselves for them. From
a counter terrorism point of view, the protection scheme of nuclear plants from
terrorist attacks is clearly the result of a scenario analysis, for example, in France
squadron of fighter pilots are ready to take off and intercept an airborne potential
threat in less than 15 min. It is also really important to understand that the risk
assessment resulting from a scenario analysis may result in the acceptance of this
risk. The nuclear plant located in Fessenheim, next to the Switzerland border has

© Springer International Publishing Switzerland 2016 1


B.K. Hassani, Scenario Analysis in Risk Management,
DOI 10.1007/978-3-319-25056-4_1
2 1 Introduction

been built in a seismic area, but the authorities came to the conclusion that the risk
was acceptable, besides it is one of the oldest nuclear plants in France and one may
think the likelihood of a failure and age are correlated.1
In the military, most equipments are the results of either field experience or
scenarios or past failure, but in many industries, contrary to the financial sector,
we may not have the opportunity to wait for a failure to be able to identify an issue
and fix it, and therefore learn from it as in other industries such as aeronautics or
pharmaceutical if a failure occurs or a faulty product is released, people’s lives are
at risk.
Now, focusing on scenario analysis within financial institutions, this one has
usually one of the following forms. The first form is stress testing (Rebonato,
2010). Stress testing aims at assessing multiple outcomes resulting from adverse
stories of different magnitude, for instance, likely, mild and worse case scenario
relying on macroeconomic variables. Indeed, it is quite frequent to analyse a
particular situation with respect to how would macroeconomic variables evolve.
The second form relates to operational risk management as prescribed in the current
regulation,2 where scenarios are required for capital calculations (Pillar I and Pillar
II—Rippel and Teply 2011). The recent crisis taught us that banks failing due
to extreme incidents may dramatically impact the real economy, indeed, Société
Générale rogue tading, a massive operational risk resulted in a massive market
risk materialisation as all the prices went down simultaneously, in a huge lack
of liquidity as the interbanking market was failing (banks were not funding each
others) and consequently in the well-known credit crunch as banks were not funding
the real economy, the whole occurring within the context of the subprime crisis.
Impacted companies were suffering and some relatively healthy went even bankrupt.
The last use of scenarios is related to general risk management. It is probably the
most useful use of scenario analysis as it is not necessary a regulatory demand and as
such would only be used by risk managers to improve their risk framework removing
the pressure of a potential higher capital charge.

1.2 Scenario Planning: Why, What, Where, How, When. . .

Presenting scenario analysis in its globality and not only in the financial industry,
the following paragraph presents a military scenario planning. In this book, we
draw a parallel between the scenario process in the Army and in a financial
institution. The scenario planning as suggested in Aepli et al. (2010) is summarised
below. It can be broken down in 12 successive steps of equal importance and we

1
The idea behind these example is neither to generate any controversy nor to feed any conspiracy
theory but to refer to examples which should talk to the largest number of readers.
2
Note that though the regulation might change, scenarios should still be required for risk
management purposes.
1.2 Scenario Planning: Why, What, Where, How, When. . . 3

would recommend risk managers to keep them in mind undertaking such a process
(International Institute for Environment and Development (IIED) 2009; Gregory
Stone and Redmer, 2006).
1. Decide on the key question to be answered by the analysis. This allows creating
the framework for the analysis and condition the next points.
2. Set both time and scope of the analysis, i.e. place the scenario in a period of
time, define the environment and precise the condition.
3. Identify and select major stakeholders to be engaged, i.e. people at the
origination of the risk, responsible or accountable, or impacted by it.
4. Map basic trends and driving forces such as industry, economic, political,
technological, legal and societal trends. Evaluate to what extent these trends
affect the issues to be analysed.
5. Find key uncertainties, assess the presence of relationships between the driving
forces and rule out any inappropriate scenarios.
6. Group the linked forces and try to operate a reduction of the forces to the most
relevant.
7. Identify the extreme outcomes of the driving forces. Check the consistency and
the plausibility of these ones with respect of the time frame, the scope and the
environment of the scenario and stakeholders behaviours.
8. Define and write out the scenarios. The narrative is very important as it will be
a reference for all the stakeholders, i.e., a common ground for analysis.
9. Identify research needs (e.g. data, information, elements supporting the stories,
etc.).
10. Develop quantitative methods. Depending on the objectives, methodologies
may have to be refined or developed. This is the book main focus and it provides
multiple examples, but these are not exhaustive.
11. Assess the scenarios implementing for example one of the strategies presented
in this book, such as the consensus approach.
12. Transform the outcome of the scenario analysis into key management actions
to prevent, control or mitigate the risks.
These steps are almost applicable as such to perform a reliable scenario analysis in
a financial institution. None of the questions should be a priori left aside.
Remark 1.2.1 An issue to bear in mind during the scenario planning phase of the
process which may impact the model selection and the selection of the stakeholders
is what we would refer to as the seniority bias. This is something we observed
facilitating the workshops, even if you have the best experts of a topic in the
room, the presence of a more senior person might lead to a self-censorship. People
may censor themselves due to threats against them or their interests from their
line manager, shareholders, etc. Self-censor occurs when employees deliberately
distort their contributions either to please the more senior manager or by fear of him
without any other pressure than their own perception of the situation.
4 1 Introduction

1.3 Objectives and Typology

Now that we have presented examples of scenarios, a fundamental question need


to be raised: up to what extent the scenario should be real? Indeed what a
financial institution should focus on? A science fiction type of scenario such as a
meteor striking the building, except a reliable business continuity plan is not really
manageable. Another example relates to something that already happened, and the
institution has now good controls in place to prevent or mitigate the issue and
therefore did not suffer any incident in the past 20 years. Should it have a scenario?
Obviously, these questions are both rhetorical and provocative. What is the point of
a scenario if the outcome is not manageable or is already fully controlled, we do not
learn anything from the process, it might be considered a waste of time. Indeed, it is
important that in its scenario selection process a bank identify its immediate largest
exposures, those which could have a tremendous impact in the short term, even if
the event is assessed in the longer term, and prioritise those requiring an immediate
attention.
Remark 1.3.1 The usual denomination “1 in 100 years” scenario characterises a tail
event, but there is no information about when the event may occur. Indeed, contrary
to the common mistake, 1 in 100 years refers to a large magnitude not to the date
the scenario may materialise itself. Indeed, this one may occur the next day.
Scenario analysis may have a high impact on regulatory capital calculations
(operational risks) but this is not the focal point of this books, scenario analysis
should be used for management thoroughly anyway. We would argue that scenario
analysis is the purest risk management tool as if a risk materialises it is not a risk
any more, in the best case, it is an incident. Consequently, contrary to people mainly
dealing with the aftermath (accountant, for instance, except for what relates to
provisions), risk managers deal with exposures, i.e., a position which may ultimately
result in some damages for financial institutions. These may be financial losses (but
not necessarily if the risk is insured), reputational impact, etc. The most important
is that actually, a risk may never materialise in an incident. We may draw a parallel
between the risk and a volcano. Indeed, an incident is the crystallisation of a risk, so
metaphorically, it is the eruption of the volcano (especially is this one is considered
asleep). But this eruption may not engender any damages or losses if the lava is only
coming on one side and nothing is on its path, it may even generate some positive
things as it may provide some good fertiliser. However, if the eruption results in a
glowing cloud which destroys everything on its path, the impact might be dramatic.
The ultimate objective of the scenario analysis is to prevent and/or mitigate risks
and losses, therefore, in a first stage, it is important to identify the risks, to make
sure that controls are in place to prevent incidents, and if they still materialise,
mitigate the losses. At the risk of sounding overly dramatic, it is really important
that financial institutions follow a rigorous process as eventually, we are discussing
the protection of the consumer, the competitiveness of the bank and the security of
the financial system.
1.3 Objectives and Typology 5

To make this book more readable and to help risk managers sorting issues
in a simple scenario taxonomy, we propose the following classification. The
most destructive risks a financial institution has to bear are those we will label
Conventional Warfare, Over Confident, Black Swans, Dinosaurs and Chimera.
By “conventional warfare”, we are talking about the traditional risk, those you
would face on a “Business as Usual” basis, such as credit risk and market risk.
Taken independently, they are not usually leading to dramatic issues and the
bank address then permanently, but when an event transforms their non-correlated
behaviour into highly correlated one, i.e., each and every individual component fails
simultaneously, they might be dramatic (and may fall in the last category). The Over
Confident label refers to types of incidents which have already materialised but the
magnitude was really low, or it led to a near miss therefore practitioners assumed
that their framework was functioning until we have a similar but larger incident.
The Black Swan is as reference to Nassim Taleb’s book, entitled the Black Swan
(Taleb, 2010). The allegory of the Black Swan was, no one could ever believe that
Black Swans existed until someone saw one. For a financial institution it is “the
risk that can never materialise in a target entity” type of scenario, but only pure
lack of experience made us make that judgment. The Dinosaur is the risk that the
institution thought did not exist anymore but suddenly “comes back to life” and
stomps on the financial institution. This is typically the exposure to the back book
financial institutions are experiencing. The last one is the Chimera, the mythological
beast, the one which is not supposed to exist, it is the impossible, the things that do
not make sense a priori. Here, we know it can happen, we just do not believe it will
such as the Fessenheim nuclear plant example before, a meteor striking the building
or a rogue wave which until the middle of the twentieth century was consider as
nonexistent by scientist, despite having been reported by multiple witnesses. The
difference between the Black Swan and the Chimera types of scenarios is that the
Black Swan did exist we just did not know it, we did not even think about the
possible existence of a Black Swan, while the Chimera is not supposed to exist,
we do not want to believe it can happen even if we could imagine it, as it is
mythological, and we have not been able to understand the underlying phenomenon
yet.
Scenarios can both find their roots in endogenous and exogenous issues. Exam-
ples of endogenous risk are those due to the intrinsic way of doing business, of
crating value, of dealing with customers, etc. Exogenous risks are those having
external roots such as terrorist attacks and earthquakes. The main problem with
endogenous risk is that we may be able to point fingers at people if we experience
some failures and therefore, we may have an adverse incentive as these people
may not want anyone to discover that there is a potential issue in their area. While
exogenous risk, we may experience another problem, in the sense that sometime
not much can be done to control it, though awareness is still important. The human
aspect of scenario analysis briefly discussed here is really important and should
always be bore in mind. As if the process is not clearly explained and the people
working in the financial institution do not buy in then we will face a major issue, the
scenarios will not be reliable as they will not be performed properly, they would do
6 1 Introduction

them because it is compulsory, but they will never try to obtain any usable outcome
as for them it is a waste of time. The first step of a good scenario process is to teach
and train people on why scenarios are useful, how to deal with them, in other words
to market the process. The objective is to embedded the process. The best evidence
of an embedded process is the transformation of a demanded “tick the box” kind of
process to scenarios analysis performed by business unit themselves without being
requested to do so as it became part of their culture.
Another question which is worth addressing in the process is the moment when
we should capture the controls already in place. Indeed, facilitating a scenario
analysis, you will often hear the following answer to the question “do you have
a risk?”, “no, we have controls in place”. To what, the manager should reply, you
have controls because you have a risk. This comes from the confusion made between
inherent and residual risk. Indeed, the inherent risk is the one the entity faces, the
one it has before putting any controls or mitigants in place. The residual risk is
the one the financial institution faces after the controls. The one that will face
even if the mitigants are functioning. Performing a scenario analysis, it is really
important working with inherent risk in a first step, otherwise our perception of the
risk might be biased. Indeed, let’s assume we would rather work with the residual
risk, then your control is failing, you would never have captured the real exposure,
and therefore would have assumed you were safe when you were not. Therefore,
we would recommend working with the inherent risk in the first place and capturing
the impact of the control in a second stage. The inherent risk will also support the
internal process of prioritisation.
Another question arise, should scenarios be analysed independently one from
the other or should we adopt a holistic methodology? Obviously here it not only
depends on the quality and the availability of the information, inputs, experts, timing
and feasibility, but also on the type of scenario you are interested in analysing.
Indeed, if your scenario is for stress testing purposes and a contagion channel has
been identified between various risks, you would need to capture this phenomenon
otherwise the full exposure will not be taken into account and your scenario will
not be representative of the threat. Now, if you are only working on a limited scope
kind of scenarios and you only have a few weeks to do the analysis you may want to
adopt an alternative strategy. Note that holistic approaches are usually highly input
consuming.

1.4 Scenario Pre-requirements

One of the key success factors of scenario analysis is the analysis of the underlying
inputs, for instance, the data. These are analysed prior to the scenario analysis,
this is the starting point to evaluate the extreme exposure. No one should ever
underestimate the importance of data in scenario analysis, in both what it brings
and the limitations associated. Indeed, the information used for scenario analysis,
obtained internally (losses, customer data, etc.) or externally (macroeconomic
variables, external LGD, etc.) are key to the reliability of the scenario analysis,
1.5 Scenarios, a Living Organism 7

but some major challenges may arise that could limit the use of these data and
worse may mislead people owning the scenarios, i.e., responsible for evaluating the
exposures and dealing with the outcomes. Some of the main issues we would need
to discuss are
• Data security: It is the issue of individual privacy. While using the data we have
to be careful not to threaten the character confidential of most data.
• Data integrity: Clearly, data analysis can only be as good as the data relying upon.
A key implementation challenge is integrating conflicting or redundant data from
different sources. A data validation process should be undertaken. This is the
process of ensuring that a program operates on clean, correct and useful data,
checking the correctness, the meaningfulness and the security of data used as
input into the system.
• Stationarity analysis: In mathematics and statistics, a stationary process is a
stochastic process whose joint probability distribution does not change when
shifted in time. Consequently, moments such as mean and variance, if they exist,
do not change over time and do not follow any trends. In other words, we can
rely on past data to predict the future (up to certain extent).
• Technical obsolescence: The requirement we all have to store large quantity of
data drives technological innovation in storage. This results in fast advances in
storage technology. However, the technologies that used to be the best not so
long ago are rapidly discarded by both suppliers and customers. Proper migration
strategies have to be anticipated at the risk of not being able to access the data
anymore.
• Data relevance: How old should be the data? Can we assume a single horizon
of analysis for all the data or depending on the question we are interested in
answering, should we use different horizons? This question is almost rhetorical
as obviously we need to use the data that are appropriate and consistent with what
we would be interested in analysing. It also means that the quantity of data and
their reliability depends on the possibility to use outdated data.

1.5 Scenarios, a Living Organism

It is extremely important to understand that scenario analysis is like a living


organism. It is alive, self feeding, evolving and may become something completely
different from what we originally intended to achieve. It is possible to draw a
parallel between a recurring scenario analysis process in a company and the theory
of evolution of Charles Darwin (up to a certain extent). Darwin’s general theory
presumes the development of life from non-life and stresses a purely naturalistic
(undirected) descent with modification (Darwin, 1859). Complex creatures evolve
from more simplistic ancestors naturally over time. These mutations are passed on
to the next generation. Over time, beneficial mutations accumulate and the result is
an entirely different organism. In a bank, it is the same, the mutation is embedded
8 1 Introduction

in the genetic code, as in the savana, the bank that is going to survive the longer is
not the biggest or the strongest, but the one the most likely to adapt, and scenario
allows adaptation through understanding of the environment.
Darwin’s theory of evolution is a slow gradual process. Darwin wrote, “Natural
selection acts only by taking advantage of slight successive variations; she can never
take a great and sudden leap, but must advance by short and sure, though slow
steps” formed by numerous, successive, slight modifications. The transcription of
the evolution into a financial institution tells us that scenarios may evolve slowly,
but they will evolve as long as practices. A scenario to be plausible should capture
the largest number of impacts and interactions. As for Darwin’s theoretical starting
point for evolution, the starting point of a scenario analysis process is always quite
gross, but by digging more and more every time, learning from experience, this
heuristic process would lead to better ways of assessing the risk, better outcomes,
better controls, etc.
Indeed, we usually observe that the scenario analysis process in a financial
institution mature in parallel of the framework. The first time the process is
undertaken, this one is never based on the most advanced strategy, the latest
methodologies and does not necessarily provide the most precise results. But this
phase is really important and necessary as it is the ignition phase, i.e., the one
that triggers a cultural change in terms of risk management procedure. The process
will constantly evolve towards the most appropriate strategy for the target financial
institution as the stakeholders will own the process.
Scenario is not a box ticking process.

1.6 Risk Culture

It is widely agreed that failures of culture (Ashby et al., 2013), which permitted
excessive and uncontrolled risk-taking and a loss of focus on end customer, were
at the heart of the financial crisis. The cultural dimensions of risk-taking and
control in financial organisations have been widely discussed, arguing that, for
all the many formal frameworks and technical modelling expertise of modern
financial risk management, risk-taking behaviour and a questionable ethics were
misunderstood by individuals, companies and regulators. The growing interest in
financial institution risk culture since 2008 has been symptomatic of a desire to
reconnect risk-taking, related management and appropriate return. The couple risk-
return which somehow has been forgotten came back not as a couple but as a single
polymorphic organism in which risk and return are indivisible elements.
When risk culture change programs were being led by risk functions the reshap-
ing of the organisational risk management was at the centre of these programs. Risk
culture is a way of framing and perceiving risk issues in an organisation. In addition,
risk culture is itself a composite of a number of interrelated factors involving many
trade-offs. Risk culture is not static but dynamic, a continuous process which repeats
and renews itself constantly. The risk culture is permanently subject to shocks that
1.6 Risk Culture 9

lead to permanent questioning. The informal aspect is probably the most important,
i.e., small behaviours and habits which in the aggregate constitute the state of
risk culture. Note that risk culture can be taken in a more general sense, as risk
culture is what makes us fasten our seat-belts in our cars. Risk culture is usually
transorganisational, and different risk cultures may be found within organisations or
across the financial industry.
The most fundamental issue at stake in the risk culture debate is an organisations
self-awareness of its balance between risk-taking and control. It is clear that
many organisational actors prior to the financial crisis were either unaware of,
or indifferent to, the risk profile of the organisation as a whole as soon as the
return generated was appropriate or sufficient according to their own standard.
Indeed, inefficient control functions and revenue-generating functions considered
more important created an unbalanced relationship leading to the disaster we know.
The risk appetite framework now helps articulating these relationships with more
clarity.
The risk culture discussion shows the desire to make risk and risk management a
more prominent feature of organisational decision-making and governance, with
the embedded idea to move towards a more convoluted risk framework, i.e., a
framework in which the risk department is engaged before rather than after a
business decision is made. The usual structure of the risk management framework
currently relies on
• a three Lines of Defence backbone
• risk oversight units and capabilities and
• increased attention to risk information consolidation and aggregation.
Risk representatives engage directly with the businesses, acting as trusted advisors;
they usually propose risk training programs and general awareness-raising activities.
Naturally this is only possible if the risk function is credible. The former approach
involves acting on the capabilities of the risk function and in developing greater
business fluency and credibility. Combining the independence of the second line
of defence and the construction of partnerships might be perceived as inconsistent,
though one may argue that an effective supervision requires proper explanations and
clear statements of the expectations to the supervisee. Consequently, they need to
have good relationships and regular interactions (structured or ad-hoc).
According to Ashby et al. (2013), two kinds of attitude have been observed
towards interactions: enthusiastic and realistic. The former are developing tools
on their own, and are investing time and resource in building informal internal
networks. Realists have a tendency to think that too much interaction can inhibit
decision-making. Realists have more respect for the lines of defense models than
enthusiasts who continually work across first and second lines. Limits and related
risk management policies and rules unintentionally become a system in their own
right. The impact of history and collective memory of past incidents should not be
underestimated as this is a constituting part of the culture of the company and may
drive future risk management behaviours.
10 1 Introduction

Regulation has undoubtedly been a big driver of risk culture change programmes.
Though a lot of organisations were frustrated about the weigh of the regulatory
demand, they had no choice but to cooperate and most of them sooner or latter
accepted the new regulatory climate and worked with it more actively; however, it
is still unclear if the extent of the regulatory footprint on the business has been fully
understood.
Behaviour alteration related to cultural change requires repositioning customer
service at the centre of financial institutions activities, and good behaviour should
be incentivised for faster changes. Martial artists say that it requires 1000 repetitions
of a single move to make it a reflex, and 10,000 thousands to change it. Therefore it
is critical to adjust behaviours before it becomes a reflex.
Scenario analysis will impact the risk culture within a financial institution as it
will change the perception of some risks and will consequently lead to the creation,
the amendment or enhancement of controls, leading themselves to the reinforcement
of the risk culture. As mentioned previously, scenarios will evolve and the risk
culture will evolve simultaneously. We believe that the current three line of defence
model will slowly fade away as the empowerment of the first line will grow.

References

Aepli, P., Summerfield, E., & Ribaux, O. (2010). Decision making in policing: Operations and
management. Lausanne: EPFL Press.
Ashby, S., Palermo, T., & Power, M. (2013). Risk culture in financial organisations - a research
report. London: London School of Economics.
Burbeck, J. (2013). Pearl Harbor - a World War II summary. http://www.wtj.com/articles/pearl_
harbor/.
Darwin, C. (1859). On the origin of species by means of natural selection, or the preservation of
favoured races in the struggle for life (1st ed.). London: John Murray.
Gregory Stone, A., & Redmer, T. A. O. (2006). The case study approach to scenario planning.
Journal of Practical Consulting, 1(1), 7–18.
International Institute for Environment and Development (IIED). (2009). In Profiles of tools and
tactics for environmental mainstreaming. Scenario planning, No. 9.
Lewis, J. (2008). Changing direction: British military planning for post-war strategic defence (2nd
ed.). London: Routledge.
Rebonato, R. (2010). Coherent stress testing: A Bayesian approach to the analysis of financial
stress. London: Wiley.
Rippel, M., & Teply, P. (2011). Operational risk - scenario analysis. Prague Economic Paper, 1,
23–39.
Taleb, N. (2010). The black swan: The impact of highly improbable (2nd ed.). New York: Random
House and Penguin.
Chapter 2
Environment

2.1 The Risk Framework

As introduced in the previous chapter, risk management is a central element of


banking—integrating risk management practices into processes, systems and culture
is key. As a proactive partner to senior management, risk management value lies in
supporting and challenging them to align the business control environment with the
bank’s strategy by measuring and mitigating risk exposure and therefore contribut-
ing to optimal return for stakeholders. For instance, some banks invested heavily
in understanding customer behaviour through new systems initially designed for
fraud detection, which is now being leveraged beyond compliance to address more
effective customer service.
The risk department of an organisation keeps its people up-to-date on problems
that have happened to other financial institutions, allowing it to take a more proactive
approach. As mentioned previously, the risk framework of a financial institution is
usually split into three layers, usually referred to as three lines of defense. The first
line which is in the business is supposed to manage the risks, the second line is
supposed to control the risks and the third line characterised by the audit department
is supposed to oversee. The target is to embed the risk framework, i.e., empower the
first line of defence to identify, assess, manage, report, etc. Ultimately, each and
every person working in the bank is a risk manager, and any piece of data is a risk
data.
Contrary to what the latest regulatory documents suggest, there is no one-size-
fits-all approach to risk management—as every company has a framework specific
to its own internal operating environment. A bank should aim for integrated risk
frameworks and models supporting behavioural improvements. Understanding the
risks should mechanically lead to a better decision-making process and to better
performance, i.e., better or more efficient returns (in the portfolio theory sense—
Markowitz 1952).

© Springer International Publishing Switzerland 2016 11


B.K. Hassani, Scenario Analysis in Risk Management,
DOI 10.1007/978-3-319-25056-4_2
12 2 Environment

Banks’ risk strategy drives the management framework as it sets the tone for
risk appetite, policies, controls and “business as usual” risk management processes.
Policies should be efficiently and effectively cascaded at all levels as long as across
the entity to ensure a homogeneous risk management.
The risk governance is the process by which the Board of Directors sets
objectives, oversees the framework and the management execution. A successful
risk strategy is equivalent to the risk being embedded at every level of a financial
institution. Governance sets the precedence for strategy, structure and execution. An
ideal risk management process ensures that organisational behaviour is consistent
with its risk appetite or tolerance, i.e., the risk an institution is willing to take to
generate a particular return. In other words, the risk appetite has two components:
risk and return. Through the risk appetite process, we see that risk management
clearly informs business decisions.
In financial institutions, it is necessary to evaluate the risk management effective-
ness regularly to ensure its quality in the long term, and to test stressed situations
to ensure its reliability when extreme incidents materialise. Here, we realise that
scenario analysis is inherent to risk management as we are talking about situations
which never materialised.
The appropriate risk management execution requires risk measurement tools
relying on the information obtained through risk control self-assessments, data
collection, etc., to better replicate the company risk profile. Indeed, appropriate risk
mitigation and internal control procedures are established in the first line such that
the risk is mitigated. “Key Risk Indicators” are established to ensure timely warning
is received prior to the occurrence of an event (COSO, 2004).

2.2 The Risk Taxonomy: A Base for Story Lines

In this section we present the main risks to which scenario analysis is usually or can
be applied in financial institutions. This list is non-exhaustive but gives a good idea
of the task to be accomplished.
Starting with credit risk, this one is defined as the risk of default on a debt
that may arise from a borrower failing to make contractual payments, such as the
principal and/ or the interests. The loss may be total or partial. Credit risk can itself
be split as follows:
• Credit default risk is the risk of loss arising from a debtor being unable to pay its
debt. For example, if the debtor is more than 90 days past due on any material
credit obligation. A potential story line would be an increase in the probability of
default of a signature due to a decrease in the profit generated.
• Concentration risk is the risk associated with a single type of counterparty
(signature or industry) having the potential to produce losses large enough to
lead to the failure of the financial institution. An example of story line would be
2.2 The Risk Taxonomy: A Base for Story Lines 13

a breach in concentration appetite due to a position taken by the target entity for
the sake of another entity of the same group.
• Country risk is the risk of loss arising from a sovereign state freezing foreign
currency payments or defaulting on its obligations. The relationship between
this risk, macroeconomics and countries stability is non-negligible. Political risk
analysis lies at the intersection between politics and business, and it deals with
the probability that political decisions, events or conditions significantly affect
the profitability of a business actor or the expected value of a given economic
action. An acceptable story line would be the bank has invested in a country in
which the government has changed and has nationalised some of the companies.
Market risk is the risk of a loss in positions arising from movements in market
prices. This one can be split between,
• Equity risk: the risk associated with changes in stock or stock index prices.
• Interest rate risk: the risk associated with changes in interest rates.
• Currency risk: the risk associated with changes in foreign exchange rates.
• Commodity risk: the risk associated with changes in commodity prices.
• Margining risk results from uncertain future cash outflows due to margin calls
covering adverse value changes of a given position.
A potential story line would be a simultaneous drop in all indexes, rates and currency
of a country due to a sudden decrease of GDP.
Liquidity risk is the risk that given a certain period of time, a particular financial
asset cannot be traded quickly enough without impacting the market price. A
story line could be a portfolio of structured notes that was performing correctly
is suddenly crashing as the index on which they have been built is dropping, but
the structured notes have no market and therefore the products can only be sold at a
huge loss. It might make more sense to analyse the liquidity risk at the micro level
(portfolio level). Regarding this risk of illiquidity at the macro level, considering that
a bank is transforming the money with a short duration such as savings into money
with a longer one through lending, a bank is operating a maturity transformation.
This ends up in banks having an unfavourable liquidity position as they do not
have access to the money they lent while the money they owe to customer can be
withdrawn at any time on demand. Through “asset and liability management”, banks
are managing this mismatch, however, and we cannot emphasise enough this point,
this implies that banks are structurally illiquid (Guégan and Hassani, 2015).
Operational risk is defined as the risk of loss resulting from inadequate or failed
internal processes, people and systems or from external events. This definition
includes legal risk, but excludes strategic and reputational risk (BCBS, 2004). It
also includes other classes of risk, such as fraud, security, privacy protection, cyber
risks, physical, environmental risks and currently one of the most dramatic, conduct
risk. Contrary to other risks such as those related to credit or market, operational
risks are usually not willingly incurred nor are they revenue driven (i.e. they are
not resulting from a voluntary position), they are not necessarily diversifiable, but
they are manageable. An example of story line would be the occurrence of a rogue
trading on the “delta one” desk on which a trader took an illegal position. Note that
14 2 Environment

for some bank this might not be a scenario as it happened, but for others it might be
an interesting case to test their resilience.
Financial institutions misconduct or perception of misconduct leads to con-
duct risk. Indeed, the terminology “conduct risk” gathers various processes and
behaviours which fall into operational risk Basel category 4 (Clients, Products
and Business Practices), but goes beyond as it generally implies a non-negligible
reputational risk. Conduct risk can lead to huge losses, usually resulting from
compensations, fines or remediation costs and the reputational impact (see below)
might non negligible. Contrary to other operational risks, conduct risk is connected
to the activity of the financial institution, i.e. the way the business is driven.
Legal risk is a component of operational risk. It is the risk of loss which is
primarily caused by a defective transaction, a claim, a change in law, an inadequate
management of non-contractual rights, a failure to meet non-contractual obligations
among other things (McCormick, 2011). Some may define it as any incident
implying a litigation.
Model risk is the risk of loss resulting from using models to make decisions
(Hassani, 2015). Understanding this risk partly as probability and partly as impact
provides insight into other risk measured. A potential story line would be a
model not properly adjusted due to a paradigm shift in the market leading to an
inappropriate hedge of some positions.
Reputational risk is a risk of loss resulting from damages to a firm’s reputation
in terms of revenue, operating costs, capital or regulatory costs, or destruction of
shareholder value, resulting from an adverse or potentially criminal event even if the
company is not found guilty. In that case, a good reputational risk scenario would be
a loss of income due to the discovery that the target entity is funding illegal activities
in a banned country. Once again, for some banks this might not be as scenario as the
incident already materialised, but the lesson learnt might be useful for others.
The systemic risk defines itself as the risk of collapse of an entire financial
system, as opposed to the risk associated with the failure of one of its component
without jeopardising the entire system. The financial system instability engendered
potentially caused or exacerbated by idiosyncratic events or conditions in financial
intermediaries may lead to the destruction of the system (Piatetsky-Shapiro, 2011).
The materialisation of a systemic risk implies the presence of interdependencies
in the financial system, i.e. the failure of a single entity may trigger a cascading
failure, which could potentially bankrupt or bring down the entire system or market
(Schwarcz, 2008).

2.3 Risk Interactions and Contagion

It is not possible to discuss scenario analysis without addressing contagion effects.


Indeed, it is not always possible or appropriate to deal with a particular risk and
analysing it in silo. It is important to capture the impact of a risk over another, i.e.,
a spread or a spillover effect.
2.3 Risk Interactions and Contagion 15

In fact this aspect is too often left aside when it should be at the centre of the
topic. Combined effect due to contagion can lead to larger losses than the sum of the
impact of each components taken separately. Consequently, capturing the contagion
effect between the risks may be a first way of tackling systemic risks.
Originally, financial contagion referred to the spread of market disturbances from
one country to the other. Financial contagion is a natural risk for countries whose
financial systems are integrated in international financial markets as obviously what
occurs in a country would mechanically impact the other in a way or another. The
impact is usually proportional to the incident, in other words, the larger the issue,
the larger the impact on the other countries belonging to the same system unless
some mitigants are in place to at least confine the smaller exposures. The contagion
phenomenon is usually one of the main components explaining that a crisis is not
contained and may pass across borders and affect an entire region of the globe.
Financial contagion may occur at any level of a particular economy and may be
triggered by various things. Note that lately, banks have been at the conjunction of a
dramatic contagion process (subprime crisis), but inappropriate political decision
may lead to even larger issues. At the domestic level, usually the failure of a
domestic bank or financial intermediary triggers a transmission when it defaults on
interbank liabilities and sells assets in a fire sale, thereby undermining confidence
in similar banks. International financial contagion, which happens in both advanced
and developing economies, is the transmission of a financial crisis across financial
markets to directly and indirectly connected economies. However, in today’s
financial system, due to both cross-regional and cross-border operations of banks,
financial contagion usually happens simultaneously at the domestic level and across
borders.
Financial contagion usually generates financial volatility and may damage the
economy of countries. There are several branches of classifications that explain
the mechanism of financial contagion, which are spillover effects and financial
crisis that are caused by the influence of the four agents’ behaviour. These are
governments, financial institutions, investors and borrowers (Dornbusch et al., 2000)
The first branch, spillover effects, can be seen as a negative externality. Spillover
effects are also known as fundamental-based contagion. These effects can occur
globally, i.e., affecting several countries simultaneously, or regionally, only impact-
ing adjacent countries. The larger the countries, the more global the effect is the
general rule. Conversely, the smaller countries are those triggering regional effects.
Though some debates arose regarding the difference between co-movements and
contagion, here we will state that if what happen in a particular location directly or
indirectly impact the situation in another geographical region, with a time lag1 then,
we should refer to it as contagion.
At the micro level, from a risk management perspective, contagion should be
considered when the materialisation of a first risk (say operational risk) triggers the
materialisation of subsequent risk (for instance, market or credit). This is typically

1
This one might be extremely short.
16 2 Environment

what happened in Société Générale rogue trading issue as briefly discussed in the
previous chapter.
From a macroeconomic point of view, contagion effects have repercussions on
an international scale transmitted through channels such as trade links, competitive
devaluations and financial links. “A financial crisis in one country can lead to direct
financial effects, including reductions in trade credits, foreign direct investment, and
other capital flows abroad”. Financial links come from globalisation since countries
try to be more economically integrated with global financial markets. Many authors
have analysed financial contagions. Allen and Gale (2000) and Lagunoff and
Schreft (2001) analyse financial contagion as a result of linkages among financial
intermediaries.
Trade links are another type of shock that has its similarities to common shocks
and financial links. These types of shocks are more focused on its integration
causing local impacts. Kaminsky and Reinhart (2000) document the evidence that
trade links in goods and services and exposure to a common creditor can explain
earlier crises clusters, not only the debt crisis of the early 1980s and 1990s, but also
the observed historical pattern of contagion.
Irrational phenomenon might also cause financial contagion. Co-movements
are considered irrational when there is no global shock triggering and interde-
pendence channeling. The cause is related to one of the four agents’ behaviours
presented earlier. Contagion causes are increased risk aversion, lack of confidence
and financial fears. Transmission channel can be through typical correlations or
liquidation processes (i.e. sell in one country to fund a position in another) (King
and Wadhwani, 1990; Calvo, 2004).
Remark 2.3.1 Investor’s behaviour seems to be one of the biggest issues that can
impact a country’s financial system.
So to summarise, a contagion may be caused by:
1. Irrational co-movements related to crowed psychology (Shiller, 1984; Kirman,
1993)
2. Rational but excessive co-movements
3. Liquidity problems
4. Information asymmetry and coordination problems
5. Shift of equilibrium
6. Change in the international financial system, or in the rules of the game
7. Geographic factors or neighbourhood effect (De Gregorio and Valdes, 2001)
8. The developments of sophisticated financial products, such as credit default
swaps and collateralised debt obligations which spread the exposure across the
world (sub-prime crisis).
Capturing interactions and contagion effects leads to analysing financial crises.
The term financial crisis refers to a variety of situations resulting in a loss of
paper wealth, which may ultimately affect the real economy. An interesting way
of representing financial contagion can be done extending models used to represent
epidemics as illustrated by Figs. 2.1 and 2.2.
2.4 The Regulatory Framework 17

Financial Crisis: Contagion From A Country To Another

l
l

USA
l UK
l France
ll China
l l Brazil
l
l ll l l l
l lll l l
l ll ll l lll l l
l llll lllllll l l
l l l lllll lll
l
ll l
ll
l l ll l l l l
llllllll lll l
l l lll l ll
ll ll
ll ll l
l ll ll
l l lll
l
l l lllllll l
ll
ll
l llll lll l
l l l lllll
lll ll
l l l l lll lllll l l l
ll ll l
l l llllllll ll
l lllll l l l l
l ll ll l
l
l l ll l ll l
l l l ll ll l l
l
l l
l l l
l
l l
l
l
l

Fig. 2.1 In order to graphically represent a financial contagion, I inspired myself from a model
created to represent the way epidemies move from a specific geographic region to another
(Oganisian, 2015)

2.4 The Regulatory Framework

In this section, we will briefly discuss the regulatory framework surrounding


scenario analysis. Indeed, scenario analysis can be found in multiple regulatory
processes, such as stress testing and operational risk management and not only from
the financial industry. As it has been introduced in the previous sections, we believe
that some precision regarding stress testing might be useful to understand the scope
of the pieces of regulations below.
The stress-testing term generally refers to examining how a company’s finances
respond to an extreme scenario. The stress-testing process is important for prudent
business management, as it looks at the “what if” scenarios companies need
to explore to determine their vulnerabilities. Since the early 1990s, catastrophe
modelling, which is a form of scenario analysis for providing insight into the
magnitude and probabilities of potential business disasters, has become increasingly
sophisticated. Regulators globally are increasingly encouraging the use of stress
testing to evaluate capital adequacy (Quagliariello, 2009).
18 2 Environment

Contagion From Country to Country

l l
French
lTunisia l lArabia
Polynesia
Malaysia
Saudi
l
Croatia
Bermuda l
Oman
l lPanama
Kazakhstan
l Azerbaijan
Czech lBelize
Republic l l
Lebanon
lIslands Madagascar
Falkland l l Singapore
l
New
Argentina l
Caledonial l l
Fiji
NA
l l
Peru
Tuvalu l
Mauritius
CostalCameroon
Rical l
Paraguay Thailand l
Guadeloupe
l l
Romania
Latvia
l l Greenland l lMacau
l l
Nicaragua Luxembourg Jersey Togo
Israel l l
Japan
l
Seychelles
Dominican l Republic
United Arab l l Kuwait
Botswana
lColombia
Emirates l l
Sweden l
Guernsey
l
China
l lLucia
Saint
Haiti
l
Finland l l l l Burma
Gabon
l
Uganda Mozambique
Nigeria l
Morocco Switzerlandl
l l
Ireland l
Slovenia
l Burkina Faso
South
Sudan Africa l l
Bulgaria
Angola l l
UkraineHungary
l
Greece l Saint Kittsland Nevis
l
Anguilla l Equatorial
Venezuela l Guinea l
Netherlands
Ethiopia l
l Australia l and
Trinidad
Georgia lTaiwan
Tobago
l Egypt l
l
Guam
l Guinea
Papua New
l l l
Italy
l
Cuba United l
Kingdom
United Montenegro
lStatesBritish Virgin Islands l
l l
lEritrea
Chile l Turkey
Somalia Barbados
l Iran
l l
Pakistan
Germany lPortugal
Vanuatu l
l Qatar l
Liberia
l
Spain l
Serbia
l l
l l Antilles
Cyprus
l
Niger l Brazil
Tajikistan
l
Afghanistan
Laos
Netherlands Capel l l
Bahrain
Tanzania
Verde
l
Yemen l
Slovakia
lRussia l Senegal
lMexico l Ghana
Vietnam lFrance l
Zambia
l l Jamaica l lNepal
Uzbekistan
lSriVirgin
Mauritania
l l Southl
Islands l Uruguay
Philippines
Korea l l lEcuador l
Lanka
l
Belgium Antigua
l and
Denmark Barbuda
Honduras lBolivia
l Lithuania
Canada l
l New Zealand Malta
l Bangladesh
lDjibouti
Kyrgyzstan
Northern l
l Mariana l Islands
Indonesia
l Western
Norway lSahara l l
l
Rwanda l Austria
Bahamas
l
Algeria l
lMalawi Kenya l
l Guyana
Bosnia landPuerto
Maldives l Rico
Herzegovina HonglKong
Cookl Islands l
Iraq l
Guatemala
l l
Turkmenistan
l Belarus
Congo (Kinshasa) l
Armenia Impact
lCambodia
El Salvador l l
Jordan
Poland Catalyst
l India
l Trigger

Fig. 2.2 This figure is similar to the Fig. 2.1, though here the representation is more granular
and sort countries involved in three categories: Trigger (origin), Catalyst (enabler or transmission
channel) and Impact (countries impacted)

Although financial institutions monitor and forecast various risks-operational,


market and credit, as well as measure the sensitivities to determine how much capital
they should hold, it seems that many of them ignored the risks of overextended credit
in this case. When new regulations are brought into play, financial institutions adapt
themselves, but adaptation is not the only way forward. They must learn how to best
use the data that they already possess to enable them to embrace regulatory change
without seeing it as a burden. Although companies seek to increase reliability and
profitability, and regulation can be a drain on costs, the seamless integration of
risk management processes and tools—which includes stress testing and scenario
analysis—should give them a competitive advantage and enable them to become
more sustainable. Ongoing business planning is dependent on accurate forecasting.
Without good stress testing and scenario analysis, big corporations cannot make
accurate business forecasts.
2.4 The Regulatory Framework 19

One approach is to view the business from a portfolio perspective, with capital
management, liquidity management and financial performance integrated into the
process. Comprehensive stress testing and scenario analysis must take into account
all risk factors, including credit, market, liquidity, operational, funding, interest,
foreign exchange and trading risks. To these must be added operational risks due to
inadequate systems and controls, insurance risk (including catastrophes), business
risk factors (including interest rate, securitisation and residual risks), concentration
risk, high impact low-probability events, cyclicality and capital planning.
In the following paragraphs, we extract quotes from multiple regulatory docu-
ments or international associations discussing scenario analysis requests to empha-
sise how important the process is considered. We analysed documents from multiple
countries and multiple industries. These documents are also used to give some
perspectives and illustrate the relationships between scenario analysis, stress testing
and risk management.
In IAA (2013), the International Actuarial Association points out the differences
between scenario analysis and stress testing: “A scenario is a possible future
environment, either at a point in time or over a period of time. A projection of
the effects of a scenario over the time period studied can either address a particular
firm or an entire industry or national economy. To determine the relevant aspects of
this situation to consider, one or more events or changes in circumstances may be
forecast, possibly through identification or simulation of several risk factors, often
over multiple time periods. The effect of these events or changes in circumstances
in a scenario can be generated from a shock to the system resulting from a sudden
change in a single variable or risk factor. Scenarios can also be complex, involving
changes to and interactions among many factors over time, perhaps generated by a
set of cascading events. It can be helpful in scenario analysis to provide a narrative
(story) behind the scenario, including the risks (events) that generated the scenario.
Because the future is uncertain, there are many possible scenarios. In addition
there may be a range of financial effects on a firm arising from each scenario. The
projection of the financial effects during a selected scenario will likely differ from
those seen using the modeler’s best expectation of the way the current state of the
world is most likely to evolve. Nevertheless, an analysis of alternative scenarios can
provide useful information to involved stakeholders. While the study of the effect
of likely scenarios is useful for business planning and for the estimation of expected
profits or losses, it is not useful for assessing the impact of rare and/or catastrophic
future events, or even moderately adverse scenarios. A scenario with significant or
unexpected adverse consequences is referred to as a stress scenario.”
“A stress test is a projection of the financial condition of a firm or economy
under a specific set of severely adverse conditions that may be the result of several
risk factors over several time periods with severe consequences that can extend
over months or years. Alternatively, it might be just one risk factor and be short
in duration. The likelihood of the scenario underlying a stress test has been referred
to as extreme but plausible.”
20 2 Environment

Analysing the case of the United Kingdom, a firm must carry out an ICAAP in
accordance with the PRA’s rules. These include requirements on the firm to assess,
on an ongoing basis the amounts, types and distribution of capital that it considers
adequate to cover the level and nature of the risks to which it is exposed. This
assessment should cover the major sources of risks to the firm’s ability to meet
its liabilities as they fall due, and should incorporate stress testing and scenario
analysis. If a firm is merely attempting to replicate the PRA’s own methodologies, it
will not be carrying out its own assessment in accordance with the ICAAP rules.
The ICAAP should be documented and updated annually by the firm, or more
frequently if changes in the business, strategy, nature or scale of its activities or
operational environment suggest that the current level of financial resources is no
longer adequate.
Specifically PRA (2015) says that firms have “to develop a framework for stress
testing, scenario analysis and capital management that captures the full range of
risks to which they are exposed and enables these risks to be assessed against a
range of plausible yet severe scenarios. The ICAAP document should outline how
stress testing supports capital planning for the firm”.
In the European Union (Single Supervisory Mechanism jurisdiction), the RTS
(EBA, 2013)—and later the final guideline (EBA, 2014)—is prepared taking into
account the FSB Key Attributes of Effective Resolution Regimes for Financial Insti-
tutions and current supervisory practices. The draft RTS covers the key elements and
essential issues that should be addressed by institutions when developing financial
distress scenarios against which the recovery plan will be tested.
Quoting: “Drafting a recovery plan is a duty of institutions or groups undertaken
prior to a crisis in order to assess the potential options that an institution or a
group could itself implement to restore financial strength and viability should
the institution or group come under severe stress. A key assumption is that
recovery plans shall not assume that extraordinary public financial support would
be provided.
The plan is drafted and owned by the financial institution, and assessed by the
relevant competent authority or authorities. The objective of the recovery plan is
not to forecast the factors that could prompt a crisis. Rather it is to identify the
options that might be available to counter; and to assess whether they are sufficiently
robust and if their nature is sufficiently varied to cope with a wide range of shocks
of different natures. The objective of preparing financial distress scenarios is to
define a set of hypothetical and forward-looking events against which the impact
and feasibility of the recovery plan will be tested. Institutions or groups should use
an appropriate number of system wide financial distress scenarios and idiosyncratic
financial distress scenarios to test their recovery planning. More than one of each
scenario is useful, as well as scenarios that combine both systemic and idiosyncratic
events. Financial distress scenarios used for recovery planning shall be designed
such that they would threaten failure of the institution or group, in the case recovery
measures are not implemented in a timely manner by the institution or group”.
2.4 The Regulatory Framework 21

Article 4. Range of scenarios of financial distress


1. The range of scenarios of financial distress shall include at least one scenario for each of the
following types of events:
(a) a system wide event;
(b) an idiosyncratic event;
(c) a combination of system wide and idiosyncratic events which occur simultaneously and
interactively.
2. In designing scenarios based on system wide events, institutions and groups shall take into
consideration the relevance of at least the following system wide events:
(a) the failure of significant counterparties affecting financial stability;
(b) a decrease in liquidity available in the interbank lending market;
(c) increased country risk and generalised capital outflow from a significant country of
operation of the institution or the group;
(d) adverse movements in the prices of assets in one or several markets;
(e) a macroeconomic downturn.
3. In designing scenarios based on idiosyncratic events, institutions and groups shall take into
consideration the relevance of at least the following idiosyncratic events:
(a) the failure of significant counterparties;
(b) damage to the institution’s or group’s reputation;
(c) a severe outflow of liquidity;
(d) adverse movements in the prices of assets to which the institution or group is predomi-
nantly exposed;
(e) severe credit losses;
(f) a severe operational risk loss.

“These Guidelines aim at specifying the range of scenarios of severe macroe-


conomic and financial distress against which institutions shall test the impact and
feasibility of their recovery plans. The recovery plans detail the arrangements which
institutions have in place and the early action steps that would be taken to restore
their long-term viability in the event of a material deterioration of financial situation
under severe stress conditions. When the consultation was launched, there was an
existing mandate in the Bank Recovery and Resolution Directive (BRRD) for the
EBA to develop technical standards for the range of scenarios to be used by firms to
test their recovery plans. During legislative process the mandate has been amended
and the EBA was asked to develop Guidelines instead”.
In Australia, the Australian Prudential Regulation Authority(APRA) were
requesting the following from banks with respect to the implementation of
operational risk models (APRA, 2007). “Banks intending to apply the Advanced
Measurement Approach (AMA) to Operational Risk are required to use scenario
analysis as one of the key data inputs into their capital model. Scenario analysis
is a forward-looking approach, and it can be used to complement the banks’ short
recorded history of operational risk losses, especially for low frequency high
impact events (LFHI). A common approach taken by banks is to ask staff with
relevant business expertise to estimate the frequency and impact for the plausible
22 2 Environment

scenarios that have been identified. A range of techniques is available for eliciting
these assessments from business managers and subject matter experts, each with
its own strengths and weaknesses. More than 30 years of academic literature is
available in the area of eliciting probability assessments from experts. Much of
this literature is informed by psychologists, economists and decision analysts, who
have done research into the difficulties people face when trying to make probability
assessments. The literature provides insight into the sources of uncertainty and bias
surrounding scenario assessments, and the methods available for their mitigation.”
The purpose of APRA (2007) was “to increase awareness of the techniques that are
available to ensure scenario analysis is conducted in a structured and robust manner.
Banks should be aware of the variety of methods available, and should consider
applying a range of techniques as appropriate”.
Besides, the COAG (Council of Australian Governments) Energy Council in
COAG (2015) requires some specific scenario analysis: “The Council tasked offi-
cials with a scenario analysis exercise and to come back to it with recommendations,
if necessary, about the need for further work. At its July 2015 meeting, the
Council considered these recommendations and tasked officials to further explore
the implications of key issues that emerged from the initial stress-testing exercise.
This piece of work is being considered as part of the Council’s strategic work
program to ensure regulatory frameworks are ready to cope with the effects of
emerging technologies”. This is an example of scenario analysis requirement for
risk management in an industry different from the financial sector.
In the USA, in the nuclear industry, the US Nuclear Regulatory Commission
(NRC) requested scenario analysis in USNRC (2004) and USNRC (2012). “The
U.S. Nuclear Regulatory Commission (NRC) will use these Regulatory Analysis
Guidelines (“Guidelines”) to evaluate proposed actions that may be needed to pro-
tect public health and safety. These evaluations will aid the staff and the Commission
in determining whether the proposed actions are needed, in providing adequate
justification for the proposed action, and in documenting a clear explanation of why
a particular action was recommended. The Guidelines establish a framework for
(1) identifying the problem and associated objectives, (2) identifying alternatives
for meeting the objectives, (3) analysing the consequences of alternatives, (4)
selecting a preferred alternative, and (5) documenting the analysis in an organised
and understandable format. The resulting document is referred to as a regulatory
analysis”.
Specifically for the financial industry, “the Comprehensive Capital Analysis and
Review (CCAR) (Fed, 2016b) is an annual exercise by the Federal Reserve to assess
whether the largest bank holding companies operating in the United States have
sufficient capital to continue operations throughout times of economic and financial
stress and that they have robust, forward-looking capital-planning processes that
account for their unique risks”.
As part of this exercise, the Federal Reserve evaluates institutions’ capital
adequacy, internal capital adequacy assessment processes and their individual plans
to make capital distributions, such as dividend payments or stock repurchases.
Dodd-Frank Act (Fed, 2016a) stress testing (DFAST)—a complementary exercise
References 23

to CCAR—is a forward-looking component conducted by the Federal Reserve


and financial companies supervised by the Federal Reserve to help assess whether
institutions have sufficient capital to absorb losses and support operations during
adverse economic conditions.
While DFAST is complementary to CCAR, both efforts are distinct testing exer-
cises that rely on similar processes, data, supervisory exercises and requirements.
The Federal Reserve coordinates these processes to reduce duplicative requirements
and to minimise regulatory burden.
International organisations such as the Food and Agriculture Organisation of
the United Nations use scenarios. In FAO (2012), they state that “a scenario is a
coherent, internally consistent and plausible description of a possible future state
of the world. Scenarios are not predictions or forecasts (which indicate outcomes
considered most likely), but are alternative images without ascribed likelihoods of
how the future might unfold. They may be qualitative, quantitative or both. An
overarching logic often relates several components of a scenario, for example, a
storyline and/or projections of particular elements of a system. Exploratory (or
descriptive) scenarios describe the future according to known processes of change,
or as extrapolations of past trends. Normative (or prescriptive) scenarios describe a
prespecified future, optimistic, pessimistic or neutral and a set of actions that might
be required to achieve (or avoid) it. Such scenarios are often developed using an
inverse modelling approach, by defining constraints and then diagnosing plausible
combinations of the underlying conditions that satisfy those constraints”.
This last section provided a snapshot of the regulatory environment surrounding
scenario analysis. In our discussion, we do not really distinguish scenario analysis
from stress testing as this one requires and rely similar methodologies to be
effective.

References

Allen, F., & Gale, D. (2000). Financial contagion. Journal of Political Economy, 108(1), 1–33.
APRA. (2007). Applying a structured approach to operational risk scenario analysis in Australia.
Sydney: Australian Prudential Regulation Authority.
BCBS. (2004). International convergence of capital measurement and capital standards. Basel:
Bank for International Settlements.
Calvo, G. A. (2004). Contagion in emerging markets: When wall street is a carrier. In E. Bour, D.
Heymann, & F. Navajas (Eds.), Latin American economic crises: Trade and labour (pp. 81–91).
London, UK: Palgrave Macmillan.
COAG. (2015). Electricity network economic regulation; scenario analysis. In Council of Aus-
tralian Governments, Energy Council, Energy Working Group, Network Strategy Working
Group.
COSO. (2004). Enterprise risk management - integrated framework executive summary. In
Committee of Sponsoring Organizations of the Treadway Commission.
De Gregorio, J., & Valdes, R.O. (2001). Crisis transmission: Evidence from the debt, tequila, and
Asian flu crises. World Bank Economic Review, 15(2), 289–314.
24 2 Environment

Dornbusch, R., Park, Y., & Claessens, S. (2000). Contagion: Understanding how it spreads. The
World Bank Research Observer, 15(2), 177–197.
EBA. (2013). Draft regulatory technical standards specifying the range of scenarios to be used
in recovery plans under the draft directive establishing a framework for the recovery and
resolution of credit institutions and investment firms. London: European Banking Authority.
EBA. (2014). Guidelines on the range of scenarios to be used in recovery plans. London: European
Banking Authority.
FAO. (2012). South Asian forests and forestry to 2020. In Food and Agriculture Organisation of
the United Nations.
Fed. (2016a). 2016 supervisory scenarios for annual stress tests required under the Dodd-Frank
act stress testing rules and the capital plan rule. Washington, DC: Federal Reserve Board.
Fed. (2016b). Comprehensive capital analysis and review 2016 summary instructions. Washington,
DC: Federal Reserve Board.
Guégan, D., & Hassani, B. (2015). Stress testing engineering: The real risk measurement? In A.
Bensoussan, D. Guégan, & C. Tapiero (Eds.), Future perspectives in risk models and finance.
New York: Springer.
Hassani, B. (2015). Model risk - from epistemology to management. Working paper, Université
Paris 1.
IAA. (2013). Stress testing and scenario analysis. In International Actuarial Association.
Kaminsky, G. L., & Reinhart, C. M. (2000). On crises, contagion, and confusion. Journal of
International Economics, 51(1), 145–168.
King, M. A., & Wadhwani, S. (1990). Transmission of volatility between stock markets. Review of
Financial Studies, 3(1), 5–33.
Kirman, A. (1993). Ants, rationality, and recruitment. Quarterly Journal of Economics, 108(1),
137–156.
Lagunoff, R. D., & Schreft, S. L. (2001). A model of financial fragility. Journal of Economic
Theory, 99(1), 220–264.
Markowitz, H. M. (1952). Portfolio selection. The Journal of Finance, 7(1), 77–91.
McCormick, R. (2011). Legal risk in the financial markets (2nd ed.). Oxford: Oxford University
Press.
Oganisian, A. (2015). Modeling ebola contagion using airline networks in R. www.r-bloggers.com.
Piatetsky-Shapiro, G. (2011). Modeling systemic and sovereign risk. In A. Berd (Ed.), Lessons
from the financial crisis (pp. 143–185). London: RISK Books.
PRA. (2015). The internal capital adequacy assessment process (ICAAP) and the supervisory
review and evaluation process (SREP). In Prudential Regulation Authority, Bank of England.
Quagliariello, M. 2009. Stress-testing the banking system - methodologies and applications.
Cambridge: Cambridge University Press.
Schwarcz, S. L. (2008). Systemic risk. Georgetown Law Journal, 97(1), 193–249.
Shiller, R. J. (1984). Stock prices and social dynamics. Brookings Papers on Economic Activity,
1984(2), 457–498.
USNRC. (2004). Regulatory analysis guidelines of the U.S. nuclear regulatory commission. In
NUREG/BR-0058, U.S. Nuclear Regulatory Commission.
USNRC. (2012). Modeling potential reactor accident consequences - state-of-the-art reactor con-
sequence analyses: Using decades of research and experience to model accident progression,
mitigation, emergency response, and health effects. In U.S. Nuclear Regulatory Commission.
Chapter 3
The Information Set: Feeding the Scenarios

A point needs to be made absolutely clear before any further presentation. None of
the methodologies presented in the following chapters can be used if these are not
fed by appropriate inputs. Therefore, we will start this chapter characterising and
defining data, then we will discuss pre-processing these inputs to make them ready
for further processing.
Data are a set of qualitative or quantitative pieces of information. Data are
engendered or obtained by both observation and measurement. They are collected,
reported, analysed and visualised. Data as a general concept refers to the fact
that some existing information or knowledge is represented in some form suitable
for better or different processing. Raw data, or unprocessed data, are a collection
of numbers and characters; data processing commonly occurs by stages, and the
processed data from one stage may become the raw data of the next one. Field data
are raw data that is collected in an uncontrolled environment. Experimental data
are data generated within the context of a scientific investigation by observation
and recording, in other words these are data generated carrying out an analysis
or implementing a model. It is important to understand, in particular for scenario
analysis, that the data used to support the process are not most of the time numeric
values. Indeed, these are usually pieces of information gathered to support a story
line, such as articles, media, incidents experienced by other financial institutions or
expert perceptions.
Indeed, specifying the definition, data are any facts, numbers or text that can be
processed. Nowadays, organisations are capturing and gathering growing quantities
of data in various formats. We can split the data in three categories:
• operational or transactional data such as, sales, cost, inventory, payroll and
accounting
• non-operational data, such as industry sales, forecast data and macroeconomic
data
• meta data—data about the data itself, such as logical database design or data
dictionary definitions

© Springer International Publishing Switzerland 2016 25


B.K. Hassani, Scenario Analysis in Risk Management,
DOI 10.1007/978-3-319-25056-4_3
26 3 The Information Set: Feeding the Scenarios

Recent regulatory documents, for instance, the Risk Data Aggregation (BCBS,
2013b) aims at ensuring the quality of the data used for regulatory purposes.
However, one may argue that any piece of data could be used for regulatory
purposes, consequently, this piece of regulation should lead in the long term to
a wider capture of data for risk measurement and consequently to better risk
management.
Indeed, BCBS (2013b) requires that the information banks used in decision-
making process capture all risks accurately as well as timely. This piece of
regulation sets out principles of effective and efficient risk management by pushing
banks to adopt the right systems and develop the right skills and capabilities instead
of ticking regulatory boxes to be compliant at a certain date.
It is important to understand that this piece of regulation cannot be dealt with in
silo. It has to be regarded as part of the larger library of regulations. This paragraph
provides some illustrations, indeed, BCBS 239 compliance is required to ensure
a successful Comprehensive Capital Analysis and Review (CCAR—Fed 2016) in
the USA, a Firm Data Submission Framework (FDSF—BoE 2013) in the UK, the
European Banking Authority stress tests (EBA, 2016) or the Fundamental Review
of the Trading Book (FRTB—BCBS 2013a). The previous chapter introduced in
more details some of these regulatory processes. The resources required for these
exercises are quite significant and should not be underestimated. If banks are not
able to demonstrate compliant solutions for data management, data governance
across the multiple units such as risk, finance and the businesses, these will have to
change their risk measurement strategies and as a corollary their risk framework. In
the short term, these rules may imply larger capital charges for financial institutions,
but in the long term the better risk management processes implied by this regulation
should help reducing capital charges for bank using internal model, or at least the
banks exposures.
With the level of change implied, BCBS 239 might be considered as the core
of regulatory transformation. However, banks task to make evolve their operating
model remains significant and adapting their technology infrastructures will not be
straightforward. However, both banks and regulators acknowledge the challenges.
The principles are an enabler to transform the business strategically speaking, to
survive in the new market environment. Furthermore, combining BCBS 239 specific
requirements and business as usual tasks across, business units and geographical
locations will not be easy and will require appropriate change management.
In the meantime, a nebula emerged, usually referred to as big data. Big data
is a broad term for data sets so large or complex that traditional data processing
applications are inadequate. Challenges include analysis, capture, cleansing, search,
sharing, storage, transfer, visualisation and information privacy. The term often
refers simply to the use of predictive analytics or other certain advanced methods
to extract valuable information from data, and rarely to a particular size of data
set. Accuracy in big data may lead to more confidence in the decision-making
process and consequently improvement in operational efficiency, reduction of costs
and better risk management.
3.1 Characterising Numeric Data 27

Data analysis is the key to the future of banking, our environment will move from
traditional to rational though a path which might be emotional. Data analysis allows
looking at a particular situation from different angles. Besides the possibilities are
unlimited as long as the underlying data are of good quality. Indeed, data analysis
may lead to the detection of correlations, trends, etc., and can be used in multiple
areas and industries. Dealing with large data sets is not necessarily easy. Most of the
time it is quite complicated as many issues arise related to data completeness, size
or reliability of the IT infrastructure.
In otherwords, “big data” combines capabilities, users objectives, tools deployed,
methodologies implemented. The field evolves quickly as what is considered big
data one year becomes “business as usual” the next (Walker 2015). Depending on
the organisation, the infrastructure to put in place will not be the same as the needs
are not identical from an entity to another, e.g., parallel computing is not always
necessary. There is no “one-size fits all” infrastructure.

3.1 Characterising Numeric Data

Before introducing any methodological aspects, it is necessary to discuss how to


represent and characterise the data. Here we will focus on numerical data, it is
important to bear in mind that, as mentioned previously, the information used for
scenario analysis should not be limited to these kinds of data.
Understanding numerical data boils down to statistical analysis. This task can be
broken down into the following:
1. Describe the nature of the data, in other words what data are we working on? This
first point is quite important as practitioners usually have a priori understandings
of the data, they have some expertise relying on their experience and therefore
can help and orientate the characterisation of the data.
2. Explore the relationship of the data with the underlying population, i.e., up to
what extent is the sample representative of a population?
3. Create, fit or adjust a model on the sample so this one would be representative of
the underlying population, i.e., fit the data and extract or extrapolate the required
information.
4. Assess the validity of the model, for example, using goodness-of-fit tests.
In this section, we will introduce some concepts that will be helpful understanding
the data and will support the selection of the appropriate scenario analysis strategy.
Indeed, most numerical data sets can be represented by an empirical distribution.
These distributions can be described in various way using moments, quantiles or
how the data interacts with each others. Therefore, we will briefly introduce these
notions in the following sections (the latter will actually be discussed in subsequent
chapters).
28 3 The Information Set: Feeding the Scenarios

3.1.1 Moments

In mathematics, a moment is a specific quantitative measure of the shape of a set of


points. If these points are representative of a density of probability, then first moment
is the mean, the second moment is the variance, the third moment is the skewness
measuring the asymmetry of the distribution and the fourth the kurtosis providing
some information regarding the thickness of the tails through the flattening of the
distribution.
For a bounded distribution of mass or probability, the collection of all the
moments uniquely determines the distribution. The nth moment of a continuous
function f .x/ defined on R given c 2 R is
Z 1
n D .x  c/n f .x/ dx: (3.1.1)
1

The moment1 of a function usually refers to the above expression considering c D 0.


The nth moment about zero or raw moment of a probability density function
f .x/ is the expected value of X n . For the second and higher moments, the central
moments are usually used rather than the moments about zero, because they provide
clearer information about the distribution’s shape. The moments about its mean 
are called central moments; these describe the shape of the function, independently
of translation.
It is actually possible to define other moments such as the nth inverse moment
about zero E ŒX n  or the nth logarithmic moment about zero E Œlnn .X/.
If f is a probability density function, then the value of the previous integral is
called the nth moment of the probability distribution. More generally, if F is a
cumulative probability distribution function of any probability distribution, which
may not have a density function, then the nth moment of the probability distribution
is given by the Riemann–Stieltjes integral
Z 1
0n D E ŒX n  D xn dF.x/ (3.1.2)
1

where X is a random variable, F.X/ its cumulative distribution and E denotes the
expectation.
When
Z 1
E ŒjX n j D jxn j dF.x/ D 1; (3.1.3)
1

then the moment does not exist (we will see example of such problems in Chap. 5
with the Generalised Pareto and the ˛-stable distributions and with the Generalised

1
Moments can be defined in a more general way than only considering real.
3.1 Characterising Numeric Data 29

Extreme Value distribution in Chap. 6). If the nth moment exists so does the .n1/th
moment as well as all lower-order moments.
Note that the zeroth moment of any probability density function is 1, since
Z 1
f .x/dx: D 1 (3.1.4)
1

3.1.2 Quantiles

Quantiles divide a set of observations into groups of equal sizes. There is one
quantile less than the number of groups created, for example, quartiles have only
three points that allow dividing a dataset into four groups of equal size of 25 %. If
there are ten different buckets each of them representing 10 %, we will talk about
decile.
More generally quantiles are values that split a finite set of values into q subsets
of equal sizes. There are q  1 of the q-quantiles, one for each integer k satisfying
0 < k < q. In some cases the value of a quantile may not be uniquely determined,
for example, for the median of a uniform probability distribution on a set of even
size. Quantiles can also be applied to continuous distributions, providing a way to
generalise rank statistics to continuous variables. When the cumulative distribution
function of a random variable is known, the q-quantiles are the application of the
quantile function
n (the inverse
o function of the cumulative distribution function) to
1 2 .q1/
the values q ; q ; : : : ; q .
Understanding the quantiles of a distribution is particularly important as it is
a manner to represent the way the data are positioned. Indeed, the larger the
quantiles at a particular point, the larger the risk. Indeed, quantiles are the theoretical
foundation of the Value-at-Risk and the Expected Shortfall which will be developed
in the next chapter. Quantiles are in fact risk measures, therefore are very useful
for evaluating exposures to a specific risk as soon as we have enough information
to ensure the robustness of these quantiles, i.e., if we have not many data, then the
occurrence of an event will materially impact the quantiles. Note that this situation
might be acceptable for tail events, but this is generally not the case for risks more
representative of the body of the distribution.

3.1.3 Dependencies

In statistics, a dependence depicts any statistical relationship between sets of data.


Correlation refers to any statistical relationships involving dependence. Correlations
are useful because they can indicate a predictive relationship that can be exploited.
30 3 The Information Set: Feeding the Scenarios

There are several correlation coefficients, often denoted  or , measuring


the degree of correlation. The most common of these is the Pearson correlation
coefficient, which is only sensitive to a linear relationship between two variables.
Alternative correlation coefficients have been developed to deal with the problems
caused by the Pearson correlation. These correlations will be presented in details in
Chap. 11.
Dependencies embrace many concepts such as correlations, autocorrelations,
copula, contagion and causal chain. Understanding them will help understanding
how an incident materialises as an early warning, what indicator could be used prior
the materialisation and what could lead to this one, supporting the implementation
of controls. As a corollary, understanding the causal effect will help supporting the
selection of the strategy to implement for scenario analysis purposes.

3.2 Data Sciences

The previous paragraphs built the path to introduce data sciences. Most methodolo-
gies presented in the next chapters either rely or are introduced somehow in this
section. Data science is a generic term gathering data mining, machine learning,
artificial intelligence, statistics, etc., under a single banner.

3.2.1 Data Mining

Data mining (Hastie et al., 2009) is a field belonging to computer science. The
purpose of data mining is to extract information from data sets and transform them
into an understandable structure with respect to the ultimate use of these data.
The embedded computational process of discovering patterns in large data sets
combines methods from artificial intelligence (Russell and Norvig, 2009), machine
learning (Mohri et al., 2012), statistics, and database systems and management. The
automatic or semi-automatic analysis of large quantities of data permits to detect
interesting patterns such as clusters (Everitt et al., 2011), anomalies, dependencies
and the outcome of the analysis can then be perceived as the essence or the
quintessence of the original input data, and may be used for further analysis in
machine learning, predictive analytics or more traditional modelling.
Usually, the term data mining refers to the process of analysing raw data and
summarising them into information used for further modelling. In data mining the
data are analysed from many different dimensions. More precisely, data mining
aims at finding correlations or dependence patterns between multiple fields in
large relational databases. The patterns, associations or relationships among all this
data can provide information usable to prepare and support the scenario analysis
program of a financial institution. While the methodologies, the statistics and the
mathematics behind are not new, until very recently and innovations in computer
3.2 Data Sciences 31

processing, disk storage and statistical software data mining were not reaching the
goal set.
Advances in data capture, processing power, data transmission and storage
capabilities are enabling organisations to integrate their various databases into
data warehouses or data lakes. Data warehousing is a process of centralised data
management and retrieval. Data warehousing, like data mining, is a relatively new
term although the concept itself has been around for years. Data warehousing
represents an ideal vision of maintaining a central repository of all organisational
data. Centralisation of data is needed to maximise user access and analysis. Data
lakes in some sense generalise the concept and allow structured and unstructured
data as well as any piece of information (PDF documents, emails, etc.) that are not
necessarily instantly usable for pre-processing.
Until now, data mining was mainly used by companies with a strong consumer
focus, in other words, retail, financial, communication and marketing organisations
(Palace, 1996). These types of companies were using data mining to analyse
relationships between endogenous and exogenous factors such as, price, product
positioning, economic indicators, competition or customer demographics, as well
as their impacts on sales, reputation, corporate profits, etc. Besides, it permitted
summarising the information analysed. It is interesting to note that nowadays retailer
and suppliers have joined forces to analyse even more relationships at a deeper level.
The National Basketball Association developed a data mining application to support
a more efficient coaching. Billy Bean from the Oackland Athletics used data mining
and statistics to select the players forming his team.
Data mining enables analysing relationships and patterns in stored data based on
open-ended user queries. Generally, any of four types of relationships are sought:
• Classes: This is the simplest kind of relationship, as stored data is used to analyse
subgroups.
• Clusters: Data items are gathered according to logical relationships related to
their intrinsic characteristics. More generally, a cluster analysis aims at grouping
a set of similar objects (in some sense) in one particular group (Everitt et al.,
2011).
• Associations: Data can be analysed to identify associations. Association rule
learning is intended to identify strong rules discovered in databases measuring
how interesting they are for our final purpose (Piatetsky-Shapiro, 1991).
• Sequential patterns: Data are analysed to forecast and anticipate behaviours,
trends or schemes, such as the likelihood of a purchase given what someone has
already the product in his Amazon basket.
Data mining consists in several major steps. We would recommend following these
steps to make sure that the data used to support the scenario analysis (if some data
are used) are appropriate and representative of the risk to be assessed.
• Data capture: In a first step data are collected from various sources and gathered
in a data base.
32 3 The Information Set: Feeding the Scenarios

• Data pre-processing, i.e., before proper mining:


– Data selection: Given the ultimate objective, only a subset of the data available
might be necessary for further analysis.
– Data cleansing and anomalies detection: Collected data may contain errors,
may be incomplete, inconsistent, outdated, erroneous, etc. These issues need
to be identified, investigated and dealt with prior to any further analysis.
– Data transformation: Following the previous stage, the data are not ready for
mining, these require transformation such as kernel smoothing, aggregation,
normalisation and interpolation.
• Data processing is only possible once the data have been cleansed and are fit for
purpose. This step combines,
– Outlier detection, i.e., an observation point that is distant (in some sense)
from other observations. Note that we make the distinction between an outlier
and an extreme value, as an outlier is related to a sample while an extreme
value is related to the whole set of value possible a realisation could take.
Though large, an extreme value is normal while an outlier might be abnormal.
An extreme value is usually an outlier in a sample when an outlier is not
necessarily an extreme value.
– Relationship analysis, as indicated before, gathering the data with similar
characteristics, classification or analysing interactions.
– Pattern recognition, such as regression analysis, time series analysis and
distributions.
– Summarisation and knowledge presentation: This step deals with visualisa-
tion, as one should beware that key aspects are not lost during the process and
the results exhibited are representative.
• Decisions making process integration: This step enables using the knowledge
obtained from the previous manipulations, the analysis. This is the ultimate
objective of data mining.
Remark 3.2.1 The infrastructure required to be able to mine the data is driven
by two main technological aspect, and these should not be underestimated as the
reliability of the analysis directly depends on the quality of the infrastructure, as
both the size of the database and the query complexity require more or less powerful
system. The larger the quantity of data to be processed and the more complex the
queries, the more powerful the system required.

3.2.2 Machine Learning and Artificial Intelligence

Once these data have been analysed and formatted, these can be further used for
prediction, forecasting and evaluation, in other words, for modelling.
3.2 Data Sciences 33

Machine learning deals with the study of pattern recognition and computa-
tional learning theory in artificial intelligence. Machine learning aims at building
algorithms that can learn from data and make predictions from them, i.e., which
operate dynamically adapting themselves to changes in the data, not only relying on
statistics but also on mathematical optimisation. Automation is the keyword of this
paragraph, the objective is to make machines think by possibly mimicking the way
human brains function (see Chap. 10).
Machine learning tasks are usually classified into four categories (Russell and
Norvig, 2009) depending on the inputs and the objectives:
• In supervised learning (Mohri et al., 2012), the goal is to infer a general rule from
example data mapped to the desired output. The example data are usually called
training data. These consist in couples input and desired output or supervisory
signal. Once the algorithm analysed the training data and inferred the function, it
can be used to map new examples and generalise its use to previously unknown
situations. Optimally, algorithms should perfectly react to new instances in
providing an unbiased and accurate outcome, e.g., a methodology outcomes
which reveal to be accurate once they can be compared with the future real
occurrences.
• The second possibility is unsupervised learning in which no training data are
given to the learning algorithm, consequently it will have to extract patterns from
the input. Unsupervised learning can actually be used to find hidden structures
and patterns embedded within the data. Therefore, unsupervised learning aims
at inferring a function describing hidden patterns from unlabelled data (Hastie
et al., 2009). In the case of unsupervised learning, it is complicated to evaluate
the quality of the solution as initially no benchmark is available.
• When the initial training information (i.e. data and/or targets) is incomplete, a
intermediate strategy called semi-supervised learning main be used.
• In reinforcement learning (Sutton and Barto, 1998), a program interacts and
evolves within a dynamic environment in which it is supposed to achieve a
specific task. However, as for unsupervised learning, there is no training data
and no benchmark. This approach aims at learning what to do, i.e., how to map
situations to actions, so as to optimise a numerical function, i.e., the output. The
algorithm has to discover which actions lead to the best output signal by trying
them. These strategies allow capturing situations in which actions may affect all
subsequent steps with or without any delay, which might be of interest.
Another way of classifying machine learning strategies is by desired output
(Bishop, 2006). Indeed, we will illustrate that classification briefly introducing some
strategies and methodologies used in the next chapters. Our objective is to show how
interconnected all the methodologies are as one may leverage on some of them to
achieve other purposes. Indeed all the methodologies belonging to data sciences can
be used as a base for scenario analysis.
The first methodology (we actually presented in the previous section) is the
classification in which inputs are divided in at least two different classes, and the
learning algorithm has to assign unseen inputs to at least one of these classes. This
34 3 The Information Set: Feeding the Scenarios

is a good example of supervised learning but it could be adapted and fall in the
semi-supervised alternative. The second methodology is the regression which also
belongs to the supervised learning, which focuses on the relationship between a
dependent variable and at least one independent variable (Chap. 11). In clustering,
inputs have to be divided into groups of similar data. Contrary to classification, the
groups are unknown a priori therefore this methodology belongs to the unsupervised
strategies. Density estimation (Chap. 5) provides the distribution of input data and
belongs by essence to the family of unsupervised learnings, though if we use
a methodology such as Bayesian inference it would be more a semi-supervised
strategy.
As mentioned before machine learning is closely related to optimisation. Most
learning problems are formulated as optimising (i.e. minimising or maximising) an
objective function. Objective functions express the difference between the output of
the trained model and the actual values. Contrary to data mining, machine learning
does not only aim at detecting patterns or for a good adjustment of a model to some
data but to a good adjustment of this model to previously unknown situations, which
is a far more complicated task. Machine learning models goal is accurate prediction
generalising patterns originally detected and refined by experience.

3.2.3 Common Methodologies

Machine learning and data mining often rely on identical methodologies and/or
overlap quite significantly though having different objectives. As mentioned in the
previous paragraphs, machine learning aims at prediction using properties learned
from training data while data mining focuses on the discovery of unknown patterns
embedded in the data. In this section, we briefly introduce methodologies used in
data mining and machine learning as some of them will be implemented in the next
chapters as scenario analysis requires first analysing data to identify the important
patterns embedded and second to make prediction from them. The following list
is far from being exhaustive; however, it provides a good sample of traditional
methodologies:
• Decision tree learning (deVille, 2006) is a predictive model. The purpose is
to predict the values of a target variable based on several inputs, which are
graphically represented by nodes. Each edge of a node leads to children,
respectively, representing each of the possible values the variable can take
given the input provided. A decision tree may be implemented for classification
purposes or for regression purposes, respectively, to identify to which class the
input belongs or to evaluate a real outcome (prices, etc.). Some examples of
decision tree strategies are Bagging decision trees (Breiman, 1996), Random
Forest classifier, Boosted Trees (Hastie et al., 2009) or Rotation forest. In Chap. 7,
a related strategy (a fault tree) has been implemented, though in our case the root
will be reverse engineered.
3.2 Data Sciences 35

• Association rule learning aims at discovering hidden or embedded relationships


between variables in databases. To assess how interesting and significant these
relationships are, various measures have to be implemented, this step is crucial to
avoid misleading outcomes and conclusions, such as Confidence, All-confidence
(Omiecinski, 2003), Collective strength (Aggarwal and Yu, 1998), Conviction
(Brin et al., 1997), Leverage (Piatetsky-Shapiro, 1991) among others. Multiple
algorithms have been developed to generate association such as Apriori algorithm
(Agrawal and Srikant, 1994), Eclat algorithm (Zaki, 2000) and FP-growth
algorithm (Han et al., 2000).
• Artificial neural networks are learning algorithms that are inspired by the struc-
ture and the functional aspects of biological neural networks, i.e., brains. Modern
neural networks are non-linear statistical data modelling tools. They are usually
used to model complex relationships between inputs and outputs, to find patterns
in data or to capture the statistical structure in an unknown joint probability
distribution between observed variables. Artificial neural networks are generally
presented as systems of interconnected “neurons” which exchange messages
between each other. The connections have numeric weights that can be tuned
based on experience, making them adaptive to inputs and capable of learning.
Neural networks might be used for function approximation, regression analysis,
time series, classification, including pattern recognition, filtering, clustering,
among others. Neural networks are discussed in more details and applied in
Chap. 9. Note that the current definition of deep learning consists in using
multiple layer neural networks (Deng and Yu, 2013).
• Inductive logic programming (Muggleton, 1991; Shapiro, 1983) uses logic
programming as a uniform representation for input examples, background
knowledge and hypotheses. Given an encoding of both background knowledge
and examples provided as a logical database of facts, the system will derive
a logic program that implies all positive and no negative examples. Inductive
logic programming is frequently used in bioinformatics and natural language
processing.
• Support vector machines (Ben-Hur et al., 2001; Cortes and Vapnik, 1995) are
supervised learning models in which algorithms analyse data and recognise
patterns, usually used for classification and regression analysis. Given a set of
training data, each of them associated with one of two categories, the algorithm
binarily assigns new examples to one of these. This methodology is quite pow-
erful though it requires fully labelled input data. Besides, the parameterisation is
quite complicated to interpret. This strategy can also be extended to more than
two classes, though the algorithm is more complex. The literature provides us
with other interesting extension such as support vector clustering an unsupervised
version, or the transductive support vector machines a semi-supervised version
or the structured support vector machine among others.
• Cluster analysis (Huang, 1998; Rand, 1971) consists in assigning observations
into subsets (clusters) so that each subset is similar according to some criteria.
Clustering is a method of unsupervised learning. Cluster analysis depicts the
general task to be solved. This can be achieved carrying out various methods
36 3 The Information Set: Feeding the Scenarios

which significantly differ in their definition of what constitutes a cluster and


how to determine them. Both the appropriate clustering algorithm and the proper
parameter settings depend on the data considering the intended use of the results.
Cluster analysis is an iterative process of knowledge discovery involving trial and
error. Indeed, it will often be necessary to fine tune the data pre-processing and
the model parameters until the results are appropriate according to a prespecified
set of criteria. Usual methodologies are Connectivity models, Centroid models,
Distribution models, Density models, Subspace models, Group models and
Graph-based models.
• A Bayesian network is a probabilistic graphical model that represents a set
of random variables and their conditional independence through a directed
acyclic graphic (DAG). The nodes either represent random variable, observable
quantities, latent variables, unknown parameters or hypotheses. Edges represent
conditional dependencies; nodes that are not connected represent variables that
are conditionally independent from each other. Each node is associated with a
probability function that takes, as input, a particular set of values from the parent
nodes, and provides the probability (or distribution) of the variable represented
by the node. Multiple extensions of Bayesian networks have been developed
such as dynamic Bayesian networks or influence diagram. Bayesian networks
are introduced in more details in Chap. 8.
• In similarity and metric learning (Chechik et al., 2010), the learning algorithm
is provided with a couple of training sets. The first contains similar objects,
while the second contains dissimilar ones. Considering a similarity function (i.e.
a particular objective function) the objective is to measure how similar are new
data coming. Similarity learning is an area of supervised machine learning in
artificial intelligence. It is closely related to regression and classification (see
Chap. 11). These kinds of algorithms can be used for face recognition to prevent
impersonation fraud, for example.
• Genetic algorithms (Goldberg, 2002; Holland, 1992; Rand, 1971)—A genetic
algorithm (GA) is a heuristic search that imitates the natural selection process
(see Chap. 1), and uses methods such as mutation, selection, inheritance and
crossover to generate new genotype in order to find solutions to a problem, such
as optimisation or search problems. While genetic algorithms supported the evo-
lution of the machine learning field, in return machine learning techniques have
been used to improve the performance of genetic and evolutionary algorithms.

References

Aggarwal, C. C., & Yu, P. S. (1998). A new framework for itemset generation. In Symposium on
Principles of Database Systems, PODS 98 (pp. 18–24).
Agrawal, R. & Srikant, R. (1994). Fast algorithms for mining association rules in large databases.
In J.B. Bocca, M. Jarke, & C. Zaniolo (Eds.), Proceedings of the 20th International Conference
on Very Large Data Bases (VLDB) (pp. 487–499).
BCBS. (2013a). Fundamental review of the trading book: A revised market risk framework. Basel:
Basel Committee for Banking Supervision.
References 37

BCBS. (2013b). Principles for effective risk data aggregation and risk reporting. Basel: Basel
Committee for Banking Supervision.
Ben-Hur, A., Horn, D., Siegelmann, H. T., & Vapnik, V. (2001). Support vector clustering. Journal
of Machine Learning Research, 2, 125–137.
Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer.
BoE. (2013). A framework for stress testing the UK banking system. London: Bank of England.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Brin, S., Motwani, R., Ullman, J. D., & Tsur, S. (1997). Dynamic itemset counting and implication
rules for market basket data. In ACM SIGMOD record, June 1997 (Vol. 26, No. 2, pp. 255–264).
New York: ACM.
Chechik, G., Sharma, V., Shalit, U., & Bengio, S. (2010). Large scale online learning of image
similarity through ranking. Journal of Machine Learning Research, 11, 1109–1135.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Deng, L., & Yu, D. (2013). Deep learning methods and applications. Foundations and Trends in
Signal Processing, 7(3–4), 197–387.
deVille, B. (2006). Decision trees for business intelligence and data mining: Using SAS enterprise
miner. Cary: SAS Press.
EBA. (2016). 2016 EU wide stress test - methodological note. London: European Banking
Authority.
Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis (5th ed.). New York:
Wiley.
Fed. (2016). Comprehensive capital analysis and review 2016 summary instructions. Washington,
DC: Federal Reserve Board.
Goldberg, D. (2002). The design of innovation: Lessons from and for competent genetic algorithms.
Norwell: Kluwer Academic Publishers.
Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In ACM
SIGMOD record (Vol. 29, No. 2, pp. 1–12). New York: ACM.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining,
inference, and prediction. New York: Springer.
Holland, J. (1992). Adaptation in natural and artificial systems. Cambridge: MIT.
Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with
categorical values. Data Mining and Knowledge Discovery, 2, 283–304.
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of machine learning. Cam-
bridge: MIT.
Muggleton, S. (1991). Inductive logic programming. New Generation Computing, 8(4), 295–318.
Omiecinski, E. R. (2003). Alternative interest measures for mining associations in databases. IEEE
Transactions on Knowledge and Data Engineering, 15(1), 57–69.
Palace, W. (1996). Data mining: What is data mining? www.anderson.ucla.edu/faculty_pages/
jason.frand.
Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. In G. Piatetsky-
Shapiro & W. Frawley (Eds.), Knowledge discovery in databases (pp. 229–248). Menlo Park:
AAAI.
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the
American Statistical Association, 66(336), 846–850.
Russell, S., & Norvig, P. (2009). Artificial intelligence: A modern approach (3rd ed.). London:
Pearson.
Shapiro, E. Y. (1983). Algorithmic program debugging. Cambridge: MIT.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge:
A Bradford Book/MIT.
Walker, R. (2015). From big data to big profits: Success with data and analytics. Oxford: Oxford
University Press.
Zaki, M. J. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge
and Data Engineering, 12(3), 372–390.
Chapter 4
The Consensus Approach

In this chapter, we will present the so-called consensus approach in which the
scenarios are analysed in a workshop and a decision is made if a consensus is
reached.
Formally, consensus decision-making is a group process in which members
gather, discuss, agree, implement and support afterwards, a decision in the best
interest of the whole, in that case the whole can be an entity, a branch, a group,
etc. A consensus is an acceptable resolution, i.e., a common ground that might
not be optimal for each individual but it is the smallest common denominator.
In other words, it is a general agreement and the term consensus describes both
the decision and the process. Therefore, the consensus decision-making process
involves deliberations, finalisation and the effects of the application of the decision.
For scenario analysis purposes, this is typically the strategy implied when a
workshop is organised and the experts gathered are supposed to evaluate a potential
exposure together. Coming back to the methodology itself, being a decision-making
process, the consensus strategy (Avery, 1981; Hartnett, 2011) aims to be all of the
following:
1. Agreement seeking—The objective is to reach the largest possible number of
endorsements and approvals or at least no dramatic antagonism. The keyword
being “seeking” as it is not given that a unanimous position will be reached.
2. Collaborative—Members of the panels discuss proposals to reach global decision
that at least tackles the largest numbers of participants concerns. Once again, it
is highly unlikely that all the issues will be tackled through this process though
it should be at least attempted to do so.
3. Cooperative—Participants should not be competing for their own benefit, the
objective is to reach the best possible decision for the greater good (up to a certain
extent). In our case, this strategy is particularly appropriate if the global exposure
is lower for all participants when they collaborate than when they do not, in
other words, if the outcome altogether is lower than the sum of all parties. Here,
a game theory aspect is appearing as we can draw a parallel between consensus

© Springer International Publishing Switzerland 2016 39


B.K. Hassani, Scenario Analysis in Risk Management,
DOI 10.1007/978-3-319-25056-4_4
40 4 The Consensus Approach

agreement seeking and a generalised version of the prisoner’s dilemma (Fehr and
Fischbacher, 2003).
4. Balanced—All members are allowed to express their opinions, present their
views and propose amendments. This process is supposed to be democratic.
This will be discussed in the manager’s section as the democratic character of
a company is still to be demonstrated.
5. Inclusive—As many stakeholders as possible should be involved as soon as they
add-value to the conversation. Their seniority should not be the only reason of
their presence in the panel. It is really important that the stakeholders are open
minded and able to put their seniority aside to listen to other people despite their
potential youth or lack of experience.
6. Participatory—All decision-makers are required to propose ideas. This point is a
corollary of the previous one. No one should be sitting in the conference room
for the sake of being there. Besides, ideas proposed should be constructive, i.e.,
they should be solution seeking and not destruction oriented.

4.1 The Process

Now that the necessary principles to reach a consensus have been presented, we can
focus on the process to be implemented.
As mentioned previously, the objective of the process is to generate widespread
levels of participation and agreement. There are variations regarding the degree
of agreement necessary to finalise a group decision, i.e., to determine if it is
representative of the group decision. However, the deliberation process demands
including any individual proposal. Concerns and alternatives raised or proposed by
any group member should be discussed as this will usually lead to the amendment
of the proposal. Indeed, each individual’s preferences should be voiced so that
the group can incorporate all concerns into an emerging proposal. Individual
preferences should not obstruct the progress of the group. A consensus process
makes a concerted attempt to reach full agreement.
There are multiple stepwise models supporting the consensus decision-making
process. They merely vary in what these steps require as long as on how decisions
are finalised. The basic model involves collaboratively generating a proposal,
identifying unsatisfied concerns and then modifying the proposal to generate as
much agreement as possible. The process described in this paragraph and the
previous can be summarised in the following six step process which can either circle
or exit with a solution:
1. A discussion is always the initial step. A moderator and a coordinator are usually
required to ensure that the discussions are going in the right direction and are not
diverging.
2. A proposal should result from the discussion, i.e., an initial optimal position
(likely to be sub-optimal in a first stage though).
4.1 The Process 41

3. All the concerns should be raised, considered and addressed the best way
possible. If these are show-stoppers it is required to circle back to the first step
before moving to the next step.
4. Then the initial proposal should be revised. (It might be necessary to go through
the second or the third point again, as new issues might arise and these should be
dealt with).
5. Then the level of support is assessed. If the criterion selected is not satisfied, then
it is necessary to circle back at least to point 3 and 4.
6. Outcomes and key decisions: This level represents the agreement. It is really
important to bear in mind that we cannot circle back and forth indefinitely, as a
decision is ultimately required. It is necessary that after some time (or a number
of iterations) the proposal is submitted to an arbitral committee to rule.
Depending on the company culture, the sensitivity of the scenarios or the temper
of the participants, the agreement level required to consider that we successfully
reached a consensus may differ. Various possibilities are generally accepted to
assess if the general consensus has been reached, these are enumerated in what
follows:
• The ultimate goal is the unanimous agreement (Welch Cline, 1990), however,
reaching this one is highly unlikely especially if the number of participants is
large as the number of concerns raised is at least proportional to the number of
participants. But if the number of participant is limited, it is probably the strategy
which should be selected.
• Another possibility is to obtain unanimity minus a certain number of disagree-
ments. This may overcome some issues, though it is necessary to make sure that
issues overruled are not show-stoppers.
• Another possibility is the use of majority thresholds (qualified, simple, etc.). This
alternative strategy is very close to what you would expect from a poll requiring
a vote (Andersen and Jaeger, 1999). It is important (and that point is valid for all
the strategies presented in this book) that the consensus only ensure the quality
of the decision made to a certain extent.
• The last possibility is a decision made by the executive committee or an
accountable person. This option should only be considered in last resort as in our
experience this may antagonise participants and jeopardise the implementation
of the decision.
Each of the previous possibilities has pros and cons, for instance, trying to reach
unanimous decisions allows participants the option of blocking the process, but, on
the other hand, if the consensus is reached the likelihood of this one leading to a
good decision is higher. Indeed, unless someone steps back for irrational reason, the
micro economist would say that they all maximised their utility.
The rules of engagement for such a solution have to be properly stated prior the
workshop otherwise, we may end up with a solution in which the participants are left
in a closed environment forbidden to leave the room until they found an agreement.
In principle, with this strategy, the group is placed over and above the individual,
42 4 The Consensus Approach

and it is in the interest of each individual to compromise for the greater good, and
both dissenters and aligned participants are mechanically encourage to collaborate.
No one has a veto right in the panel. Common “blocking rules” are as follows:
• Limiting the option to block consensus to issues that are fundamental to
the group’s mission or potentially disastrous to the group, though it is often
complicated to draw the line.
• Providing an option for those who do not support a proposal to “stand aside”
rather than block.
• Requiring two or more people to block for a proposal to be put aside.
• Requiring the blocking party to supply an alternative proposal or at least an
outlined solution.
• Limiting each person’s option to block consensus to a handful of times in a given
session.
Unanimity is achieved when the full group consents to a decision. Giving consent
does not necessarily mean the proposal being considered is one’s first choice.
Group members can vote their consent to a proposal because they choose to
cooperate with the direction of the group, rather than insist on their personal
preference. This relaxed threshold for a yes vote can help make unanimity easier
to achieve. Alternatively, a group member can choose to stand aside. Standing aside
communicates that while a participant does not necessarily support a group decision,
he does not wish to block it.
Note that critics of consensus blocking have a tendency to object to giving the
possibility to individuals to block proposals widely accepted by the group. They
believe that this can result in a group experience of widespread disagreement, the
opposite of a consensus process’s primary goal. Further, they believe group decision
making may stagnate by the high threshold of unanimity. Important decisions may
take too long to make, or the status quo may become virtually impossible to change.
The resulting tension may undermine group functionality and harm relationships
between group members as well as the future execution of the decision (Heitzig and
Simmons, 2012).
Defenders of consensus blocking believe that decision rules short of unanimity
do not ensure a rigorous search for full agreement before finalising decisions. They
value the commitment to reach unanimity and the full collaborative effort this
goal requires. They believe that under the right conditions unanimous consent is
achievable and the process of getting there strengthens group relationships. In our
opinion, these arguments are only justifiable if we do not have any time constraint,
which realistically almost never happens.
The goals of requiring unanimity are only fully realised when a group is
successful in reaching it. Thus, it is important to consider what conditions make full
agreement more likely. Here are some of the most important factors that improve
the chances of successfully reaching unanimity:
• Small group size: The smaller the size of the group, the easier the consensus will
be reached, however, the universality of the decision might become questionable,
4.2 In Practice 43

as one may wonder if this small group is representative of the entire entity to
which will be applied the decision.
• Clear common purpose: The objective should be clearly stated to avoid diverging
discussions.
• High levels of trust: This is a prerequisite. If people do not trust each other or the
methodology owner, they will question the proposals and the decisions made. Or
worse, they will undermine the process.
• Participants well trained in consensus processes: training is key in the sense
that we should explain people what is expected from them before the workshop.
The lack of training inevitably results in participants not handling properly the
concepts, misunderstanding the process, scenario not properly analysed and as a
result a waist of participants’ time.
• Participants willing to put the best interest of the group before their own,
therefore, it may take time to reach a consensus. Patience is a virtue. . .
• Participants willing to spend sufficient time in meetings.
• Appropriate facilitation and preparation: particularly in the long term. If the
workshops are led by unskilled people, the seriousness and the professionalism
of the process will be questioned. Note that the time of preparation of a workshop
should not be underestimated. The general rule is the more thorough the ground
work, the smoother the workshops.
• Multiplying decisions rules to avoid blockages might also be a good idea,
particularly when the scenario to be analyse is complex.
Most institutions implementing a consensus decision-making process consider
non-unanimous decision rules. The consensus process can help prevent problems
associated with Robert’s Rules of Order or top-down decision making (Robert,
2011). This allows hierarchical organisations to benefit from the collaborative
efforts of the whole group and the resulting joint ownership of final proposals.
A small business owner may convene a consensus decision-making discussion
among her staff to generate proposals of changes to the business. However, after
the proposal is given the business owner may retain the authority to accept or reject
it, obviously up to a certain extent. Note that if a person accountable rejects a
decision representative of the group, he might put himself in a difficult position
as his authority would be questioned.
The benefits of consensus decision making are lost if the final decision is made
without regard to the efforts of the whole group. When group leaders or majority
factions reject proposals that have been developed with widespread agreement of a
group, the goals of consensus decision making will not be realised.

4.2 In Practice

Applying this methodology within financial institutions, the goal is to obtain


a consensus on key values (usually percentiles or moments) through scenario
workshops, for instance, the biggest exposure in 10, 40 and/or 100 years, for each
44 4 The Consensus Approach

risk subject to analysis. A story line representing each horizon ought to be selected
consistently with entity risk profile discussed in the previous chapter, and will be
presented to the businesses for evaluation. The business stakeholders will be chosen
with regard to the business area in which the selected risk may materialise, on the
one hand, and the business area supposed to control that risk (these two areas might
be identical), on the other hand.

4.2.1 Pre-workshop

The type of scenario analysis discussed in that chapter requires multiple steps.
The first one being the identification of the scenarios to be analysed. In this first
stage the previous chapter dealing with data analysis might be useful as it should
provide stakeholders with benchmarks and key metrics to support their selection.
The department responsible for the scenario analysis program in any given entity—
it might be the risk department or more specifically the operational risk department,
or a strategic department—is also usually in charge of the ground work, the material
for the workshops and the workshops facilitation themselves.
These departments are supposed to define the question to be answered and to
formulate the issue to be analysed (see Chap. 1 - Scenario Planning). They suggest
the story lines, but they do not own them, ownership lies with the stakeholders
or more specifically with the risk owners (Fig. 4.1). Owners are fully entitled to
amend, modify or change the scenarios to be analyse if they believe that they are
not representative of the target issue to be analysed.
Before scheduling the workshop, a set of scenarios should be written and poten-
tially pre-submitted depending on the maturity of the business experts regarding that
process. These scenarios should consider both technical and organisational aspect
during the analysis.
Remark 4.2.1 It is really important to understand that scenario analysis is necessary
to find a solution to a problem, raising the issues is just a step towards solving it.
To organise the workshops, the presence of three kinds of people is necessary, a
planning manager, a facilitator and the experts. The facilitator guarantees that the
workshops are held in a proper fashion, for example, ensuring that all participants
have the same time allowed for expressing their views, or that the discussion is not
diverging. Experts should be knowledgeable, open minded and good communicators
with an overview of their field. The person responsible for planning has the overall
responsibility of making sure that the process is transparent and has been clearly
communicated to the experts before the workshop. As mentioned before, one of the
key success factor is that the process is properly documented and communicated to
the stakeholders, in particular what is expected from them.
4.2 In Practice 45

Fig. 4.1 Illustration (for example) of a pre-workshop template

4.2.2 The Workshops

A scenario workshop is a meeting in which business representatives and risk


facilitator discuss the question to answer, i.e., the risk to analyse. The participants
carry out assessments of risk materialisation impact, outcomes and aftermath as long
as solutions to these problems such as controls, mitigants and monitoring indicators
(KRI or KPI, for instance) (Fig. 4.2).
In the workshop, the scenarios are used as visions, and as a source of inspiration.
The participants are asked to criticise and comment them to enable the development
of visions of their own—and not necessarily to choose among, or prioritise the
scenarios.
The risk facilitator has the duty of correcting misunderstandings and factual
errors, but is not allowed to influence the views of the business representatives as
these ones are the owners of the scenarios, and the complete independence needs to
be observed. The risk facilitators should only make sure that the question is properly
addressed by the business stakeholders, to prevent them from going off track.
The process is guided by a facilitator and takes place in “role” groups, “theme”
groups and plenary sessions. Dialogue among participants with different knowledge,
views and experience is central. Various techniques can be used to ensure good
discussions and the production of actionable results.
The scenario facilitators have four principal objectives:
• to ease the conversation, making sure that all participants have an opportunity to
express themselves;
• to comment on, and criticise the scenarios to make sure that the scenarios are
representative of the risk profile of the target perimeter;
46 4 The Consensus Approach

Fig. 4.2 Illustration (for example) of a workshop supporting template

• to develop and deepen participants proposals;


• to develop plans of action such as controls, mitigants and insurances.
During the workshop, if a secretary has not been designated, the participants must
nominate someone. The minutes of the workshop are quite important though for
obvious reason these might be complicated to take (especially if the debate has a
tendency to diverge), but these need to be as complete as possible, highlighting
the key points and the intermediate conclusions. These will be used for future
references, to improve the process and to show to the Audit department how the
scenario analysis has been performed (audit trail) (Wang and Suter, 2007).

4.3 For the Manager

Scenarios are supposed to support strategic decision-making processes (Postma and


Liebl, 2005), i.e., long-term establishment risk frameworks, therefore misleading
conclusion arising from this process may be dramatic. In this section, the keys for a
reliable process are being discussed, for instance, sponsorship, buy-in and sign-offs
from the owner before the validation.
4.3 For the Manager 47

4.3.1 Sponsorship

In this section, we discuss the question of sponsorship of the scenario program. The
most important tasks an executive sponsor has to achieve are the following (Fed,
2016; Prosci, 2009):
• Take the lead in establishing a budget and assigning the right resources for the
project including, (1) set priorities and balance between project work and day-
to-day work, (2) ensure that the appropriate budget is allocated, (3) appoint an
experienced change manager to support the process.
• Be active with the project team throughout the project: (1) support the definition
of the program and the scope, (2) attend key meetings, (3) set deadlines and
expectations, (4) control deliverables, (5) make himself available to the team
members and (6) set expectations and hold the team accountable, (7) transform a
vision into objectives.
• Engage and create support with other senior managers: (1) represent the project
in front of its peers, (2) enure that key stakeholders are properly trained, (3) sell
the process to other business leaders and ensure good communication, (4) hold
mid-level managers accountable, (5) form, lead and drive a steering committee
of key stakeholders and (6) ensure that resistance from other senior managers is
dealt with prior the initialisation of the process.
• Be an active and visible spokesperson for the change: (1) help the team under-
stand the political landscape and hot spots, (2) use authority when necessary.
Participants cited the following areas as the most common mistakes made by
executive sponsors that they would advise other senior managers to avoid. Note
that each one of them may lead to a failing scenario analysis
• Not visibly supporting the change throughout the entire process. The sponsor
should ensure that he does not become disconnected from the project.
• Abdicating responsibility or delegating too much.
• Not communicating properly to explain why the task undertaken is necessary.
• Failing to build a coalition of business leaders and stakeholders to support the
project.
• Moving on to the next change before the current change is in place or changing
priorities too soon after the project has started.
• Underestimating resistance of managers and not addressing this one properly.
• Failing to set expectations with mid-level managers and front-line supervisors
related to the change and change process.
• Spending too little time on the project to keep it on track and with the project
team to help them overcome obstacles.
48 4 The Consensus Approach

4.3.2 Buy-In

Employees buy-in is when employees are committed to the mission and/or goals set
by their company, and/or also find the day-to-day work personally meaningful. Buy-
in promotes engagement and a willingness to go the extra mile on the job (Davis,
2016).
Most of the time, when a request is made from a perfect stranger, even those
who comply will give the person asking a really odd look. The main reason why
so few comply, and those who do still show reluctance, is that no one knows why
they are supposed to do something on demand, especially if doing so seems rather
pointless. They are not committed to following the instruction, and have thus not
“bought into” the goal of the request. Now, if you were asked to do something that
you know is important, or that you feel committed to doing, you would very likely
comply because you buy into the aims and goals underlying the request. In fact,
you would comply willingly, and perhaps even eagerly, because of how much the
request echoes with you.
Obtaining stakeholders buy-in provide more assurance that the process will lead
to decision of better quality as they would be committed to the success of the
process.

4.3.3 Validation

The validation aspect is also very important as the idea is to tackle the issues
mentioned earlier such as the fact that potentially the consensus would lead to
sub-optimal outcomes and therefore would have a limited reliability. Indeed, this
would jeopardise future use of scenarios but even more dramatically may limit the
applicability or the usefulness of the process in terms of risk management.
One way to validate would be to use a challenger-champion approach (Hassani,
2015; BoE, 2013) and therefore to implement, for example, one or more strategies
suggested in the next chapters. The second is to use internal and external data
available as benchmarks.

4.3.4 Sign-Offs

All projects need at some stage or other a formal sign-off. This step of the process
is the final stamp given by people ultimately accountable. This is the guarantee that
the consensus is now accepted by top executives (Rosenhead, 2012).
It is rather important to note that following the workshops and therefore the selec-
tion of the rules, a pre sign-off should be provided, i.e., mid to top managers in the
scale of accountability should sign-off the results as they are, before any challenge
4.4 Alternatives and Comparison 49

process or any piece of validation as this would demonstrate the ownership of the
scenarios. This demonstrates that the accountability of the materialisation of these
scenarios lies with them. Furthermore, speaking from experience and from a more
pragmatic point of view, if someone does not pre sign-off the initial outcome and
these are challenged following the validation process, for instance this one require
that the scenario has to be reviewed, the managers will be reluctant to sign them off
afterwards, and the entire process will be jeopardised.

4.4 Alternatives and Comparison

In this section, we aim at discussing the limitations of the strategies as long as


potential alternative solutions. If an entity has adopted a book of rules (or policies)
for conducting its meetings, it is still free to adopt its own rules which supersede
any rules in the adopted policy with which they conflict. The only limitations might
come from the rules in a parent organisation or from the law. Otherwise, the policies
are binding on the society.
Consensus decision making is an alternative to commonly practiced non-
collaborative decision-making processes. Robert’s Rule of Order (Robert, 2011),
for instance, is a process used in many institution. Robert’s Rules objective is to
structure the debate. Proposals are then submitted to a vote for selection and/or
approval. This process does not aim at reaching a full agreement, nor does it enable
or imply some collaboration among a group of people or the inclusion concerns
from the minority in the resulting proposals. The process involves adversarial
debate and consequently the apparition of confronting parties. This may impact the
relationships between groups of people and undermine the capability of a group to
carry out the controversial decision.
Besides, as implied before, consensus decision making is an alternative to the
hierarchical approach in which the group implements what the top management
deems appropriate. This decision-making process does not include the participation
of all interested stakeholders. The leaders may gather inputs, but the group of
intended stakeholders are not participating to the key decisions. There is no coalition
formation and agreement of a majority is not an objective. However, it is important
to nuance that this does not necessarily mean that the decision is bad.
The process may induce rebellion or complaisance from the group members
towards the top managers and therefore may lead to a split of the larger group
into two factions. The success of the decision to be implemented is also relying
on the strength, the authority or the power of the senior management. Indeed, senior
managers challenged by subordinated people may lead to the poor implementation
of key decisions, especially if they do not openly challenge the senior manager.
Besides, the resulting decisions may overlook important concerns of those directly
affected resulting in poor group relationship dynamics and implementation prob-
lems.
50 4 The Consensus Approach

Consensus decision making addresses the problems observed in the previous two
alternatives. To summarise, the consensus approach should lead to better decisions
as the inputs of various stakeholders are considered, consequently issued proposals
are more likely to tackle most concerns and issues raised during the workshops and
therefore be more reliable for the group. In this collaborative process the wider the
agreement, the better the implementation of the resulting decision. As corollary the
relationships quality, the cohesion and the collaboration among or between factions,
groups of people or departments would largely be enhanced.
To conclude this chapter, more elaborate models of consensus decision making
exist as this field is in perpetual evolution such as consensus-oriented decision-
making model (Hartnett, 2011), however, as they are not the focal point of this
book, we refer the reader to the appropriate bibliography.

References

Andersen, I.-E., & Jaeger, B. (1999). Scenario workshops and consensus conferences: Towards
more democratic decision-making. Science and Public Policy, 26(5), 331–340.
Avery, M. (1981). Building united judgment: A handbook for consensus decision making. North
Charleston: CreateSpace Independent Publishing Platform.
BoE. (2013). A framework for stress testing the UK banking system. London: Bank of England.
Davis, O. (2016). Employee buy-in: Definition & explanation. study.com/academy.
Fed. (2016). Comprehensive capital analysis and review 2016 summary instructions. Washington,
DC: Federal Reserve Board.
Fehr, E., & Fischbacher, U. (2003). The nature of human altruism. Nature, 425(6960), 785–791.
Hartnett, T. (2011). Consensus oriented decision-making. Gabriola Island: New Society Publishers.
Hassani, B. (2015). Model risk - From epistemology to management. Working paper, Université
Paris 1.
Heitzig, J., & Simmons, F. W. (2012). Some chance for consensus: Voting methods for which
consensus is an equilibrium. Social Choice and Welfare, 38(1), 43–57.
Postma, T., & Liebl, F. (2005). How to improve scenario analysis as a strategic management tool?
Technological Forecasting and Social Change, 72, 161–173.
Prosci (2009). Welcome to the change management tutorial series.
www.change-management.com/tutorial-change-sponsorship.htm.
Robert, H. M. (2011). Robert’s rules of order newly revised (11th ed.). Philadelphia: Da Capo
Press.
Rosenhead, R. (2012). Project sign off - do people really know what this means? www.
ronrosenhead.co.uk.
Wang, H., & Suter, D. (2007). A consensus-based method for tracking: Modelling background
scenario and foreground appearance. Pattern Recognition, 40(3), 1091–1105.
Welch Cline, R. J. (1990). Detecting groupthink: Methods for observing the illusion of unanimity.
Communication Quarterly, 38(2), 112–126.
Chapter 5
Tilting Strategy: Using Probability Distribution
Properties

As implied in the previous chapter, scenario analysis cannot be disconnected from


the concept of statistical distributions. Indeed, by using the term scenarios, we are
specifically dealing with situation that never materialised in a target institution either
at all or at least in magnitude, therefore the exposure analysed cannot be dissociated
from a likelihood. A scenario is nothing more than the realisation of a random
variable, and as such follows the distribution representative of the underlying loss
generating process. A probability distribution (or probability mass function for
discrete random variables) assigns a probability to each measurable subset of the
possible outcomes of a story line.
Considering the data analysis solutions provided in the previous chapters, it
is possible to fit some distributions and to use these distributions to model the
scenarios. Indeed, the scenarios can be represented by tilting the parameters
obtained fitting the distributions. These parameters are usually representative of
some characteristics of the underlying data, for instance, the mean, the median,
the variance, the skewness, the kurtosis, the location, the shape, the scale, etc.
Therefore, the scenarios can be applied to these parameters, for example, the median
traditionally representative of a typical loss might be increased by 20 % and we
could re-evaluate consistently other risk measures to understand such an impact on
the global exposure.
By tilting, we cause the slope to raising one end, or inclining another. The
parameters of the distributions are impacted positively or negatively to represent the
scenario to be analysed. Then the impact on the risk measure is assessed. Therefore
in this chapter, we will analyse the theoretical foundations of such an approach,
i.e., the distributions, the estimation procedure and the risk measures. Besides, we
will provide some illustrations related to real cases. A last section will provide the
managers with pros and cons to use this approach as long as the methodological
issues.

© Springer International Publishing Switzerland 2016 51


B.K. Hassani, Scenario Analysis in Risk Management,
DOI 10.1007/978-3-319-25056-4_5
52 5 Tilting Strategy: Using Probability Distribution Properties

5.1 Theoretical Basis

In this section we introduce the concepts required to implement a tilting strategy, for
instance, the distribution and the risk measures as long as the estimation approaches
required to parametrise these distributions.

5.1.1 Distributions

This section proposes several alternatives for the fitting of a proper distribution
to the information set related to a risk (losses, incidents, etc.). Understanding the
distributions characterising each risk is necessary to understand the associated
measures. The elliptical domain (Gaussian or Student distribution) should not be
left aside, but has its properties are well known, we will focus on distributions which
are asymmetric and leptokurtic such as the generalised hyperbolic distributions
(GHD), the generalised Pareto distributions or the extreme value distributions
among others.1 But before discussing parametric distributions, we will introduce
non-parametric approaches as these allow representing the data as they are and may
support the selection of a parametric distribution if necessary.
Non-parametric statistics are a very useful and practical alternative to represent
the data (Müller et al., 2004), either using a histogram or a kernel density. A his-
togram (Silverman, 1986) gives a good representation of the empirical distribution,
but the kernel density has the major advantage of enabling the transformation of a
discrete empirical distribution into a continuous one (Wand and Jones, 1995).
To introduce this method, we give the density estimator formula. Let X1 ; : : : ; Xn
be an empirical distribution. Its unknown density function is denoted f , and we
assume that f has continuous derivatives of all order required, denoted f 0 ; f 00 ; : : :.
Then the estimated density of f is
 
1 X
n
x  Xi
fO.xI h/ D K ; (5.1.1)
nh iD1 h

R C1 R C1
where K is the kernel function satisfying 1 K.t/dt D 1, 1 tK.t/dt D 0
R C1
and 1 t2 K.t/dt D k2 ¤ 0, k2 is a constant denoting the variance of the kernel
distribution and h is the bandwidth.
The choice of the kernel nature has no particular importance; however, the
resulting density is very sensitive to the bandwidth selection. The global error of
the density estimator fO.xI h/ may be measured by the mean square error (MSE):

MSE. fO.xI h// D EŒ fO.xI h/  f .x/2 (5.1.2)

1
Note that the elliptic domain is part of the GH family.
5.1 Theoretical Basis 53

This one can be decomposed,

MSE. fO.xI h// D Var. fO.xI h//  .EŒ fO.xI h/  f .x//2 ; (5.1.3)

where,

biash .x/ D EŒ fO.xI h/  f .x/ (5.1.4)


Z C1
D K.t/. f .x  ht/  f .x//dt (5.1.5)
1
1 2 00
D h f .x/k2 C higher-order terms in h; (5.1.6)
2
is the bias, and the integrated square bias is approximately,
Z C1 Z C1
1
biash .x/ dx  h4 k22
2
f 00 .x/2 dx: (5.1.7)
1 4 1

Z C1
1 1
VarfO.xI h/ D f .x/ K.t/2 dt C O. / (5.1.8)
nh 1 n
Z C1
1
 f .x/ K.t/2 dt: (5.1.9)
nh 1

is the variance of the estimator, and the integrated variance is approximately,


Z C1 Z C1
1
VarfO.xI h/dx  K.t/2 dt: (5.1.10)
1 nh 1

Indeed, estimating the bandwidth, we face a trade-off between the bias and the
variance, but this decomposition allows easier analysis and interpretation of the
performance of the kernel density estimator.
The most widely used way of placing a measure on the global accuracy of fO.xI h/
is the mean integrated squared error (MISE):
Z C1
MISE. fO.xI h// D EŒ fO.xI h/  f .x/2 dx (5.1.11)
1
Z C1
D MSE. fO.xI h//dx (5.1.12)
1
Z C1 Z C1
D 2
biash .x/ dx C VarfO.xI h/dx: (5.1.13)
1 1

But, as the previous expressions depend on the bandwidth, it is difficult to


interpret the influence of this one on the performance of the kernel, therefore, we
54 5 Tilting Strategy: Using Probability Distribution Properties

derive an approximation of the MISE which is the asymptotic MISE or AMISE,


Z C1 Z C1
1 1
AMISE. fO.xI h// D K.t/2 dt C h4 k22 f 00 .x/2 : (5.1.14)
nh 1 4 1

R C1 R C1
Let #.K.t// D 1 t2 K.t/dt and . f .x// D 1 f .x/2 dx, for any square
integrable function f , then the relation (5.1.14) becomes

1 1
AMISE. fO.xI h// D .K.t// C h4 k22 . f 00 .x//: (5.1.15)
nh 4
The minimisation of the AMISE with respect to the parameter h permits the
selection of the appropriate bandwidth. As the optimal bandwidth selection is not in
the core of this book, we will only refer the reader to the bibliography included
in this section. Now that the non-parametric distributions have been properly
introduced, we can present other families of distributions that will be of interest
for the methodology presented in this chapter.
The GHD is a continuous probability distribution defined as a mixture of
an inverse Gaussian distribution and a normal distribution. The density function
associated with the GHD is
p
.=ı/ 2
ˇ.x/ K 1=2 .˛ ı C .x  / /
2
f .x; / D p e p ; (5.1.16)
2
K .ı / . ı 2 C .x  2 /=˛/1=2

with 0  jˇj < ˛. This class of distributions is very interesting as it relies on five
parameters. If the shape parameter is fixed then several well-known distributions
can be distinguished:
1. D 1: Hyperbolic distribution
2. D 1=2: NIG distribution
3. D 1 and ! 0: Normal distribution
4. D 1 and ! 1: Symmetric and asymmetric Laplace distribution
5. D 1 and ! ˙ : Inverse Gaussian distribution
6. D 1 and j j ! 1: Exponential distribution
7. 1 < < 2: Asymmetric Student
8. 1 < < 2 and ˇ D 0: Symmetric Student
9.  D 0 and 0 < < 1: Asymmetric Normal Gamma distribution
The four other parameters can then be associated with the first four moments
permitting a very good fit of the distributions to the corresponding losses as it
captures all intrinsic features of these ones.
The next interesting class of distribution permits to model extremes relying on a
data set defined above a particular threshold. Let X a r.v. with distribution function
F and right end point xF and a fixed u < xF . Then,

Fu .x/ D PŒX  u  xjX > u; x  0;


5.1 Theoretical Basis 55

is the excess distribution function of the r.v. X (with the df F) over the threshold u,
and the function

e.u/ D EŒX  ujX > u

is called the mean excess function of X which can play a fundamental role in risk
management. The limit of the excess distribution has the distribution G defined by:
( 1
1  .1 C x/ ¤ 0;
G .x/ D
1  ex D 0; :

where,

x0  0;
0  x   1 < 0; :

The function G .x/ is the standard generalised Pareto distribution (Pickands, 1975;
Danielsson et al., 2001; Luceno, 2007). One can introduce the related location-scale
family G ; ;ˇ .x/ by replacing the argument x by .x  /=ˇ for 2 R, ˇ > 0. The
support has to be adjusted accordingly. We refer to G ; ;ˇ .x/ as GPD.
The next class of distributions is the class of ˛-stable distributions (McCulloch,
1996) defined through their characteristic function also relying on several param-
eters. For 0 < ˛  2,  > 0, ˇ 2 Œ1; 1 and  2 RC , S˛ .; ˇ; / denotes
the stable distribution with the characteristic exponent (index of stability) ˛, the
scale parameter , the symmetric index (skewness parameter) ˇ and the location
parameter . S˛ .; ˇ; / is the distribution of a r.v. X with characteristic function,

exp.ix   ˛ jxj˛ .1  iˇsign.x/ tan.
˛=2/// ˛ ¤ 1;
EŒeixX  D (5.1.17)
exp.ix  jxj.1 C .2=
/iˇsign.x/ ln jxj// ˛ D 1 ;

where x 2 R, i2 D 1, sign.x/ is the sign of x defined by sign.x/ D 1 if x >


0, sign.0/ D 0 and sign.x/ D 1 otherwise. A closed form expression for the
density f .x/ of the distribution S˛ .; ˇ; / is available in the following cases: ˛ D 2
(Gaussian distribution), ˛ D 1 and ˇ D 0 (Cauchy distribution) and ˛ D 1=2 and
ˇ D C=1 (Levy distributions). The index of stability ˛ characterises the heaviness
of the stable distribution S˛ .; ˇ; /.
Finally we introduce the g-and-h random variable Xg;h obtained transforming the
standard normal random variable with the transformation function Tg;h :
( 2
exp.gy/1
exp. hy2 / g ¤ 0;
Tg;h .y/ D g
2 (5.1.18)
y exp. hy2 / gD0 :
56 5 Tilting Strategy: Using Probability Distribution Properties

Thus

Xg;h D Tg;h .Y/; when Y  N.0; 1/:

This transformation allows for asymmetry and heavy tails. The parameter g
determines the direction and the amount of asymmetry. A positive value of g
corresponds to a positive skewness. The special symmetric case which is obtained
for g D 0 is known as h distribution. For h > 0 the distribution is leptokurtic with
the mass in the tails increasing with h.
Now with respect to the risks we need to assess if the estimates and the fitting of
the univariate distributions is adapted to the data sets. The models will be different
depending on the kind of risks we would like to investigate.
It is important to bear in mind that the distributions presented in this chapter
are non- exhaustive, and other kind of distributions might be more appropriate in
specific situations. We focused on these distributions as their characteristics make
them appropriate to capture risk data properties, in particular the asymmetry and
the thickness of the tails. Besides, in the next chapter, we present another scenario
strategy relying on generalised extreme value distributions.

5.1.2 Risk Measures

Scenario analysis for risk management cannot be departed from the concept of risk
measure, as there is no risk management without measurement, in other words, to
evaluate the quality of the risk management, this one needs to be benchmarked.
Initially risks in financial institutions were evaluated using the standard deviation.
Nowadays, the industry moved towards quantile-based downside risk measures
including the Value-at-Risk (VaR˛ for confidence level ˛) or Expected Shortfall.
The VaR˛ measures the losses that may be expected for a given probability, and
corresponds to the quantile of the distribution which characterises the asset or the
type of events for which the risk has to be measured, while the ES represents the
average loss above the VaR. Consequently, the fit of an adequate distribution to the
risk factor is definitively an important task to obtain a reliable risk measure.
The definitions of these two risks measures are recalled below:
Definition 5.1.1 Given a confidence level ˛ 2 .0; 1/, the VaR˛ is the relevant
quantile2 of the loss distribution, VaR˛ .X/ D inffx j PŒX > x 6 1  ˛g D
inffx j FX .x/ > ˛g where X is a risk factor admitting a loss distribution FX .

2
VaR˛ .X/ D q1˛ D FX1 .˛/.
5.1 Theoretical Basis 57

Definition 5.1.2 The Expected Shortfall (ES˛ ) is defined as the average of all losses
which are equal or greater than VaR˛ :
Z 1
1
ES˛ .X/ D VaR˛ dp
1˛ ˛

The Value-at-Risk initially used to measure financial institutions market risk was
popularised by Morgan (1996). This measure indicates the maximum probable loss
given a confidence level and a time horizon.3 The expected shortfall has a number of
advantages over the VaR˛ because it takes into account the tail risk and fulfills the
sub-additive property. It has been widely dealt with in the literature, for instance, in
Artzner et al. (1999), Rockafellar and Uryasev (2000, 2002) and Delbaen (2000).
Nevertheless even if regulators require banks to use the VaR˛ and recently the
ES˛ to measure their risks and ultimately provide the capital requirements to avoid
bankruptcy these risk measures are not entirely satisfactory:
• They provide a risk measure for an ˛ which is too restrictive considering the risk
associated with the various financial products.
• The fit of the distribution functions can be complex or inadequate in particular
for the practitioners who want to follow regulatory guidelines (Basel II/III
guidelines). Indeed, in the operational risk case, the suggestions is to fit a GPD
which does not correspond very often to a good fit and its implementation turns
out to be difficult.
• It may be quite challenging to capture extreme events, when taking into account
these events in modelling the tails of the distributions is determinant.
• Finally all the risks are computed considering unimodal distributions which may
be unrealistic in practice.
Recently several extensions have been analysed to overcome these limitations
and to propose new routes for the risk measures. These new techniques are briefly
recalled and we refer to Guégan and Hassani (2015) for more details, developments
and applications:
• Following our proposal we suggest the practitioners to use several ˛ to obtain
a spectrum of their expected shortfall and to visualise the evolution of the ES
with respect to these different values. Then, a unique measure can be provided
making a convex combination of these different ES with appropriate weights.
This measure is called spectral measure (Acerbi and Tasche, 2002).
• In the univariate approach if we want to take into account information contained
in the tails we cannot restrict to the GPD as suggested in the guidelines provided
by the regulators. As mentioned before, there exist other classes of distributions

3
The VaR˛ is sometimes referred to as the “unexpected” loss.
58 5 Tilting Strategy: Using Probability Distribution Properties

which are very interesting, for instance, the generalised hyperbolic distribu-
tion (Barndorff-Nielsen and Halgreen, 1977), the extreme value distributions
including the Gumbel, the Frechet and the Weibull distributions (Leadbetter,
1983), the ˛-stable distributions (Taqqu and Samorodnisky, 1994) or the g-and-h
distributions (Huggenberger and Klett, 2009) among others.
• Nevertheless the previous distributions are not always sufficient to properly fit the
information in the tails and another approach could be to build new distributions
shifting the original distribution on the right or left parts in order to take a
different information in the tails. Wang (2000) proposes such a transformation
of the initial distribution which provides a new symmetrical distribution. Sereda
et al. (2010) extend this approach to distinguish the right and left part of the
distribution taking into account more extreme events. The function applied to
the initial distribution for shifting is called a distortion function. This idea is
interesting as the information in the tails is captured in a different way using the
previous classes of distributions.
• Nevertheless when the distribution is shifted with a function close to the Gaussian
one as in Wang (2000) and Sereda et al. (2010) the shifted distribution remains
unimodal. Thus we propose to distort the initial distribution with polynomials of
odd degree in order to create several humps in the distributions. This permits to
catch all the information in the extremes of the distributions, and to introduce a
new coherent risk measure .X/ computed under the g ı f .x/ distribution where
g is the distortion operator and f .x/ the initial distribution (FX represent the
cumulative distribution function), thus we get

.X/ D Eg ŒFX1 .x/jFX1 .x/ > FX1 .ı/: (5.1.19)

All these previous risk measures can be included within a scenario analysis
process or a stress-testing strategy.

5.1.3 Fitting

In order to use the distributions presented above and the associated risk measures
discussed in the previous section, their parameters have to be estimated, i.e.,
the parameters allowing an appropriate representation of the phenomenon to be
modelled. In the next paragraphs, several methodologies which could be imple-
mented, depending on the situation (i.e. the data, the properties of the distributions,
etc.), to estimate the parameters of the distributions selected, are presented. The
first methodology to be presented is the maximum likelihood estimation (MLE)
(Aldrich, 1997). This one can be formalised as follows:
Let x1 ; x2 ; : : : ; xn be n independent and identically distributed (i.i.d.) observa-
tions, of probability density function f .:j/, where  is a vector of parameters. In
order to use the maximum likelihood approach, the joint density function for all
5.1 Theoretical Basis 59

observations is specified. For an i.i.d. sample, this one is

f .x1 ; x2 ; : : : ; xn j / D f .x1 j /  f .x2 j/      f .xn j /: (5.1.20)

Then the likelihood function is obtained using x1 ; x2 ; : : : ; xn as parameters of this


function, whereas  becomes the variable:

Y
n
L. I x1 ; : : : ; xn / D f .x1 ; x2 ; : : : ; xn j / D f .xi j /: (5.1.21)
iD1

In practice a monotonic and strictly increasing transformation using a logarithm


function makes it easier to use and does not change the outcome of the methodology:

X
n
ln L. I x1 ; : : : ; xn / D ln f .xi j /; (5.1.22)
iD1

or the average log-likelihood,

1
`O D ln L: (5.1.23)
n

estimates the expected log-likelihood of a single observation in the model. O is or



are the value(s) that maximises Ò. Ix/. If a maximum does exist, then the estimator is

fOmle g farg max `.


O I x1 ; : : : ; xn /g; (5.1.24)
 2‚

For some distributions the maximum likelihood estimator can be written as a closed
form formula, while for some others a numerical method has to be implemented.
Bayesian estimation may be used to fit the distribution, though this one will only
be briefly introduced here as the maximum likelihood estimator coincides with the
most probable Bayesian estimator (Berger, 1985) given a uniform prior distribution
on the parameters. Note that Bayesian philosophy differs from the more traditional
frequentist approach. Indeed, the maximum a posteriori estimate of  is obtained
maximising the probability of  given the data:

f .x1 ; x2 ; : : : ; xn j /P./
P. j x1 ; x2 ; : : : ; xn / D (5.1.25)
P.x1 ; x2 ; : : : ; xn /

where P./ is the prior distribution of the parameter  and where P.x1 ; x2 ; : : : ; xn /
is the probability of the data averaged over all parameters. Since the denominator is
independent of , the Bayesian estimator is obtained maximising f .x1 ; x2 ; : : : ; xn j
/P./ with respect to . If the prior P./ is a uniform distribution, the Bayesian
estimator is obtained maximising the likelihood function f .x1 ; x2 ; : : : ; xn j /
as presented above. We only wanted to introduce that aspect of the maximum
60 5 Tilting Strategy: Using Probability Distribution Properties

likelihood estimator to show how everything is related. Indeed, the Bayesian


framework will be discussed in a subsequent chapter. Note that Bayesian estimation
might be quite powerful in situations where the number of data points is very small.
Multiple variations of the maximum likelihood already exist such as quasi
maximum likelihood (Lindsay, 1988), restricted maximum likelihood (Patterson and
Thompson, 1971) or the penalised maximum likelihood (Anderson and Blair, 1982),
and these may be more appropriate in some particular situations.
Another popular alternative approach to estimate parameters is the generalised
method of moments (Hansen, 1982). This one can be formalised as follows:
Consider a data set fzt;tD1;:::;T g representing realisations of a random variable.
This random variable follows a distribution which is driven by an unknown
parameter (or set of parameters)  2 ‚. In order to be able to apply GMM, g.zt ; /
are required such that

m.0 /
EŒg.zt ; 0 / D 0; (5.1.26)

where E denotes expectation. Moreover, the function m./ must differ from zero for
 ¤ 0 . The basic idea behind the GMM is to replace the theoretical expected value
EŒ: with its empirical sample average:

1X
T
O
m./
g.zt ; / (5.1.27)
T tD1

and then to minimise the norm of this expression with respect to . The  value (or
set of values) minimising the norm of the expression above is our estimate for 0 .
By the law of large numbers, m. O /  EŒg.z; /Dm. / for large data sample, and thus
O 0 /  m.0 /D . The GMM looks for a number O which would make
we expect that m.
O O / as close to zero as possible.4 The properties of the resulting estimator will
m.
depend on the particular choice of the norm function, and therefore the theory of
GMM considers an entire family of norms, defined as
2
km./k
O O 0 W m./;
W D m./ O (5.1.28)

where W is a positive-definite weighting matrix, and mO 0 denotes the transposition


O In practice, the weighting matrix W is obtained using the available data set,
of m.
which will be denoted as WO . Thus, the GMM estimator can be written as
 0  X 
1X
T T
1
O D arg min g.zt ; / WO g.zt ; / (5.1.29)
 2‚ T tD1 T tD1

4
The norm of m, denoted as jjmjj, measures the distance between m and zero.
5.1 Theoretical Basis 61

Under suitable conditions this estimator is consistent, asymptotically normal, and


with the appropriate weighting matrix WO also asymptotically efficient.

5.1.4 Goodness-of-Fit Tests

To ensure the quality of a distribution adjustment, this one has to be assessed.


Indeed, an inappropriate fitting will mechanically lead to inappropriate outcomes.
Therefore, goodness-of-fit tests have to be implemented. The goodness of fit of
a statistical model describes how well it fits a set of observations. Goodness-of-
fit measures summarise the discrepancy between observed values and the values
expected using the tested model. Four of the most common tests are presented
below, the Kolmogorov–Smirnov test (Smirnov, 1948), the Anderson–Darling test
(Anderson and Darling, 1952), the Cramér–von-Misses test (Cramér, 1928) and the
chi-square test (Yates, 1934).
For the first one, i.e., the Kolmogorov–Smirnov test, the empirical distribution
function Fn for n i.i.d. observations Xi is defined as

1X
n
Fn .x/ D IŒ1;x .Xi / (5.1.30)
n iD1

where IŒ1;x .Xi / is the indicator function, equal to 1 if Xi  x and 0 otherwise. The
statistic for a given cumulative distribution function F.x/ is

Dn D sup jFn .x/  F.x/j (5.1.31)


x

where supx is the supremum of the set of distances. Glivenko–Cantelli theorem


(Tucker, 1959) tells us that Dn converges to 0 almost surely in the limit when
n goes to infinity, if the data comes from distribution F.x/. Kolmogorov and
Donsker (Donsker, 1952) strengthened this result, providing the convergence rate.
In practice, the statistic requires a relatively large number of data points to properly
reject the null hypothesis.
Then, the Anderson–Darling and Cramér–von Mises tests can be presented. Both
statistics belong to the class of quadratic empirical distribution function. Let F be
the assumed distribution, and Fn the empirical cumulative distribution function, then
both statistics measure the distance between F and Fn :
Z 1
n .Fn .x/  F.x//2 w.x/ dF.x/; (5.1.32)
1

where w.x/ is a weighting function. When w.x/ D 1, the previous equation


represents the Cramér–von Mises statistic. The Anderson–Darling test is based on a
62 5 Tilting Strategy: Using Probability Distribution Properties

different distance
Z 1
.Fn .x/  F.x//2
ADn dF.x/; (5.1.33)
1 F.x/ .1  F.x//

for which the weight function is given by w.x/ D ŒF.x/ .1  F.x//1 . As a


consequence, the Anderson–Darling statistic puts more weight on the tail than the
Cramér–von Misses. This might be of interest considering the fat-tailed distributions
presented earlier.
Remark 5.1.1 These tests are non-parametric, i.e., the larger the number of data
points the lower the chance of the tests to accept the distributions. This could be a
major drawback as the other way around is also true, the lower the number of data
points the larger the chance that the test is going to accept the distribution. However
in this case the robustness of the fitting would be highly questionable.
While previous tests are usually preferred to evaluate the fit of continuous
distributions to a data sample, the next test is usually implemented on discrete
distributions, and might be of interest to compare two data samples. Indeed, in this
paragraph, the 2 test statistic is presented. Mathematically, the statistic is given as
follows:

X
n X
n  2
2 .Oi  Ei /2 Oi =N  pi
D DN pi : (5.1.34)
iD1
Ei iD1
pi

where Oi is the number of observations of type i, N is total number of observations,


Ei D Npi D the theoretical frequency of type i, asserted by the null hypothesis
that the fraction of type i in the population is pi and n is the number of buckets of
possible outcomes.
In reality this statistic is the Pearson’s cumulative test statistic, which asymptot-
ically approaches a 2 distribution. The 2 statistic can then be used to calculate a
p-value by comparing the value of the statistic to a 2 distribution. The number of
degrees of freedom, which has to be used to compute the appropriate critical value,
is equal to the number of buckets n  p.

5.2 Application

In this section, we propose to show the impact on parameters of some scenarios


and we represent the shift and the distortion of the distributions as long as the
impacts on percentiles and risk measures. Therefore, we will take a right skewed
and leptokurtic data sets, i.e., the tail is asymmetric on the right and this one is fatter
than the equivalent tail of a Gaussian distribution.
5.2 Application 63

30,000 Split Data


Frequency
20,000
10,000
0

0 50,000 100,000 150,000 200,000


Losses

Fig. 5.1 This figure represents three types of data, as illustrated, these data sets combined (as
discussed in the first section) may lead to multimodal distribution

Combination of Data
10,000 20,000 30,000 40,000 50,000 60,000
Frequency
0

0 50,000 100,000 150,000 200,000


Losses

Fig. 5.2 This figure is represent the same data as the previous one, though, here the data are not
juxtaposed but combined

Following the process described in the previous paragraphs, in a first step we


use a histogram to represent the data, to see how these are empirically distributed.
Figure 5.1 represents the data. As shown these data are representative of the same
story line but triggered from three different processes. The three colours represent
the distributions of each data set taken independently, while Fig. 5.2 represents the
histogram of the data once combined.
It is interesting to note that the empirical distributions taken independently have
completely different features. This is something not particularly unusual depending
on the granularity of the event we are interested in modelling, for example, if we
are interested in analysing the exposure of the target financial institution to external
fraud, these may be combining cyber attacks, credit card frauds, Ponzi schemes,
credit application fraud and so on and so forth. Consequently it is not unlikely to be
confronted to multimodal distributions.
64 5 Tilting Strategy: Using Probability Distribution Properties

0e+00 1e−05 2e−05 3e−05 4e−05 5e−05 Kernel Density Adjusted on the Data
Probabilities

0 50,000 100,000 150,000 200,000


Losses

Fig. 5.3 This figure represents how the empirical distributions should have been modelled if the
data were not combined

Kernel Density Adjusted on Combined Data


0.0e+00 5.0e−06 1.0e−05 1.5e−05 2.0e−05 2.5e−05
Probabilities

0 50,000 100,000 150,000 200,000

Losses

Fig. 5.4 This figure illustrates a kernel density estimation on the combined data set

Once these have been represented, the first strategy to be implemented to fit the
data is a kernel density estimation. In that case, assuming an Epanechnikov kernel,
it is possible to see that the shape of the densities adjusted on each individual
distribution (Fig. 5.3), as long as the one adjusted on the combined data sets
(Fig. 5.4), is similar to the histogram represented in Fig. 5.2. Therefore these could
be adequate solutions to characterise the initial distribution. However, as these
methodologies are non-parametric, it is not possible to shock the parameters, but
the shape of the represented distribution may help selecting the right family as
introduced earlier in this chapter.
Therefore, once the right distribution has been selected, such as a lognormal, an
˛-stable or any other suitable distribution, we can compare the fittings. Figure 5.5
shows on a single plot different adjustments. As depicted depending on the
5.3 For the Manager: Pros and Cons 65

0.20 X3

X1
0.15
Probabilities

X2
0.10
0.05

X4
0.00

1000 5,264,106 15,790,316 26,316,527 36,842,737 47,368,948 57,895,158 68,421,369

Losses

Fig. 5.5 In this figure four distributions are represented illustrating how data would be fitted
and represented by these distributions. This figure illustrates how by tilting the data, we could
move from an initial thin tailed distribution (X1) to a fat-tailed distribution (X4). The fat-tail
representation will lead by construction to higher risk measures

adjustment, we will capture slightly different characteristics of the underlying


data, and therefore different scenario values for a given distribution. Note that the
goodness-of-fit tests described in the previous section may support the selection
of a distribution over another, however, practitioners expertise may also contribute
to the selection particularly in the case of emerging risks, i.e., risk which never
materialised or for which no data has ever been collected yet.
Besides, Fig. 5.5 also illustrates the fact that considering the same data set,
fitting different distributions may lead to various risk measures. For instance, in
our example, 57,895,158 euros represents the 95th percentile of X1 (the VaR), the
96th of X2, the 99th of X3 and the 75th of X4.
Finally, for a given percentile the scenarios may be evaluated, as long as
various risk measures. It is important to note that in the case of multimodal
distributions, distortion risk measure, combination of distributions or a generalised
Pareto distribution might be very useful in practice.

5.3 For the Manager: Pros and Cons

5.3.1 Implementation

In this section, we discuss the pros and cons of the methodology from a manager
point of view, and in particular the added value of the methodology. Indeed, this
methodology is very useful in some cases but it is not appropriate in others. The
right question once again is what are the objectives? For example, for some stress-
testing purposes, this is quite powerful as some of the distributions have properties
that can capture asymmetric shocks, extremes values, etc.
66 5 Tilting Strategy: Using Probability Distribution Properties

However, the managers would need to have an understanding of probabilities,


statistics and mathematics. He would have to understand the limitations of the
approach. Alternatively, the manager could rely on a quant team. Furthermore,
the tilts have to rely on some particular rationale led by business owners, external
data or regulatory requirement. They also would have to transform these pieces of
information into parameters.
In other words, the engineering behind is more complicated, however, in some
particular situation it is able to capture multiple features of a particular risk.
The training of practitioners and business owners is essential and primordial, as
otherwise the outcome will never be transformed in key management actions as the
methodology might be seen as too complicated or worse, not representative.
The understanding of the parameter transformation induced by a scenario may
sometimes be quite difficult to handle as it may be dramatically different from a
class of distributions to another. Therefore the selection of the distribution used to
model a risk plays a major role, and this one might be heavily challenged by the top
management if the process is not properly elaborated and documented.

5.3.2 Distribution Selection

As the name suggests, the generalised hyperbolic family has a very general
form combining various distributions, for instance, the Student’s t-distribution,
the Laplace distribution, the hyperbolic distribution, the normal-inverse Gaussian
distribution, the variance-gamma distribution, among others. It is mainly applied to
areas requiring the capture of larger probabilities in the tails, property the normal
distribution does not possess. However, the five parameters required may make this
distribution complicated to fit.
To apply the second distribution presented above, the GPD, the choice of the
threshold might be extremely complicated (Guégan et al., 2011). Besides, the
estimation of the shape parameter may lead to infinite mean models (shape superior
to 1) which might be complicated to use in practice.
Finally, stable distributions generalise the central limit theorem to random vari-
ables without second moments. Once again, we might experience some problems
as if ˛  1, the first moment does not exist, and therefore the distribution might be
inappropriate in practice.

5.3.3 Risk Measures

VaR has been controversial since 1994, date of its creation by Morgan (1996).
Indeed, the main issue is that VaR is not sub-additive (Artzner et al., 1999). In other
words, the VaR of a combined portfolio can be larger than the sum of the VaRs of
its components.
References 67

VaR users agree that this one can be misleading if misinterpretated:


1. Referring to the VaR as a “worst-case” is inappropriate as it represents a loss
given a probability.
2. By making VaR reduction the central concern of risk management, practitioners
would miss the point, as though it is important to reduce the risk, it might be
more important to understand what happen if the VaR is breached.
3. When losses are extremely large, it is sometimes impossible to define the VaR as
the level of losses for which a risk manager starts preparing for anything.
4. A VaR based on inappropriate assumptions such as always using a Gaussian dis-
tribution no matter the risk profle, or fitting any other inappropriate distribution to
model a specific risk might have dramatic consequences as the risk taken might
not be properly evaluated.
Consequently, the VaR may lead to excessive risk-taking for financial insti-
tutions, as practitioners focus on the manageable risks near the centre of the
distribution and ignore the tails. Besides, it has the tendency to create an incentive to
take “excessive but remote risks” and could be catastrophic when its use engenders
a false sense of security among senior executives.
Finally, as discussed in Guégan and Hassani (2016), depending on the chosen
distributions, VaR˛ can be lower than ESˇ , with ˇ > ˛. Therefore, the risk
measure selected or the level of confidence does not ensure with certainty that the
measurement will be conservative.

References

Acerbi, C., & Tasche, D. (2002). On the coherence of expected shortfall. Journal of Banking and
Finance, 26(7), 1487–1503.
Aldrich, J. (1997). R. A. fisher and the making of maximum likelihood 1912–1922. Statistical
Science, 12(3), 162–176.
Anderson, J. A., & Blair, V. (1982). Penalized maximum likelihood estimation in logistic
regression and discrimination. Biometrika, 69(1), 123–136.
Anderson, T. W., & Darling, D. A. (1952). Asymptotic theory of certain “goodness-of-fit” criteria
based on stochastic processes. Annals of Mathematical Statistics, 23(2), 193–212.
Artzner, P., Delbaen, F., Eber, J. M., & Heath, D. (1999). Coherent measures of risk. Mathematical
Finance 9(3), 203–228.
Barndorff-Nielsen, O., & Halgreen, C. (1977). Infinite divisibility of the hyperbolic and general-
ized inverse Gaussian distributions. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte
Gebiete, 38(4), 309–311.
Berger, J. O., (1985). Statistical decision theory and Bayesian analysis. New York: Springer.
Cramér, H. (1928). On the composition of elementary errors. Scandinavian Actuarial Journal,
1928(1), 13–74.
Danielsson, J., et al. (2001). Using a bootstrap method to choose the sample fraction in tail index
estimation. Journal of Multivariate Analysis, 76, 226–248.
Delbaen, F. (2000). Coherent risk measures. Blätter der DGVFM 24(4), 733–739.
Donsker, M. D. (1952). Justification and extension of Doob’s heuristic approach to the
Kolmogorov–Smirnov theorems. Annals of Mathematical Statistics, 23(2), 277–281.
68 5 Tilting Strategy: Using Probability Distribution Properties

Guégan, D., & Hassani, B. (2015). Distortion risk measures or the transformation of unimodal
distributions into multimodal functions. In A. Bensoussan, D. Guégan, & C. Tapiro (Eds.),
Future perspectives in risk models and finance. New York: Springer.
Guégan, D., & Hassani, B. (2016). More accurate measurement for enhanced controls: VaR vs
ES? In Documents de travail du Centre d’Economie de la Sorbonne 2016.15 (Working Paper)
[ISSN: 1955-611X. 2016] <halshs-01281940>.
Guégan, D., Hassani, B. K. & Naud, C. (2011). An efficient threshold choice for the computation
of operational risk capital. The Journal of Operational Risk, 6(4), 3–19.
Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.
Biometrika, 50(4), 1029–1054.
Huggenberger, M., & Klett, T. (2009). A g-and-h Copula approach to risk measurement in
multivariate financial models, University of Mannheim, Germany, Preprint
Leadbetter, M. R. (1983). Extreme and local dependence in stationary sequences. Zeitschrift für
Wahrscheinlichkeitstheorie und Verwandte Gebiete, 65, 291–306.
Lindsay, B. G. (1988). Composite likelihood methods. Contemporary Mathematics, 80, 221–239.
Luceno, A. (2007). Likelihood moment estimation for the generalized Pareto distribution.
Australian and New Zealand Journal of Statistics, 49, 69–77.
McCulloch, J. H. (1996). On the parametrization of the afocal stable distributions. Bulletin of the
London Mathematical Society, 28, 651–655.
Müller, M., Sperlich, S., & Werwatz, A. (2004). Nonparametric and semiparametric models.
Springer series in statistics. Berlin: Springer.
Patterson, H. D., & Thompson, R. (1971) Recovery of inter-block information when block sizes
are unequal. Biometrika, 58(3), 545–554.
Pickands, J. (1975). Statistical inference using extreme order statistics. Annals of Statistics, 3,
119–131.
Morgan, J. P. (1996). Riskmetrics technical document.
Rockafellar, R. T., & Uryasev, S. (2000). Optimization of conditional value-at-risk. Journal of
Risk, 2(3), 21–41.
Rockafellar, R. T., & Uryasev, S. (2002). Conditional value at risk for general loss distributions.
Journal of Banking and Finance, 26(7), 1443–1471.
Sereda, E. N., et al. (2010). Distortion risk measures in portfolio optimisation. Business and
economics (Vol. 3, pp. 649–673). Springer, New York
Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman
and Hall/CRC.
Smirnov, N. (1948). Table for estimating the goodness of fit of empirical distributions. Annals of
Mathematical Statistics, 19, 279–281.
Taqqu, M., & Samorodnisky, G. (1994). Stable non-Gaussian random processes. New York:
Chapman and Hall.
Tucker, H. G. (1959). A generalization of the Glivenko-Cantelli theorem. The Annals of
Mathematical Statistics, 30(3), 828–830.
Wand, M. P., & Jones, M. C. (1995). Kernel smoothing. London: Chapman and Hall/CRC.
Wang, S. S. (2000). A class of distortion operators for pricing financial and insurance risks. Journal
of Risk and Insurance, 67(1), 15–36.
Yates, F. (1934). Contingency table involving small numbers and the 2 test. Supplement to the
Journal of the Royal Statistical Society, 1(2), 217–235.
Chapter 6
Leveraging Extreme Value Theory

6.1 Introduction

Relying on Guégan and Hassani (2012) proposal, in this chapter, we suggest an


approach to build a data set focusing specifically on extreme events arising from
any risks. We will show that using an alternative approach which focuses on extreme
events may be more relevant and more reliable for risk measurement purposes. We
discuss here the type of information to be considered to be able to behave in the
extreme value theory framework.
The solution is based on the knowledge that has been gained by risk managers
who experience risks on a daily basis from their root causes to their consequences.
Indeed, in a three line of defense configuration, the first line managing the risk, i.e.,
facing these issues, dealing with them, controlling and mitigating these situations
and their corresponding exposures, gathers a lot of experience and understanding
of these problems. Their knowledge of these events leads to the construction of
a new data set which by its analysis and its results may be used in parallel to
more traditional approaches. This statement makes the potential flaws quite obvious,
indeed the more mature the risk framework and the larger number of risk managers,
the better the information gathered, the more reliable the approach. The converse is
unfortunately also valid.
As implied, we first consider the expertise of local risk managers who are the
guardian of the system efficiency, and provide the department responsible for the
permanent control of the system, useful information. Some of them collect the
losses and the incidents, others have in charge deploying some plans to prevent
operational risks, therefore they have a real experience of these risks and are able
to anticipate them. Their opinions incorporate different types of information such
as what behaviours are important to consider, the persistence, the seasonality, the
cycles and so on; how strong is the activity in a specific entity in a particular period;
how efficient and effective are the measures and the mitigants in place to prevent
these risks, etc. We have a real opportunity to use their expertise several times a year

© Springer International Publishing Switzerland 2016 69


B.K. Hassani, Scenario Analysis in Risk Management,
DOI 10.1007/978-3-319-25056-4_6
70 6 Leveraging Extreme Value Theory

either to understand the evolution of the risks, or to estimate a capital allocation or


in a forward looking exercise, i.e., the scenario analysis in our case.
In this approach we are not working with traditional data sets, in the sense that
these are not a combination of losses, incidents or market data, in other words events
which already occurred. These are representative of the risk perception of the market
experts. Besides, collecting the data does not ensure that we will get any extreme
points.
Actually, we may argue that working with historical data sets biases our vision
of extreme events as their frequency is much lower than for regular events (small
and medium sized) and does not reflect the real risk exposures. Consequently,
large losses are difficult to model and analyse with traditional approaches. A
solution stands in modelling extreme events in a specific framework which has been
specifically created to capture them, for instance, considering the generalised Pareto
distribution to model the severities (Pickands, 1975; Coles, 2004; Guégan et al.,
2011), as presented in the previous chapter. Nevertheless, this last method requires
large data sets to ensure the robustness of the estimations. It might be complicated
to fit these distributions whose information is contained in the tails using historical
data, a possibility is to build a scenario data set based on expert opinions.
Traditionally, to assess the scenarios, workshops are organised within financial
institutions as introduced in the fourth chapter of this book. According to the risk
taxonomy consistent with the target entity risk profile, some story lines representing
the major exposures are proposed to a panel of business experts who are supposed to
evaluate them. As for the consensus approach, the session leader may ask the largest
loss that may occur in a number of years, 10, 25, 50, etc. Then, the information
provided can be transformed into distribution percentiles. However, contrary to the
consensus approach, we do not seek an agreement, we are actually more interested
in gathering the individual opinions of each business experts.
Indeed, in this strategy each and every opinion matters and if we are not bound
to use a consensus approach, then we should select this methodology as it tackles at
once almost all the issues identified in the previous chapters.
From a more human behaviour point of view, because of human nature, some
more charismatic person may take over the session and prevent the others from
giving their opinions (see the seniority bias in Chap. 1), whereas their experience
may lead to different assessments, as they may come from different business units.
These facts may be seen as drawbacks, but in fact it is a real opportunity to capture a
lot of information and to have an alternative, creating another set of data to explore
the behaviour of extreme risk events. Why should we eclipse some experts’ opinion?
Indeed, by labelling them experts, we mechanically acknowledge and recognise
their understanding and experience of the risks. Not trusting them would be similar
to not trusting the first officer in a plane and only relying on the captain no matter
what, even if this one cannot fly the plane. It does not make any sense to hire experts
if these are not listened to.
The information obtained from the experts may be heterogeneous because they
do not have the same experience, the same information quality or the same location.
This might be seen as a drawback, but in our case, if justified by the various
6.2 The Extreme Value Framework 71

exposures, this heterogeneity is what we are looking for, up to a certain extent.


In order to reduce the impact of huge biases, we will only keep the maximum value
observed or forecasted for a particular event type occurring on a particular business
unit in a specific period of time (a week, a month, etc.). Therefore, each expert is to
provide several maxima, for each risk class of the approved taxonomy, and also for
different levels of granularity and prespecified horizon.
The objective is to provide risk measures associated with the various risk of
the taxonomy built with these data sets. As soon as we work with sequences of
maxima, we will consider the extreme value theory (EVT) results (Leadbetter et al.,
1983; Resnick, 1987; Embrechts et al., 1997; Haan and Ferreira, 2010) to compute
them. We focus on the theoretical framework that under some regularity conditions,
a series of maxima follows a generalised extreme value (GEV) distribution given
in (6.2.2).1

6.2 The Extreme Value Framework

Extreme value theory is a statistical framework created to deal with extreme


deviations from the median of probability distributions (extreme values). The
objective is to assess, from a given ordered sample of a given random variable,
the probability of extreme events.
Two approaches exist in practice regarding extreme value. The first method
relies on deriving block maxima (minima) series, i.e., the largest value observed
at regular intervals. The second method relies on extracting the peak values reached
during any period, i.e., values exceeding a certain threshold and is referred to as
the “Peak Over Threshold” method (POT) (Embrechts et al., 1997) and can lead
to several or no values being extracted in any given year. This second method is
actually more interesting to fit a generalised Pareto distribution as presented in the
previous chapter. Indeed, using over a threshold, the analysis involves fitting two
distributions: one for the number of events in a basic time period and a second for
the exceedances. The fitting of the tail here can rely on Pickands (1975) and Hill
(1975). We will not focus on this strategy in this chapter.
Regarding the first approach, the analysis relies on a corollary of the results
of the Fisher–Tippett–Gnedenko theorem, leading to the fit of the generalised
extreme value distribution as the theorem relates to the limiting distributions for the
minimum or the maximum of a very large collection of realisation of i.i.d. random
variables obtained from an unknown distribution. However, distributions belonging
to the maximum domain of attraction of this family of distributions might also be
of interest as the number and the type of incident may actually lead to different
distributions anyway.

1
The parameters of the GEV distributions are estimated by maximum likelihood (Hoel, 1962).
72 6 Leveraging Extreme Value Theory

This strategy is particularly interesting as it allows capturing the possible


occurrences of extreme incidents which are high-profile, hard-to-predict, rare events
and beyond normal expectations. The way the workshops are led may help dealing
with psychological biases which may push people to refuse the reality of an
exposure as it never happened or because for them it cannot happen (denial).
Usually, they have the reflex to look at past data, but the fact that it did not happen
before does not mean that it will not happen in the future.

6.2.1 Fisher–Tippett Theorem

In statistics, the Fisher–Tippett–Gnedenko theorem2 is a fundamental result of


extreme value theory (almost the founding result) regarding asymptotic distribution
of extreme order statistics. The maximum of a sample of normalised i.i.d. random
variables converges to one of three possible distributions: the Gumbel distribution,
the Fréchet distribution or the Weibull distribution.
This theorem (enounced below) is to maxima what the central limit theorem is
to averages, though the central limit theorem applies to the sample average of any
distribution with finite variance, while the Fisher–Tippet–Gnedenko theorem only
states that the distribution of normalised maxima converges to a particular class of
distributions. It does not state that the distribution of the normalised maximum does
converge.
We denote X a random variable (r.v.) with a cumulative distribution function
(c.d.f.) F. Let X1 ; : : : ; Xn be a sequence of independent and identically distributed
(i.i.d.) r.v., and let Mn D max.X1 ; : : : ; Xn /. Then, the Fisher and Tippett (1928)
theorem says:
Theorem 6.2.1 If there exists constants cn > 0 and dn 2 R, then
 
Mn  dn d
P  x D F n .cn x C dn / ! H (6.2.1)
cn

for some non-degenerate distribution H . Then H belongs to the generalised


extreme value distribution presented in the following section.

6.2.2 The GEV

While some aspects of extreme value theory have been discussed in the previous
chapter, here we will present its application in a different context and theoretical
framework.

2
Sometimes known as the extreme value theorem.
6.2 The Extreme Value Framework 73

In probability theory and statistics, the generalised extreme value (GEV) distri-
bution (sometimes called the Fisher–Tippet distribution) is a family of continuous
probability distributions developed within extreme value theory combining the
Gumbel, Fréchet and Weibull families also known as type I, II and III extreme value
distributions.
The generalised extreme value distribution has cumulative distribution function
 h  x   i1= 
F.xI ; ; / D exp  1 C (6.2.2)


for 1 C .x  /= > 0, where  2 R is the location parameter,  > 0 the scale
parameter and 2 R the shape parameter. Thus for > 0, the expression just given
for the cumulative distribution function is valid for x >   = , while for < 0 it
is valid for x <  C =. /. For D 0 the expression just given for the cumulative
distribution function is not defined and is replaced taking the limit as ! 0 by,
n  x   o
F.xI ; ; 0/ D exp  exp  ; (6.2.3)

without any restriction on x.
The resulting density function is
 h  x   i1= 
1h  x   i.1= /1
f .xI ; ; / D 1C exp  1 C
  
(6.2.4)

again, for x >   = in the case > 0, and for x <  C =. / in the case < 0.
The density is zero outside of the relevant range. In the case D 0 the density is
positive on the whole real line and equal to

1 h  x   i n h x   io
f .xI ; ; / D exp  exp  exp  : (6.2.5)
  
The first four moments as long as the mode and the median are
• Mean -
8
ˆ .1 /1
< C 
ˆ if ¤ 0; < 1;
C if D 0; (6.2.6)
ˆ
:̂1 if  1;

where  is Euler’s constant.


74 6 Leveraging Extreme Value Theory

• Median -
(  1
 C  .ln 2/ if ¤ 0;
(6.2.7)
   ln ln 2 if D 0:

• Mode -
(  1
 C  .1C / if ¤ 0;
(6.2.8)
 if D 0:

• Variance -
8
ˆ 2 2 2
if ¤ 0; < 12 ;
< .g2  g1 /=
ˆ
2
 2
6 if D 0; (6.2.9)
ˆ
:̂1 if  1
;
2

where gk D .1  k /.
• Skewness -
8
ˆ g3 3g1 g2 C2g31
ˆ
ˆ if > 0;
< .g2 g21 /3=2
g3 3g1 g2 C2g31
ˆ  2 3=2 if < 0; (6.2.10)
ˆ p.g2 g1 /
:̂ 12 6.3/ if D 0:

3

where .x/ is Riemann zeta function


• Excess kurtosis -
8
g4 4g1 g3 C6g2 g21 3g41
ˆ
ˆ 3 if ¤ 0; < 14 ;
< .g2 g21 /2
12 (6.2.11)
ˆ5 if D 0;

1 if  14 :

6.2.3 Building the Data Set

In order to apply the methodology, the first step is to build the data set, considering
for example a banking group which possesses several branches, subsidiaries and
legal entities all over the world. Note that this kind of structure is typical of
what we can find with systematically important financial institutions (SIFIs) or
large insurance companies. In each branch, subsidiary, legal entity or business
unit, the group has experts responsible for managing the risks, the so-called first
line of defense. This methodology is particularly appropriate for operational risk
6.2 The Extreme Value Framework 75

management as the Basel Matrix provides each and every entity with a base
taxonomy of the risks (BCBS, 2004).
We assume that we have i D 1; : : : ; p subsidiaries or branches, each one being
represented by a risk manager. This manager can provide j D 1; : : : ; n quotations
per risk in a year (for instance) or any relevant period of time. Thus, for a given date,
we can have np quotations for a risk type. These quotations can also be obtained for
different level of granularity. Then, these np quotations per risk provide a data set
which corresponds to a sequence we will refer to as a maxima data set (MDS).
Remark 6.2.1 Once the data collection process properly explained to risk managers,
the information can be collected by email or through the risk management system,
they do not necessarily need to meet on a regular basis. Consequently, this
methodology is particularly appropriate for large, complex and global companies
an relatively costless.

6.2.4 How to Apply It?

Given the MDS created in the previous section, we will estimate the parameters of
the GEV distribution whose density is given by Eq. (6.2.4).
As mentioned before, this distribution contains the Fréchet distribution for > 0,
the Gumbel distribution for D 0 and the Weibull distribution for < 0 (Fisher
and Tippett, 1928; Gnedenko, 1943). Therefore, the shape parameter governs the
tail behaviour of the distribution. The sub-families defined above have the following
cumulative distribution functions:
Gumbel or type I extreme value distribution ( D 0)
.x/=
F.xI ; ; 0/ D ee for x 2 R: (6.2.12)

Fréchet or type II extreme value distribution, if D ˛ 1 > 0


(
0 x
F.xI ; ; / D (6.2.13)
..x/= /˛
e x > :

Reversed Weibull or type III extreme value distribution, if D ˛ 1 < 0


( ˛
e..x/= / x<
F.xI ; ; / D (6.2.14)
1 x

where  > 0.
Remark 6.2.2 Though we are working with maxima, the theory is equally valid for
minima. Indeed, a generalised extreme value distribution can be fitted the same way.
76 6 Leveraging Extreme Value Theory

Remark 6.2.3 Considering the variable change t D   x, the ordinary Weibull


distribution is mechanically obtained. Note that the change of variable provides a
strictly positive support. This is due to the fact that the Weibull distribution usually
preferred to deal with minima. The distribution has an additional parameter and is
transformed so that the distribution has an upper bound rather than a lower bound.
Remark 6.2.4 In terms of support specificity, the Gumbel distribution is unlimited,
the Fréchet distribution has a lower limit, while the GEV version of the Weibull
distribution has an upper limit.
Remark 6.2.5 It is interesting to note that if > 1 in (6.2.2), then the distribution
has no first moment, as for the GPD presented in the previous chapter. This property
is fundamental in the applications, because in these latter cases we cannot use
the GEV—in our application as some of the risk measure cannot be calculated.
Therefore, we have to pay attention to the value of the parameter ( ).3
It is interesting to note that the distributions might be linked. Indeed, assuming
a type II cumulative distribution function of a random variable X, with positive
support, i.e., F.xI 0; ; ˛/, then the cumulative distribution function of ln X is of type
I, with the following form F.xI ln ; 1=˛; 0/. Similarly, if the cumulative distribution
function of X is of type III, and negative support, i.e., F.xI 0; ; ˛/, then the
cumulative distribution function of ln.X/ is of type I, with the following form
F.xI  ln ; 1=˛; 0/.
Besides, as stated earlier, many distributions are related to the GEV:
• If X  GEV.; ; 0/, then mX C b  GEV.m C b; m; 0/
• If X  Gumbel.; / (Gumbel distribution), then X  GEV.; ; 0/
• If X  Weibull.; / (Weibull distribution), then  1  ln X 
GEV.; ; 0/
• If X  GEV.; ; 0/, then  exp. X

/  Weibull.; / (Weibull distribu-
tion)
• If X  Exponential.1/ (Exponential distribution), then    ln X 
GEV.; ; 0/
• If X  GEV.˛; ˇ; 0/ and Y  GEV.˛; ˇ; 0/ , then X  Y  Logistic.0; ˇ/
(Logistic distribution)
• If X  GEV.˛; ˇ; 0/ and Y  GEV.˛; ˇ; 0/ , then X C Y  Logistic.2˛; ˇ/

3
The estimation procedure is a very important aspect of the approach. Under regular conditions
the maximum likelihood estimate can be unbiased, consequently, if it is possible to use it, it will
not make any sense opting for another approach. Unfortunately, this approach may lead to an
infinite estimated mean model. To avoid this problem we can use a “probability weighted moment”
estimation approach, as this would have enabled constraining the shape parameter within Œ0; 1.
But, as discussed in the following sections we will see that estimation procedures are not the main
problem because they are linked to the information set used.
6.3 Summary of Results Obtained 77

6.3 Summary of Results Obtained

In this section, the main results obtained in Guégan and Hassani (2012) are
summarised to illustrate the approach.4
In this paper, the information provided by the expects is sorted according to the
Basel taxonomy for operational risk which has three level of granularity (BCBS,
2004).
• In a first risk category of the Basel Matrix, for instance, the “Payment and
Settlement”/“Internal Fraud” cell, the estimated value for is 4:30 for the first
level of granularity. Consequently, this estimated GEV distribution has an infinite
mean and is therefore inapplicable. Working on the second level of granularity,
even if the value decreases, it remains larger than 1 and therefore the fitted
GEV distribution cannot be used for risk management purposes, or at least the
outcomes might be very complicated to justify. This means that we need to
consider a lower level of granularity to conclude: the third one, for instance.
Unfortunately, this information set is not available for the present exercise. So
the methodology is not always applicable, particularly if the data are not adapted.
• The second application is far more successful. Indeed the application to the
“Retail Banking”/“Clients, Products and Business Practices/Improper Business
or Market Practice” cell, disaggregating the data set from the first to the
subsequent level of granularity, i.e., from the highest level of granularity to the
lowest, the value of increases from D 0:02657934 to D 0:04175584 for
the first subcategory, D 3:013321 for the second subcategory, D 0:06523997
for the fourth subcategory and D 0:08904748 for the fifth. Again, the influence
of the data set built for estimation’s purpose is highlighted. The aggregation of
different risk natures—the definition behind this sub-event covers many kinds of
incidents—in a single cell cannot permit to provide an adequate risk measure.
For the first level of granularity, is less than 1 and this is probably due to the
fact that the corresponding information set is biased by the combination of data.
In this specific case, we have four cells in the second level of granularity for
which some quotations are available, i.e., the bank may consider that some major
threats may arise from these categories, as the result, working at a lower level of
granularity tends to make sense. Note that the data for the third subcategory at
the second level of granularity were not available.
• In a successful third case, the methodology has been applied to the cell “Payment
and Settlement”/“Execution, Delivery and Process Management”. In this case,
D 2:08 for the first level of granularity, and D 0:23 for the subcategory
quoted at the next level, i.e., the “Payment and Settlement”/“Vendors and
Suppliers” cell. Note that some cells are empty, because the banks top risk
managers dealt with these risks in different ways and did not ask quotations to the
risk managers. In these situations, we would recommend switching to alternative

4
Note that this methodology has been tested and/ or is used in multiple large banking group.
78 6 Leveraging Extreme Value Theory

methodologies. We also noted in our analysis that the shape parameter was
positive in all cases, thus the quotations’ distributions follow Fréchet distributions
given in relationship (6.2.2).
Thus, using MDS from different cells permit to anticipate incidents, losses,
corresponding capital requirements and prioritise key management decisions to be
undertaken. Besides it shows the necessity to have precise information.
In the summarised piece of analysis, comparing the risk measures obtained using
experts opinion with the ones obtained from the collected losses using the classical
loss distribution approach (LDA) (Lundberg, 1903; Frachot et al., 2001; Guégan and
Hassani, 2009), we observe that even focusing on extreme losses, the methodology
proposed in this chapter does not always provide larger risk measures than those
obtained implementing more traditional approaches. This outcome is particularly
important as it means that using an appropriate framework even focusing on extreme
events does not necessarily imply that the risk measures will be higher. This tackles
one of the main clichés regarding the over conservativeness of the EVT and risk
managers should be aware of that feature.
On the other hand, the EVT approach vs the LDA (Frachot et al., 2001) which
relies on past incidents, even if the outcomes may vary, the ranking of these ones
with respect to the class of incidents is globally maintained. Regarding the volatility
between the results obtained from the two methods, we observe that the experts tend
to provide quotations embedding the entire information available at the moment
they are giving their quotations as well as their expectations, whereas historical
information sets are biased by the delays between the moment an incident occurred,
is detected and the moment it has been entered in the collection tool.
Another reason explaining the differences between the two procedures is the
fact that experts anticipate the loss maximum values with respect to the internal
policy of risk management, such that the efficiency of the operational risk control
system, the quality of the communication from the top management or the lack
of insight regarding a particular risk, or the effectiveness of the risk framework.
For example, on the “Retail Banking” business line for the “Internal Fraud” event
type, a VaR of 7,203,175 euros using experts opinions is obtained against a VaR of
190,193,051 euros with the LDA. The difference between these two amounts may
be interpreted as a failure inside the operational risk control system to prevent these
frauds.5
The paper summarised highlighted the importance to consider an a priori
knowledge of the experts associated with an a posteriori backtesting based on
collected incidents.

5
Theoretically, the two approaches (Experts vs. LDA) are different, therefore this way of thinking
may be easily challenged, nevertheless it might lead practitioners to question their system of
control.
References 79

6.4 Conclusion

In this chapter, a new methodology based on experts opinions and extreme value
theory to evaluate risks has been developed. This method does not suffer from
numerical methods and provide analytical risk measures, though GEV’s parameters
estimation might sometimes be challenging.
With this method, practitioner’s judgements have been transformed into com-
putational values and risk measures. The information set might only be biased by
people’s personality, risk aversion and perception, but not by obsolete data. It is
clear that these values include an evaluation of the risk framework and might be
used to evaluate how the culture is embedded.
The potential unexploitability of the GEV ( > 1) may just be caused by the fact
that several risk types are mixed in a single unit of measure, for example, “Theft
and Fraud” and “System Security” within the “External Fraud” event type. But
from splitting the data set some other challenges may appear, as this will require a
procedure to deal with the dependencies, such as the approach presented in Guégan
and Hassani (2013).
However, it is important to bear in mind that the reliability of the results mainly
depends on the risk management quality and particularly on the risk managers
capability to work as a team.

References

BCBS. (2004). International convergence of capital measurement and capital standards. Basel:
Bank for International Settlements.
Coles, S. (2004). An introduction to statistical modeling of extreme values. Berlin: Springer.
Embrechts, P., Klüppelberg, C., & Mikosh, T. (1997). Modelling extremal events: For insurance
and finance. Berlin: Springer.
Fisher, R. A., & Tippett, L. H. C. (1928). Limiting forms of frequency distributions of the largest or
smallest member of a sample. Proceedings of the Cambridge Philological Society, 24, 180–190.
Frachot, A., Georges, P., & Roncalli, T. (2001). Loss distribution approach for operational risk.
Working paper, GRO, Crédit Lyonnais, Paris.
Gnedenko, B. V. (1943). Sur la distribution limite du terme d’une série aléatoire. Annals of
Mathematics, 44, 423–453.
Guégan, D., & Hassani, B. K. (2009). A modified Panjer algorithm for operational risk capital
computation. The Journal of Operational Risk, 4, 53–72.
Guégan, D., & Hassani, B. K. (2012). A mathematical resurgence of risk management: An extreme
modeling of expert opinions. To appear in Frontier in Economics and Finance, Documents de
travail du Centre d’Economie de la Sorbonne 2011.57 - ISSN:1955-611X.
Guégan, D., & Hassani, B. K. (2013). Multivariate VaRs for operational risk capital computa-
tion: A vine structure approach. International Journal of Risk Assessment and Management
(IJRAM), 17(2), 148–170.
Guégan, D., Hassani, B. K., & Naud, C. (2011). An efficient threshold choice for the computation
of operational risk capital. The Journal of Operational Risk, 6(4), 3–19.
Haan, L. de, & Ferreira, A. (2010). Extreme value theory: An introduction. Springer Series in
Operations Research and Financial Engineering. New York: Springer.
80 6 Leveraging Extreme Value Theory

Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. Annals
of Statistics, 3, 1163–1174.
Hoel, P. G. (1962). Introduction to mathematical statistics (3rd ed.). New York: Wiley.
Leadbetter, M. R., Lindgren, G., & Rootzen, H. (1983). Extreme and related properties of random
sequences and series. New York: Springer.
Lundberg, F. (1903). Approximerad framställning av sannolikhetsfunktionen Aterförsäkring av
kollektivrister. Uppsala: Akad. Afhandling. Almqvist och Wiksell.
Pickands, J. (1975). Statistical inference using extreme order statistics. Annals of Statistics, 3, 119–
131.
Resnick, S. I. (1987). Extreme values, regular variation, and point processes. New York: Springer.
Chapter 7
Fault Trees and Variations

In order to analyse the process leading to a failure, we have seen various strategies.
In this chapter we are presenting another approach which is also very intuitive and
would obtain business buy-in as it is by design, built and informed by risk owners:
the fault tree analysis (FTA) (Barlow et al., 1975; Roberts et al., 1981; Ericson,
1999a; Lacey, 2011). This methodology relies on a binary system which makes the
underlying mathematics quite simple and easy to implement.
Therefore, the FTA is a top down, deductive (and not inductive) failure analysis
in which an undesired state of a system is analysed using Boolean logic to combine
a series of lower-level events (DeLong, 1970; Larsen, 1974; Martensen and Butler,
1975; FAA, 1998). This methodology is mainly used in fields of both safety and
reliability engineering to analyse how systems may fail, to mitigate and manage the
risks or to determine event rates of a safety accident or a particular system level
failure. This methodology is directly applicable to financial institutions (Benner,
1975; Andrews and Moss, 1993; Vesely, 2002; Lacey, 2011).
To be more specific regarding how FTA can be used the following enumeration
should be enlightening:
1. understand the logic, events and conditions as well as their relationships leading
to an undesired event (i.e. root cause analysis (RCA)).
2. show compliance with the system safety and reliability requirements.
3. identify the sequence of causal factors leading to the top event.
4. monitor and control the safety performance to design safety requirements.
5. optimise resources.
6. assist in designing a system. Indeed, the FTA may be used to design a system
while identifying the potential causes of failures.
7. identify and correct causes of the undesired event. The FTA is a diagnosis tool.
8. quantify the exposure by calculating the probability of the undesired event (risk
assessment).

© Springer International Publishing Switzerland 2016 81


B.K. Hassani, Scenario Analysis in Risk Management,
DOI 10.1007/978-3-319-25056-4_7
82 7 Fault Trees and Variations

Any complex system is subject to potential failures as a result of subsystems


failing. However the likelihood and the magnitude of a failure can often be
mitigated by improving the system design. FTA allows drawing the relationships
between faults, subsystems and redundant safety design elements by creating a logic
diagram of the overall system. Note that FTA has a global coverage, i.e., it permits
dealing with failures, fault events, normal events, environmental effects, systems,
subsystems, system components (hardware, software, human and instructions),
timing (mission time, single phase and multi phase) and repair.
The undesired outcome is located at the top of a tree such as, for example, the
fact that there is no light in the room. Working backward it is possible determine
that this could happen if the power is off or the lights are not functioning. This
condition is a logical OR. Considering the branch analysing when the power is
off, this may happen if the network is down or if a fuse is burnt. Once again,
we are in the presence of another logical OR. On the other part of the tree, the
lights might be burnt. Assuming there are three lights, these should all be burnt
simultaneously, i.e., light 1, light 2 and light 3 are burnt. Here the relationship
takes the form of a logical AND. When fault trees events are associated with failure
probabilities, it is possible to calculate the likelihood of the undesired event to occur.
When a specific event impacts several subsystems, it is called a common cause
(or common mode). On the diagram, this event will appear in several locations of
the tree. Common causes mechanically embed dependencies between events. The
computation of the probabilities in a tree containing common causes is slightly
more complicated than in trees for which events are considered independent. To
avoid creating some confusion, we will not address that issue in this chapter and
refers to the bibliography provided.
The diagram is usually drawn using conventional logic gate symbols. The path
between an event and a causal factor in the tree is called a cut set. The shortest
credible way from the fault to the triggering event is usually referred to as a minimal
cut set.
Some industries use both fault trees and event trees. An event tree starts from
an undesired causal issue and climbs up a tree to a series of final consequences
(bottom-up approach). Contrary to the FTA, an event tree is an inductive process for
investigation and therefore may be used for scenario analysis (Woods et al., 2006),
though the problem here we do not know a priori which extreme event we want to
analyse as this one may change while we are climbing the tree.

7.1 Methodology

In a fault tree, events are associated with probabilities, e.g., a particular failure may
occur at some constant rate . Consequently, the probability of failure depends on
the rate and the moment of occurrence t:

P D 1  exp. t/ (7.1.1)

P  t, t < 0:1.
7.2 In Practice 83

Fault trees are generally normalised to a given time interval.


Unlike traditional logic gate diagrams in which inputs and outputs may only take
TRUE or FALSE values, the gates in a fault tree output probabilities are related to
the set operations of Boolean logic. Given a gate, the probability of the output event
depends on the probabilities of the inputs.
An AND gate is a combination of independent events, i.e., the probability of any
input event to an AND gate is not impacted by any other input linked to the same
gate. The AND gate is equivalent to an intersection of sets in mathematics, and the
probability is given by:

P.A \ B/ D P.A/P.B/ (7.1.2)

An OR gate can be represented by a union of sets:

P.A [ B/ D P.A/ C P.B/  P.A \ B/ (7.1.3)

If the probabilities of a failure on fault trees are very small (negligible), P.A \ B/
may be discarded in the calculations.1 As a result, the output of an OR gate may be
approximated by:

P.A [ B/  P.A/ C P.B/; P.A \ B/  0; (7.1.4)

assuming that the two sets are mutually exclusive. An exclusive OR gate represents
the probability that one or the other input, but not both, occurs:

N D P.A/ C P.B/  2P.A \ B/


P.A[B/ (7.1.5)

As before, if P.A \ B/ is considered negligible, it might be disregarded. Conse-


quently, the exclusive OR gate has limited value in a fault tree.

7.2 In Practice

7.2.1 Symbols

The basic symbols used in FTA are grouped as events, gates and transfer symbols
(Roberts et al., 1981).
Remark 7.2.1 Depending on the software used these symbols may vary as they
may have been borrowed from alternative approaches to represent causality such
as circuit diagrams.

1
It becomes an error term.
84 7 Fault Trees and Variations

Undeveloped
Basic Event External Event
Event

Conditioning Intermediate
Event Event

Fig. 7.1 Event taxonomy

Event symbols are used for primary events and intermediate (or secondary)
events. Primary events are not developed any further on the fault tree. Intermediate
events are located after the output of a gate. The event symbols are represented in
Fig. 7.1.
The primary event symbols are typically used as follows:
• Basic event—failure or error root.
• External event—exogenous impact (usually expected).
• Undeveloped event—an event for which we do not have enough information or
which has no impact on our analysis of the main problem.
• Conditioning event—conditions affecting logic gates.
Gate symbols describe the relationship between input and output events. The
symbols are derived from Boolean logic symbols (Parkes, 2002; Givant and Halmos,
2009). These are represented in Fig. 7.2.
The gates work as follows:
• OR gate—the output occurs if any input occurs
• AND gate—the output occurs only if all inputs occur
• Exclusive OR gate—the output occurs if exactly one input occurs
• Priority AND gate—the output occurs if the inputs occur in a specific sequence
specified by a conditioning event
• Inhibit gate—the output occurs if the input occurs under an enabling condition
specified by a conditioning event.
In a first step, it is necessary to explain the difference between a fault and a
failure. A failure is related to a basic component, it is the result of an internal
7.2 In Practice 85

Exclusive OR
OR Gate AND Gate
Gate

Priority AND Inhibit Gate


Gate

Transfer In Transfer Out

Fig. 7.2 Gate taxonomy

mechanism pertaining to the component in question, while a fault corresponds to the


undesired state of a component, resulting from a failure, a chain of failures and/or
chain of faults which can be further broken down. It is important to note that the
component may function correctly but at the wrong time, potentially engendering
itself a bigger issue (Roland and Moriarty, 1990).
A primary event (fault or failure) is an issue that cannot be defined further down
the tree, i.e., at a lower level. A secondary event (fault or failure) is a failure that
can be defined at a lower level but not in details. A command fault/failure is a fault
state that is commanded by an upstream fault/failure such as a normal operation of
a component in an inadvertent or untimely manner. In other words, the normal but
undesired state of a component at a particular point in time.
To clarify subsequent readings of the bibliography provided for instance, we
define in the following paragraphs some other terms that are traditionally used such
as multiple occurring event (MOE) or failure mode that occurs in more than one
place in the fault tree, also known as a redundant or repeated event. A multiple
occurring branch (MOB) is a tree branch that is used in more than one place in
the fault tree. All of the basic events within the branch would actually be multiple
occurring events. A branch is a subsection of the tree, similar to a limb on a real tree.
A module is a subtree or branch. An independent subtree that contains no outside
MOE or MOB and is not an MOB itself.
Regarding the cut set terms, a cut set is a set of events that together cause the
tree top undesired event to occur, the minimal cut set (MCS) is characterised by the
minimum number of events that can still cause the top event. A super set is a cut
set that contains an MCS plus additional events to cause the top undesired event.
The critical path is the highest probability cut set that drives the top undesired event
probability. The cut set order is the number of components in a cut set. A cut set
86 7 Fault Trees and Variations

truncation is the fact of not considering particular segments during the evaluation
of the fault tree. Cut sets are usually truncated when they exceed a specific order
and/or probability.
A transfer event indicates a subtree branch that is used elsewhere in the tree.
A transfer always involves a gate event node on the tree, and is symbolically
represented by a triangle. The transfer has various purposes such as (1) starts a
new page (for plots), (2) indicates where a branch is used in various places in the
same tree, but is not repeatedly drawn (internal transfer) (MOB) and (3) indicates
an input module from a separate analysis (external transfer).
Transfer symbols are used to connect the inputs and outputs of related fault trees,
such as the fault tree of a subsystem to its system. Figure 7.3 exhibits an example of
simple FTA regarding a building on fire.

7.2.2 Construction Steps

The construction of a fault tree is an iterative process, which has 6 clearly defined
steps, for instance (Ericson, 1999b):
1. Review the gate event under investigation
2. Identify all the possible causes of this event and ensure that none are missed
3. Identify the cause–effect relationship for each event
4. Structure the tree considering your findings
5. Ensure regularly that identified events are not repeated
6. Repeat the process for the next gate.
While informing each gate node involves a three steps:
• Step 1—Immediate, necessary and sufficient (INS)
• Step 2—Primary, secondary and command (PSC)
• Step 3—State of the system or component.
Analysing this first step in detail, the question to be answered is are the
factors INS to cause the intermediate event? Immediate means that we do not
skip past events, necessary means that we only include what is actually necessary
and sufficient means that we do not include more than the minimum necessary.
Regarding the second step, it is necessary to consider the fault path for each enabling
event and identify each causing event identifying if they are primary fault, secondary
faults or command faults (or even induced fault or sequential fault). Then, it is
possible to structure the subevents and gate logic from the path type. Finally, the
third step requires answering the question is the intermediate event a state of the
system or a state of the component. If it is a “state of the component” we are at the
lowest level of that issue, while if the answer to the previous question is “state of
the system”, this implies subsequent or intermediate issues.
7.2 In Practice 87

Fig. 7.3 Simple fault tree: this fault tree gives a simplified representation of what could lead to
a building on fire. In this graph, we can see that the building is on fire if and only if a fire has
been triggered, the safety system malfunctioned and the doors have been left open. Analysis the
“Fire Triggered” node located in the upper right part of the diagram, this one results from three
potential issues, for instance, a faulty electrical appliance, someone smoking in the building or an
arsonist, while the safeguard system is not functioning if the smoke alarms are not going off or the
fire extinguishers are not functioning
88 7 Fault Trees and Variations

7.2.3 Analysis

An FTA can be modelled in different manners, the usual way is summarised below.
A single fault tree permits analysing only one undesired event but this one may be
subsequently fed into another fault tree as a basic event. Whatever the nature of the
undesired event, an FTA is applicable as the methodology is universal.
FTA analysis involves five steps (note that each and every steps should be
properly documented):
1. Define the undesired event to study
• Identify the undesired event to be analysed, and draft the story line leading to
that event.
• Analyse the system and the threat. i.e. what might be the consequences of the
materialisation of the undesired event. This step is necessary to prioritise the
scenarios to be analysed.
2. Obtain an understanding of the system
• Obtain the intermediate probabilities of failure to be fed into the fault tree in
order to evaluate the likelihood of materialisation of the undesired event.
• Analyse the courses, i.e., the critical path, etc.
• Analyse the causal chain, i.e. obtain a prior understanding of what conditions
are necessary and intermediate events have to occur to lead to the materialisa-
tion of the undesired event.
3. Construct the fault tree
• Replicate the causal chain identified in the previous step of the analysis, from
the basic events to the top
• Use the appropriate gates where necessary, OR, AND etc. (see section 7.2.1.)
4. Evaluate the fault tree
• Evaluate the final probability of the undesired event to occur
• Analyse the impact of dealing with the causal factors. This is a “what if” stage
during which we identify the optimal positioning of the controls.
5. Control the hazards identified
• Key management actions (what controls should be put in place).
The implementation of appropriate key management actions is the end game of a
proper scenario analysis. The objective is to manage and if possible, mitigate the
potential threats.
7.2 In Practice 89

7.2.4 For the Manager

The main positive aspects of FTA are the following:


• It’s a visual model representing cause/effect relationships
• It is easy to learn, do and follow and consequently easy to present to the senior
management of the financial institution.
• It models complex system relationships in an understandable manner
– Follows paths across system boundaries
– Combines hardware, software, environment and human interaction
• As presented in section 7.1, it is a simple probability model.
• Is scientifically sound
– Boolean algebra, logic, probability, reliability
– Physics, chemistry and engineering
• Commercial softwares are available and are generally not too costly
• FT’s can provide value despite incomplete information
• It is a proven technique.
However, these methodologies should not be considered as
• a hazard analysis, as this approach is deductive and not inductive targeting the
root cause. This may seem obvious considering that the methodology is top
down.
• a failure mode and effects analysis (FMEA) which is a bottom-up single thread
analysis.
• an un-reliability analysis. It is not an inverse success tree.
• a model of all system failures as it only includes issues and failures relevant with
respect to the analysis of the top event.
• a absolute representation of the reality too. It is only the representation of a
perception of the reality.
Alternatives are actually presented in the next sections.

7.2.5 Calculations: An Example

In this subsection, the objective is to outline the calculations, i.e., to evaluate the
probability of the top event to occur assuming the probabilities of the bottom events
are known. We use the fault tree presented in Fig. 7.3. Let’s assume the trigger events
in the bottom have the following probabilities:
• Outdated fire extinguisher: 1e106
• Faulty fire extinguisher: 1e106
• Battery remained unchecked: 1e106
90 7 Fault Trees and Variations

• Faulty smoke detector: 1e106


• Doors left open: 1e105
• Unapproved device plugged: 1e105
• Approved device untested: 1e106
• Employee smoked in the building: 7e106
• Arsonist: 3e106
Therefore applying the formulas provided above, the likelihood of having:
• A fire extinguisher not functioning = P(Outdated fire extinguisher) + P(Faulty
fire extinguisher) = 1e106 + 1e106 = 2e106
• A smoke detector not functioning = P(Battery remained unchecked) + P(Faulty
smoke detector) = 1e106 + 1e106 = 2e106
• A faulty electrical device = P(Unapproved device plugged) + P(Approved Device
untested) = 1e105 + 1e106 = 11e106
Moving to the next level, the likelihood of having:
• A fire undetected and unattacked = P(fire extinguisher not functioning) + P(A
smoke detector not functioning) = 2e106 + 2e106 = 4e106
• A fire triggered = P(Arsonist) + P(Employee smoked in the building) + P(A faulty
electrical device) = 3e106 + 7e106 + 11e106 = 21e106
Therefore, the likelihood of having a fire spreading and therefore having a building
on fire is given by:
• Building on fire = P(A fire triggered) x P(A fire undetected and unattacked) x
P(Doors left open) = 4e106 x 21e106 x 1e105 = 8:4e10  16

7.3 Alternatives

As discussed before, the FTA is a deductive methodology, in other words it is a


top-down method aiming at analysing the effects of initiating faults or failures and
events on a top and final incident given a complex system. This differs from various
alternatives that are briefly introduced in the following for consideration, such as the
FMEA, which is an inductive, bottom-up analysis method which aims at analysing
the effects of single component or function failures on equipment or subsystems,
the dependence diagram (DD) (also known as reliability block diagram (RBD)
or success tree analysis), the RCA, why-because analysis (WBA) or Ishikawa
diagrams.
7.3 Alternatives 91

7.3.1 Failure Mode and Effects Analysis

The FMEA might be used to systematically analyse postulated component failures


and identify the resultant effects on system operations. The analysis might be
represented by a combination of two sub-components, the first being the FMEA,
and the second, the criticality analysis (Koch, 1990). All significant failure modes
for each element of the system should be included for the system to be reliable.
FMEA primary benefit is the early identification of all critical system failure and
these can be mitigated by modifying the design at the earliest; therefore, the FMEA
should be done at the system level initially and later extended to lower levels.
The major benefits of FMEA are the following:
• Maximise the chance of designing a successful process.
• Assessing potential failure mechanisms, failure modes and their impacts allow
ranking them according to their severity and their likelihood of occurrence. This
leads to the prioritisation of the issues to be dealt with.
• Early identification of single points of failure critical to the success of a project
or a process, for instance.
• Appropriate method to evaluate controls effectiveness.
• “In-flight” issue identification and troubleshooting procedures.
• Criteria for early planning of tests.
• Easy to implement.

7.3.2 Root Cause Analysis

RCA aims at solving problems by dealing with their origination (Wilson et al., 1993;
Vanden Heuvel et al., 2008; Horev, 2010; Barsalou, 2015). A root cause defines
itself by the fact that if it is removed from a causal sequence, the final undesirable
event does not occur; whereas a causal factor affects an event’s outcome, but is not a
root cause as it does not prevent the undesired event from occurring. Though dealing
with a causal factor usually benefits an outcome, such as reducing the magnitude of
a potential loss, it does not prevent it. Note that several measures may effectively
deal with root causes.
RCA allows methodically identifying and correcting the root causes of events,
rather than dealing with the symptoms. Dealing with root causes has for ultimate
objective to prevent problem recurrence. However, RCA users acknowledge that the
complete prevention of a corrective action might not always be achievable.
The analysis is usually done after an event has occurred, therefore the insights in
RCA make it very useful to feed a scenario analysis process. It is indeed compatible
with the other approaches presented in this book. RCA can be used to predict a
failure and is a prerequisite to manage the occurrence effectively and efficiently.
92 7 Fault Trees and Variations

The general principles and usual goal of the RCA are the following:
1. to identify the factors leading to the failure: magnitude, location, timing,
behaviours, actions, inactions or conditions.
2. to prevent recurrence of similar harmful outcomes, focusing on what has been
learnt from the process.
3. RCA must be performed systematically as part of an investigation. Root causes
identified must be properly documented.
4. The best solution to be selected is the one that is the most likely to prevent the
recurrence of a failure at the lowest cost.
5. Effective problem statements and event descriptions are a must to ensure the
appropriateness of the investigations conducted.
6. Hierarchical clustering data-mining solutions can be implemented to capture root
causes (see Chap. 3).
7. The sequence of events leading to the failures should be clearly identified,
represented and documented to support the most effective positioning of controls.
8. Transform a reactive culture into a forward-looking culture (see Chap. 1).
However, the cultural changes implied by the RCA might not be welcome gently
as it may lead to the identification of personnel’s accountability. The association
of the RCA with a no blame culture might be required as well as a strong
sponsorship (see Chap. 4).
The quality of RCA depends on the data quality as well as its capability to
use them and transform the outcome into management actions. One of the main
issues that RCA may suffer is the so-called analyst bias, i.e., the selection and
the interpretation of the data supporting a prior opinion. The process transparency
should be ensured to avoid that problem. Note that RCA, as most of the factor
models presented in this book, are highly data consuming (Shaqdan et al., 2014).
However, the RCA is not necessarily the best approach to estimate the likelihood
and the magnitudes of future impacts.

7.3.3 Why-Because Strategy

The why-because analysis has been developed to analyse accidents (Ladkin and
Loer, 1998). It is an a posteriori analysis which aims at ensuring objectivity,
verifiability and reproducibility of results. A why-because graph presents causal
relationships between factors of an accident. It is a directed acyclic graph in which
the factors are represented by nodes and relationships between factors by directed
edges.
“What?” is always the first question to ask. It is usually quite easy to define as
the consequences are understood. The following steps are an iterative process to
determine each and every potential causes. Once the causes of the accident have
been identified, formal tests are applied to all potential cause–effect relationships.
7.3 Alternatives 93

This process can be broken down for each cause identified until the targeted level is
reached, such as the level of granularity the management can have an effect on.
Remark 7.3.1 For each node, each contributing cause must be a necessary condition
to cause the accident, while all of causes taken together must be sufficient to cause it.
In the previous paragraph, we mentioned the use of some tests to evaluate how
necessary the potential causes are necessary or sufficient. Indeed, the counterfactual
test addresses the root character of the cause, i.e., is the cause necessary for the
incident to occur. Then, the causal sufficiency test deals with the combination of
causes and aims at analysing whether a set of causes are sufficient for an incident
to occur, and therefore help identifying missing causes. Causes taken independently
must be necessary, and all causes taken together must be sufficient.
This solution is straightforward and may support the construction of scenarios,
but it might not be particularly efficient to deal with situations that never crystallised.
Good illustration of WBAs can be found in Ladkin (2005)

7.3.4 Ishikawa’s Fishbone Diagrams

Ishikawa diagrams are causal diagrams depicting the causes of a specific event
created by Ishikawa (1968) for quality management purposes. Ishikawa diagrams
are usually used to design a product and to identify potential factors causing a bigger
problem. As illustrated this methodology can easily be extended to operational risk
or conduct risk scenario analysis, for example. Causal factors are usually sorted into
general categories. These traditionally include
1. People: Anyone involved in the process.
2. Process: How the process is performed- policies, procedures, rules, regulations,
laws, etc.
3. Equipment: Tools required to achieve a task.
4. Materials: Raw materials used to produce the final product (in our case these
would be risk catalysts).
5. Management and measurements: Data used to evaluate the exposure.
6. Environment: The conditions to be met so the incident may happen.
Remark 7.3.2 Ishikawa’s diagram is also known as a fishbone diagram because of
its shape, similar to the side view of a fish skeleton.
Cause-and-effect diagrams are useful to analyse relationships between multiple
factors, and the analysis of the possible causes provides additional information
regarding the processes behaviour. As in Chap. 4, potential causes can be defined
in workshops. Then, these groups can then be labeled as categories of the fishbone,
in our case, we used the traditional ones to illustrate what the analysis of a fire
exposure would look like (Fig. 7.4).
94 7 Fault Trees and Variations

Equipment Process People

Fire Extinguisher People Smoking Arsonist

Smoke Detector Doors left open Employee

Fire doors Plugging electric device Fire response unit

Security

Building on Fire

Carpet Combustants Calibration

Papers Humidity Triggers

Devices' plastic Tempeture Scenarios

Measurement &
Materials Environment
Management

Fig. 7.4 Ishikawa diagram illustration

7.3.5 Fuzzy Logic

In this section, we present a methodology that has been widely used at the early
stages of scenario analysis for risk management: fuzzy logic. In fuzzy logic, values
representing the “truth” of a variable is a real number lying between 0 and 1
contrary to Boolean logic in which the “truth” can only be represented by 0 or
1. The objective is to capture the fact that the “truth” is a conceptual objective and
can only be partially reached, and therefore the outcome of an analysis may range
between completely true and completely false (Zadeh, 1965; Biacino and Gerla,
2002; Arabacioglu, 2010).
Classical logic does not permit to capture situation in which answers may vary,
in particular when we are dealing with people’s perceptions, and only a spectrum
of answers may lead to a consensual “truth”, which should converge to the “truth”.
This approach makes a lot of sense, when we only have a partial information at our
disposal.
Most people are instinctively apply “fuzzy” estimates in daily situation, based
upon previous experience, to determine how to park their car in a very narrow space,
for example.
References 95

Fuzzy logic systems can be very powerful when input values are not available
or are not trustworthy, and can be used and adapted in a workshop such as those
described in Chap. 4, as this method aims for a consensus.
Cipiloglu Yildiz (2008) provides the following algorithm to implement a fuzzy
logic:
1. Define the linguistic variables, i.e., variable that represents some characteristics
of an element (color, temperature, etc.). This variable takes words as values.
2. Build the membership functions which represents the degree of truth.
3. Design the rulebase i.e. the set of rules, such as IF-THEN rules etc.
4. Convert input data into fuzzy values using the membership functions.
5. Evaluate the rules in the rulebase.
6. Combine the results of each rule evaluated in the previous step.
7. Convert back the output data into non-fuzzy values so these can be used for
further processing or management in our case.

References

Andrews, J. D., & Moss, T. R. (1993). Reliability and risk assessment. London: Longman Scientific
and Technical.
Arabacioglu, B. C. (2010). Using fuzzy inference system for architectural space analysis. Applied
Soft Computing, 10(3), 926–937.
Barlow, R. E., Fussell, J. B., & Singpurwalla, N. D. (1975). Reliability and fault tree analysis,
conference on reliability and fault tree analysis. UC Berkeley: SIAM Pub.
Barsalou, M. A. (2015). Root cause analysis: A step-by-step guide to using the right tool at the
right time. Boca Raton: CRC Press/Taylor and Francis.
Benner, L. (1975). Accident theory and accident investigation. In Proceedings of the Society of Air
Safety Investigators Annual Seminar.
Biacino, L., & Gerla, G. (2002). Fuzzy logic, continuity and effectiveness. Archive for Mathemat-
ical Logic, 41(7), 643–667.
Cipiloglu Yildiz, Z. (2008). A short fuzzy logic tutorial. http://cs.bilkent.edu.tr/~zeynep.
DeLong, T. (1970). TA fault tree manual. (Master’s thesis) Texas A and M University.
Ericson, C. (1999a). Fault tree analysis - a history. In Proceedings of the 17th International Systems
Safety Conference.
Ericson, C. A., (Ed.) (1999b). Fault tree analysis. www.thecourse-pm.com.
FAA. 1998. Safety risk management. In ASY-300, Federal Aviation Administration.
Givant, S. R., & Halmos, P. R. (2009). Introduction to Boolean algebras. Berlin: Springer.
Horev, M. (2010). Root cause analysis in process-based industries. Bloomington: Trafford
Publishing.
Ishikawa, K. (1968). Guide to quality control. Tokyo: Asian Productivity Organization.
Koch, J. E. (1990). Jet propulsion laboratory reliability analysis handbook. In Project Reliability
Group, Jet Propulsion Laboratory, Pasadena, California JPL-D-5703.
Lacey, P. (2011). An application of fault tree analysis to the identification and management of
risks in government funded human service delivery. In Proceedings of the 2nd International
Conference on Public Policy and Social Sciences.
Ladkin, P. (2005). The Glenbrook why-because graphs, causal graphs, and accimap. Working
paper, Faculty of Technology, University of Bielefeld, German.
Ladkin, P., & Loer, K. (1998). Analysing aviation accidents using WB-analysis - an application of
multimodal reasoning. (AAAI Technical Report) SS-98-0 (pp. 169–174)
96 7 Fault Trees and Variations

Larsen, W. (1974). Fault tree analysis. Picatinny Arsenal (Technical Report No. 4556).
Martensen, A. L., & Butler, R.W. (1975). The fault-tree compiler. In Langely Research Center,
NTRS.
Parkes, A. (2002). Introduction to languages, machines and logic: Computable languages, abstract
machines and formal logic. Berlin: Springer.
Roland, H. E., & Moriarty, B. (Eds.), (1990). System safety engineering and management. New
York: Wiley.
Shaqdan, K., et al. (2014). Root-cause analysis and health failure mode and effect analysis: Two
leading techniques in health care quality assessment. Journal of the American College of
Radiology, 11(6), 572–579.
Vanden Heuvel, L. N., Lorenzo, D. K., & Hanson, W. E. (2008). Root cause analysis handbook: A
guide to efficient and effective incident management (3rd ed.). New York: Rothstein Publishing.
Vesely, W. (2002). Fault tree handbook with aerospace applications. In National Aeronautics and
Space Administration.
Vesely, W. E., Goldberg, F. F., Roberts, N. H., & Haasl, D. F. (1981). Fault tree handbook (No.
NUREG-0492). Washington, DC: Nuclear Regulatory Commission.
Wilson, P. F., Dell, L. D., Anderson, G. F. (1993). Root cause analysis: A tool for total quality
management (Vol. SS-98-0, pp. 8–17). Milwaukee: ASQ Quality Press.
Woods, D. D., Hollnagel, D. D., & Leveson, N. (Eds.). (2006). Resilience engineering: Concepts
and precepts (New Ed ed.). New York: CRC Press.
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353.
Chapter 8
Bayesian Networks

8.1 Introduction

This chapter introduces Bayesian belief and decision networks (Koski and Noble,
2009) as quantitative tools for risks measurement and management. Bayesian
networks are a powerful statistical tool which can be applied to risk management
in financial institutions at various stages (Pourret et al., 2008). As stated in the third
chapter, this methodology belongs to the field of data science and can be applied to
various situations beyond scenario analysis.
To effectively and efficiently manage risks, influencing factors from triggers to
catalyst must be clearly identified. Once the key drivers have been identified, the
second stage regards the controls in place to mitigate these risks and ideally to
reduce the exposures. But before initiating these tasks, and assuming that the risk
appetite of the company has been taken into account, three main components need to
be analysed: those are control effectiveness, potential negative impact of the controls
on associated risks and cost of these controls (Alexander, 2003):
1. Effectiveness: Bayesian network factor modelling may help understanding the
impact of a factor (control, risk or trigger) on the overall exposure. The Bayesian
networks are designed to deal with such situations.
2. Dependency: It is possible that the reduction of one risk increases the risks
in another area or a different kind of risks. The Bayesian networks provide
practitioners with a solution to analyse that possibility. This aspect is particularly
important for practitioners as most of the time, dealing with risk implies various
trade-offs and usually requires to compromise.
3. Cost: Would controls cost reduce the risk significantly to at least cover the
investment? This question is fully related to the question of firm risk appetite.
Do we want to accept the risk, or are we willing to offset it?

© Springer International Publishing Switzerland 2016 97


B.K. Hassani, Scenario Analysis in Risk Management,
DOI 10.1007/978-3-319-25056-4_8
98 8 Bayesian Networks

Addressing now the core topic of this chapter, we can start with the defi-
nition of a Bayesian network. A Bayesian network is a probabilistic graphical
model representing random variables and their conditional dependencies (hence the
Bayesian terminology) via a directed acyclic graph (DAG). Formally, the nodes
represent random variables in the Bayesian sense, i.e., these may be observable
quantities, latent variables, unknown parameters, hypotheses, etc. Arcs or edges
represent conditional dependencies; nodes that are not connected represent variables
that are conditionally independent from each other. Each node is associated with
a probability function which takes a particular set of values from the node’s
parent variables, and returns the probability of the variable represented by the
node. Figure 8.1 illustrates a simple Bayesian network presenting how three initial
conditionally independent variables may lead to an issue.
The node where the arc originates is called the parent, while the node where the
arc ends is called the child. In our example (Fig. 8.1), A is a parent of C, and C is
a child of A. Nodes that can be reached from other nodes are called descendants.
Nodes that lead a path to a specific node are called ancestors. Here, C and E are
descendents of B, and B and C are ancestors of E. Note that children cannot be
its own ancestor or descendent. Bayesian networks will generally include tables
providing the probabilities for the true/false values of the variables. The main point
of Bayesian networks is to allow for probabilistic inference (Pearl, 2000) to be
performed. This means that the probability of each value of a node in the Bayesian
network can be computed when the values of the other variables are known. Also,
because independence among the variables is easy to recognise since conditional
relationships are clearly defined by graphic edges, not all joint probabilities in the
Bayesian system need to be calculated in order to make a decision.
In order to present Bayesian network practically, we will rely on a simple
example related to IT failures as depicted in Fig. 8.2. Assuming that two events in the
IT department could lead to a business disruption and a subsequent financial loss:

A B D

C F

Fig. 8.1 Illustration: a simple directed acyclic graph—this graph contains six nodes from A to F.
C depends on A and B, F depends on D, E depends on C and F and t hrough these nodes A, B
and D
8.1 Introduction 99

Fig. 8.2 This figure represents a Bayesian network, allowing to analyse the exposure to a financial
loss due to a business disruption caused by two potential root causes, for instance, an IT failure
and/or a cyber attack. The conditional probabilities are also provided allowing to move from one
node to the next

either the entity endures an IT failure or the entity suffers a cyber attack. Also, it is
possible to assume that the cyber attack may impact the IT system too (e.g. this one
is disrupted). Then a Bayesian network can model the situation, as represented in the
previous diagram. We assume that the variables have only two possible outcomes,
True or False. The joint probability function is given as follows:

P.L; F; C/ D P.L j F; C/P.F j C/P.C/; (8.1.1)

where L represents the business disruption and the financial loss, F represents the
IT failure and C the cyber attack. The model should then be able to answer the
question “What is the probability of suffering a business disruption given that we
100 8 Bayesian Networks

had a cyber attack?” by using the conditional probability formula and summing over
all nuisance variables:
P
P.L D T; C D T/ F2fT;Fg P.L D T; F; C D T/
P.C D T j L D T/ D D P :
P.L D T/ F;C2fT;Fg P.L D T; F; C/

(8.1.2)

Using the expansion for the joint probability function P.L; F; C/ and the condi-
tional probabilities as presented in the diagram, we can compute any combination.
For example,

P.L D T; F D T; C D T/ D P.L D T j F D T; C D T/P.F D T j C D T/P.C D T/

which leads to 0:9  0:7  0:3 D 0:189. Or,

P.L D T; F D T; C D F/ D P.L D T j F D T; C D F/P.F D T j C D F/P.C D F/

which leads to 0:7  0:2  0:7 D 0:098. Then the numerical results are
0:189TTT C 0:027TFT
P.C D T j L D T/ D  68:78 %:
0:189TTT C 0:098TTF C 0:027TFT C 0:0TFF

8.2 Theory

In this section, we will address the Bayesian network from a theoretical point of
view, not only focusing on our problem, i.e., scenario analysis, but also discussing its
use beyond scenario analysis, or in other words, its use for automated and integrated
risk management.
The first point to introduce is the concept of joint probability, i.e., the probability
that a series of events will happen subsequently or simultaneously. The joint
probability distribution can be expressed either in terms of a joint cumulative
distribution function or in terms of a joint probability density function in the
continuous case or joint probability mass function in the discrete case.1 These in turn
can be used to find two other types of distributions: the marginal distributions giving
the probabilities for any of the variables, and the conditional probability distribution
for the remaining variables.
The joint probability mass function of two discrete random variables X, Y is
given by

P.X D x and Y D y/ D P.Y D y j X D x/  P.X D x/ D P.X D x j Y D y/  P.Y D y/;


(8.2.1)

1
Chapter 11 provides alternative solution to build joint probability functions.
8.2 Theory 101

where P.Y D y j X D x/ is the probability of Y D y given that X D x. The


generalisation to n discrete random variables X1 ; X2 ; : : : ; Xn which is

P.X1 D x1 ; : : : ; Xn D xn / D P.X1 D x1 /P.X2 D x2 j X1 D x1 /


 P.X3 D x3 j X1 D x1 ; X2 D x2 /    
 P.Xn D xn j X1 D x1 ; X2 D x2 ; : : : ; Xn1 D xn1 /

In parallel, the joint probability density function fX;Y .x; y/ for continuous random
variables is

fX;Y .x; y/ D fYjX .y j x/fX .x/ D fXjY .x j y/fY .y/ ; (8.2.2)

where fYjX .yjx/ and fXjY .xjy/ give the conditional distributions of Y given X D x and
of X given Y D y, respectively, and fX .x/ and fY .y/ give the marginal distributions
for X and Y, respectively.
In the case of a Bayesian network, the joint probability of the multiple variables
can be obtained from the product of individual probabilities of the nodes:

Y
n
P.X1 ; : : : ; Xn / D P.Xi j parents.Xi // : (8.2.3)
iD1

The second requirement to understand how the network is functioning is under-


standing Bayes’ theorem (Bayes and Prince, 1763), expressed as:

P.AjS/  P.IjA; S/
P.AjI; S/ D ; (8.2.4)
P.IjS/

where our belief in assumption A can be refined given the additional information
available I as long as secondary inputs S. P.AjI; S/ is the posterior probability, i.e.,
the probability of A to be true considering the initial information available as long as
the added information. P.AjS/ is the prior probability or the probability of A being
true given S. P.IjA; S/ is the likelihood component and gives the probability of the
evidence assuming that both A and S are true. Finally, the last term P.IjS/ is called
the expectedness, or how expected the evidence is, given only S. It is independent
of A, therefore it is usually considered as a scaling factor, and may be rewritten as

X
n
P.IjS/ D P.IjAi ; S/  P.Ai jS/; (8.2.5)
i

where i denotes the index of a particular assumption Ai , and the summation is taken
over a set of hypotheses which are mutually exclusive and exhaustive. It is important
to note that all these probabilities are conditional. They specify the degree of
102 8 Bayesian Networks

belief in propositions assuming that some other propositions are true. Consequently,
without prior determination of the probability of the previous propositions, the
approach cannot be functioning.
Going one step further, we can now briefly present the statistical inference.
Given some data x, and parameter , a simple Bayesian analysis starts with a
prior probability p./ and likelihood p.x j / to compute a posterior probability
p. j x/ / p.x j /p./ (Shevchenko, 2011).
Usually the prior distributions depend on other parameters ' (not mentioned in
the likelihood), referred to as hyperparameters. So, the prior p./ must be replaced
by a likelihood p. j '/, and a prior p.'/ on the newly introduced parameters ' is
required, resulting in a posterior probability

p.; 'jx/ / p.xj/p.j'/p.'/: (8.2.6)

The process may be repeated multiple times if necessary; for example, the parame-
ters ' may depend in turn on additional parameters , which will require their own
prior. Eventually the process must terminate, with priors that do not depend on any
other unmentioned parameters.2
For example, suppose we have measured the quantities x1 ; : : : ; xn , each with
normally distributed errors of known standard deviation ,

xi  N.i ;  2 /: (8.2.7)

Suppose we are interested in estimating i . An approach would be to estimate the


i using a maximum likelihood approach; since the observations are independent,
the likelihood factorises and the maximum likelihood estimate is simply

i D xi : (8.2.8)

However, if the quantities are not independent, a model combining the i is required
such as,

xi  N.i ;  2 /; (8.2.9)
i  N.';  2 / (8.2.10)

with improper priors ' ,  2 .0; 1/. When n  3, this is an identified model
(i.e. there exists a unique solution for the model’s parameters), and the posterior
distributions of the individual i will tend to converge towards their common mean.3

2
The symbol / means proportional too, and to draw a parallel with the previous paragraph related
to Bayes’ theorem, we see that the scaling factor does not have any impact in the research of the
appropriate values for the parameters.
3
This shrinkage is a typical behaviour in hierarchical Bayes’ models (Wang-Shu, 1994).
8.2 Theory 103

8.2.1 A Practical Focus on the Gaussian Case

In order to specify the Bayesian network and therefore represent the joint probability
distribution, the probability distribution for X conditional upon X’s parents has to
be specified for each node X. These distributions may take any form, though it is
common to work with discrete or Gaussian distributions since these simplifies the
calculations.
In the following we develop the Gaussian case because of the so-called conjugate
property. Indeed, if the posterior distributions p.jx/ are in the same family as the
prior probability distribution p./, the prior and posterior are then called conjugate
distributions, and the prior is called a conjugate prior for the likelihood function. The
Gaussian distribution is conjugate to itself with respect to its likelihood function.
Consequently, the conjugate prior of the mean vector is another multivariate normal
distribution, and the conjugate prior of the covariance matrix is an inverse-Wishart
distribution W 1 (Haff, 1979). Suppose then that n observations have been gathered

X D fx1 ; : : : ; xn g  N .; †/ (8.2.11)

and that a conjugate prior has been assigned, where

p.; †/ D p. j †/ p.†/; (8.2.12)

where

p. j †/  N .0 ; m1 †/; (8.2.13)

and

p.†/  W 1 .‰; n0 /: (8.2.14)

Then,
 
nNxCm0 1
p. j †; X/  N nCm ; nCm † ;
 (8.2.15)
p.† j X/  W 1 ‰ C nS C nCmnm
.Nx  0 /.Nx  0 /0 ; n C n0 ;

where

X
n
xN D n1 xi ;
iD1
Xn (8.2.16)
S D n1 .xi  xN /.xi  xN /0 :
iD1
104 8 Bayesian Networks

If N-dimensional x is partitioned as follows:





x1 q1
xD with sizes (8.2.17)
x2 .N  q/  1

and accordingly  and † are partitioned as follows:





1 q1
D with sizes (8.2.18)
2 .N  q/  1



† 11 † 12 qq q  .N  q/
†D with sizes (8.2.19)
† 21 † 22 .N  q/  q .N  q/  .N  q/

then, the distribution of x1 conditional on x2 D a is multivariate normal .x1 jx2 D


a/ N.; †/ where

N D 1 C † 12 † 1
 22 .a  2 / (8.2.20)

and covariance matrix

† D † 11  † 12 † 1
22 † 21 : (8.2.21)

This matrix is the Schur complement (Zhang, 2005) of †22 in †. This means that
to compute the conditional covariance matrix, the overall covariance matrix need to
be inverted, the rows and columns corresponding to the variables being conditioned
upon have to be dropped, and then inverted back to get the conditional covariance
matrix. Here † 1
22 is the generalised inverse of † 22 .

8.2.2 Moving Towards an Integrated System: Learning

In the simplest case, a Bayesian network is specified by an expert and is then used
to perform inference, as briefly introduced in the first section. In more complicated
situations, the network structure and the parameters of the local distributions must
be learned from the data.
As discussed in Chap. 4, Bayesian networks are part of the machine learning
field of research. Originally developed by Rebane and Pearl (1987) the automated
learning relies on the distinction between the three possible types of adjacent triplets
allowed in a DAG:
• Type 1: X ! Y ! Z
• Type 2: X Y!Z
• Type 3: X ! Y Z
8.2 Theory 105

Type 1 and type 2 are both independent given Y, therefore, they are indis-
tinguishable. On the other hand, Type 3 can be uniquely identified as X and Z
are marginally independent and all other pairs are dependent. Thus, while the
representations of these three triplets are identical, the direction of the arrows defines
the causal relationship and is therefore of particular importance. Algorithms have
been developed to determine the structure of the graph in a first step and orient the
arrows according to the conditional independence observed in a second step (Verma
and Pearl 1991; Spirtes and Glymour 1991; Spirtes et al. 1993; Pearl 2000).
Alternatively, it is possible to use structural learning methods which require
a scoring function and a search strategy, such as a Markov Chain Monte Carlo
(MCMC) to avoid being trapped in local minima. Another method consists in focus-
ing on the sub-class of models, for which the MLE have a closed form, supporting
the discovery of a consistent structure for hundreds of variables (Petitjean et al.,
2013).
Nodes and edges can be added using rule-based machine learning techniques,
inductive logic programming or statistical relational learning approaches (Nassif
et al., 2012, 2013).
Often the conditional distributions require a parameter estimation, using, for
example, a maximum likelihood approach (see Chap. 5) though any maximisation
problem (likelihood or posterior probability) might be complex if some variables
are unobserved. To solve this problem the implementation of the expectation–
maximisation algorithm, which iteratively alternates evaluating expected values
of the unobserved variables conditional on observed data, and maximising the
complete likelihood (or posterior) assuming that previously computed expected
values are correct, is particularly helpful. Alternatively it is possible estimate the
parameters by treating them as additional unobserved variables and to compute a full
posterior distribution over all nodes conditional upon observed data, but this usually
leads to large dimensional models, which are complicated to implement in practice.
Bayesian networks are complete models capturing relationships between vari-
ables and can be used to evaluate probabilities at various stages of the causal chain.
Computing the posterior distribution of variables considering the information gath-
ered about them is referred to as probabilistic inference. To summarise, a Bayesian
network allows automatically applying Bayes’ theorem to complex problems.
The most common exact inference methods are: (1) variable elimination, which
eliminates either by integration or summation the non-observed non-query variables
one by one by distributing the sum over the product; (2) clique tree propagation
(Zhang and Yan 1997), which stores in computers memory the computation so
that multiple variables can be queried simultaneously and new evidence propagated
quickly; (3) and recursive conditioning which allow for a space-time trade-off and
match the efficiency of variable elimination when enough space is used (Darwiche
2001). All of these methods see their complexity growing with the network’s tree
width.
The most common approximate inference algorithms are importance sampling,
stochastic MCMC simulation, mini-bucket elimination, loopy belief propagation,
106 8 Bayesian Networks

generalised belief propagation and variational methods (MacKay, 2003; Hassani


and Renaudin, 2013).

8.3 For the Managers

In this section, we discuss the added value of Bayesian networks for risk practition-
ers. As these are some kind of models, the possibilities are almost unlimited as long
as the information and the strategies used to feed the nodes are both accurate and
appropriate. Indeed, the number of nodes leading to an outcome can be as large as
practitioners would like though it will require more research to feed the probabilities
required for each node.
The network in Fig. 8.3 shows how starting from a weak IT system, we may
analyse the likelihood of putting customers data at risk and therefore getting a
regulatory fine, of losing customer due to the reputational impact, of suffering an
opportunistic rogue trading incident, up to the systemic incident. In that example, we
can see a macro contagion mimicking a bit the domino effect observed after Societe
Generale rogue trading incident in 2008. Note that each node can be analysed and/or
informed by either discrete or continuous distributions. It is also interesting to note
how the two illustrations in this chapter start from similar underlying issues though
aim at analysing different scenarios (i.e., comparing Figs. 8.2 and 8.3).
The key to the use of this network is the evaluation of the probabilities and
conditional probabilities at each node. Note once again that this kind of method-
ology is highly data consuming, as to be reliable we need evidence and information

Weak IT Systems

Customers Data At Risk Rogue Trading

Regulatory Fine Impact on Reputation Market Confidence Impact

Loss of Customers Financial Loss Liquidity Issue

Credit Crunch

Real Economy not funded

Financial Loss due to defaults

Fig. 8.3 In this figure, we illustrate the possibility to analyse the cascading outcomes resulting
from a weak IT System, i.e. the likelihood of putting customers data at risk and therefore getting
a regulatory fine, of losing customers due to the reputational impact and in parallel analyse the
probability of suffering an opportunistic rogue trading incident, implementing a Bayesian Network
8.3 For the Managers 107

supporting these probabilities, otherwise it would be highly judgemental and


therefore likely to be unreliable. Besides, to be used for risk assessment, the right
questions need to be asked, indeed, are we interested in the potential loss amount or
the probability of failure? In other words, what is our target?
One advantage of Bayesian networks is that it is intuitively easier for a manager
to build, to explain and to understand direct dependencies and local distributions
than complete joint distributions, and to defend it in front of senior managers. In the
following, the pros and cons of the methodology are detailed.
The advantages of Bayesian networks are as follows:
• Bayesian networks represent all the relationships between variables in a system
with connecting arcs. It is quite simple for a professional to build his own causal
network representative of the risk he is trying to model, from the triggers to the
contagion nodes up to the outcome in case of the materialisation of the risk, e.g.,
loss, system failure, time of recovery, etc.
• It is simple to identify dependent and independent nodes. This would help, for
example, determining where some more controls should be put in place and
prioritise the tasks.
• Bayesian networks are functioning even if the data sets are incomplete as the
model takes into account dependencies between all variables. This makes it faster
to implement and allows practitioners to use multiple sources of information to
inform the nodes.
• Bayesian networks can map scenarios where it is not feasible/practical to
measure all variables due to system constraints. Especially in situations where
they are integrated in a machine learning environment and the mapping is
identified automatically.
• They can help reaching order out of chaos on complicated models (i.e. containing
many variables).
• They can be used for any system model—from all known parameters to unknown
parameters.
However, and from a more scenario centric perspective, the limitations of
Bayesian networks are as follows (Holmes, 2008):
• All branches must be calculated in order to calculate the probability of any one
branch. That might be highly complicated, and the impact on the global outcomes
of the model of a node that is not properly informed may be leading to unreliable
results and therefore may potentially mislead the management.
• The second problem regards the quality and the extent of the prior beliefs used in
Bayesian inference processing. A Bayesian network is only as useful as this prior
knowledge is reliable. Either an excessively optimistic or pessimistic expectation
of the quality of these prior beliefs will distort the entire network and invalidate
the results. Related to this concern is the selection of the statistical distribution
induced in modelling the data. Selecting the proper distribution model to describe
the data has a notable effect on the quality of the resulting network.
• It is difficult, computationally speaking, to explore a previously unknown
network. To calculate the probability of any branch of the network, all branches
108 8 Bayesian Networks

must be calculated. While the resulting ability to describe the network can be
performed in linear time, this process of network discovery is a hard task which
might either be too costly to perform, or impossible given the number and
combination of variables.
• Calculations and probabilities using Bayes’ rule and marginalisation can become
complex, therefore calculation should be undertaken carefully.
• System’s users might be keen to violate the distribution of probabilities upon
which the system is built.

References

Alexander, C. (2003). Managing operational risks with Bayesian networks. Operational Risk:
Regulation, Analysis and Management, 1, 285–294.
Bayes, T., & Prince, R. (1763). An essay towards solving a problem in the doctrine of chance. By
the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M. A. and F.
R. S. Philosophical Transactions of the Royal Society of London, 53, 370–418.
Darwiche, A. (2001). Recursive conditioning. Artificial Intelligence, 126(1–2), 5–41.
Haff, L. R. (1979). An identity for the Wishart distribution with applications. Journal of
Multivariate Analysis, 9(4), 531–544.
Hassani, B. K., & Renaudin, A. (2013). The cascade Bayesian approach for a controlled
integration of internal data, external data and scenarios. Working Paper, Université Paris 1.
ISSN:1955-611X [halshs-00795046 - version 1].
Holmes, D. E. (Ed.). (2008). Innovations in Bayesian networks: Theory and applications. Berlin:
Springer.
Koski, T., & Noble, J. (2009). Bayesian networks: An introduction (1st ed.). London: Wiley.
MacKay, D. (2003). Information theory, inference, and learning algorithms. Cambridge: Cam-
bridge University Press.
Nassif, H., Wu, Y., Page, D., & Burnside, E. (2012). Logical differential prediction Bayes net,
improving breast cancer diagnosis for older women. In AMIA Annual Symposium Proceedings
(Vol. 2012, p. 1330). American Medical Informatics Association.
Nassif, H., Kuusisto, F., Burnside, E. S., Page, D., Shavlik, J., & Costa, V. S. (2013). Score
as you lift (SAYL): A statistical relational learning approach to uplift modeling. In Joint
European conference on machine learning and knowledge discovery in databases, September
2013 (pp. 595–611). Berlin/Heidelberg: Springer.
Pearl, J. (Ed.). (2000). Causality: Models, reasoning, and inference. Cambridge: Cambridge
University Press.
Petitjean, F., Webb, G. I., & Nicholson, A. E. (2013). Scaling log-linear analysis to high
dimensional data. In International Conference on Data Mining, Dallas, TX (pp. 597–606).
Pourret, O., Naim, P., & Marcot, B. (Eds.). (2008). Bayesian networks: A practical guide to
applications (1st ed.). London: Wiley.
Rebane, G., & Pearl, P. (1987). The recovery of causal poly-trees from statistical data. In
Proceedings 3rd Workshop on Uncertainty in AI, Seattle, WA.
Shevchenko, P. V. (2011). Modelling operational risk using Bayesian inference. Berlin: Springer.
Spirtes, P., & Glymour, C. N. (1991). An algorithm for fast recovery of sparse causal graphs. Social
Science Computer Review, 9(1), 62–72.
Spirtes, P., Glymour, C. N., & Scheines, R. (1993). Causation, prediction, and search. New York:
Springer.
References 109

Verma, T., & Pearl, J. (1991). Equivalence and synthesis of causal models. In P. Bonissone
et al. (Eds.), UAI 90 Proceedings of the Sixth Annual Conference on Uncertainty in Artificial
Intelligence. Amsterdam: Elsevier.
Wang-Shu, L. (1994). Approximate Bayesian shrinkage estimation. Annals of the Institute of
Statistical Mathematics, 46(3), 497–507.
Zhang, F. (2005). The Schur complement and its applications. New York: Springer.
Zhang, N. L., & Yan, L. (1997). Independence of causal influence and clique tree propagation. In
Proceedings of the thirteenth conference on uncertainty in artificial intelligence, August 1997
(pp. 481–488). Los Altos: Morgan Kaufmann Publishers Inc.
Chapter 9
Artificial Neural Network to Serve Scenario
Analysis Purposes

Artificial neural networks (ANN), though inspired by the way brains are func-
tioning, have been largely replaced by approaches based on statistics and signal
processing, but the philosophy remains the same. Consequently and as briefly
introduced in the third chapter, artificial neural networks are a family of statistical
learning models.
An artificial neural network is an interconnected group of nodes (“neurons”)
mimicking neural connections in a brain, though it is not clear to what degree
artificial neural networks mirror brain functions. As represented in Fig. 9.1 a circular
node characterises an artificial neuron and an arrow depicts the fact that the output
of one neuron is the input of the next. They are used to estimate or approximate
functions that can depend on a large number of inputs. The connections have
weights that can be modified, fine tuned or adapted according to experience or new
situations: this is the learning scheme.
To summarise the process, neurons are activated when they receive a signal, i.e.,
a set of information. After being weighted and transformed, the activated neurons
pass the modified information, message or signal onto other neurons. This process is
reiterated until an output neuron is triggered, which determines the outcome of the
process. Neural networks (Davalo and Naim 1991) have been used to solve multiple
tasks that cannot be adequately addressed using ordinary rule-based programming,
such as handwriting recognition (Matan et al. 1990), speech recognition (Hinton
et al. 2012) or climate change scenario analysis (Knutti et al. 2003), among others.
Neural networks are a family or class of processes that have the following
characteristics:
• It contains weights which are modified during the process based on the new
information available, i.e., numerical parameters that are tuned by a learning
algorithm.
• It allows approximating non-linear functions of their inputs.
• The adaptive weights are connection strengths between neurons, which are
activated during training and prediction by the appropriate signal.

© Springer International Publishing Switzerland 2016 111


B.K. Hassani, Scenario Analysis in Risk Management,
DOI 10.1007/978-3-319-25056-4_9
112 9 Artificial Neural Network to Serve Scenario Analysis Purposes

1 1

X1

X2

Output

X3

X4

Fig. 9.1 This figure illustrates a neural network. In this illustration, only one hidden layer has
been represented

9.1 Origins

In this section we will briefly provide an historical overview of artificial neural


networks.
McCulloch and Pitts (1943) created a computational model for neural networks
based on mathematics and algorithms usually referred to as threshold logic. This
approach led to the split of neural network research in two different axis. The first
focused on biological processes in the brain, while the second aimed at applying
neural networks to artificial intelligence. Psychologist (Hebb 1949) created the
typical unsupervised learning rule referred to as Hebbian learning, later leading to
new neuroscience models. Hebbian network was simulated for the first time at the
MIT by Farley and Clark (1954) using an ancestor of the computer1 and later this
work was extended by Rochester et al. (1956).
Then Rosenblatt (1958) created the perceptron, a two-layer computer learning
network algorithm using additions and subtractions, created for pattern recognition
purposes; however, this one could not be properly processed at the time. It is only
when Werbos (1974) created the back-propagation algorithm that it was possible
processing situation previously impossible to model with the perceptron. Besides
this new algorithm revived the use of ANN as it solved the exclusive-or issue.
However, the use of ANN was still limited due to the lack of processing power.
It is only in the early 2000s that neural networks really came back to life with the
theoretisation of the deep learning (Deng and Yu 2014).
To summarise, as presented ANN are far from being new, but the computation
power necessary has only been made recently available.

1
Turing’s B machine already existed (sic!).
9.2 In Theory 113

9.2 In Theory

Neural network models in artificial intelligence are essentially mathematical models


defining a function f W X ! Y (i.e. y D f .x/) or a distribution over X or both X and
Y.
The first layer contains input nodes which transfer data to the subsequent layers
of neurons via synapses until the signal reaches the output neurons. The most
complex architecture has multiple layers of neurons, various input neurons layers
and output neurons. The synapses embed weight parameters that modify the data in
the calculations.
An ANN relies on three important features:
• The connection between the different layers of neurons
• The learning process (i.e. updating the weights)
• The activation function which transforms the weighted inputs into transferable
values.
Mathematically, the network function f .x/ is a combination of other functions
gi .x/, which might themselves be a combination of other functions. This network
structure representation using arrows is straightforward to represent the synaptic
connections, i.e., the variable relationships. Besides, f .x/ is traditionally represented
as a non-linear weighted sum:
!
X
f .x/ D K wi gi .x/ ; (9.2.1)
i

where K is the prespecified activation function (Wilson 2012).


In the traditional probabilistic view, the random variable F D f .G/ depends on
the random variable (r.v.) G D g.H/, which itself relies upon the r.v. H D h.X/
depending on the r.v. X.
Considering this architecture, the components of individual layers are indepen-
dent from each other. Therefore some intermediate operations can be performed in
parallel.
Networks used in this chapter are referred to as feed forward, as their graph is
a directed acyclic graph (DAG) as the Bayesian networks presented in the previous
chapter.
Neural networks are very interesting as they can learn, i.e., given a problem and
a class of functions F, the learning process uses a set of observations to find the
optimal2 subset of functions f  2 F solving the problem, achieving the task or
assessing the likely outcome of a scenario storyline.
This requires defining an objective function C W F ! R such that, for the optimal
solution f  , C. f  /  C. f /8f 2 F (i.e. no solution is better than the optimal

2
According to prespecified criteria.
114 9 Artificial Neural Network to Serve Scenario Analysis Purposes

solution). The objective function C is very important as it measures the distance


of a particular solution from an optimal solution given the task to be achieved. The
objective function has to be a function of the input data and is usually defined as a
statistic that can only be approximated.
While it is possible to define some ad hoc objective function, it is highly unusual.
A specific objective function is traditionally used, either because of its desirable
properties (e.g. convexity) or because the formulation of the problem led to it, i.e.,
this one depends on the desired task.

9.3 Learning Algorithms

Training a neural network model essentially means selecting one model from the
set of allowed models that minimise the objective function criterion. There are
numerous algorithms available for training neural network models; most of them
can be viewed as a straightforward application of optimisation theory and statistical
estimation. Most of the algorithms used in training artificial neural networks employ
some form of gradient descent, using backpropagation to compute the actual
gradients. This is done by simply taking the derivative of the objective function with
respect to the network parameters and then changing those parameters in a gradient-
related direction. The backpropagation training algorithms are usually classified in
three categories: steepest descent (with variable learning rate, with variable learning
rate and momentum, with resilient backpropagation), quasi-Newton (Broyden–
Fletcher–Goldfarb–Shanno, one step secant, Levenberg–Marquardt) and conjugate
gradient (Fletcher–Reeves update, Polak–Ribiére update, Powell–Beale restart,
scaled conjugate gradient) (Forouzanfar et al. 2010).
Evolutionary methods (Rigo et al. 2005), gene expression programming (Ferreira
2006), simulated annealing (Da and Xiurun 2005), expectation–maximisation, non-
parametric methods and particle swarm optimisation (Wu and Chen 2009) are some
commonly used methods for training neural networks.
Perhaps the greatest advantage of ANNs is their ability to be used as an arbitrary
function approximation mechanism that “learns” from observed data. However,
using them is not so straightforward, and a relatively good understanding of the
underlying theory is essential.
Obviously, the approximation accuracy will depend on the data representation
and the application. Complex models tend to lead to problems with learning. Indeed,
there are numerous issues with learning algorithms. Almost any algorithm will work
well with the correct hyperparameters for training on a particular fixed data set.
However, selecting and tuning an algorithm for training on unseen data requires a
significant amount of experimentation.
If the model’s, objective function and learning algorithm are selected appropri-
ately the resulting ANN might be quite robust. With the correct implementation,
ANNs might be used naturally for online learning and large data set applications.
9.3 Learning Algorithms 115

Their simple structure and the existence of mostly local dependencies exhibited in
the structure allows for fast parallel implementations.
The utility of artificial neural network models lies in the fact that they can be
used to infer a function from observations. This is particularly useful in applications
where the complexity of the data or task makes the design of such a function by
hand impractical. Indeed, the properties presented in the next paragraphs support
the capability of Neural Networks to capture particular behaviors embedded within
data sets and infer a function from it.
Artificial neural network models have a property called “capacity”, which means
that they can model any function despite the quantity of information, its type or its
complexity.
Addressing the question of convergence is complicated since it depends on a
number of factors: (1) many local minima may exist, (2) it depends on the objective
function and the model, (3) the optimisation method used might not converge when
starting far from a local minimum, (4) for a very large number of data points or
parameters, some methods become impractical.
In applications where the goal is to create a system which works well in unseen
situations, the problem of overtraining has emerged. This arises in convoluted or
over-specified systems when the capacity of the network significantly exceeds the
needed free parameters.
There are two schools of thoughts to deal with that issue. The first suggests using
cross-validation and similar techniques to check for the presence of overtraining
and optimally select hyperparameters such as to minimise the generalisation error.
The second recommends using some form of regularisation. This is a concept that
emerges naturally in a probabilistic framework, where the regularisation can be
performed by selecting a larger prior probability over simpler models; but also in
statistical learning theory, where the goal is to minimise over two quantities: the
“empirical risk” and the “structural risk”, which roughly corresponds to the error
over the training set and the predicted error in unseen data due to overfitting.
Supervised neural networks that use a mean squared error (MSE) objective
function can use formal statistical methods to determine the confidence of the
trained model. The MSE on a validation set can be used as an estimate for variance.
This value can then be used to calculate the confidence interval of the output of the
network, assuming a normal distribution. A confidence analysis made this way is
statistically valid as long as the output probability distribution stays the same and
the network is not modified.
It is also possible to assign a generalisation of the logistic function, referred to
as the softmax activation function so that the output can be interpreted as posterior
probabilities (see Chap. 8).
The softmax activation function is
ex i
yi D Pc : (9.3.1)
jD1 ex j
116 9 Artificial Neural Network to Serve Scenario Analysis Purposes

9.4 Application

In this section, our objective is to apply neural network to scenario analysis. Indeed
scenario analysis includes many tasks that can be independently performed by
neural networks such as function approximation, regression analysis, time series
prediction, classification (pattern and sequence recognition), novelty detection and
sequential decision making and can also be used in data processing for tasks such
as mining, filtering, clustering, knowledge discovery in databases, blind source
separation and compression. After training, the networks could predict multiple
outcomes from unrelated inputs (Ganesan 2010).
Applications of neural networks to risk management are not new. Indeed, Trippi
and Turban (1992) provide multiple chapters presenting methodologies using neural
networks to predict bank failures. In this book, the neural network strategy is also
compared to more traditional approaches. Relying on the results presented in these
chapters, we see that neural networks can be used as follows.
Considering that neural networks are relying on units. Each unit u receives inputs
signals from other units, aggregates these signals based on the input function Ui and
generates an output signal based on an output Oi . The output signal is then directed
to other units consistently with the topology of the network. Although the form of
input/ output functions at each node has no constraint other than to be continuous
and differentiable, using the function obtained from Rumelhart et al. (1996):
X
Ui D wij Oj C i (9.4.1)
j

and
1
Oi D ; (9.4.2)
1 C e Ui

where
1. Ui D input of unit i,
2. Oi D output of unit i,
3. wij D connection weight between unit i and j,
4. i = bias of unit i
Here, the neural network can be represented by a weighted directed graph where
the units introduced in the previous paragraph represent the nodes and the links
represent connections. To the links are assigned the weights of the corresponding
connection. A special class of neural networks referred to as feedforward networks
are used in the chapters in question.
A feedforward network contains three types of processing units, for instance,
input, output and hidden. Input units, initialising the network, receive the seed infor-
mation from some data. Hidden units do not directly interact with the environment,
9.4 Application 117

they are invisible, though they are located in the subsequent intermediate layers.
Finally, output units provide signals to the environment and are located in the final
layers. Note that layers can be skipped, but we cannot move backward.
The weight vector W, i.e., weights associated with the connections, is the core
of the neural network. W represents what a neural knows and permits responding
to any input provided. “A feedforward network with an appropriate W can be used
to model the casual relationship between a set of variables”. The fitting and the
subsequent learning is done by modifying the connections’ weights.
Determining the appropriate W is not usually easy, especially when the charac-
teristics of the entire population are barely known. As mentioned previously, the
network is trained using examples. The objective is to obtain a set of weights W
leading to the best fit of the model to the data used initially. The backpropagation
algorithm has been selected here to perform the learning as it is able to train multi-
layer networks. Its effectiveness comes from the fact that it is capable of exploiting
regularities and exceptions contained in the initial sample. The backpropagation
algorithm consists in two phases: forward-propagation and backward-propagation.
Mechanically speaking, let s be a training sample, each piece of information
described by an input vector Xi D .xi1 ; xi2 ; : : : ; xim / and an output vector Di D
.di1 ; di2 ; : : : din /, 1  i  s. In forward propagation, Xi is fed to the input layer, and
an output Yi D .yi1 ; yi2 ; : : : ; yin / is obtained using W, in other words Y D f .W/
where f characterises any appropriate function. The value of Yi is then compared
with the desired output Di by computing the squared error ..yij  dij /2 /, 1  i  n,
for each output unit. Output differences are aggregated to form the error function
SSE (sum squared error).

X
s X
n
.yij  dij /2
SSE D : (9.4.3)
iD1 jD1
2

The objective is to minimise the SSE with respect to W so that all input vectors
are correctly mapped into their corresponding output vectors. As a matter of fact,
the learning process can be considered as a minimisation problem with objective
function SSE defined in the space of W, i.e., arg maxW SSE:
The second phase consists in evaluating the gradient of the function in the weight
space to locate the optimal solution. Both direction and magnitude change wij of
each wij are obtained using

ıSSE
wij D  ; (9.4.4)
ıwij

where 0 <  < 1 is a parameter controlling the convergence rate of the algorithm.
The sum squared error calculated in the first phase is propagated back, layer
by layer, from the output units to the input units in the second phase. Weight
adjustments are obtained through propagation at each level. As Ui , Oi and SSE are
continuous and differentiable, ıSSE=ıwij can be evaluated at each level applying
118 9 Artificial Neural Network to Serve Scenario Analysis Purposes

the following chain rule:

ıSSE ıSSE ıOi ıUi


D : (9.4.5)
ıwij ıOi ıUi ıwij

In this process, W can be updated in two manners. For instance, either W is updated
sequentially for each couple .Xi ; Di /, or considering the aggregation of wij after a
complete run of all examples. For each iteration of the back-propagation algorithm,
the two phases are executed until the SSE converges.
In this book neural networks offer a viable alternative for scenario analysis. Here
this model is applied to bankruptcy prediction. In Trippi and Turban (1992), the
results exhibited for neural networks show a better predictive accuracy than those
obtained from implementing a linear discriminant model, a logistic regression,
a k nearest neighbour strategy and a decision tree. Applying their model to the
prediction of bank failures, the authors have modified the original backpropagation
algorithm to capture prior probabilities and misclassification. Indeed, the error
of misclassifying a failed bank into the non-failed group (type I error) is more
severe than the other way. The original function SSE is generalised to SSEw by
multiplying each error term by Zi , in other word by weighting it. The comparison of
the methodologies is based on a training set with an equal proportion of failed and
non-failed banks, though quite often, the number of defaults constitutes a smaller
portion of the whole population than the non-failed entities. The matching process
may bias the model, consequently, they recommended the entire population to be
used as the training set. As actually mentioned in earlier chapters, neural networks
can be helpful to identify a single group from a large set of alternatives.
Alternatively, Fig. 9.2 provides another application of neural networks with two
hidden layers. In that model, the data provided are related to cyber security. The

1 1 1
−1
1.
Antivirus_Updates 11.3
1 03
162 32
5
−10.7775499
−48.3
−2.1

.4
8 01
7037

1
5

−5.3088 −4
94

Industry_Reputation .5
45
−0.7

−7 31
.87
−1.2
2.4.8732

9.7

106
617
4−1. 17

35 6
43 6 7

.98
112

886
6
6 3
88

23
4

Budget_Security_Program 14.5517
4
−3.63973
94
−2 6.94499
−−15.73
.9 15

−3
35 77
.4

3 . 93
320
−0
54
54

22.1 50
4

.03

−0.1

Number_Of_Malware_Attack 9
0.47

82

−2
05

.79
1184
6
72

51
507

74
100.344

42 9
9.2

3
.6 53

03
3 00
03

.
−1

2. −9
07

Number_Of_Security_Patches 2.11032 Losses


7.9 82

51 −0
41

.3 .
−212.50

46
23
−1

78 96
524
.51

−0.0075

4
074
6.48083

75

4.4
8

Level_Of_Formation_Of_Managers
1

−41 2
46

.541 83
33

27 80
3.4
317.3105
0.

43
.63 6453 −2 .897

2.
835 84

−0.32507
.4
24

3
Traffic_To_Unwanted_Addresses 53.9801
−0
3

.45
−424.320
79

0.78836

2608
7
−0.19

9 9
325

0.6
−1 31
6
16.3

Quality_Of_Security_Checks −2.2838 16
1.
320726

8
.6 72
−63.07024

63
44
−3.5.3
34

785
1.10
Number_Of_Daily_Users

Fig. 9.2 This figure illustrates a neural network applied to IT security issues, considering
information coming from anti-virus updates frequency, industry reputation (how likely it is to be
threatened), the budget of security programs within financial institutions, the number of malware
attack, the number of security patches, the level of training of managers, the traffic to unwanted
addresses, the quality of security checks and the number of daily users
9.5 For the Manager: Pros and Cons 119

objective is to evaluate the likelihood of a financial loss. The information regarding


anti-virus updated, industry reputation, budget of the security program, number of
malware attacks, number of security patches, level of formation of the managers,
traffic to unwanted addresses, quality of security checks and number of daily users
is used as input.
Implementing a strategy as described before, the weights are calculated for each
and every connections. Then moving from a layer to another, we can evaluate the
probability of a loss related to cyber attacks given the initial issues identified.
In our second example, we are already moving toward deep learning strategies
as the neural network has two layers.

9.5 For the Manager: Pros and Cons

In this section we discuss the main issues and advantages of implementing a neural
network strategy for scenario analysis purposes, starting with the issues.
To be properly applicable neural networks require sufficient representative data
to capture the appropriate underlying structure which will allow a generalisation to
new situations. These issues can be dealt with in various manners such as randomly
shuffling the training data, using a numerical optimisation or reclassifying the data.
From a computational and IT infrastructure point of view, to implement large,
efficient and effective neural network strategies, considerable processing and storage
resources are required (Balcazar 1997). Simulating even the most simplified neural
network may require filling large database and may consume huge amounts of
memory and hard disk space. Besides, neural network methodologies will usually
require some simulations to deal with the signal transmission between the neurons—
and this may need huge amounts of CPU processing power and time.
Though, neural networks methodologies are questioned as it is possible to create
a successful net without understanding how it works. However it is arguable that an
unreadable table that a useful machine could read would still be well worth having
(NASA 2013). Indeed, the discriminant capability of a neural network is difficult to
express in symbolic form. However, neural networks are limited if one wants to test
the significance of individual inputs.
Remark 9.5.1 In that case we are somehow already talking about artificial intelli-
gence.
Other limitations reside in the fact that there is no formal method to derive a
network configuration for a given classification task. Although it was shown that
only one hidden layer is enough to approximate any continuous functions, the
number of hidden units can be arbitrarily large, the risk of overfitting the network is
real especially if the size of the training sample is insufficient. Researchers exploring
learning algorithms for neural networks are uncovering generic principles allowing
a successful fitting, learning, analysis and prediction. A new school of thoughts
actually consider that hybrid models (combining neural networks and symbolic
120 9 Artificial Neural Network to Serve Scenario Analysis Purposes

approaches) can even improve neural networks’ outcomes on their own (Bengio
and LeCun 2007; Sun 1994).
However, on the positive side, a neural network allows adaptive adjustments
of the predictive model as new information becomes available. This is the core
property of this methodology especially when the underlying group of distributions
are evolving. Statistical methods do not generally weigh the information and assume
that old and new examples are equally valid, and the entire set is used to construct
a model. However, when a new sample is obtained from a new distribution, keeping
the old information (likely to be obsolete) may bias the outcome and lead to a model
of low accuracy. Therefore, the adaptive feature of a neural network is that past
information is not ignored but receives a lower weight than the latest information
received and fed into the model. To be more effective a rolling window might be
used in practice. The proportion of the old data to be kept depends on considerations
related to stability, homogeneity, adequacy and noise of the sample.
Neural networks have others properties particularly useful, indeed the non-linear
discriminant function represented by the net provides a better approximation of the
sample distribution, especially when the latter is multimodal. Many classification
tasks have been reported to have a non-linear relationship between variables, and as
mentioned previously, neural networks are particularly robust as they do not assume
any probability distribution. Besides, there is no restriction regarding input/output
functions other than these have to be continuous and differentiable.
The research in that field is continuous. In fact one of the outcome led to the
application of genetic algorithm (Whitley 1994). Applying genetic algorithms for
network designs might be quite powerful as they mechanically retain and combine
good configurations in the next generation. The nature of the algorithm allows the
search for good configurations reducing in parallel the possibility of ending up with
a local optimum.
As presented in this chapter, neural networks can be used for scenario analysis,
for bankruptcy detection and can be easily extended to managerial applications.
Note that the topic is currently highly discussed as it is particularly relevant for the
trendy big data topic.

References

Balcazar, J. (1997). Computational power of neural networks: A Kolmogorov complexity charac-


terization. IEEE Transactions on Information Theory, 43(4), 1175–1183.
Bengio, Y., & LeCun, Y. (2007). Scaling learning algorithms towards AI. In L. Bottou, et al. (Eds.),
Large-scale kernel machines. Cambridge, MA: MIT Press.
Castelletti, A., de Rigo, D., Rizzoli, A. E., Soncini-Sessa, R., & Weber, E. (2005). A selective
improvement technique for fastening neuro-dynamic programming in water resource network
management. IFAC Proceedings Volumes, 38(1), 7–12.
Da, Y., & Xiurun, G. (2005). An improved PSO-based ANN with simulated annealing technique.
In T. Villmann (Ed.), New Aspects in Neurocomputing: 11th European Symposium on Artificial
Neural Networks (Vol. 63, pp. 527–533). Amsterdam: Elsevier.
References 121

Davalo, E., & Naim, P. (1991). Neural networks. MacMillan computer science series. London:
Palgrave.
Deng, L., & Yu, D. (2014). Deep learning: Methods and applications. Foundations and Trends in
Signal Processing, 7(3–4), 1–199.
Farley, B. G., & Clark, W. A. (1954). Simulation of self-organizing systems by digital computer.
IRE Transactions on Information Theory 4(4), 76–84.
Ferreira, C. (2006). Designing neural networks using gene expression programming. In A. Abra-
ham, et al. (Eds.), Applied soft computing technologies: The challenge of complexity (pp. 517–
536). New York: Springer.
Forouzanfar, M., Dajani, H. R., Groza, V. Z., Bolic, M., & Rajan, S. (2010). Comparison of feed-
forward neural network training algorithms for oscillometric blood pressure estimation. In 2010
4th international workshop on soft computing applications (SOFA), July 2010 (pp. 119–123).
New York: IEEE.
Ganesan, N. (2010). Application of neural networks in diagnosing cancer disease using demo-
graphic data. International Journal of Computer Applications, 1(26), 76–85.
Hebb, D. (1949). The organization of behavior. New York: Wiley.
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., et al. (2012). Deep neural
networks for acoustic modeling in speech recognition: The shared views of four research
groups. IEEE Signal Processing Magazine, 29(6), 82–97.
Knutti, R., Stocker, T. F., Joos, F., & Plattner, G. K. (2003). Probabilistic climate change projections
using neural networks. Climate Dynamics, 21(3–4), 257–272.
Matan, O., Kiang, R. K., Stenard, C. E., Boser, B., Denker, J. S., Henderson, D., et al. (1990).
Handwritten character recognition using neural network architectures. In Proceedings of the
4th USPS advanced technology conference, November 1990 (pp. 1003–1011).
McCulloch, W., & Pitts, W. (1943). A logical calculus of ideas immanent in nervous activity.
Bulletin of Mathematical Biophysics, 5(4), 115–133.
NASA (2013). NASA neural network project passes milestone. www.nasa.gov.
Rochester, N., Holland, J., Haibt, L., & Duda, W. (1956). Tests on a cell assembly theory of the
action of the brain, using a large digital computer. IRE Transactions on Information Theory,
2(3), 80–93.
Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organiza-
tion in the brain. Psychological Review, 65(6), 386–408.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1996). Learning representations by backprop-
agating errors. Nature, 323, 533–536.
Sun, R. (1994). A two-level hybrid architecture for common sense reasoning. In R. Sun &
L. Bookman (Eds.), Computational architectures integrating neural and symbolic processes.
Dordrecht: Kluwer Academic Publishers.
Trippi, R. R., & Turban, E. (Eds.), (1992). Neural networks in finance and investing: Using arti
ficial intelligence to improve real-world performance. New York: McGraw-Hill Inc.
Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysis in the behavioral
sciences. Ph.D. thesis, Harvard University.
Whitley, D. (1994). A genetic algorithm tutorial. Statistics and computing, 4(2), 65–85.
Wilson, W. (2012). The machine learning dictionary. www.cse.unsw.edu.au/~billw.
Wu, J., & Chen, E. (2009). A novel nonparametric regression ensemble for rainfall forecasting
using particle swarm optimization technique coupled with artificial neural network. In H. Wang,
et al. (Eds.), 6th International Symposium on Neural Networks. Berlin: Springer.
Chapter 10
Forward-Looking Underlying Information:
Working with Time Series

10.1 Introduction

In order to capture serially related events, banks may need to consider the complete
dependence scheme. This is the reason why this chapter focuses on time series. It is
important to note that the presence of autocorrelation is not compulsory, sometimes
the independence assumption should not be rejected a priori. Indeed, if there is no
statistical evidence to reject the assumption of independence, then this one should
not be rejected for the sake of it. Besides, these dependencies may take various
forms and may be detected on various time steps. We will come back to that point
in the next paragraphs. In this chapter, we assume that serial dependence exists
and we model it using time series processes (McCleary 1980; Hamilton 1994; Box
et al. 2015). In many cases, the scenario analysis has to integrate macro-economical
factors, and here time series models are particularly useful. The literature on this
topic is colossal (in the bibliography of this chapters as well as the previous we will
find some interesting articles). But strategies relying on time series should not be
limited to macro-economic factors or stock indexes for instance. In this chapter, we
illustrate the models with applications, but in order not to bias the manager trying to
implement the methodologies presented we do not emphasise the data to which we
applied them, though in this case they were macro-economic data.
Our objective is to capture the risks associated with the loss intensity which
may increase during crises or turmoils, taking into account correlations, embedded
dynamics and large events thanks to adequate distributions fitted on the residuals.
Using time series permit capturing the embedded autocorrelation phenomenon
without losing any of the characteristics captured by traditional methodologies such
as fat tails.
Consequently, a time series is a sequence of data points, typically consisting
in successive measurements made over a period of time. Time series are usually
represented using line charts. A traditional application of time series processes is
forecasting, which in our language can be translated into scenario analysis. Time

© Springer International Publishing Switzerland 2016 123


B.K. Hassani, Scenario Analysis in Risk Management,
DOI 10.1007/978-3-319-25056-4_10
124 10 Forward-Looking Underlying Information: Working with Time Series

series analysis aims at extracting meaningful statistics and other characteristics of


the data. Time series forecasting consists in using a model to predict future values
based on past observations and embedded behaviours.
Time series are ordered by construction, as data are collected and sorted with
respect to a relevant date, occurrence, accounting, etc. This makes time series
analysis distinct from cross-sectional studies or regressions (see Chap. 11), in which
there is no natural ordering of the observations for the first, and a classification
per type or characteristic more than a natural ordering for the second. Time
series analysis also differs from spatial data analysis where to each observation
is associated a geographical location. A stochastic model for a time series can be
implemented to capture the fact that observations close together in time will be
more closely related than observations further apart, but this is not always the case
as discussed a bit further in this chapter. In addition, the natural one-way ordering of
time leads to the fact that each value is expressed as a function of past values. Time
series analysis is applicable to various data type as soon as these are associated with
time periods (continuous data, discrete numeric data, etc.)
Time series analysis usually belongs to one of the two following classes:
frequency-domain and time-domain methods (Zadeh, 1953). The former include
spectral analysis and recently, wavelet analysis; the latter include autocorrelation
and cross-correlation analysis. We will focus on the second type. Additionally,
we may split time series analysis techniques into parametric and non-parametric
methods. Parametric approaches assume that the underlying stationary stochastic
process can be captured using a strategy relying on a small number of parameters.
Here, estimating the parameters of this model is a requirement. Non-parametric
approaches, on the contrary, explicitly estimate the covariance or the spectrum of
the process without assuming that the process has any particular structure. Methods
of time series analysis may also be divided into linear and non-linear, and univariate
and multivariate. In this chapter we will focus in particular on parametric models
and will illustrate univariate approaches. The next chapter may be used to extend
the solutions provided here to multivariate processes.

10.2 Methodology

Practically to build a model closer to reality, the assumption of independence


between the data points may have to be relaxed. Thus, a general representation of
the losses .Xt /t is 8t,

Xt D f .Xt1;::: / C "t : (10.2.1)

There exist several models to represent various patterns and behaviour. Variations
in the level of a process using the following approaches or a combination of them
can be obtained. Time series processes can be split into various classes, each of
them having their own variations, for instance, the autoregressive (AR) models, the
10.2 Methodology 125

integrated (I) models and the moving average (MA) models. These three classes
depend linearly on past data points (Gershenfeld, 1999). Combinations of these lead
to autoregressive moving average (ARMA) and autoregressive integrated moving
average (ARIMA) models. The autoregressive fractionally integrated moving aver-
age (ARFIMA) model combines and enlarges the scope of the previous approaches.
VAR1 strategies are an extension of these classes to deal with vector-valued data
(multivariate time series), besides these might be extended to capture exogenous
impacts.
Non-linear strategies might also be of interest as empirical investigations have
shown that using predictions derived from non-linear models, over those from
linear models, might be more appropriate (Rand 1971 and Holland 1992). Among
these non-linear time series models those capturing the evolution of variance over
time (heteroskedasticity) are of particular interest. These models are referred to as
autoregressive conditional heteroskedasticity (ARCH) and the library of variation
contains a wide variety of representation such as GARCH, TARCH, EGARCH,
FIGARCH and CGARCH. The changes in variability are related to recent past
values of the observed series.

10.2.1 Theoretical Aspects

Originally the theory has been built on two sets of conditions, for instance,
stationarity and its generalisation, ergodicity. However, ideas of stationarity must be
expanded: strict stationarity and second-order stationarity. Models can be developed
under each of these conditions, but in the latter case the models are usually regarded
as partially specified. Nowadays, many time series models have been developed to
deal with seasonally stationary or non-stationary series.

10.2.1.1 Stationary Process

In mathematics and statistics, a stationary process stricto sensus is a stochastic


process whose joint probability distribution does not change when shifted in time.
Consequently, moments (see Chap. 3) such as the mean and variance, if these exist,
do not change over time and do not follow any trends. Practically, raw data are
usually transformed to become stationary.
Mathematically, let fXt g be a stochastic process and let FX .xt1 C ; : : : ; xtk C /
represent the c.d.f. of the joint distribution of fXt g at times t1 C ; : : : ; tk C . Then,
fXt g is strongly stationary if, for all k, for all , and for all t1 ; : : : ; tk ,

FX .xt1 C ; : : : ; xtk C / D FX .xt1 ; : : : ; xtk /: (10.2.2)

Since  does not affect FX ./, FX is not a function of time.

1
Vector autoregression.
126 10 Forward-Looking Underlying Information: Working with Time Series

10.2.1.2 Autocorrelation

Statistically speaking, the autocorrelation of a random process describes the corre-


lation between values of the process at different times. Let X be a process which
reiterates in time, and t represents a specific point in time, then Xt is the realisation
of a given run of the process at time t. Suppose that the mean t and variance t2
exist for all times t, then the definition of the autocorrelation between times s and
t is

EŒ.Xt  t /.Xs  s /
R.s; t/ D ; (10.2.3)
t s

where E is the expected value operator. Note that this expression cannot be evaluated
for all time series as the variance may be zero (e.g. for a constant process), infinite or
nonexistent. If the function R is computable, the returned value in the range Œ1; 1,
where 1 indicates a perfect correlation and 1 a perfect anti-correlation.
If Xt is a wide-sense stationary process, then  and  2 are not time-dependent.
The autocorrelation only depends on the lag between t and s, i.e., the time-distance
between two values. Therefore the autocorrelation can be expressed as a function of
the time-lag  D s  t, i.e.,

EŒ.Xt  /.XtC  /


R./ D ; (10.2.4)
2

an even function as R./ D R./.

10.2.1.3 White Noise

The framework in which we are evolving implies that observed data series are
the combination of a path dependent process (some may say “deterministic”) and
random noise (error) terms. Then an estimation procedure is implemented to param-
eterise the model using observations. The noise (error) values are assumed mutually
uncorrelated with a mean equal to zero and the same probability distribution, i.e.,
the noise is white. Traditionally, a Gaussian white noise is assumed, i.e. the error
term follows a Gaussian distribution, but it is possible to have the noise represented
by other distributions and the process transformed.
If the noise terms underlying different observations are correlated, then the
parameters are still unbiased, however, uncertainty measures will be biased. This
is also true if the noise is heteroskedastic, i.e., if its variance varies over time. This
fact may lead to the selection of an alternative time series process.
10.2 Methodology 127

10.2.1.4 Estimation

There are many ways of estimating the coefficients or parameters, such as the
ordinary least squares procedure or the method of moments (through Yule–Walker
equations).
For example, the AR. p/ model is given by the equation

X
p
Xt D 'i Xti C "t ; (10.2.5)
iD1

where 'i , i D 1; : : : ; p denotes the coefficients. As a direct relationship exists


between the model coefficients and the covariance function of the process, the
parameters can be obtained from the autocorrelation function. This is performed
using the Yule–Walker equations.
The Yule–Walker equations (Yule and Walker, 1927) correspond to the following
set:

X
p
m D 'k mk C "2 ım;0 ; (10.2.6)
kD1

where m D 0; : : : ; p, leading to p C 1 equations. Here m is the autocovariance


function of Xt , " is the noise standard deviation and ım;0 is the Kronecker delta
function.
As the last part of an individual equation is non-zero only if m D 0, the set of
equations can be solved by representing the equations for m > 0 matricially, i.e.,
2 3 2 32 3
1 0 1 2 ::: '1
62 7 6 1 0 1 : : :7 6'2 7
6 7 6 76 7
63 7 6 2 1 0 : : :7 6 7
6 7D6 7 6'3 7 (10.2.7)
6:7 6 : :: :: :: 7 6:7
4 :: 5 4 :: : : : 5 4 :: 5
p p1 p2 p3 ::: 'p

which can be solved for all f'm I m D 1; 2; : : : ; pg. The remaining equation for m D
0 is

X
p
0 D 'k k C "2 ; (10.2.8)
kD1

which, once f'm I m D 1; 2; : : : ; pg are known, can be solved for "2 .


128 10 Forward-Looking Underlying Information: Working with Time Series

Alternatively, AR parameters are determined by the first p C 1 elements ./ of


the autocorrelation function. The full autocorrelation function can then be derived
by recursively calculating

X
p
./ D 'k .k  /: (10.2.9)
kD1

The Yule–Walker equations provide several ways of estimating the parameters of an


AR. p/ model, by replacing the theoretical covariances with estimated values.
Alternative estimation approaches include maximum likelihood estimation.
Indeed, two distinct variations of maximum likelihood methods are available. In the
first, the likelihood function considered corresponds to the conditional distribution
of later values in the series given the initial p values in the series. In the second, the
likelihood function considered corresponds to the unconditional joint distribution
of all the values in the observed series. Significant differences in the results of these
approaches may be observed depending on the length of the series, or if the process
is almost non-stationarity.

10.2.1.5 Seasonality

As mentioned before, time series data are collected at regular intervals, implying
that some peculiar schemes might be observed multiple times over a long period.
Indeed, some patterns tend to repeat themselves over known, fixed periods of time
within the data set. These might characterise seasonality, seasonal variation, periodic
variation or periodic fluctuations (risk cycle).
Seasonality may be the result of multiple factors and consists in periodic,
repetitive and relatively regular, and predictable patterns of a time series. Seasonality
can repeat on a weekly, monthly or quarterly basis, these periods of time are
structured while cyclical patterns extend beyond a single year and may not repeat
themselves over fixed periods of time. It is necessary for organisations to identify
and measure seasonal variations within their risks to support strategical plans and
to understand their true exposure and not the exposures point in time, indeed if a
relationship such as “the volume impact the exposure” (credit card fraud is a good
example, as the larger the number of credit card sold, the larger the exposure),
if the volumes tend to increase, the risk tends to increase, the seasonality in the
volume will mechanically imply larger losses, but it does not necessarily mean that
the institution is facing more risks.
Multiple graphical techniques can be used to detect seasonality: (1) a run
sequence plot, (2) a seasonal plot (each season is overlapped), (3) a seasonal
subseries plot, (4) multiple box plots, (5) an autocorrelation plot (ACF) can help
identify seasonality or (6) seasonal index measuring the difference between a
particular period and its expected value.
10.2 Methodology 129

A simple run sequence plot is usually a good first step to analyse time series
seasonality. Although seasonality appears more clearly on the seasonal subseries
plot or the box plot, besides the seasonal subseries plot exhibit the evolutions of
the seasons over time contrary to the box plot but the box plot is more readable for
large data sets.
Seasonal, seasonal subseries and box plots rely on the fact that seasonal periods
are known, e.g., for monthly data we have 12-regular period in a year. However, if
the period is unknown, the autocorrelation plot is probably the best solution. If there
is significant seasonality, the autocorrelation plot should show regular pikes (i.e. at
the same period every year).

10.2.1.6 Trends

Dealing with time series, the analysis of the tendencies in the data related to
measurements to the times at which they occurred is really important. In particular, it
is useful to understand if measurements exhibiting increasing or decreasing patterns
are statistically distinct from random behaviours.2
Considering a data set for modelling purposes, various functions can be chosen
to represent them. Assuming the data are unknown, then the simplest function (once
again) to fit is an affine function (Y D aX C b) for which the magnitudes are given
on the vertical axis, while the time is represented in abscissa.
Once the strategy has been selected, the parameters need to be estimated usually
implementing a least-squares approach, as presented earlier in this book. Applying
it to our case we obtain the following equation,

Œ.at C b/  yt 2 ; (10.2.10)
t

where yt are the observed data, and a and b are to be estimated. The difference
between yt and at C b provides the residual set. Therefore, yt D at C b C "t is
supposed to be able to represent any set of data (though the error might be huge).
If the errors are non-stationary, then the non-stationary series yt is referred to as
trend stationary. It is usually simpler if the "’s are identically distributed, but if it
is not the case and some points are less certain than other a weighted least square
methodology can be implemented to obtain more accurate parameters.
In most cases, for a simple time series, the variance of the error term is calculated
empirically by removing the trend from the data to obtain the residuals. Once the
“noise” of the series has been properly captured, the significance of the trend can be
addressed by making the null hypothesis that the trend a is not significantly different
from 0.
The presented methodology has been the subject of criticisms related to the non-
linearity of the time trend, the impact of this non-linearity on the parameters, the

2
In the latter case, homogeneity problems may have to be dealt with.
130 10 Forward-Looking Underlying Information: Working with Time Series

possible variation in the linear trend or the spurious relationships, leading to a


search for alternative approaches to avoid an inappropriate use in model adjustment.
Alternative approaches involve unit root tests and cointegration techniques (Engle
and Granger 1987; Cameron 2005).
The augmented Dickey–Fuller (ADF) test (Dickey and Said, 1984) is the
traditional test to detect a unit root in a time series sample. It is a revised version
of the Dickey–Fuller test (Dickey and Fuller, 1979) for more challenging time
series models. The statistic is represented by a negative number. The lower it is,
the stronger the rejection of the hypothesis that there is a unit root at some level of
confidence.
The testing procedure for the ADF test is the same as for the Dickey–Fuller test
but it is applied to the model

yt D ˛ C ˇt C  yt1 C ı1 yt1 C    C ıp1 ytpC1 C "t ; (10.2.11)

where ˛ is a constant, ˇ the coefficient on a time trend and p the lag order of
the autoregressive process. Remark that setting ˛ D 0 and ˇ D 0 is equivalent
corresponds to modelling a random walk, only setting ˇ D 0 leads to modelling a
random walk with a drift.
Remark 10.2.1 Note that the order of the lags (p) permits to capture high order
autoregressive processes. The order has to be determined either using the t-value of
the coefficient or using the Akaike criterion (AIC) (Akaike, 1974), the Bayesian
information criterion (BIC) (Schwarz, 1978) or the Hannan–Quinn information
criterion (Hannan and Quinn, 1979).
The null hypothesis  D 0 is tested against the alternative  < 0. The test statistic

O
DF D (10.2.12)
SE.O /

is then computed, and compared to the relevant critical value for the Dickey–Fuller
test. A test statistic lower than the critical value implies a rejection of the null
hypothesis, i.e., the absence of a uniroot.
A widely used alternative is the Kwiatkowski–Phillips–Schmidt–Shin (KPSS)3
test (Kwiatkowski et al., 1992) which tests the null hypothesis that a time series
is stationary around a deterministic trend. The series is the sum of deterministic
trend, random walk and stationary error, and the test is the Lagrange multiplier
test of the hypothesis that the random walk has zero variance. The founding paper
actually states that by testing both unit root hypothesis and stationarity hypothesis
simultaneously, it is possible to distinguish series that appear to be stationary, series
that have a unit root and series for which the data are not sufficiently informative to
be sure whether they are stationary or integrated.

3
The KPSS is included in many statistical softwares (R, etc.).
10.2 Methodology 131

10.2.2 The Models

In this section we present various models as long as their theoretical properties. It


is interesting to note that the phenomenon captured here is path dependent, in the
sense that the next occurrence is related to the previous ones.
• Autoregressive model: the notation AR(p) refers to the autoregressive model of
order p. The AR(p) model is written

X
p
Xt D c C 'i Xti C "t ; (10.2.13)
iD1

where '1 ; : : : ; 'p are parameters, c is a constant and the random variable "t
represents a white noise.
The parameters of the model have to be constrained to ensure the model
remains stationary. AR processes are not stationary if j'i j  1.
• Moving average model: the notation MA(q) refers to the moving average model
of order q:

X
q
Xt D  C " t C i "ti (10.2.14)
iD1

where the 1 ; : : : ; q are the parameters of the model,  equals EŒXt  and the
"t ; "t1 ; : : : are white noise error terms. In this process the next value of Xt builds
up on past combined errors.
• ARMA model: the notation ARMA(p, q) refers to the model with p autoregres-
sive terms and q moving average terms. This model contains the AR(p) and
MA(q) models,

X
p X
q
Xt D c C " t C 'i Xti C i "ti : (10.2.15)
iD1 iD1

The ARMA models were popularised by Box and Jenkins (1970).


• In an ARIMA model, the integrated part of the model includes the lag operator
.1  B/ (where B stands for back shift operator) raised to an integer power, e.g.,

.1  B/2 D 1  2B C B2 ; (10.2.16)

where

B2 Xt D Xt2 ; (10.2.17)
132 10 Forward-Looking Underlying Information: Working with Time Series

so that

.1  B/2 Xt D Xt  2Xt1 C Xt2 : (10.2.18)

Both ARFIMA and ARIMA (Palma, 2007) models have the same form,
though, d 2 NC for the ARIMA while d 2 R.
! !
X
p
X
q
1 i B i
.1  B/ Xt D 1 C
d
i B i
"t : (10.2.19)
iD1 iD1

ARFIMA models have the intrinsic capability to capture long range depen-
dencies, i.e., the fact that present data points are linked to information captured a
long time ago.
• ARCH(q): "t denotes the error terms which in our case are the series terms. These
"t are divided into two pieces: a stochastic component zt and a time-dependent
standard deviation t ,

"t D t zt : (10.2.20)

The random variable zt is a strong white noise process. The series t2 is
formalised as follows:

X
q
t2 D ˛0 C ˛1 "2t1 CC ˛q "2tq D ˛0 C ˛i "2ti ; (10.2.21)
iD1

where ˛0 > 0 and ˛i  0, i > 0.


It is possible to adjust an ARCH(q) model on a data set implementing an
ordinary least squares approach (see previous section). Zaki (2000) designed a
methodology to test for the lag length of ARCH errors relying on the Lagrange
multiplier, proceeding as follows:
1. Fit thePautoregressive model AR(q) yt D a0 C a1 yt1 C    C aq ytq C "t D
q
a0 C iD1 ai yti C "t .
2. Regress "O2t on an intercept and q lagged values:

X
q
"O2t D ˛O 0 C ˛O i "O2ti ; (10.2.22)
iD1

where q is the ARCH lags length.


3. The null hypothesis corresponds to ˛i D 0 for all i D 1; : : : ; q if there is no
ARCH components. On the contrary, the alternative hypothesis states that we
are in the presence of an ARCH if at least one ˛i is significant.
Considering a sample of n residuals and a null hypothesis of no ARCH
errors, the test statistic NR2 follows a 2 distribution with q degrees of
10.2 Methodology 133

freedom, where N represents the number of equations fitting the residuals vs


the lags (i.e. N D n  q ). If NR2 > 2q , the null hypothesis is rejected. This
rejection leads to the conclusion that there is an ARCH effect in the ARMA
model. If NR2 < 2q table value, the null hypothesis is accepted.
• GARCH: Taking the ARCH model above, if an ARMA model characterises
the error variance, a generalised autoregressive conditional heteroskedasticity
(GARCH), Bollerslev (1986) model is obtained.

In that case, the GARCH (p, q) model (where p is the order of the GARCH
terms  2 and q is the order of the ARCH terms "2 ) is given by
X
q
X
p
t2 D ˛0 C˛1 "2t1 C  C˛q "2tq Cˇ1 t1
2 2
C  Cˇp tp D ˛0 C ˛i "2ti C 2
ˇi ti :
iD1 iD1
(10.2.23)
To test for heteroskedasticity in econometric models, the White (1980) test is
usually implemented. However, when dealing with time series data, this means
to test for ARCH errors (as described above) and GARCH errors (below).

The lag length p of a GARCH. p; q/ process is established in three steps:


1. Adjust the AR(q) model
X
q
yt D a0 C a1 yt1 C    C aq ytq C "t D a0 C ai yti C "t : (10.2.24)
iD1

2. Evaluate the autocorrelations of "2 by


PT
.O"2  O t2 /.O"2t1  O t1
2
/
 D tDiC1PTt : (10.2.25)
2 2 2
tD1 .O
"t  O t /

The asymptotic standard deviation of .i/ is p1T . If yt >  we are in the


presence of GARCH errors. The total number of lags is obtained iteratively using
Ljung–Box Q-test (Box and Pierce, 1970). The Ljung–Box Q-statistic follows
a 2 distribution with n degrees of freedom assuming the squared residuals
"2t are uncorrelated. It is recommended to consider up to T4 values of n. The
null hypothesis of the test considers that there is no ARCH or GARCH errors.
A rejection of the null leads to the conclusion that such errors exist in the
conditional variance.
– NGARCH: Engle and Ng (1991) introduced a non-linear GARCH (NGARCH)
also known as non-linear asymmetric GARCH(1,1) (NAGARCH).

t2 D ! C ˛. "t1   t1 /2 C ˇ t1


2
; (10.2.26)

where ˛ , ˇ  0 ; ! > 0.
134 10 Forward-Looking Underlying Information: Working with Time Series

– IGARCH: In the integrated generalised autoregressive conditional


heteroskedasticity model the persistent parameters sum up to one, and brings
a unit root in the GARCH process. The condition for this is
X
p X
q
ˇi C ˛i D 1: (10.2.27)
iD1 iD1

– EGARCH (Nelson, 1991): The exponential generalised autoregressive


conditional heteroskedastic is another form of the GARCH model. The
EGARCH(p,q) is characterised by:
X
q
X
p
log t2 D ! C ˇk g.Ztk / C 2
˛k log tk ; (10.2.28)
kD1 kD1

where g.Zt / D Zt C .jZt j  E.jZt j//, t2 is the conditional variance, !, ˇ,
˛,  and are coefficients and Zt is a representation of the error term which
may take multiple forms. g.Zt / allows the sign and the magnitude of Zt to have
different effects on the volatility.
Remark 10.2.2 As log t2 can take negative values the restrictions on param-
eters are limited.
– GARCH-in-mean (Kroner and Lastrapes 1993): In this model a heteroskedas-
ticity term is added in the mean equation of the GARCH, such that,

yt D ˇxt C t C "t ; (10.2.29)

where "t is still the error term.


– Asymetric GARCH:
QGARCH (Sentana, 1995): The quadratic GARCH (QGARCH) model is
particularly useful for scenario analysis as it captures asymmetric effects
of positive and negative shocks. In the example of a GARCH(1,1) model,
the residual process t is

"t D t zt ; (10.2.30)

where zt is i.i.d. and

t2 D K C ˛ "2t1 C ˇ t1


2
C  "t1 (10.2.31)

GJR-GARCH (Glosten et al., 1993): The Glosten–Jagannathan–Runkle


GARCH version also models asymmetry in the ARCH. As previously
"t D t zt where zt is i.i.d., but

t2 D K C ı t1
2
C ˛ "2t1 C  "2t1 It1 ; (10.2.32)

where It1 D 0 if "t1  0 , and It1 D 1 if "t1 < 0.


10.3 Application 135

TGARCH model (Rabemananjara and Zakoian, 1993): The threshold


GARCH uses the conditional standard deviation instead of the conditional
variance:

t D K C ı t1 C ˛1C "C  


t1 C ˛1 "t1 ; (10.2.33)

where "C C
t1 D "t1 if "t1 > 0 , and "t1 D 0 if "t1  0. Likewise,
 
"t1 D "t1 if "t1  0, and "t1 D 0 if "t1 > 0.
– the Gegenbauer process (Gray et al., 1989):
1
X
f .Xt1;::: / D j "tj ; (10.2.34)
jD1

where j are the Gegenbauer polynomials which may represented as follows:

Œ j=2
X .1/k .d C j  k/.2 /j2k
j D ;
kD0
.d/.k C 1/. j  2k C 1/

 represents the Gamma function, d and are real numbers to be estimated,


such that 0 < d < 1=2 and j j < 1 to ensure stationarity. When
D 1, we obtain the AutoRegressive Fractionally Integrated (ARFI) model,
(Guégan, 2003; Palma, 2007) or Fractionally Integrated (FI(d)) model without
autoregressive terms.
Remark 10.2.3 The fGARCH model (Hentschel, 1995) combines other GARCH
model in its construction making him potentially useful when we want to test
multiple approaches simultaneously.

10.3 Application

In this section, we illustrate some of the models presented in the previous section as
long as some of their properties. Starting from Fig. 10.1 representing an autocorre-
lation function (ACF). This one presents a rapid decay towards zero characterising
an autoregressive function.
Figure 10.2 exhibits an AR(2) process with two parameters 1 D 1 and 2 D
0:5 which ensure the stationarity of the underlying model. In that case, the event
occurring in Xt is related to the two previous occurrences recorded in Xt1 and Xt2 .
In real life applications, losses generated by identical generating processes usually
lead to that kind of situations. It is also important to note that even if the series is
really volatile, this one may still be stationary as soon as the moments remain stable
over time.
136 10 Forward-Looking Underlying Information: Working with Time Series

1.0
0.8
0.6 ACF Weekly Aggregated Series
ACF
0.4
0.2
0.0
−0.2

0 1 2 3 4 5 6 7
Lag

Fig. 10.1 This figure represents an autocorrelation function (ACF). This one presents a rapid
decay towards zero characterising an autoregressive function

AR(2) φ1 = 0.5 φ2 = 0.4


4
2
0
x
−2
−4
−6

0 200 400 600 800 1000


Time

Fig. 10.2 This figure exhibits an AR(2) process with two parameters 1 D 0:5 and 2 D 0:4
which ensure the stationarity of the underlying model. In that case, the event occurring in Xt is
related to the two previous occurrences recorded in Xt1 and Xt2

Figure 10.3 represents an ARIMA process, i.e., a process that contains an


integrated autoregressive model and a MA process. Once again, though the aspect
seems erratic, the data generated are still stationary.
Figure 10.4 presents the ACF and the PACF of an AR(2) process as the top
quadrant exhibits an ACF plot quickly decreasing to zero denoting an autogressive
process and the bottom quadrant exhibits the partial autocorrelation function (PACF)
of the series, showing the order of the process. Indeed, only the two first lags are
significantly different from zero.
10.3 Application 137

20 50
40
30 ARIMA(1, 1, 1) φ = 0.5 θ = 0.5
x
10
0
−10

0 100 200 300 400 500


Time

Fig. 10.3 This figure illustrates an ARIMA process, i.e., a process that contains an integrated
autoregressive model and an MA process

Series: x
1.0
0.5
ACF
0.0
−0.5

0 5 10 15 20 25 30
LAG
1.0
0.5
PACF
0.0
−0.5

0 5 10 15 20 25 30
LAG

Fig. 10.4 This figure presents the ACF and the PACF of an AR(2) process as the top quadrant
exhibits an ACF plot quickly decreasing to zero denoting an autogressive process and the bottom
quadrant exhibits the partial autocorrelation function (PACF) of the series, showing the order of
the process

The PACF in Fig. 10.5 is representative of some risk data. It is actually


representative of the presence of long memory, i.e., the loss Xt is related to events
which occurred in the past (more than a few lags, such as 10, for example). In that
figure, we see that parameters are still significant more than a hundred periods from
the last data point. Note that this series has been tested for seasonality and none has
been found, therefore this possibility has been ruled out.
Figure 10.6 provides the analysis of the residuals, showing their evolution over
time, and demonstrating their stationarity. The residuals are independent according
138 10 Forward-Looking Underlying Information: Working with Time Series

0.3
0.2 PACF Weekly Aggregated Series
Partial ACF
0.1
0.0
−0.1

0 1 2 3 4 5 6 7
Lag

Fig. 10.5 The PACF represented here exhibits the presence of long memory, i.e., the loss Xt is
related to events which occurred a long time ago

Standardized Residuals
2
1
−2 −1 0

1880 1900 1920 1940 1960 1980 2000


Time

ACF of Residuals Normal Q−Q Plot of Std Residuals


l
Sample Quantiles
−0.2 0.0 0.2 0.4

l
2

l l l l l

ll ll
llllll l
lllll
llllll
1

l
lllllll
llll
ACF

llllllllll
llllll
lllllll
−2 −1 0

llllll
lllllllll
llllllllllll
llllll
lllllllll
lllllllll
l l l lll
l
l l l
l l
l
l
l

5 10 15 20 −2 −1 0 1 2
LAG Theoretical Quantiles

p values for Ljung−Box statistic


0.0 0.2 0.4 0.6 0.8 1.0

l
p value

l
l
l
l
l
l
l l
l
l l
l
l
l
l
l l

5 10 15 20
lag

Fig. 10.6 Following the adjustment of a SARIMA model to macro-economic data (selected for
illustration purposes), this figure provides the analysis of the residual, showing their evolution over
time, and demonstrating their stationarity. The residuals are independent according to the ACF
and the QQ-plot advocate that the residuals are normally distributed, and the Ljung-Box statistic
provides evidence that the data are independent

to the ACF and the QQ-plot advocates that the residuals are normally distributed.
The Ljung–Box statistic provides evidence that the data are independent.
Time series are particularly interesting as once it has been established that Xt
is related to past incidents, and we are interested in a particular scenario, then
the scenarios can be analysed by shocking the time series, the parameters or the
distribution representing the residuals.
References 139

Indeed, leveraging on strategies presented in previous chapters such as changing


the distribution of t from a Gaussian to a more fat-tailed distribution, we would
be able to capture asymmetric and/or more extreme behaviours. (Note that it is
necessary to transform the residuals to these have a mean equal to zero).
Besides, the multiple processes presented in this chapter, such as those capturing
the intrinsic of the data or of the residuals allow modelling changes in risk patterns,
i.e., the fact that these evolve over time. As stated in the first chapter, risks as
the scenarios reflecting them are living organisms. They are in perpetual motion,
therefore depending on the risk to be modelled, multiple combinations of the
previous models are possible, and these may help capturing multiple risk behaviours
simultaneously, and therefore can be a powerful tool to analyse scenarios.

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on
Automatic Control, 19(6), 716–723.
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econo-
metrics, 31(3), 307–327.
Box, G., & Jenkins, G. (1970). Time series analysis: Forecasting and control. San Francisco, CA:
Holden-Day.
Box, G. E. P., & Pierce, D. A. (1970). Distribution of residual autocorrelations in autoregressive-
integrated moving average time series models. Journal of the American Statistical Association,
65, 1509–1526.
Box, G. E., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis:
Forecasting and control. New York: Wiley.
Cameron, S. (2005). Making regression analysis more useful, II. Econometrics. Maidenhead:
McGraw Hill Higher Education.
Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series
with a unit root. Journal of the American Statistical Association, 74(366), 427–431.
Dickey, D. A., & Said, S. E. (1984). Testing for unit roots in autoregressive-moving average models
of unknown order. Biometrika, 71(366), 599–607.
Engle, R. F., & Granger, C. W. J. (1987). Co-integration and error correction: Representation,
estimation and testing. Econometrica, 55(2), 251–276.
Engle, R. F., & Ng, V. K. (1991). Measuring and testing the impact of news on volatility. Journal
of Finance, 48(5), 1749–1778.
Gershenfeld, N. (1999). The nature of mathematical modeling. New York: Cambridge University
Press.
Glosten, L. R., Jagannathan, D. E., & Runkle, D. E. (1993). On the relation between the expected
value and the volatility of the nominal excess return on stocks. The Journal of Finance, 48(5),
1779–1801.
Gray, H., Zhang, N., & Woodward, W. (1989). On generalized fractional processes. Journal of
Time Series Analysis, 10, 233–257.
Guégan, D. (2003). Les chaos en finance. Approche statistique. Paris: Economica.
Hamilton, J. D. (1994). Time series analysis (Vol. 2). Princeton: Princeton University Press.
Hannan, E. J., & Quinn, B. G. (1979). The determination of the order of an autoregression. Journal
of the Royal Statistical Society, Series B, 41(2), 190–195.
Hentschel, L. (1995). All in the family nesting symmetric and asymmetric GARCH models.
Journal of Financial Economics, 39(1), 71–104.
Holland, J. (1992). Adaptation in natural and artificial systems. Cambridge, MA: MIT.
140 10 Forward-Looking Underlying Information: Working with Time Series

Kroner, K. F., & Lastrapes, W. D. (1993). The impact of exchange rate volatility on international
trade: reduced form estimates using the GARCH-in-mean model. Journal of International
Money and Finance, 12(3), 298–318.
Kwiatkowski, D., Phillips, P. C., Schmidt, P., & Shin, Y. (1992). Testing the null hypothesis of
stationarity against the alternative of a unit root: How sure are we that economic time series
have a unit root?. Journal of Econometrics, 54(1–3), 159–178.
McCleary, R., Hay, R. A., Meidinger, E. E., & McDowall, D. (1980). Applied time series analysis
for the social sciences. Beverly Hills, CA: Sage.
Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Economet-
rica, 59(2), 347–370.
Palma, W. (2007). Long-memory time series: Theory and methods. New York: Wiley.
Rabemananjara, R., & Zakoian, J. M. (1993). Threshold ARCH models and asymmetries in
volatility. Journal of Applied Econometrics, 8(1), 31–49.
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the
American Statistical Association, 66(336), 846–850.
Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.
Sentana, E. (1995). Quadratic ARCH models. The Review of Economic Studies, 62(4), 639–661.
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for
heteroskedasticity. Econometrica, 48(4), 817–838.
Yule, U., & Walker, G. (1927). On a method of investigating periodicities in disturbed series, with
special reference to Wolfer’s sunspot numbers. Philosophical Transactions of the Royal Society
of London, Series A, 226, 267–298.
Zadeh, L. A. (1953). Theory of filtering. Journal of the Society for Industrial and Applied
Mathematics, 1, 35–51.
Zaki, M. J. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge
and Data Engineering, 12(3), 372–390.
Chapter 11
Dependencies and Relationships Between
Variables

In this chapter we address the topic of the capture of dependencies, as these are
intrinsically connected to scenario analysis. Indeed, as implied in the previous
chapters, the materialisation of large losses usually results from multiple issues,
faults or failures occurring simultaneously. As seen, in some approaches, the
magnitude of the correlations and the dependencies are not explicitly evaluated
though they are the core of some strategies such as neural networks or Bayesian
networks. Here, we discuss the concepts of correlation and dependencies explicitly,
i.e., these are measured and specific models or functions are built, in order to capture
them and reflect them in risk measurement.
Statistically speaking, a dependence is a relationship between random variables
or data sets (at least two). The related concept of correlation refers to statistical
relationships embedding dependencies. Correlations are useful as they indicate a
relationship that can be exploited in practice for forecasting purposes, for example.
However, statistical dependence does not necessarily imply the presence of a causal
relationship. Besides, issues related to non-linear behaviours may arise. These will
be developed in the following paragraphs.
Formally, dependencies refer to any situation in which random variables do
not satisfy a mathematical condition of probabilistic independence, which may
seem quite obvious, though this definition implies that the emphasis is made on
independence, therefore if the variables are not independent, these are somehow
dependent. The literature counts several correlation measures and coefficients
(usually denoted  or ) allowing to evaluate the degrees of these relationships. The
most famous of these is Pearson (1900) correlation coefficient, which captures linear
relationships between two variables. This measure is usually what practitioners
and risk managers have in mind when the question of correlation is addressed, for
instance, the related coefficient takes its values between 1 and 1. Other correlation
coefficients have been developed to address issues related to the Pearson approach
such as the capture of non-linear relationships and the correlations between more
that 2 factors simultaneously.

© Springer International Publishing Switzerland 2016 141


B.K. Hassani, Scenario Analysis in Risk Management,
DOI 10.1007/978-3-319-25056-4_11
142 11 Dependencies and Relationships Between Variables

In this chapter, we will present the theoretical foundations of the various concepts
surrounding dependencies—from correlations to copula and regressions, as well
as the characteristics and properties which may help practitioners analysing risk
scenarios. We will also illustrate them with figures and examples.

11.1 Dependencies, Correlations and Copulas

11.1.1 Correlations Measures

Starting with the theoretical foundations of the methodologies related to dependence


measurement, these will be discussed from the most common to the most advanced.
And therefore, as mentioned before, the most common is the Pearson’s correlation
coefficient. It is obtained by dividing the covariance of the two variables by the
product of their respective standard deviations.
Mathematically speaking, let .X; Y/ be a couple of random variables with
expected values X and Y , standard deviations X and Y and covariance .X;Y/
then the Pearson correlation coefficient is given by:
.X;Y/ EŒ.X  X /.Y  Y /
.X;Y/ D D ; (11.1.1)
X Y X Y
where E denotes the expected value and x;y represents the covariance between X
and Y.
Obviously the Pearson correlation is only defined if both standard deviations
(i.e. the second moments—see Chap. 3) of each random variable exists, are finite
and non-zero. It is also interesting to note that .X;Y/ D .Y;X/ and consequently the
values are independent from the variables order.
The Pearson correlation ranges from C11 in the case of a perfect correlation to
1 representative of a perfect anticorrelation, i.e. when X increases, Y decreases in
same magnitude (Dowdy et al. 2011). All the other values belonging to that range
indicates various degrees of linear dependencies between the variables. When the
coefficient is equal to zero the variables are assumed uncorrelated.  > 0 implies
that the random variables are evolving concomitantly while  < 0 implies that the
random variables are evolving conversely (Fig. 11.1).
A well-known issue related to Pearson’s approach can be stated as follows.
Independent variables imply  D 0, but the converse is not true as this approach only
captures linear dependencies. As a result, non-linear dependencies are disregarded
and this may lead to dreadful modelling inaccuracies.
A first alternative to Pearson’s correlation is the Spearman correlation coefficient
(Spearman 1904), which is defined as the Pearson correlation coefficient between

1
The Cauchy–Schwarz inequality (Dragomir 2003) implies that this correlation coefficient cannot
exceed 1 in absolute value.
11.1 Dependencies, Correlations and Copulas 143

the ranked variables. For example, considering a data sample containing n data
points, the data points Xi ; Yi are ranked and become xi ; yi , and  is calculated as
follows:
P
6 ıi2
.X;Y/ D 1  : (11.1.2)
n.n2  1/

where ıi D xi  yi characterises the difference between ranks. Ties are assigned


a rank equal to their positions’ average in the ascending order of the values. For
example, let .x1 ; y1 /, .x2 ; y2 /; : : : ; .xn ; yn / be a set of observations of the joint
random variables X and Y, respectively, such that all xi and yi are unique. Any
couple .xi ; yi / and .xj ; yj / are considered concordant if xi > xj and yi > yj or if
xi < xj and yi < yj . Conversely they are discordant, if xi > xj and yi < yj or if
xi < xj and yi > yj . If xi D xj or yi D yj , the pair is none of the previous alternative.
In parallel, Kendall’s  coefficient (Kendall 1938), another alternative to Pear-
son’s coefficient is defined as:

.number of concordant couples/  .number of discordant couples/


D 1
:
2 n.n  1/
(11.1.3)

The denominator denotes the number of combinations, as a result  2 Œ1; 1. If


the two rankings are perfectly matching the coefficient equals 1, if these are not
matching whatsoever, the coefficient equals 1. If X and Y are independent, then
the coefficient tends to zero.
Another alternative is Goodman and Kruskal’s  (Goodman and Kruskal 1954),
also measuring the rank correlation. The quantity G presented below is an estimate
of  . This requires Ns the number of concordant pairs, Nd , the number of discordant
couples. Note that “ties” are not considered and are therefore dropped. Then

Ns  Nd
GD : (11.1.4)
Ns C Nd

G can be seen as the maximum likelihood estimator for  :

Ps  Pd
D ; (11.1.5)
Ps C Pd

where Ps and Pd are the probabilities that a random couple of observations will
position itself in the same or opposite order, respectively, when ranked by both
variables.
Critical values for the  statistic are obtained using the Student t distribution, as
follows:
s
Ns C Nd
tG ; (11.1.6)
n.1  G2 /
144 11 Dependencies and Relationships Between Variables

where n is the number of observations, i.e., n is usually different from

n ¤ Ns C Nd ; (11.1.7)

11.1.2 Regression

While in the first section we were discussing the measurement of correlations, in


that section, we are discussing another way of capturing dependencies, analysing the
influence of a variable on another for forecast or prediction purposes (for instance):
the regressions (Mosteller and Tukey 1977 and Chatterjee and Hadi 2015).
Therefore, regression analysis aims at statistically estimating relationships
between variables. Many techniques are available to capture and to analyse
relationships between a dependent variable and one or more independent variables.2
Regression permits analysing how a dependent variable evolves when any of
the independent variables varies, while other independent variables remain fixed.
Besides, regression analysis allows estimating, the conditional expectation of the
dependent variable given the independent variables, the quantile, or the conditional
distribution of the dependent variable given the independent variables. The objective
function, i.e., the regression function is expressed as a function of independent
variables which can in turn be represented by a probability distribution (Figs. 11.1,
11.2, 11.3, and 11.4).

Front Office Risk Analysis

Bonus

Income

Office Hours

Market Volume

Losses

Adventurous Positions

Number of People on the Desk

Desk Volume

Experience

Economics

Controls

Fig. 11.1 This figure shows correlations pair by pair. The circle represents the magnitude of
the correlations. These are equivalent to a correlation matrix, providing an representation of the
Pearson correlations. This figure allows to analyse pairwise correlations between various elements
related to a rogue trading in the front office

2
Sometimes called predictors.
11.1 Dependencies, Correlations and Copulas 145

2.0 Scatterplot

l l
l l
l
l
1.5

l
l

l
l
Losses

l
1.0

l
l
l ll

l l
l l
l l
0.5

l
l

l l l
l
l
l
l

2 3 4 5 6
Controls

Fig. 11.2 This figure is a scatterplot representing losses with respect to controls. Here, we have
the expected behaviour, i.e., the level of losses decreases when the level of controls increases

3D Scatterplot

l
400

ll
350

l
l
l
300

l
l
l
Desk Volume

l
250

l l
l
l ll
l
200

l
l
l l
l l
l 7
150

l
l l
6
l
l
l
5
l
100

l
4
3 trols
2 Con
50

1
10 15 20 25 30 35 40

Office Hours

Fig. 11.3 This figure is similar to Fig. 11.2, i.e., this is a scatterplot, though compared to the
previous figure, this one represents three variables

Regression analysis is particularly interesting for forecasting purposes. Note that


(once again) these strategies belong to the field of machine learning. Regression
analysis allows detecting the independent variables related to the dependent variable
and these relationships may be causal. But this methodology is to be implemented
with caution as it is easy to obtain spurious correlations and interpret them as if they
were real. Besides, we need to bear in mind that correlation does not necessarily
imply causation.
As mentioned previously, many regression techniques exist. These can be split
into two families: the parametric methods which rely on a set of parameters to be
estimated from the data (e.g. the linear regression or the ordinary least squares
146 11 Dependencies and Relationships Between Variables

Simple Scatterplot Matrix


0.5 1.0 1.5 2.0 1 2 3 4 5 6 7 8
l l l

15 20 25 30 35
l l l l l l
l l l

l l l l l l

l l l

Office Hours l
l

l l l
l l
l
l

ll l
l l l
l
l
l l

ll
l l l l l l
ll l
l l l l l l l l l
l l l
l l l l l l
l l l
l l l l l l l l l l l l
l l l l l l

l l l l ll
0.5 1.0 1.5 2.0

l l l

l l l l l l
l l l l l l
l l l
l l l
l l l
l l l
l l l
l l l l l l

l
l l ll
l l l
l
Losses l l
l
l
l l
l
l
l
l
l l
l
l
ll
l
l

l l l l l l

l l l l l l
l ll l l l l
l l
l l l
l l l l l l
l l l

6.5
l l l

l l l l l l
l l l
l l l l l l

5.5
l l l l l l
l l l l l l

l
l
l
l
l

l
l l
l
l
l
l
l l
l l Adventurous Positions l
l
l
l
l
l
l
l

4.5
l l l ll l l l l l l l l l l l l
l
l l l l l
l l l
l l l
l l l
l l l l ll

3.5
l l l

l l l
1 2 3 4 5 6 7 8

l l l

l l l
ll
l
l
l
l
l
ll
l
l l
l
l l l l l
l
l
l l l
l l l
l
Number of People on the Desk
l l ll l l
l l l
l l l l l l
l l l l l l
ll l l l l ll l l l l l l
l l l
l l l l ll
l l l l l l
l l l
l l l l l l

15 20 25 30 35 3.5 4.0 4.5 5.0 5.5 6.0 6.5

Fig. 11.4 This figure illustrates a scatter plot matrix, plotting pairwise relationships between
components of rogue trading issues

regression), and the non-parametric regressions which rely on a specified set of


functions.
The most useful models are described in the following paragraphs, these may all
be formally expressed as,

Y  f .X; ˇ/; (11.1.8)

where Y is the dependent variable, X represents independent variables and ˇ


characterises a set of parameters. Besides, the approximation symbol denotes the
presence of an error term. The approximation is usually mathematically formalised
as E.YjX/ D f .X; ˇ/. To implement the regression analysis, the function f must be
specified as this one characterises the relationship between Y and X, which does
not rely on the data. If unknown f is chosen according to other criteria such as its
propensity to capture and mimic a desired pattern.
To be applicable, the data must be sufficient, i.e., the number of data points (n)
has to be superior to the number of parameters (k) to be estimated. If n < k, the
model is underdetermined. If n D k and f is linear, the problem is reduced to
solving a set of n equations with n unknown variables which has a unique solution.3
However, if f is non-linear the system may not have any solution or on the contrary
we may have many solutions. If N > k, then we have enough information to robustly
estimate a unique value for ˇ, the regression model is overdetermined.
Minimising the distance between the measured and the predicted values of
the dependent variable Y (least squares minimisation) with respect to ˇ is one
of the most common ways to estimate these parameters. Note that under certain

3
The factors have to be linearly independent.
11.1 Dependencies, Correlations and Copulas 147

statistical assumptions, the model uses the surplus of information to provide


statistical information about the unknown parameters ˇ and the predicted values
of the dependent variable Y, such as confidence intervals.
The main assumptions in simple regression analysis which are common to all
strategies presented in the following are
• The sample is representative of the population.
• The error term is a random variable with a mean equal to zero conditional on the
explanatory variables.
• The independent variables are measured with no error.
• The independent variables are linearly independent.
• The errors are uncorrelated.
• The error terms are homoskedastic (the variance is constant over time).
Now that the main features of regressions have been presented, we will present
some particular case that might be useful in practice. The first model presented
is the linear regression. This model presents itself in a form where the dependent
variable, yi , is a linear combination of the parameters. For example, in a simple
linear regression to model n data points there is one independent variable: xi , and
two parameters, ˇ0 (the intercept) and ˇ1 :
• Affine function (Figs. 11.2 and 11.3):

yi D ˇ0 C ˇ1 xi C "i ; i D 1; : : : ; n: (11.1.9)

Adding a term in x2i to the previous equation, we obtain


• Parabolic function:

yi D ˇ0 C ˇ1 xi C ˇ2 x2i C "i ; i D 1; : : : ; n: (11.1.10)

The expression is still linear but is now quadratic. In both cases, "i is an error
term and the subscript i refers to a particular observation. Multiple linear regressions
are built the same way, however, these contain several independent variables or
functions of independent variables.
Fitting the first model to some data, we obtain ˇO0 and ˇO1 the estimates,
respectively, of ˇ0 and ˇ1 . Equation (11.1.9), becomes

yi D ˇO0 C ˇO1 xi :
b (11.1.11)

The residuals represented by i D yi  yO i are the difference between the value


of the dependent variable predicted by the model, b yi , and the true value of the
dependent variable, yi . As mentioned above, the popular method of estimation for
these cases, the ordinary least squares, relies on the minimisation of the squared
148 11 Dependencies and Relationships Between Variables

residuals formalised as follows (Kutner et al. 2004):

X
n
SSE D i2 : (11.1.12)
iD1

A set of linear equations in the parameters are solved to obtain ˇO0 ; ˇO1 . For a simple
affine regression, the least squares estimates are given by
P
b̌1 D .xi  xN /.yi  yN /
P and b̌0 D yN  b̌1 xN ; (11.1.13)
.xi  xN /2

where xN represent the mean of the xi values, and yN the mean of the yi values. The
estimate of the variance of the error terms is given by the mean square error (MSE):

SSE
O "2 D : (11.1.14)
np

where p represents the number of regressors. The denominator is replaced by .n 


p  1/ if an intercept is used. The standard errors are given by
s
1 xN 2
O ˇ0 D O " CP (11.1.15)
n .xi  xN /2
s
1
O ˇ1 D O " P : (11.1.16)
.xi  xN /2

These can be used to create confidence intervals and test the parameters.
The previous regression models can be generalised. Indeed, the general multiple
regression model contains p independent variables:

yi D ˇ1 C ˇ2 xi2 C    C ˇp xip C "i ; (11.1.17)

where xij is the ith observation on the jth independent variable. The residuals can be
written as

"i D yi  ˇO1 xi1      ˇOp xip : (11.1.18)

Another very popular regression widely used in risk management is the logistic
regression which has a categorical dependent variable (Cox 1958 and Freedman
2009). The logistic model is used to estimate the probability of a binary response
based on some predictor(s), i.e., 0 or 1.
The logistic regression measures the relationship between the categorical depen-
dent variable and some independent variable(s), estimating the probabilities using
11.1 Dependencies, Correlations and Copulas 149

the c.d.f. of the logistic distribution. The residuals of this model are logistically
distributed.
The logistic regression is a particular case of the generalised linear model and
thus analogous to the linear regression presented earlier. However, the underlying
assumptions are different from those of the linear regression. Indeed, the conditional
distribution y j x is a Bernoulli distribution rather than a Gaussian distribution,
because the dependent variable is binary, and the predicted values are probabilities
and are therefore restricted to the interval Œ0; 1.
The logistic regression can be binomial, ordinal or multinomial. In a binomial
logistic regression only two possible outcomes can be observed for a dependent
variable. In a multinomial logistic regression we may have more than two possible
outcomes. In an ordinal logistic regression the dependent variables are ordered.
The logistic regression is traditionally used to predict the odds of obtaining “true”
(1) to the binary question based on the values of the independent variables. The odds
are given by the ratio, probability of obtaining a positive outcome divided by the
probability of obtaining “false” (0).
As implied previously, here, most assumptions of the linear regression do not
hold. Indeed, the residuals cannot be normally distributed. Furthermore, linear
regression may lead to predictions making no sense for a binary dependent variable.
To convert a binary variable into a continuous one which may take any real value,
the logistic regression uses the odds of the event happening for different levels of
each independent variable, the ratio of those odds and then takes the logarithm of
that ratio. This function is usually referred to as logit.
The logit function is then fitted to the predictors using linear regression analysis.
The predicted value of the logit is then transformed into predicted odds using
the inverse of the natural logarithm, i.e., the exponential function. Although the
observed dependent variable in a logistic regression is a binary variable, the related
odds are continuous.
The logistic regression can be translated into finding the set of ˇ parameters that
best fit:

y D 1if ˇ0 C ˇ1 x C " > 0 (11.1.19)


y D 0; otherwise; (11.1.20)

where " is an error distributed by the standard logistic distribution.4


ˇ parameters are usually obtained by maximum likelihood (see Chap. 5).
The logistic function is useful and actually widely used in credit risk measure-
ment as it can take any input value, the output will always be between zero and one,
and consequently can be interpretable as a probability.

4
The associated latent variable is y0 D ˇ0 C ˇ1 x C ". Note that " is not observed consequently y0
is not observed.
150 11 Dependencies and Relationships Between Variables

Formalising the concept presented before, the logistic function .t/ is defined as
follows:

et 1
.t/ D D : (11.1.21)
et C 1 1 C et

Let t be a linear function of a single variable x (the extension to multiple variable


is trivial), then t is

t D ˇ0 C ˇ1 x (11.1.22)

then, the logistic function can be rewritten as

1
F.x/ D : (11.1.23)
1C e.ˇ0 Cˇ1 x/

F(x) can be regarded as the probability of the dependent variable equaling a


“success”. The inverse of the logistic function, g, the logit (log odds):
 
F.x/
g.F.x// D ln D ˇ0 C ˇ1 x; (11.1.24)
1  F.x/

and equivalently, after exponentiating both sides:

F.x/
D eˇ0 Cˇ1 x : (11.1.25)
1  F.x/

g./ is the logit function. Here g.F.x// is equivalent to the linear regression
expression, ln denotes the natural logarithm, F.x/ is the probability that the
dependent variable equals “true” considering a linear combination of the predictors.
F.x/ shows that the probability of the dependent variable to represent a success is
equal to the value of the logistic function of the linear regression expression. ˇ0 is
the intercept from the linear regression equation (the value of the criterion when
the predictor is equal to zero). ˇ1 x is the regression coefficient and e denotes the
exponential function.
From above we can conclude that the odds of the dependent variable leading to
a success is given by

odds D eˇ0 Cˇ1 x : (11.1.26)


11.1 Dependencies, Correlations and Copulas 151

11.1.3 Copula

While in the first section we have measured the dependence, in the second we have
captured the impact of a variable on another, in this section, we propose building
multivariate functions.
Following (Guegan and Hassani 2013), a robust way to measure the dependence
between large data sets is to compute their joint distribution function using copula
functions. Indeed, a copula is a multivariate distribution function linking a large
data sets through their standard uniform marginal distributions (Sklar 1959; Bedford
and Cooke 2001; Berg and Aas 2009). The literature often states that the use
of copulas is complicated in high dimensions except when implementing elliptic
structures (Gaussian or Student) (Gourier et al. 2009). However, they fail to capture
asymmetric shocks. For example, using a Student copula with three degrees of
freedom5 to capture a dependence between the largest losses (as implied by the
regulation (EBA 2014)), would also be translated into higher correlations between
the smallest losses. An alternative is found in Archimedean copulas (Joe 1997)
which are interesting as they are able to capture the dependence embedded in
different parts of the marginal distributions. The marginal distributions might be
those presented in Chap. 5. However, as soon as we are interested in measuring
a dependence between more than two sets (Fig. 11.4), the use of this class of
copulas becomes limited as these are usually driven by a single parameter. Therefore
traditional estimation methods may fail to capture the intensity of the “true”
dependence. Therefore, a large number of multivariate Archimedean structures have
been developed, for instance, the fully nested structures, the partially nested copulas
and the hierarchical ones. Nevertheless, all these structures have restrictions on the
parameters and impose only using an Archimedean copula at each node (junction)
making their use limited in practice. Indeed, the parameters have to decrease as the
level of nesting increases.
An intuitive approach proposed by Joe (1997), based on a pair-copula decom-
position, might be implemented (Kurowicka and Cooke 2004; Dissmann et al.
2013). This approach rewrites the n-density function associated with the n-copula,
as a product of conditional marginal and copula densities. All the conditioning
pair densities are built iteratively to get the final one representing the complete
dependence structure. The approach is easy to implement,6 and has no restriction
for the choice of functions and their parameters. Its only limitation is the number
of decompositions we have to consider as the number of vines grows exponentially
with the dimension of the data sample and thus requires the user to select a vine

5
A low number of degrees of freedom imply a higher dependence in the tail of the marginal
distributions.
6
A recent packages has been developed to carry out this approach - for instance the R package
VineCopula (Schepsmeier et al. https://github.com/tnagler/VineCopula) and the R package vines
(Gonzalez-Fernandez et al. https://github.com/yasserglez/vines).
152 11 Dependencies and Relationships Between Variables

from nŠ2 possible vines (Antoch and Hanousek 2000; Bedford and Cooke 2002;
Brechmann et al. 2012; Guégan and Maugis 2011).
To be more accurate the formal representation of copulas is defined in the
following way. Let X D ŒX1 ; X2 ; : : : ; Xn  be a vector of random variables, with joint
distribution F and marginal distributions F1 ; F2 ; : : : ; Fn , then (Sklar 1959) theorem
insures the existence of a function C mapping the individual distributions F1 ; : : : ; Fn
to the joint one F:

F.x/ D C.F1 .x1 /; F2 .x2 /; : : : ; Fn .xn //;

where x D .x1 ; x2 ; : : : ; xn /. we call C a copula.


The Archimedean nested type is the most intuitive way to build n-variate
copulas with bivariate copulas, and consists in composing copulas together, yielding
formulas of the following type for n D 3:

F.x1 ; x2 ; x3 / D C1 ;2 .F.x1 /; F.x2 /; F.x3 //


D C1 .C2 .F.x1 /; F.x2 //; F.x3 //;

where i ; i D 1; 2 is the parameter of the copula. This decomposition can be done


several times, allowing to build copulas of any dimension under specific constraints
(Figs. 11.5 and 11.6).
To present the vine copula method, we use here the density decomposition and
not the distribution function as before. Denoting f the density function associated
with the distribution F, then the joint n-variate density can be obtained as a product
of conditional densities. For n D 3, we have the following decomposition:

f .x1 ; x2 ; x3 / D f .x1 /  f .x2 jx1 /  f .x3 jx1 ; x2 /;

where

f .x2 jx1 / D c1;2 .F.x1 /; F.x2 //  f .x2 /;

and c1;2 .F.x1 /; F.x2 // is the density copula associated with the copula C which links
the two marginal distributions F.x1 / and F.x2 /. With the same notations we have

f .x3 jx1 ; x2 / D c2;3j1 .F.x2 jx1 /; F.x3 jx1 //  f .x3 jx1 /


D c2;3j1 .F.x2 jx1 /; F.x3 jx1 //  c1;3 .F.x1 /; F.x3 //  f .x3 /:

Then,

f .x1 ; x2 ; x3 / Df .x1 /  f .x2 /  f .x3 /


 c1;2 .F.x1 /; F.x2 //  c1;3 .F.x1 /; F.x3 // (11.1.27)
 c2;3j1 .F.x2 jx1 /; F.x3 jx1 //:
11.1 Dependencies, Correlations and Copulas 153

That last formula is called vine decomposition (Fig. 11.7). Many other decompo-
sitions are possible using different permutations. Details can be found in Berg and
Aas (2009), Guégan and Maugis (2010) and Dissmann et al. (2013).
In the applications below, we focus on these vine copulas and in particular the
D-vine whose density f .x1 ; : : : ; xn / may be written as,

Y
n YY
n1 nj
f .xk / c;i;iCjjiC1;:::;iCj1 .F.xi jxiC1 ; : : : ; xiCj1 /; F.xiCj jxiC1 ; : : : ; xiCj1 //:
kD1 jD1 iD1
(11.1.28)
Other vines exist such as the C-vine:

Y
n Y
n1 Y
nj
f .xk / c;i;iCjj1;:::;j1 .F.xj jx1 ; : : : ; xj1 /; F.xjCi jx1 ; : : : ; xj1 //;
kD1 jD1 iD1
(11.1.29)

where index j identifies the trees, while i runs over the edges in each tree (Figs. 11.5,
11.6, and 11.7).

C abcd (C abc , u d )

C abc (C ab , u c )

C ab (u a , u b )

ua ub uc ud

Fig. 11.5 Fully nested copula illustration

C abcd (C ab , C cd )

C ab (u a , u b ) C cd (u c , u d )

ua ub uc ud

Fig. 11.6 Partially nested copula illustration


154 11 Dependencies and Relationships Between Variables

C abc (C ab , C bc )

C ab (u a , u b ) C bc (u b , u c )

ua ub uc

Fig. 11.7 Three-dimensional D-vine illustration: it represents another kind of structure we could
have considering a decomposition similar to (11.1.27), considering the CDFs

Gumbel Galombos
0.0 0.2 0.4 0.6 0.8 1.0

1.0
lllll
ll
ll
l
l
ll
l
ll
l
l
ll
l
l l l
l
ll
l
l
ll
l
l
ll
l
l
l
l l lll
ll
lll
l
ll
ll
ll
ll l
l
ll
l
ll
ll
l l l ll ll ll
ll
l
ll
l
l
ll
ll
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
l
l
ll l l l
ll
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
llll
l l ll l lllll ll ll lllll
ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
ll
l
ll
l
l
ll
l
l
l
l
l
l
l
l
ll
l
ll
l
l
lll
l
l
l
l
lll
l
ll
l
lll
l l
l
ll l
l
llll
lll
lll
l
ll
l
l
l
l
l
l
lll
ll
l
l
l
l
ll
l
ll
lll
l
l
ll
l
l
l
ll
l
l
ll
l
l
l
l
l
l
l l ll lll l
lllll l llllll
l ll ll
lll
l
l
ll
l
l
l
lll
l
ll
lll
l
l
l
ll
ll
l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
ll
l
l
l
l
l
l
l
l
ll
ll
ll
ll
ll
lll
l ll ll ll
lll
l ll
ll
l
llll
l
ll
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
lll
ll
l
l
l
l
l
l
l
ll
ll
l
l
l l l
l ll l ll lll
l l
ll
ll l l
ll
l l llll
l
llll
l l
ll
l
l
l
l
l
l ll
l
ll
l
l
l
l
lll
l
l
l
ll
l
l
l
l
l
l
lll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l l
llll
l l
ll ll l
llll
l ll
lll
lll
lllll
ll
l
l
l
lll
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l l
ll ll
l lll
l
l l ll ll
llllll
l
llll ll l
l l
l
lll lllll
l
l
l
ll
l
l
lll
l
l
l
ll
ll
lll
l
l
l
l
lll
l
ll
lllll
lll
l
l
l
l
l l
lll
l
l
l
l
l
l
l
ll
l
lllllll
l
l
l ll ll
llllll l lll
ll l
l l
lll
llll
l
l
ll
l
l
l
l
l
l
l
lll
l
l
l
ll
l
l
ll
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
lll
ll
l ll
l
ll
ll
l l
llll
l ll
l l l l lll lll llll ll
lll
l ll
l l
l l lll
ll ll
l
l
ll
ll l
l l
ll
ll
l
ll
ll
l l l
l
0.8 ll ll
l
lll
l lllll
ll
ll
lll
ll
l
l
ll
l lll
lll
ll
l lll
l
l
l l l l l ll ll ll
l l lll
lll ll
lll
l
lllll
l ll l
ll
l ll
ll
l
lllllll
ll
ll
l
l l l
lll l l l
lll
llllllll l
l
ll
lll
lll
ll
ll
ll
ll l
ll
l
lll ll
l ll l l l l
l lll ll
l
lll
ll
lllll
ll
l
ll
ll
l
ll
l l
lll
l
l
l
ll
l
l
lll
l l
ll
lll
l
ll
ll
llll
l
l
ll
ll
lll lll
ll ll
l l
llll
l
ll l
l l llll l l
l l llll
l
l ll
l
l
l
ll
l
ll
l
l
l ll
l
l
ll
l
ll
l
ll
ll
l
l
l
l
l
l
l
l
ll
ll
l
l
l
lll
l
ll
l
l
l
l
ll
l
l
l
ll
l
ll
l
ll
l
l
l
l
l
l
l
l
llll
l
l
l
l
l
l
lllll
l l l l ll l
ll l l ll ll llll l l ll lllll
ll ll lllllll
ll
ll
lll
l l
l
l
ll
ll
l
ll
l
l
l
lll
lll
l
ll
l
l
l
l
llll
ll
ll
l
l
l
l
ll
ll
l
ll
l
l
lllll
llll
l
llll
lll
ll
ll lll
ll ll
l l l l ll
llll
l
l
ll
l lll ll
l
l
l
l
ll
l
l
l
ll
l
l
lll
l
l
ll
ll
ll
l
l
l
l
l
l
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
ll
l
ll
l
ll
l
ll
ll
l
l
lll
l llll
l l l
l ll ll l ll ll l l lll lll
lll
l l
ll
llllllllll lll
ll
l
ll
lll
l
l
llll
l
l ll
ll
l
l
ll
l
lll
l
l
ll
l
ll
ll
l
ll
ll l l
llll
l
ll
ll
l ll
l l
l
l
l
l
ll
l
l ll
lll l ll l llll l l ll ll
l l l
ll
l
ll
ll
l
ll
ll
ll
l
l
ll
ll
ll
l
l
ll
ll
l
ll
ll
ll
l
l
ll
l
l
l
l
ll
l
ll
l
l
l
l
ll
l
l
ll
l
ll
l
ll
lll
lll
l l
l
l l lll
l l
l l l l lll l lll l ll lllll ll
l
ll l ll
llll lll
l lll
lllll
llllll l l
ll
l
lll
l lll llll
l ll
ll
lllll
l l ll ll llll lllll llllll
llll
l
ll
lll
l l
lll
l
l llll
ll
ll
lll
ll
ll
ll
ll
lll
l
llll
l
l l
lll
l llll lll
ll l
ll l l l lllll ll
l ll lll
lll lllll l
ll
l lll
l
ll l
ll
l
ll
lll l
l
l ll
llllll
ll l
l l
lll
ll
l
l ll llll
lllll l l l l l l ll ll l l ll
l lll
ll
l
ll
ll
lll
l
l
ll
l
l
l
l
ll
l
l
lll
l
lll ll
l
l
l
l
l
l
lll
l
l
l
l
l
lll
l
l
l
l
l
l
ll
lllll
lll l
l ll
l l l
ll l ll l l l l ll l l lll ll ll ll ll llllll
lll ll
llllll
lll
ll
ll
l
lll
ll
lll
l
ll
ll
l
ll
l
l
ll
l
l l
lll
l
l
l ll l
l
l
ll
l l
ll
l
ll
ll
l
l
lll
lll
ll llll
lllll
l l llllllll ll l l l l ll ll
l l ll
lll
ll l
ll
ll
l
ll
l
ll
ll
l
l
l
ll
l
ll
ll
l
l
lll
l l
l
ll
l
ll
l
l
lll
l
l
l
l
l
l
l
l l
ll
l
l
ll
ll
l
l
l
l
l
lll
l
l
l
ll
lll
l
llll
lll
l
l
l
l
l
l ll llll ll l llll
l l
l l llll l
ll l
ll l l
l
l
ll l
ll
l l l l
l
ll
lll
ll
lll
llll ll l
ll
ll ll lllll l
ll l
l l l ll ll l l l ll ll
l l
ll
l
ll
l l
lll
l l
l
ll
ll
lll
ll
ll
ll l l
l ll l l ll
l
0.6

l l lll ll l lll l
llllll l l ll l l
ll lll l ll ll
lll l l l l
l
l l
llll l
ll
ll
ll
l l
ll l
l l l ll ll lll l l ll l lllll lllll ll
lllll lllllllll
l llll
l
ll
l
l
l
lll
ll l
lll
lll
lllll
ll llllll llll
l
ll
l lll
ll l l ll ll l ll
ll
ll
l ll l llll llll ll
llll
l
l
ll
lll
ll
l
llllll
ll
ll
l l
ll
l ll
ll
ll
ll
l
ll
l l
lll
ll
l
l ll l
lll lll
l
l l
l
l lll
ll
llll l ll ll lllll
l l
lllll
l
l
l
l
l
l ll llll
l
l
llll l
l
l
l
l
l ll
l ll
l
l l llllllll l
ll
l l
l
lll
lll
l
l ll
llll l l
ll
lllll
ll lll l lll l
l l
llllll
l
ll
l
llll l
lll l ll
l l
l
lll
ll
l
l
llll
lll
l
ll
ll
l ll
l l
ll
l
l
llll
l
ll
lll
l
ll
lllll
lll
l
lll
ll l
l
ll ll llll l
l lll l l
l l l l l l l lll
ll l ll
ll ll l l l l
l ll
llllll l
l l
l
llll l
lllllllll l
l
ll ll
l ll
l ll
l
l
ll
l llll
l ll
l
ll
ll
lllll
ll ll l
l lll
ll
ll l
ll ll llll l l ll ll
ll ll l l
l
lllll
l
l
lll
lll
lll ll
l
l
l
l ll l
l
lll
l
l
l
l
l
llll
l ll
l
l
ll
ll
l
l
l
l
ll
ll
ll
l
l
l
l
l
ll
ll
ll
lll
l ll
l
ll l llllll l
Y

lll l l l ll l ll l l l ll ll l ll ll l l
l l ll ll l
l llll
l
l lllll l
l llll lll lll ll l
lll lllll l
ll
lllllll llllllll l
l
lll l lllll l
ll
l ll
l l lll ll llllll l
ll lllll lll l l ll
l l ll l
ll
ll
l
ll
lll
lll
l
lll
l
l l
lll l
l ll
ll
ll
l
lll
ll
lll
l ll l
lll
l l
l l l l
llll l
ll ll
l ll
l
ll l ll lll l llll lll l l lllll
l l l
l lll ll
l l ll l l l l
ll
ll l ll l ll l l ll lll ll l ll lll
l ll
l lll l l
lll
l l l l ll l l
Y

l l ll ll l ll ll l
lll l lll l l l lll l ll l l l
ll l llll l l llll ll l lll l
ll lll l llll
l l
l l
ll l ll
l
lll
lll
ll
l
l
l
lll
l lll l lll
ll l
ll lll
llll
ll l
lll
lll ll
lll ll l
ll l ll
l
llll
l l l l l l l lll ll l
llll
ll
l
ll l
lllll
lll l
ll
llll
ll
l
l
l
lll
l
ll
l
l
l
l
ll
lll
l
l
l
ll
l
l
l
ll
l l
ll
l
l
l l
l
lll
ll
l
l
l
l
l
l
ll
l
l
l l
llll
l
ll
l
llllll
l
llll lll l
lll l
llll l
l
ll l lll l ll ll lll l
llll
l lll lll
ll
l lllll l
llll l l lll l
lll
l l
ll ll l
ll
lll
llll ll
ll lll
l l
l lll l
l l l l l l l l l ll
l
lll l
l l
l
ll llll
llll
l
ll
l
l
ll
ll
lll
ll
l
ll
l
l
ll
l
ll l
l
l lll
l l
lll
ll
lll
l
lllll
l
l lllll
l l
lll l lll
ll
ll l lll lll ll l l lll lllll l lllllll
l
l
l
l lll
ll l
ll
l ll lll
ll l
ll
lll llll
l l
lll
ll
l
lll
l lll lll
l lll
l l l
l ll ll l
l l l l l ll l l l ll ll ll
ll
ll ll
lll
ll
l lll
lll
l l
l
ll
lll
ll
ll
ll
lll
ll
llllll
l llllll ll l ll l l l
l
l lllll l
l
0.4

l ll l ll l lll l
ll
l l ll l
llllll l
l l l lll l l llllll l ll l l ll lll
l ll
lll l ll l
l l l ll l ll ll lllll
llll l
ll
ll l
l ll l ll
lll llll
ll l
lll ll ll ll ll
l ll ll l l llll l
ll ll l
l
llll
ll
lllll lll l
ll
lll
lll lll
ll
ll
ll
l ll
lll
l ll
lllll
l
l l
lll
l ll
l
ll
ll ll l lll lll
lll l l lll l l l l lll lll
l
l
lll
llll
l l
ll
l lll
l
ll
ll
ll
ll
l
l
l
ll
ll
lll
l
ll
l
ll
l
l
ll
l
l
ll
ll
l
ll
l
l
ll
l
l
lll
l
l
llll
ll ll
lllll l
llll
ll l
l ll l ll l l
llll ll ll l l l l l
ll
l
llll
ll l
l
lllll
l
lll
l l
lll l
ll l
ll
ll
l ll
l
l l
llll
ll l
l l
ll lll ll
l
l ll l l l
l l
l ll
l l l ll l
l
l ll l
ll
ll
ll
l
l l
l llll
l
lll
l
l
l
ll
ll
l
l
l
l
l
llll
l
lll
ll
l
ll
l
ll
lll
l
ll
l
l
l
l
l l
lll
lll ll
l l
ll l
ll l
l
l ll
l l ll l l
l l ll ll l ll l l
l l l l ll ll lll
l llll l l
ll l l l l l lll l l l ll l l ll l lll l llll l llll ll
l ll l ll
l
lllll
ll
l lll l l l
l
llll ll l l lllllll l l
ll
ll
ll
ll
l
lllll
lll
l
lllll l
ll
l
ll
l
ll
l
lllll
l ll
ll llll l l
lll
ll
ll
l ll ll lll l lll
ll l
lllll
l
lll
l
l
ll
ll
lll
llll
llll
l ll l
l
l
l
l
lll
ll
l
l
ll
l
lllll ll
lll
l
l lll
ll
l l ll
l ll l l ll
l
l l
ll lllllllllll
l
l l l ll
l
l
lll
l ll
lll
llll
ll
llll
l l
lll l l
llll llll l l lllll llllll l ll l l l l l ll l ll l
ll
ll
ll
l
lllll
l
ll
lllllllll l
l
ll
ll
ll
lll
ll
ll
l
llll
lllll
l
l ll
l
l l
l llll l l
l l
l ll lll llll llll l ll l
l
l
lllllll l
l l
lll
l llll
l ll
ll
l llll lll l
l
l
ll
ll
l
llllll ll l
l
lllllllll ll l lll ll l
l l l l l l l l l ll l lll l
l lll
ll l
l
llll
l
l
ll
l
l lll
l
llll
l
l
l l
lll
l
ll
ll
lll
l
l
l
ll
l
l
lll
l
lll
ll
ll
lll
ll
l
llll
l
l l l
l
ll
lllll l
l l
llll lllllll
l
ll l
llll l l
l ll l
l l l
ll ll lll
ll
l
ll
l
lll
l ll llll
ll
l l
lll ll l
ll l l l
ll l l lll l
ll ll l l
l ll l l ll
lllll l l
lll
l lll
l
l ll
ll l
ll
l
ll
l
ll
l
ll l
l lllll
l
l
llll
lll
l
l
lll
l
ll
llll l lllll ll l l ll
l l l l lllll l l
lll
l
llllll ll
lll
l l
lll lll
l
l l l
l
ll
l
l
l
ll
l ll lll l
lllll
l l
ll
l
llllll ll
llll l l l ll l lll
l ll l
ll l ll l
l l l l ll
ll l
l
lllll
ll
l
ll lll l
ll
lll
lll
lll
l
lll
l
l
ll
ll
ll
l
ll
ll
l
lll
l l llll l lllll l ll l
llll ll
l lll l l l
l llll
lll
lll
lll ll
l lll llll
l
ll ll ll
lll
llll
ll
l
ll llllll
l l lll l ll
l l l
ll l l l llll ll
lll
ll
ll
l ll ll l
ll
ll
ll l
l
lll
l l
ll
ll
l
ll
l
l
l
l
ll
l
lll
l l
ll
l
l l
lll l llllllll ll ll ll
0.2

l l l lll ll l l l l l ll l l ll l l l l l
l llll l ll l
l lll
llll
lllll l
llll
l
ll
l lll
ll l
lll
l ll
ll
l
llllll
llllll
lll
l
l
ll
lll l
ll
lll lllllll
ll
ll llll lll ll l
l ll
lll l llllll
l ll l llllll
lll
l
ll
l
ll
ll
lll
ll
ll
lll
l l
lllll
l
ll
ll
llll
l
lll
l l
lll
l
ll
lll
l
l
l
l
l
lll
l lll l l l l l
l
l lll lll l
l ll l
lllllll
l
lll
l
l
ll
ll
lll lll
ll
l
l
lll
lll l
ll
l
llll
llllllll
lll
llll l
ll ll l
lll
llll
ll
ll
ll ll l
l l
lll lll l l
llll
l l
lll l ll
lllll l
l
llll
lllll
ll
ll
ll
l
lll
ll
ll
l
ll
l
lll
l
ll
ll
l
ll
ll
l ll
ll
l
l
l
l
ll
l
l l
lll
l
l
l
ll
lll
l l
ll ll
l lll l ll l ll l l
lllll
ll ll
l
ll
lll lll
ll ll l
l l
l ll lll
ll
llll
ll l lll l ll ll l l l
l ll ll l l ll l
ll
ll
llllllllllll ll
ll
l ll
lllll
ll
l
ll
ll
ll
l ll
l l l
ll
l llll lll l
lll l l
lllll
l
ll
l
lll
l
l
ll ll
ll
llllll
l
l
l
l
ll
l
l
ll
l l
ll
l l
ll
ll ll
l
l
l
l
l
l
ll
ll
l
ll
ll
ll
l
l
ll
l ll
l
ll
lll
l
ll
lll
l
ll
l
llll ll ll
ll
lllll
ll
lllll lll
lllll ll llll
l l ll
l l l
l l l l l l l
llll
l l
ll l
lll
l
l
l l
l
ll
l
ll
l
ll
l
ll
ll
l
ll
ll l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
llll
ll
l
lll
l
llll
l
l
l
l
llll
l l
l l ll ll
l lll l ll
lllll
l
l
ll
l
ll
l l
ll
l
l
l
lll l
ll
lll
ll
l
l
l
l
l
l
l
l
lllll
l
lll l
ll
l
lll
l
ll
l
ll ll
ll
l
lll
ll
ll
lll ll
lll
l ll llll
l lll l
l l
l ll l llll
l
ll
l
ll
ll
ll
l
l
ll
l
l
l
l
l
ll
ll
l
l
l
l
l
l
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
ll
ll
l
l
lll
l
l
l
l
l
ll
l l
ll
ll
l lllll lll l
lll
lll
lllll
l
l
l
l
l
lll
ll
ll
l
ll
l
ll
l
l
ll
lll
ll
lll l
l ll
l
l
l l
l
llll
l l lll ll lllll
l
l l
llll llll lllll ll l lll l l ll
l
lll
l
ll
l
ll
l
ll
ll
l
lll
l
ll
ll
l
l
lll
l
l
ll
l
ll
l
ll
l l
ll
ll
ll
ll
ll llllllll ll l l l
ll
ll
ll
ll
ll
l
ll
lll
ll
llll
l
l l
l
llll
ll
l
l l
lll
l
ll
lll l
l ll
ll ll
ll
l
ll ll
lll
l ll l
l ll lll l l llll
ll
l
ll
l
ll
l
ll
l
lll
l
l
ll
l
l
ll
ll
lll
l
ll
ll
l
l
ll
ll
l ll
ll
ll
ll l
l
lllll
ll l ll ll ll l l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
ll
l
l
ll
l
l
l
l
ll
l
ll
l
l
l
lll
l
l
lll
ll
lll
l ll
l l
l
ll
ll
l l
ll ll
lll l l
lll lllll l ll l l ll l l l
l
ll
ll
l
l
l
ll
l
ll
l
ll
l
l
l
ll
l
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
lll
l
l
l
l
ll
l
l
l
l l
l l
ll
l
0.0

l
l
l
l
l
l
ll
l
l
ll
ll
l
l
ll
l
l
ll
l
ll
l
ll
ll
lll
l
ll
l
ll
l lll l l l
l
l l l
l
l
ll
l
l
l
lll
l
l
l
l
ll
ll
l
l
ll
l
lll ll
l
ll
ll
l
ll
l lll

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
X X

Gaussian Clayton
0.0 0.2 0.4 0.6 0.8 1.0

1.0

l l l l l l ll l l lllll l ll l
lllll l ll l ll ll l ll llllll ll lll
ll lll
ll llll
l
ll l
ll
l
lll
ll l l l l ll lll l llll ll
lll lllll llll l
l l
lllll l lll ll
ll
ll l
lll lll
l
llll
l
ll
l lll l
l l
ll
ll l l lll l ll ll l ll llll ll l
llllll ll l llll l ll
l l
ll lll ll ll ll llll lllll
l lll
llll
llllll
ll
ll
lll l
l
ll
l l l ll
ll
ll
l l
lll
l
ll
ll
l
l l l l
lll ll ll ll
lll lll
llll l
ll llllll
ll l lll l ll ll ll
lll
lll
l ll
l ll
l
l l lll
ll
l
l l l
ll lll
llll
l
l
ll
ll
l
lll
ll
ll l ll ll ll ll lll ll ll l l ll
l
l l ll lll
l l l ll lll
l
ll lll llll ll
l
lll l
l lll lll lll l
llll
l
l l
l
l
l
lll
lll
ll
ll
l
ll
ll
l l l l ll ll l ll lll
lll
ll ll
l l
l ll l
l
l ll
ll
lll
lllll
llll
lllllll
llllll
llll
ll
llllll l
lll
l
lll
ll
lll
lll
llll
ll
l
l
l l l ll ll
llll l l l
llll l
lllllll l ll ll l
llll
ll l lllll l l lll ll
l llll
l ll
l ll
lllll l
lll l lll
l ll
l lllllllll
l
l ll lll ll l
lllll l ll l l l ll l ll llllll ll l l
lll
l ll l l
l l
l ll
llll
l l l
l ll
ll
ll l
llllll
l l
l
l l
lll
lllll
ll
llll
lll
ll l
l
l
l l
ll
l
l l
l ll l ll l l l ll l
ll ll
ll llll lll
llll l
lll l llll ll lllll lll l
llll
llll ll
lll
lllllll
l
l ll
l l
llll ll l
ll
l lll
l
llll
l
l ll
lll l lll l
l ll
l
ll ll
llllll
l
l
l l l l l l l ll ll l lllllll l
l l llll
l l
l
lll
l
l
llll l
l
lll
l
ll l
l lll
ll
l
ll
l
lll
l
lll
l ll
ll
l
l l l
lll
ll
ll
lll
l l
llll
ll
lll l
l
l
ll
ll
llll
l
ll
l
lll
lll
l ll l ll ll l llll lll l l ll l lll lll l l
l lll ll lll ll l l ll ll l
ll
ll l
ll
l l
ll
llll l
l ll
llll l
lllll ll ll
l
l ll
llll ll l ll l l
llll l lll l lll l
l ll
l ll ll l
l lll lll
lllll
llllll llll l llll
l l l
l
l
ll l ll
ll
l l
ll
l
lllll ll
ll
ll
l l
lll
l
lll l
l ll l
ll
ll
l
ll l l l
l l l l ll l
ll l
l
llll
l l lllll ll ll
ll
l lll l
l l lll lll l ll
lllll
lll l
lll lll ll l
lll
l ll
lll l lll
ll l
lll llll
l l
lllll
ll
l l ll
lll lll ll llllll ll
ll l ll
l llll
llll ll l
lll ll
llllllllll l
ll
l
lll l
ll
l
lllll llll
ll
ll
ll
l
l l l
ll l
llll
l
lllll
ll llll
l ll
ll
l
ll
l l ll ll l l
l l l lll
l lll llll ll ll
l
ll lll l lll lll l ll ll lll lllll llllll l ll lllll l ll llll l
ll l ll llllll l lll l
lll ll ll
l l l
ll llll
ll
l l l
lll
l l
l l l
ll ll l llll
l llll
llll l
ll lll
l
l
ll lll
ll
l ll
l
l ll ll ll
l
l l lll
l l
lll
ll
l
llll
ll l
ll l
lll
l l l
l lll lllll
l lll
l
ll l l
ll
l
l l ll llllll ll l llll l l l l
l l
0.8

ll l l l ll ll l l l ll l l ll llll
l l l lll l l ll ll
l l ll ll l l lllllll l l
l l l
l llll l ll ll l l ll ll l l
ll lll lllll llll lllll l ll
llll l llll ll ll
l ll
lll ll l l l
l
lll
ll
l l
ll
l ll llll llll
l
ll
ll
lll
lll
l l lllll l lll ll l l l ll lll ll ll
l ll l
ll ll
lll lll l
ll
l
l
lll
l l llll
ll ll l
l l
lllll l
lll ll l llll ll
lll l l l l l
llll lllll ll l llll l l ll lllllll l
llll llllll l llll l ll l llllll lll l llll
l llll l llll ll l ll l ll ll
l ll ll
l lll llll
l ll
ll
llll l
ll
lllllll ll l
llllll llll l
llll
ll
ll
lll
ll lll lll
l
l ll ll
lll
l ll
lll ll l
l ll llll l l
l l
l l lll ll l l
llll l
lll lll lll
l llll
l
l
l
l
ll lll l
l l l llllll
l lll
llll
l l ll
ll
ll ll l
lllll l
l
ll
l
l
llllll
l
l ll
l
l l
l
lll l
lll
l
lll
l l l ll l l
ll
ll
l l l l
l l
llllllll
l
lll
llll
l
l
l
llll ll l
l ll l
ll ll lll
lllllll l ll
lll l l
l
ll
lllllll
l ll
lllll
lll
l
l
l
l
ll
llllll
ll
l
llll
lll
l
l
l
ll
ll
l
l
l
llll
l
l
l
l l
l l
ll
ll l
l
l
l
llll
l ll l
l
l
l l
lllllllll
l
l
l
l
l lll
llll
l
l
l
ll
l l
lll
l
ll
l
llllll l
l
l
lll
l
l
l
l
l
l
l
l
l
ll
l l l l l l llll l
l lllll ll llll l
l ll l l l llll l l l ll ll l l l l l
l l l l l l llll l ll lll ll l l lll llll ll lll l
ll l l l l lllll ll llll
l lll l
l ll l
ll llll
l ll lll llll lll
ll
l l l
l lll lll
ll l llll l
l l l l l l ll l l l l llll ll l
llll l l
ll l
ll l
l llll l
l
llllll ll
l
ll
l ll l l l llll
lll
ll
l
l lll
lll
llllll l lll l ll
ll l
lllllll
l
l llll ll ll
ll l l l ll l
l l
l ll lll ll
l llll lll lllll llllll ll
l llll l ll l l lllllll ll ll ll l
ll l l lllll lll l l
ll
llll lll llllllllll l
l
l ll l
l ll
l
llllllll lllll l l
l l l ll llllll
ll lll
l
l l
llllll
l ll
ll llll
lll
ll
l ll l
lll
l
ll lll ll ll lllll
l lllll l
llll
llll ll llll
ll
l
ll ll
l
l l l l
l ll l
l l
l
l lllll ll
l lll l l l
l l l llll
l ll
l
ll l
lllll l
l llllll ll
l
l
ll
l l ll
ll l l
l
l
ll
llllllllll
l
l
l l
lllllll ll l l
ll
ll
lll
ll
ll
lll
ll
ll
l
l
l l l l
l l l ll
l
ll
l lll
llll
l l l
l
lllll ll
ll lll
ll
l l
l l lll l ll l llll l
l
ll
ll l l
l ll l l l
l l l l
lll lll l
l
ll
ll l l ll l
l
lll
ll l
ll
lll
l
ll
l
llll
l l
l
lll
l l
ll
l
l
l
l
l
l
l
l
l
l
lll
l
ll
l
l
l l
l
l
l
ll ll
l
l
lll
ll
l
l
l
ll
llllll lll
l lll
l
lllll l
l l
l
ll
l ll ll ll
l l
l
l
ll
ll
ll
lll
l
llll lllll lll l l
ll ll ll l l l l
l ll l l l l ll l
ll l ll lll l ll ll l ll ll l l
l l ll l llllll lll llll l ll l lll ll l l l l ll llll ll llll l llll l llll ll l l l l l ll l
ll ll ll l
ll l
l l l l lll l l ll l l
l l ll l ll lll l ll l ll l ll
lllll ll
lll l ll lll ll l
l l ll lllllll l ll lll llll
lll l l
l
lllllll l l l l l
l llllll ll l l
lll l
l ll l lll l l
ll
l l l l
lll l
lllll
lll lll
l llll
lll l llll ll l
ll lll l l l l l lllll l
0.6

lll lll l ll l ll ll l l ll ll llllll l ll


l ll llll l l
ll lll ll l ll
l ll l l lll l
l ll ll l
ll ll llllll l l lllllll ll l ll
lll ll
llll lllll l l
l l
l
llllll llllll l lllllllllll lll ll
llll l ll ll l
llll
l l
ll
l l ll l lll ll l
l l l ll ll l ll ll llll
lll
l
ll ll
lll lll
l l l l lll
l
l
ll ll
l ll
ll
l l l
l l
ll
l llll
ll
l l ll l
l l lll
l ll l l l ll ll llll l
ll l ll l
l l l ll
l l lllll l lll
lll l l
llll l
ll
ll ll ll lll l lll lllll ll l lll ll
llll l l ll
l
ll
l
ll ll ll
l ll
l
l l l
llll
l
ll
llll l
lll
ll
l ll
l l ll l l l ll
l lll
l llll ll lll ll
ll ll lll
l lll lllllll lll ll
l ll llll l l
ll
lll l
ll l l
l ll l ll
ll
l
l
l llllll llllll l ll l l l
l ll ll l
ll l
l l l l
l l l lll
l ll l lll l l ll lll
llll ll ll lll l
l
ll lll ll
l l
l l ll ll l
lllllll lll
l ll lll l l l llll ll l
ll ll
l ll
ll
ll
l l lll
ll lllll ll
l
llllll l
lll l lll l lll
l
ll ll lll
lll l lll
llll
l
l lll
lll l ll llllllll l
ll
l l
l
l l llll l ll
l l l
ll l llll
l
lll
lll l
lll ll l l
l l l ll l ll l l ll ll lll
ll ll ll lll l l ll
l l lll lll ll l l l lll l lll ll ll lll
l ll ll lll ll
ll l ll
l ll
ll l l ll lll ll l l
l l l
Y

l ll l l l l l l l
l ll l ll l ll l l lllll l
ll lllll ll
l l
ll llllll l
ll llll llll ll ll l ll
ll ll ll llll l lllllllll l llllll llll ll llll ll lllll l
ll l llllllll l ll l
lll lll lll l ll l l l lll ll ll ll lll
l l ll
ll
l l ll
l lll
l lll lllll
ll ll llll
l ll
l llllll
l
l
l ll
ll l l
lllll lllllll l llll l l
ll lll l ll l l
l l l ll l ll lllll
ll lllll l llllllll ll lll ll l l l ll l lll l l ll ll llll
l l l l l l l l lll l l ll l l
Y

llll ll l l l l ll ll ll lllll lll ll lll ll l l l ll ll l ll l ll


ll lll ll ll l l l ll
ll
l ll
ll l ll l lll ll
ll lll
l l
l ll l
llll l llllll l l lll
ll ll llllll
l lllll
lll ll lll l ll llll l l ll
llll lllll llll lll ll l l l ll ll ll ll l ll lll
l l
ll lll
l lll
lllll lllll ll
l ll lll
l ll
llll
l
l
l
lll
l
llllll l l
lllll
llll l
lllll llll ll ll
ll
ll ll ll l l l
l l lll lll lll
l ll l
ll ll l lll ll l
l l
ll ll l ll lll
l l l ll ll lllll
l l l ll l
ll l
lll lll llll l l l ll ll l ll lll lll lll ll l llll l
l l
l
l
ll ll
lll ll
ll ll
llll l l ll ll
l
lll l
lll llll ll ll l l l ll ll l l l
ll
l lllll
ll l ll lllllll lllll
l l ll lll ll llllll lllll l
llll ll
ll ll l lllll ll lllll lll ll lll
l l ll
lll
lll l l l
ll
l l l l lll
l
ll
lll
ll ll llll
l
l
ll
ll
llll
l
l
lll
ll
llll llll l
lllll
ll
llll
l
lll
lll ll ll
l llllllll l
ll
ll
l
l l
lll
l
ll ll
l lll
l
l
l
lll l
ll lll l l
l llll l ll l
l l l lll ll ll l l ll ll l l l l ll l l l ll ll l
0.4

ll ll lll l l ll ll ll l l l l l l l l l l
l l
l ll
l
l l lllll ll l llllll lll l l
ll l
l lll l ll l lllll
l
ll
ll
llll lll
l lll l l lll llll
ll
ll
llllllll l l
ll
l l lll
l llll
l l llll ll l l ll
ll ll ll ll ll l ll l
l
lll l l
l ll ll ll lll lllll
l lll
l l
lll
lllll ll
l
lll l
l
l
llll
llll
l
l l
l
ll
lll
lll
l
ll
l
l
l
ll
ll
ll
ll
ll
l ll
llllll
l
lllllll
l
ll
l
ll lllll l
lllll ll
l
lll l ll
ll
l
ll
lll ll l l l l l l ll l l
l ll l l
llllllll
l llll l l lll ll l lll
l
l lll
l ll ll lll llll l l l ll l l l ll llll l l lll ll
llll l ll ll l l ll ll ll l l l ll l ll
ll
l
ll
ll
lll
l
lll
l
lllll l
lll
lll
ll ll l
l l
l l l lll ll ll
l
ll
llll l
ll llllll ll lll l l
ll ll ll lll ll l llll ll
l ll
l ll
ll ll ll ll lll l
l ll ll ll lll
l l lllll ll ll
l l l lll ll l ll l ll ll
llll
ll ll l l l ll l llllll
l
l l
l
ll llll
lll
llll
l lll ll
l
llll
lll
llll
lll
l
l lll lll
l
lll lll
l ll ll
l llll ll ll l l l l
l
llll
lllllll
ll llll l
l
ll l l
l l
ll
l llll l ll
ll ll lll l l ll
ll ll
l l l lll l l ll llllll ll llll ll l l l ll l l ll l ll
lll
l
lllll
ll
l llll l l
ll l
l
ll
lllll
ll
lll
ll ll ll
l ll
lll
llll l
l l
l l
llllllll
lll llll l ll l l ll ll l l l l
lll
l llll ll l ll l lll l l l l l lll ll ll ll l
ll lll l
ll ll ll ll
lll
llll llllll l
llll
l
l
l
l l l
l lll l l
ll llll ll l
ll
ll
l
l ll
l
l
ll l
ll
l l ll
l
l
l
lll
ll
l
l
lll llll l
l l
l l llllll
lll
l
ll
ll l
l l
l l
l l
l
ll
l l llll
l
l l l l
ll ll l l l l
l l
l l ll l ll l ll l
l ll
ll
l
lll
ll
ll
ll
l
l
ll
lll
l
llll
l
l
l
l
l l
l
l
lllll
l
ll
lll
l
l
l
l
l
ll
l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
ll
l
l l
l
l
l
l
ll
l
ll
lll
ll l
ll
ll l ll
ll
l
l
ll l
l l l l l
ll ll l l l ll llll ll l
l l ll
l l ll l ll ll
ll lll l ll lll l
l lll l
lll lll ll l ll l l l lll
l lll
l ll l lll l
lll l l
lllll l ll lll l l l ll
l ll l
lllll l
ll
llll
ll
l
ll
lllllllll
lll l
l
llll
l
l lllll
l lll
ll llllll l l lll ll l l l l l l l ll
l
ll l
l lll ll l ll ll l
ll llll l l
lll ll ll ll
lll l lll l l ll ll l l l l
lll llll ll l
l l llllll l ll ll lll ll lll l ll lllll l l ll l ll l l ll
l ll
llll
l l
l llll
ll
l
lll
ll
ll
lll
l
ll
l l l
lll lll
l l
ll
l
l
ll ll l l
l lllll
lll llll ll l ll l ll l l l
l
ll l l
l ll l l llllll ll
lllll
l lllll l
ll ll l
l
lll lll l l
llll
l
ll
l
l
l
l
llllll ll l l lll ll
l ll lll l
l
l
ll ll l lll l l ll l lllll ll
l
llll ll
l ll
l ll
l
llll
l
ll
l
l
ll
ll
l
lll
l
l
l
ll
l
l
l
l
ll
ll
l
ll
ll
l
l
l l
l
l
lll ll
ll
ll lll
lll l
ll l
l
l
l
l l l
ll
l
lll l l
l l ll ll l ll l
l llll l l l ll l l lllll ll ll llll l l l ll ll
l l l lll
l
ll l
l l l l l l
ll l
0.2

ll lll l ll l lll ll ll l ll l ll l l l l l ll l l l l llll l


l lll l ll
ll ll l l
l lllll ll
llll l l
lll
l ll
l lll ll ll l ll
l l ll
l lll
lll
l ll ll
ll
ll lll
ll lllll llll l l ll
l lll
l l lll
ll lll ll ll ll
l l
ll l lll lll llll
ll llllll
ll l
l
l
l
lll
l
ll
l
ll
l
ll
ll
lll
ll
ll
l
l lll
ll
ll
l
ll l
llll
l l
ll
l
llll
l
lll l
ll
l l l lll l l l l
lll ll
llll
l
ll ll ll l ll l l
llll
lll
l l
ll lll
l lllllll
l l ll ll ll
lll l l ll lll l
l ll ll
l
ll l
lll ll ll l l l l
l l l l lllll
l
llll
l l
l
ll
l
ll
l
l
ll
ll
l
l
ll
ll
ll
l
ll
ll ll
lll
l
l l
lll
lllll
l l llllll l l
ll l l l
l
lllll
l l
l
l l ll ll
lll
ll l
l ll l lll ll
l
l lllllll
l
llll l lll ll l
lll ll
l
l llll l
l llll l ll l lll
l ll
lll lll
ll l l
l l l l ll l
lll
lll
lll
l
ll
ll
lll
ll
ll
ll
l
l
l
l
ll
l
ll l
ll
ll
l l
ll
ll
ll
ll
lll
l
lll
ll ll
llllll lll l ll ll
l
ll lll
lllll
ll
l
lll
ll
l
ll
lll
llll l
l l ll ll l l
l
l l llll l l
llllllll ll lll lll ll llllll ll lllll
llll l ll
ll l ll l l l ll l ll l ll ll
lll
l l l
ll
ll
l
ll
l
l
lll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
ll
ll
l
l
ll
ll
l
ll
l l
l
l
ll
l
l
ll
l ll
l lll
l
l
l
l
l
ll l l l
l lll lll llll llll
ll
ll ll
lll
ll ll
l
ll ll ll lll lll l l
l lll lllll ll ll ll l lll llll ll l l
lllll llll l l l ll ll ll l l
l l l l lllll
ll
l l
l
l
l
ll
l
ll
l
lll
l
l
l
lll
l
ll
l
l l
lll ll
lll l ll
l l
llll
l ll
ll l l l l
lll
ll
ll
lllll l ll
ll
l lll l l
l l
ll lll lll llll
lll lll l lll l
llll
l l l
ll ll l ll l
lllll
l lll l l l l ll l
ll l l l l
l ll
l
ll
l
ll
ll
lll
l
l
l l
l
ll
l
l l
ll
l
ll
l
l
ll
l
ll
l
ll l
lll
l ll
ll l lll
llll
ll
l
ll
l l l
l
l lllll
l l
ll
lllll
llll
lll
ll
lll
ll
l
l ll lll lll ll lll l
lllll l ll l l lll ll l ll ll l l ll
l l l l ll
l
ll
l
l
ll
ll
l
ll
l
l
ll
ll
l
ll
l
l
ll
l
l
l
lll
llll
lll
l ll
ll l
l
ll
l ll
l
l
ll
llll
l
ll
ll
l
l
l
l
l
l ll l
l
llll ll
lll ll lllll l l
ll l l ll llll lll llll l ll l ll
l lll l
lll ll ll l l l l ll l ll l ll l ll l ll
lll
l
l
ll
l
l
l
ll
l
ll
l
l
l
l
l
ll
l
l
l
ll
l
l
ll
l
ll
l
l
lll l l
llll
llll
ll
ll
ll
ll
l l lll l
l l
lllll ll ll l lll llllllll
l ll
l
l ll
ll
ll ll lll ll l l ll l l l
lll
l
l
l
ll
l
l
l
l
ll
l
l
ll
lll ll
l l l
0.0

l
l
l
ll
ll
ll
ll
l l
l l
ll
ll ll l
llll
ll ll
l lllll ll ll l ll l lll
l l l ll ll l l l ll ll l l ll l
l
ll
l
l
ll
l
l
l
l
ll
l
ll
l

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
X X

Fig. 11.8 This figure represents four types of copulas. Starting from the top left hand corner,
the Gumbel copula which is an Archimedean copula is upper tail dependent. The top right-hand
corner copula is the Galambos, an extreme value copula. The bottom left-hand corner represents
the Gaussian copula belonging to the elliptic family. Mathematically this copula is the inverse of
the multivariate Gaussian distribution. The last one represents the Clayton copula, an Archimedean
copula which is lower tail dependent

Some usual copulas (Fig. 11.8) are provided in the following (Ali et al. 1978; Joe
1997; Nelsen 2006):
ˆ
• Gaussian: C† .u/ D ‚† .‚1 .u1 /; : : : ; ‚‹1 .ud //, † being a correlation matrix.
• Student-t: C†;v .u/ D t†;v .tv1 .u1 /; : : : ; tv1 .ud //, † being a correlation matrix
t

and v the number of degrees of freedom.


uv
• Ali–Mikhail–Haq: 1.1u/.1v/ ,  2 Œ1; 1/.
˚  1=
• Clayton: maxh u C v   1I 0 , i 2 Œ1; 1/nf0g 2 Œ1; 1/nf0g.
• Frank:  1 log 1 C .exp. u/1/.exp.
exp. /1
v/1/
,  2 Rnf0g.
h  1= i
• Gumbel: exp  . log.u// C . log.v// ,  2 Œ1; 1/.
1=
• Joe: 1  .1  u/ C .1  v/  .1  u/ .1  v/ ,  2 Œ1; 1/.
11.2 For the Manager 155

11.2 For the Manager

In this section, we will discuss points that are to be remembered when these
methodologies are implemented. Following the structure of the chapter, we start
with the correlation coefficients in particular the most commonly used, the Pearson
correlation which measures the strength of linear association between two variables.
The first interesting point is that outliers can heavily influence linear correlation
coefficients and may lead to spurious correlations between two quantitative vari-
ables. Besides, Pearson’s correlation relates to covariances, i.e., variables moving
together, but it does not mean that a real relation exists.
Besides, the correlation coefficient is a numerical way to quantify the relationship
between two variables and is always between 1 and 1, thus 1 <  < 1. Larger
correlation coefficients, i.e., closer to 1 suggest a stronger relationship between the
variables, whilst closer to 0 would suggest weaker ones. This leads to outcomes easy
to interpret.
It is important to remember that correlation coefficients do not imply causality.
If two variables are strongly correlated, it does not mean that the first is responsible
for the other’s occurrence and conversely.
Now, discussing the performance of regression analysis methods in practice,
this depends on the data-generating process, and the model used to represent
them. As the first component, i.e., the data-generating process is usually unknown,
the appropriateness of the regression analysis depends on the assumptions made
regarding this process. These are sometimes verifiable if enough data are available.
Regression models for prediction are often useful even when the assumptions are
moderately violated, although they may not perform optimally, but we should
beware misleading results potentially engendered in these situations.
Sensitivity analysis, such as variation from the initial assumptions may help
measuring the usefulness of the model and its applicability.
Now focusing on the use of copulas, it is important to understand that though
they are powerful tools they are not the panacea. Some would actually argue that
the application of the Gaussian copula to CDOs acted as catalyst in the spreading
of the sub-prime crisis, even though the limitations of copula functions such as the
lack of dependence dynamics and the poor representation of extreme events were
tried to be addressed.
Note that Gaussian and Student copulas have another problem, despite being
widely used these are symmetric structure, i.e., if we have asymmetric negative
shocks, these will be automatically transferred on the other side. In other words, if
only large negative events have a tendency to occur simultaneously, the structure
will also consider that large positive events also occur simultaneously which as
mentioned previously might not be the case.
Alternative briefly presented in this chapter are not necessary easier to use, as the
parametrisation might be complicated.
Further to the brief discussions regarding the presented methodology, as they are
related to the analysis of correlations, we thought it might be of interest to briefly
address and illustrate the exploratory data analysis methodologies, for instance,
156 11 Dependencies and Relationships Between Variables

the principal component analysis (PCA) and the correspondence analysis (CA).
PCA (Jolliffe 2002) is an orthogonal linear transformation of the data. These are
transferred to new sets of coordinates, ranking the variance of each component such
that the component with the largest variance will be represented on the first axis, the
second largest variance on the second axis, and so on and so forth. On the other
hand, correspondence analysis (CA) is a multivariate statistical technique (Hair
2010; Hirschfeld 1935; Benzécri 1973) which is similar to principal component
analysis, but applies to categorical rather than continuous data. As PCA, it allows
representing a set of data in a two-dimensional graphical form.
In other words, these methodologies break down existing dependencies in large
data sets. Basically, the methodology groups together highly correlated variables.
Though, the accuracy is reduced, the simplification and the dimension reduction
makes the outcome usable in practice. These methodologies are illustrated in the
following Figs. 11.9 and 11.10.
These approaches may be very useful to break down a set of correlated variables
into linearly uncorrelated variables making then ready for further analysis. This may
help practitioners reducing the number of variables to be analysed only focusing on
the most important while reducing the noise.

−8 −6 −4 −2 0 2 4

T8
4
T6
0.2

Bonus
T21
T25

T12
T23 T4 T9
T22T5
2

T14 T3 T26
T15
T17
Controls T11 Income T20
EconomicsT16
0.0

T7 T13 T18
0

Experience T2 Office Ho
T10
Comp.2

T32
−2

T24 T1
Desk Volume
T27 Volume T19
Market
−0.2

Losses
Number of People on the Desk T28
−4

Adventurous Positions
−6
−0.4

T29
T30
−8

T31

−0.4 −0.2 0.0 0.2

Comp.1

Fig. 11.9 This figure represents a PCA providing an analysis of a rogue trading exposure. Each
trader is characterised by a value in each fields
References 157

4.36
l
3

2.26
2

l
Dimension 2 (4%)

2.44
l 6.04
l
1

0.68
3.44 2.75
l l
1.02
1.73 2.80.67
l 3.612.07
1.13
1.32 0.48
4.6 l 1.62
1.62
1.68
1.38
0.96
0.09
4.491.12 2.85
3.53l
l 4.03
3.91 0.75
0.26
0.34
0.91
0.2 l
l
4.51
1.56
1.21
0

l l 1.95 4.320.17
1.75
l0.44
0.744.3
3.62
0.32
l
3.6
0.793.472.28 l
l l
3.22 4.24
l6.3 4.56 l
l l5.75 l 3.88
l
−1

3.38
3.9
l
−2

−6 −4 −2 0 2 4 6

Dimension 1 (4%)

Fig. 11.10 This figure represents a CA providing an analysis of a rogue trading exposure

References

Ali, M. M., Mikhail, N. N., & Haq, M. S. (1978). A class of bivariate distributions including the
bivariate logistic. Journal of Multivariate Analysis 8, 405–412.
Antoch, J., & Hanousek, J. (2000). Model selection and simplification using lattices. CERGE-EI
Working Paper Series (164).
Bedford, T., & Cooke, R. M. (2001). Probability density decomposition for conditionally depen-
dent random variables modeled by vines. Annals of Mathematics and Artificial Intelligence, 32,
245–268.
Bedford, T., & Cooke, R. (2002). Vines: A new graphical model for dependent random variables.
The Annals of Statistics, 30(4), 1031–1068.
Benzécri, J.-P. (1973). L’Analyse des Données. Volume II: L’Analyse des Correspondances. Paris:
Dunod.
Berg, D., & Aas, K. (2009). Models for construction of multivariate dependence - a comparison
study. The European Journal of Finance, 15, 639–659.
Brechmann, E. C., Czado, C., & Aas, K. (2012). Truncated regular vines in high dimensions with
application to financial data. Canadian Journal of Statistics, 40(1), 68–85.
Capéraà, P., Fougères, A. L., & Genest, C. (2000). Bivariate distributions with given extreme value
attractor. Journal of Multivariate Analysis, 72, 30–49.
Chatterjee, S., & Hadi, A. S. (2015). Regression analysis by example. New York: Wiley.
Cox, D. R. (1958). The regression analysis of binary sequences (with discussion). Journal of Royal
Statistical Society B, 20, 215–242.
Dissmann, J., Brechmann, E. C., Czado, C., & Kurowicka, D. (2013). Selecting and estimating
regular vine copulae and application to financial returns. Computational Statistics & Data
Analysis, 59, 52–69.
Dowdy, S., Wearden, S., & Chilko, D. (2011). Statistics for research (Vol. 512). New York: Wiley.
Dragomir, S. S. (2003). A survey on Cauchy–Bunyakovsky–Schwarz type discrete inequalities.
JIPAM - Journal of Inequalities in Pure and Applied Mathematics, 4(3), 1–142.
EBA. (2014). Draft regulatory technical standards on assessment methodologies for the advanced
measurement approaches for operational risk under article 312 of regulation (eu), no. 575/2013.
London: European Banking Authority.
Freedman, D. A. (2009). Statistical models: Theory and practice. Cambridge: Cambridge Univer-
sity Press.
158 11 Dependencies and Relationships Between Variables

Galambos, J. (1978). The asymptotic theory of extreme order statistics. Wiley series in probability
and mathematical statistics. New York: Wiley.
Gonzalez-Fernandez, Y., Soto, M., & Meys, J. https://github.com/yasserglez/vines.
Goodman, L. A., & Kruskal, W. H. (1954). Measures of association for cross classifications.
Journal of the American Statistical Association, 49(268), 732–764.
Gourier, E., Farkas, W., & Abbate, D. (2009). Operational risk quantification using extreme value
theory and copulas: from theory to practice. The Journal of Operational Risk 4, 1–24.
Guégan, D., & Maugis, P.-A. (2010). New prospects on vines. Insurance Markets and Companies:
Analyses and Actuarial Computations, 1, 4–11.
Guégan, D., & Maugis, P.-A. (2011). An econometric study for vine copulas. International Journal
of Economics and Finance, 2(1), 2–14.
Guegan, D., & Hassani, B. K. (2013). Multivariate vars for operational risk capital computation:
a vine structure approach. International Journal of Risk Assessment and Management, 17(2),
148–170.
Hair, J. F. (2010). Multivariate data analysis. Pearson College Division.
Hirschfeld, H. O. (1935). A connection between correlation and contingency. Proceedings of
Cambridge Philosophical Society, 31, 520–524.
Joe, H. (1997). Multivariate models and dependence concepts. Monographs on statistics, applied
probability. London: Chapman and Hall.
Jolliffe, I. (2002). Principal component analysis. New York: Wiley.
Kendall, M. (1938). A new measure of rank correlation. Biometrika, 30(1–2), 81–89.
Kurowicka, D. and Cooke, R. M. (2004). Distribution - Free continuous bayesian belief nets.
In Fourth international conference on mathematical methods in reliability methodology and
practice. New Mexico: Santa Fe.
Kutner, M. H., Nachtsheim, C. J., & Neter, J. (2004). Applied linear regression models (4th ed.).
Boston: McGraw-Hill/Irwin.
Mendes, B., de Melo, E., & Nelsen, R. (2007). Robust fits for copula models. Communications in
Statistics: Simulation and Computation, 36, 997–1017
Mosteller, F., & Tukey J. W. (1977). Data analysis and regression: A second course in statistics.
Addison-Wesley series in behavioral science: Quantitative methods. Reading, MA: Addison-
Wesley.
Nelsen, R. B. (2006). An introduction to copulas. Springer series in statistics. Berlin: Springer.
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case
of a correlated system of variables is such that it can be reasonably supposed to have arisen
from random sampling. Philosophical Magazine Series 5, 50(302), 157–175.
Schepsmeier, U., Stoeber, J., Christian Brechmann, E., Graeler, B., Nagler, T., Erhardt, T., et al.
https://github.com/tnagler/VineCopula.
Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges. Publication Institute of
Statistics, 8, 229–231.
Spearman, C. (1904). The proof and measurement of association between two things. American
Journal of Psychology, 15, 72–101.
Index

A C
Activation, 113 Capital analysis and review (CCAR), 22, 26
Adaptive weights, 111 Chimera, 5
Advanced measurement approach (AMA), 21 Cleansing, 26, 32
Agreement, 39–43, 49, 50, 70 Clusters, 16, 30, 31, 34–36, 92, 116
Ancestors, 7, 98, 112 Collaborative, 39, 40, 42, 43, 49, 50
Approximation, 35, 54, 114–115, 120, 146 Computation, 30, 82, 105, 107, 112, 119
Autocorrelation, 123, 124, 126, 136, 137 Concentration risk, 12
Autoregressive conditional heteroskedasticity Concerns, 39–41, 49, 50, 67, 107
(ARCH), 125, 132 Conditional dependencies, 36, 98
Autoregressive fractionally integrated moving Conduct risk, 93
average (ARFIMA) Conjugate prior, 103
autoregressive (AR), 124, 127, 128, Consensus, 39–50, 70, 95
131–133, 135–137 Construction, 9, 65, 69, 86–87, 93, 124, 135
autoregressive integrated moving average Contagion, 6, 14–18, 30, 106, 107
(ARIMA), 125, 131, 132, 136, 137 Control, 3–6, 8–12, 19, 25, 30, 44–47, 69, 78,
autoregressive moving average (ARMA), 81, 88, 91, 92, 97, 107, 117, 145
125, 131 Cooperative, 39
moving average (MA), 125, 131, 136, 137 Copula
Archimedean, 151, 152
Clayton, 154
B Elliptic, 52, 151, 154
Back propagation, 114 Frank, 22, 154
Balanced, 40 Galambos, 151, 154
Bayesian Gaussian, 154, 155
estimation, 59, 60 Gumbel, 154
network, 36, 97–108, 113 Joe, 151, 154
Bayes theorem, 101, 102, 105 student, 151, 154, 155
BCBS 239, 26 Correlation
Big data, 26, 27, 120 Goodman and Krushal, 143
Black Swan, 5 Kendall, 143
Blocking rules, 42 Pearson, 30, 141, 142, 155
Block maxima, 71 Spearman, 142
Boolean, 81, 83, 84, 89, 94 Correspondence analysis, 156
Buy-in, 46, 48, 81 Country risk, 13, 21

© Springer International Publishing Switzerland 2016 159


B.K. Hassani, Scenario Analysis in Risk Management,
DOI 10.1007/978-3-319-25056-4
160 Index

Credit risk, 5, 12, 149 Fisher–Tippett–Gnedenko, 71, 72


Cut set, 82, 85, 86 Fitting, 51, 52, 56, 58–62, 64, 65, 67, 71, 117,
119, 133, 147
Forecasting, 18, 32, 123, 141, 145
D Fréchet distribution, 58, 72, 75, 76, 78
Data Fuzzy logic, 94–95
lake, 31
mining, 30–32, 34, 92
science, 30–33, 97 G
Dependence(ies), 29–30, 36, 79, 82, 90, 97, 98, Gates, 82–86, 88
107, 115, 123, 141–157 Gegenbauer, 135
Dependence diagram, 90 Generalised autoregressive conditional
Descendents, 98 heteroskedasticity (GARCH)
Dickey-fuller, 130 EGARCH, 125, 134
Directed acyclic graph (DAG), 36, 98, 104, 113 GARCH-M, 134
Discussion, 9, 23, 40, 43–45, 155 IGARCH, 134
Distribution Generalised method of moments, 60
alpha-stable, 28, 55, 65 Genetic algorithms, 36, 120
elliptic, 52, 151, 154 Goodness of fit, 27, 61–62, 65
extreme value, 29, 52, 56, 58, 71–73, 75 Governance, 9, 12, 26
gamma, 54, 66 Gradient, 114, 117
Gaussian, 54, 55, 62, 103, 149, 154 Gumbel distribution, 72, 75, 76
generalised hyperbolic, 52, 58, 66
generalised Pareto, 28, 52, 55, 70, 71
Laplace, 54, 66
NIG, 54 I
non-parametric, 52, 54, 62, 64 Incident, 2, 4, 5, 9, 12, 14, 15, 25, 30, 52,
student, 52, 66, 143 69–72, 77, 78, 90, 93, 106, 138
Dodd-Frank Act stress testing (DFAST), 22 Inclusive, 40
Inductive logic, 35, 105
Inference, 34, 98, 102, 104, 105, 107
E Infinite mean, 66
Efficiency (efficient), 11, 12, 26, 31, 61, 69, 91, Information, 1, 4, 6, 9, 12, 16, 19, 25–36, 52,
93, 97, 105, 119 57, 58, 66, 69, 70, 75–79, 84, 89, 93,
Estimation, 19, 34, 51, 52, 56, 58–60, 64, 66, 94, 101, 105–107, 111, 115–120,
70, 76, 77, 79, 105, 114, 126–128, 123–139, 146, 147
147, 151 Inherent risk, 6
Evolution, 7, 8, 36, 50, 57, 70, 114, 125, 129, Initialisation, 47
137, 138 Inputs, 6, 7, 21, 25, 30, 33–36, 49, 50, 83, 84,
Expected shortfall (ES), 29, 56, 57 86, 95, 101, 111, 114, 116, 119, 120,
Experts, 3, 6, 8, 21, 25, 27, 39, 44, 65, 69–71, 149
74, 78, 79, 104 Integrated system, 104–106
Extreme value theory, 69–79 Integration, 16, 18, 32, 105, 130
Interactions, 8, 9, 14–17, 19,
32, 89
F Internal capital adequacy assessment process
Facilitation (facilitator), 43–45 (ICAAP), 20
Failure mode and effect analysis (FMEA), 89, Ishikawa diagrams, 90, 93–94
90
Failures, 2, 5, 8, 12, 14, 15, 20, 21, 78, 81–85,
88–92, 98, 99, 106, 107, 116, 141 K
Fault, 2, 34, 81–94, 141 Kwiatkowski–Phillips–Schmidt–Shin (KPSS),
Fault tree analysis (FTA), 81–94 130
Index 161

L P
Latent variables, 36, 98, 149 Pattern, 16, 30–35, 112, 116, 124, 128, 129,
Learning, 8, 30–34, 36, 104–107, 111–119, 139, 146
145 Perceptron, 112, 115
Least-square, 129 Planning, 2, 3, 18–20, 22, 44, 91
Legal risk, 13, 14 Posterior, 101–103, 105, 115
Learning Principal component analysis, 156
semi-supervised, 33, 34 Prior, 6, 9, 12, 20, 30, 32, 41, 47, 59, 92,
supervised, 33–35, 112 101–103, 107, 115, 118
unsupervised, 33–35, 112 Processing, 25, 26, 31, 32, 35, 36, 94, 107,
Liquidity risk, 13 111, 112, 116, 119
Logic, 23, 25, 31, 35, 81–84, 86, 89, 94–95,
105, 112
Q
Quantiles, 27, 29, 56, 144
M
Market risk, 5, 13, 57
Markov Chain Monte Carlo (MCMC), 105 R
Maxima data set, 75 Rank, 29, 78, 91, 143, 154
Maximum likelihood estimation, 128 Regression
Mean, 7, 28, 49, 51–53, 55, 66, 72, 73, 76–78, linear, 145, 149, 150
86, 98, 102–104, 114, 115, 125, 126, logistic, 118, 148, 149
128, 133, 134, 139, 147, 148, 155 Regulation, 1, 2, 10, 17, 18, 21, 26, 93
Mean square error, 52, 148 Reputational risk, 14
Meta data, 25 Requirements, 7, 20, 22, 26, 57, 66, 78, 81, 101
Military, 1, 2 Residual risk, 6, 19
Minutes, 46 Residuals, 6, 19, 123, 129, 132, 137, 148–149
Model risk, 14 Risk
Moderator, 40 culture, 8–10
Moment, 6, 7, 27–29, 43, 54, 60, 66, 73, 76, data aggregation, 26
78, 82, 114, 125, 127, 135, 142 framework, 2, 9, 11–12, 26, 46, 69, 78, 79
measures, 12, 14, 26, 29, 51, 52, 56–58, 62,
65–67, 69, 71, 76–79, 141, 149
N owner, 44, 81
Nested copula, 151 Root cause analysis, 81, 91–92
Networks Rule of order, 49
Bayesian, 36, 97–108, 113
neural, 35, 111–120, 141
Neural network. See Networks S
Neuron, 35, 111, 115, 119 Seasonality, 69, 128–129, 137
Nodes, 34, 36, 86, 87, 92, 93, 98, 99, 101, 103, Seniority bias, 3, 70
105–107, 111, 113, 116, 151 Shape, 28, 51, 54, 64, 66, 73, 75, 76, 78, 93
Numeric data, 27–30, 124 Signal, 33, 111, 113, 116, 119
Sign-offs, 46, 48–49
Sklar, A., 151, 152
O Softmax activation function, 115
Objective function, 34, 36, 113, 114, 144 Spill-over, 14, 15
Observable quantities, 36, 98 Sponsorship, 46, 47, 92
Odds, 48, 58, 149, 150 Stationarity (stationary process), 7, 125, 128,
Operational risk, 2, 4, 13–15, 17, 19, 21, 44, 130, 135–138
57, 69, 74, 77, 78, 93 Stepwise, 40
Optimisation, 33, 34, 36, 114, 115, 119 Stress testing, 2, 6, 17–20, 22, 23, 58, 65
Origins, 7, 15, 30, 34, 58, 91, 98, 104, 112, Sum of squared error (SSE), 117, 118, 148
118, 125 Supervised neural network, 115
162 Index

Support vector machines, 35 V


Symbols, 82–84, 86, 102, 119, 146 Validation, 7, 46, 48, 49, 115
Synaptic connection, 113 Value at risk (VaR), 56, 57, 66, 67, 78
Systemic risk, 14, 15 Variance, 7, 28, 51–53, 66, 72, 74, 115,
125–129, 133–135, 147, 148, 156
Vines, 151–154
T Vote, 41, 49
Taxonomy, 5, 12–14, 70, 71, 75, 77, 84, 85
Three lines of defense, 11
Tilting, 51–67 W
Time series, 32, 35, 116, 123–139 Weibull distribution, 58, 72, 75, 76
Training, 9, 33–36, 43, 66, 111, 114–119 White noise, 126, 131, 132
Tree, 34, 82–95, 105, 118, 153 Why-because analysis, 90, 92
Trends, 3, 7, 23, 27, 31, 120, 125, 129–130 Workshop, 3, 39, 41, 43–46, 48, 50, 70, 72, 93,
Trust, 9, 43, 70 95
Typology, 4–6

Y
U Yule–Walker, 127, 128
Unanimous (Unanimity), 39, 41–43
Uni-root, 130, 134

You might also like