0% found this document useful (0 votes)
104 views640 pages

The Geometry of Uncertainty

Uploaded by

Hazem El Banna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views640 pages

The Geometry of Uncertainty

Uploaded by

Hazem El Banna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 640

Fabio Cuzzolin

The geometry of uncertainty

The geometry of imprecise probabilities

September 19, 2016

Springer
Berlin Heidelberg NewYork
Barcelona Hong Kong
London Milan Paris
Tokyo
Your dedication comes here
Table of Contents

1 Introduction: Theories of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Mathematical probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Interpretations of probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Does probability exist at all? . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Competing interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.3 Frequentist probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.4 Propensity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.5 Subjective and Bayesian probability . . . . . . . . . . . . . . . . . . . . 7
1.3.6 Bayesian versus frequentist inference . . . . . . . . . . . . . . . . . . . 9
1.4 Beyond probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 Something is wrong with probability . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Pure data: beware of the prior . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.3 Pure data: designing the universe? . . . . . . . . . . . . . . . . . . . . . . 11
1.4.4 No data: modelling ignorance . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.5 Set-valued observations: the clocked die . . . . . . . . . . . . . . . . . 12
1.4.6 Propositional data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.7 Scarce data: beware the size of the sample . . . . . . . . . . . . . . . 15
1.4.8 Unusual data: rare events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4.9 Uncertain data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.10 Knightian uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Mathematics (plural) of uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.1 A variety of proposals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.2 Belief functions and random sets . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.3 Belief, evidence and probability . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6 Structure of this Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Part I Theories of uncertainty

2 Shafer’s belief functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29


2.1 Belief functions as set functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.1 Basic probability assignment . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.2 Plausibility functions or upper probabilities . . . . . . . . . . . . . . 32
2.1.3 Bayesian theory as a limit case . . . . . . . . . . . . . . . . . . . . . . . . . 33

VII
VIII Table of Contents

2.2 Dempster’s rule of combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33


2.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.2 Weight of conflict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.3 Conditioning belief functions . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.4 Combination vs conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3 Simple and separable support functions . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.1 Heterogeneous and conflicting evidence . . . . . . . . . . . . . . . . . 37
2.3.2 Separable support functions and decomposition . . . . . . . . . . . 38
2.3.3 Internal conflict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4 Families of compatible frames of discernment . . . . . . . . . . . . . . . . . . . 40
2.4.1 Refinings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.2 Families of frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4.3 Consistent belief functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.4 Independent frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5 Support functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.1 Vacuous extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.6 Impact of the evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.6.1 Families of compatible support functions . . . . . . . . . . . . . . . . 46
2.6.2 Discerning the interaction of evidence . . . . . . . . . . . . . . . . . . . 47
2.7 Quasi support functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.7.1 Bayes’ theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.7.2 Incompatible priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.8 Consonant belief functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3 Understanding belief functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53


3.1 The multiple semantics of belief functions . . . . . . . . . . . . . . . . . . . . . . 55
3.1.1 Dempster’s multi-valued mappings, compatibility relations . 55
3.1.2 Belief functions as generalised (non-additive) probabilities . 57
3.1.3 Belief functions as inner measures . . . . . . . . . . . . . . . . . . . . . . 58
3.1.4 Belief functions as credal sets . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1.5 Belief functions as random sets . . . . . . . . . . . . . . . . . . . . . . . . 60
3.1.6 Zadeh’s ‘simple view’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Genesis and debate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2.1 Shafer’s position and early support . . . . . . . . . . . . . . . . . . . . . . 62
3.2.2 Bayesian versus belief reasoning . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.3 Pearl’s criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2.4 Issues with multiple interpretations . . . . . . . . . . . . . . . . . . . . . 67
3.2.5 Rebuttals and justifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2.6 Agenda(s) for the future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3 Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3.1 Smets’ Transferable Belief Model . . . . . . . . . . . . . . . . . . . . . . 70
3.3.2 Kohlas and Monney’s theory of hints . . . . . . . . . . . . . . . . . . . . 73
3.3.3 Dezert-Smarandache Theory (DSmT) . . . . . . . . . . . . . . . . . . . 75
3.3.4 Gaussian (linear) belief functions . . . . . . . . . . . . . . . . . . . . . . . 77
3.3.5 Kramosil’s probabilistic interpretation . . . . . . . . . . . . . . . . . . . 78
Table of Contents IX

3.3.6 Hummel and Landy’s statistical view . . . . . . . . . . . . . . . . . . . . 79


3.3.7 Intervals and sets of belief measures . . . . . . . . . . . . . . . . . . . . 81
3.3.8 Other frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4 Reasoning with belief functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93


4.1 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.1.1 From statistical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.1.2 From qualitative data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.2 Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2.1 Dempster’s rule under fire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2.2 Alternative combination rules . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.3 Families of combination rules . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.2.4 Combination of dependent evidence . . . . . . . . . . . . . . . . . . . . 115
4.2.5 Combination of conflicting evidence . . . . . . . . . . . . . . . . . . . . 116
4.2.6 Combination of (un)reliability of sources of evidence,
discounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.3 Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.3.1 Conditional belief functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.3.2 The generalised Bayes theorem . . . . . . . . . . . . . . . . . . . . . . . . 127
4.3.3 Generalising total probability . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.4 Efficient computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.4.1 Approximation schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.4.2 Transformation approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.4.3 Monte-Carlo approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.4.4 Local propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.4.5 Graphical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.5 Decision making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4.5.1 Based on expected utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.5.2 Multicriteria decision making . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.5.3 Other approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.6 Continuous formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.6.1 Shafer’s allocations of probabilities . . . . . . . . . . . . . . . . . . . . . 146
4.6.2 Random sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4.6.3 Belief functions on random Borel intervals . . . . . . . . . . . . . . . 149
4.6.4 Kramosil’s belief function on infinite spaces . . . . . . . . . . . . . . 151
4.6.5 MV algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.6.6 Generalised evidence theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.7 A toolbox for the working scientist . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.7.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.7.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
4.7.3 Ranking aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
4.7.4 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.7.5 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
4.7.6 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
4.8 Advances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
X Table of Contents

4.8.1 Matrix representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158


4.8.2 Distances and dissimilarities . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.8.3 Measures of uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.8.4 Algebra and independence of frames . . . . . . . . . . . . . . . . . . . . 163
4.8.5 Multivariate analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.8.6 Canonical decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.8.7 Frequentist formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

5 The bigger picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167


5.1 Imprecise probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.1.1 Lower probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.1.2 Gambles and behavioural interpretation . . . . . . . . . . . . . . . . . . 171
5.1.3 Lower previsions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.1.4 Events as indicator gambles . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
5.1.5 Rules of rational behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
5.1.6 Natural extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
5.1.7 Belief functions and imprecise probabilities . . . . . . . . . . . . . . 173
5.2 Capacities (A.K.A. fuzzy measures) . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.2.1 Special types of capacities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.3 Probability intervals (two-monotone capacities) . . . . . . . . . . . . . . . . . 176
5.3.1 Probability intervals and belief measures . . . . . . . . . . . . . . . . . 177
5.4 Higher-order probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
5.5 Fuzzy theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.5.1 Possibility theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.5.2 Belief functions on fuzzy sets . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.5.3 Vague sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.5.4 Other fuzzy extensions of the theory of evidence . . . . . . . . . . 181
5.6 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
5.6.1 A belief functions logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
5.6.2 Josang’s subjective logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
5.6.3 Fagin and Halpern’s logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
5.6.4 Haenni and Lehmann’s Probabilistic Argumentation Systems183
5.6.5 Default logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
5.6.6 Modal logic interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
5.6.7 Probability of provability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
5.6.8 Other logical frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.7 Rough sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.7.1 Pawlak’s rough sets algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.7.2 Belief functions and rough sets . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.8 Probability boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
5.8.1 Probability boxes and belief functions . . . . . . . . . . . . . . . . . . . 188
5.8.2 Approximate computations for random sets . . . . . . . . . . . . . . 189
5.8.3 Generalised probability boxes . . . . . . . . . . . . . . . . . . . . . . . . . . 190
5.9 Spohn’s theory of epistemic beliefs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
5.9.1 Epistemic states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Table of Contents XI

5.9.2 Disbelief functions and Spohnian belief functions . . . . . . . . . 192


5.9.3 α-conditionalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.10 Zadeh’s Generalized Theory of Uncertainty (GTU) . . . . . . . . . . . . . . 193
5.11 Baoding Liu’s Uncertainty Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
5.12 Other formalisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
5.12.1 Info-gap decision theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
5.12.2 Vovk and Shafer’s game theoretical framework . . . . . . . . . . . 196
5.12.3 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

Part II The geometry of uncertainty

6 The geometry of belief functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203


6.1 The space of belief functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.1.1 The simplex of dominating probabilities . . . . . . . . . . . . . . . . . 205
6.1.2 Dominating probabilities and L1 norm . . . . . . . . . . . . . . . . . . 206
6.1.3 Exploiting the Moebius inversion lemma . . . . . . . . . . . . . . . . 207
6.1.4 Convexity of the belief space . . . . . . . . . . . . . . . . . . . . . . . . . . 208
6.1.5 Symmetries of the belief space . . . . . . . . . . . . . . . . . . . . . . . . . 209
6.2 Simplicial form of the belief space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
6.2.1 Simplicial structure on a binary frame . . . . . . . . . . . . . . . . . . . 212
6.2.2 Faces of B as classes of belief functions . . . . . . . . . . . . . . . . . 213
6.3 The differential geometry of belief functions . . . . . . . . . . . . . . . . . . . . 213
6.3.1 A case study: the ternary case . . . . . . . . . . . . . . . . . . . . . . . . . . 214
6.3.2 Definition of smooth fiber bundles . . . . . . . . . . . . . . . . . . . . . . 216
6.3.3 Points of the Cartesian space as sum functions . . . . . . . . . . . . 217
6.4 Recursive bundle structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
6.4.1 Recursive bundle structure of the space of sum functions . . . 217
6.4.2 Recursive bundle structure of the belief space . . . . . . . . . . . . 218
6.4.3 Bases and fibers as simplices . . . . . . . . . . . . . . . . . . . . . . . . . . 219
6.5 Open questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

7 Geometry of Dempster’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229


7.1 Dempster’s combination of pseudo belief functions . . . . . . . . . . . . . . 230
7.2 Dempster’s sum of affine combinations . . . . . . . . . . . . . . . . . . . . . . . . 231
7.3 Convex formulation of Dempster’s rule . . . . . . . . . . . . . . . . . . . . . . . . 234
7.4 Commutativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.4.1 Affine region of missing points . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.4.2 Non-combinable points and missing points: a duality . . . . . . 236
7.4.3 The case of unnormalized belief functions . . . . . . . . . . . . . . . 236
7.5 Conditional subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.5.2 The case of unnormalized belief functions . . . . . . . . . . . . . . . 240
7.5.3 Vertices of conditional subspaces . . . . . . . . . . . . . . . . . . . . . . . 240
7.6 Constant mass loci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
XII Table of Contents

7.6.1 Geometry of Dempster’s rule in S2 . . . . . . . . . . . . . . . . . . . . . 242


7.6.2 Affine form of constant mass loci . . . . . . . . . . . . . . . . . . . . . . . 246
7.6.3 Action of Dempster’s rule on constant mass loci . . . . . . . . . . 247
7.7 Geometric orthogonal sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
7.7.1 Foci of conditional subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . 249
7.7.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
7.8 Consistency of conditional belief functions . . . . . . . . . . . . . . . . . . . . . 253
7.9 Open questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

8 Three equivalent models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263


8.1 Basic plausibility assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
8.1.1 Example of basic plausibility assignment . . . . . . . . . . . . . . . . 265
8.1.2 Relation between basic probability and plausibility
assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
8.2 Basic commonality assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
8.2.1 Properties of basic commonality assignments . . . . . . . . . . . . . 267
8.3 The geometry of plausibility functions . . . . . . . . . . . . . . . . . . . . . . . . . 267
8.3.1 Plausibility assignment and simplicial coordinates . . . . . . . . . 268
8.3.2 Plausibility space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
8.3.3 Running example: the binary case . . . . . . . . . . . . . . . . . . . . . . 269
8.4 The geometry of commonality functions . . . . . . . . . . . . . . . . . . . . . . . 270
8.4.1 Running example: the binary case . . . . . . . . . . . . . . . . . . . . . . 271
8.5 Equivalence and congruence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
8.5.1 Congruence of belief and plausibility spaces . . . . . . . . . . . . . 273
8.5.2 Running example: the binary case . . . . . . . . . . . . . . . . . . . . . . 273
8.5.3 Congruence of plausibility and commonality spaces . . . . . . . 274
8.5.4 Running example: congruence of Q2 and PL2 . . . . . . . . . . . . 275
8.6 Point-wise rigid transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
8.6.1 Belief and plausibility spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 275
8.6.2 Commonality and plausibility spaces . . . . . . . . . . . . . . . . . . . . 276
8.7 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

9 The geometry of possibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281


9.1 Consonant belief functions as necessity measures . . . . . . . . . . . . . . . . 282
9.2 The consonant subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
9.2.1 Chains of subsets as consonant belief functions . . . . . . . . . . . 284
9.2.2 Ternary case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
9.2.3 Consonant subspace as simplicial complex . . . . . . . . . . . . . . . 286
9.3 Properties of the consonant subspace . . . . . . . . . . . . . . . . . . . . . . . . . . 287
9.3.1 Congruence of the convex components of CO . . . . . . . . . . . . 288
9.3.2 Decomposition of maximal simplices into right triangles . . . 289
9.4 Consistent belief functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
9.4.1 Consistent knowledge bases in classical logic . . . . . . . . . . . . . 291
9.4.2 Belief functions as uncertain knowledge bases . . . . . . . . . . . . 291
9.4.3 Consistency in belief logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
Table of Contents XIII

9.5 The geometry of consistent belief functions . . . . . . . . . . . . . . . . . . . . . 293


9.5.1 Example: the binary frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
9.5.2 The region of consistent belief functions . . . . . . . . . . . . . . . . . 294
9.5.3 Consistent complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
9.6 Natural consistent components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
9.7 Open questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

Part III Geometric interplays of uncertainty measures

10 The affine family of probability transforms . . . . . . . . . . . . . . . . . . . . . . 305


10.1 Affine transforms in the binary case . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
10.2 Geometry of the dual line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
10.2.1 Orthogonality of the dual line . . . . . . . . . . . . . . . . . . . . . . . . . . 309
10.2.2 Intersection with the region of Bayesian normalized sum
functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
10.3 The intersection probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
10.3.1 Interpretations of the intersection probability . . . . . . . . . . . . . 313
10.3.2 Example of intersection probability . . . . . . . . . . . . . . . . . . . . . 316
10.3.3 Intersection probability and affine combination . . . . . . . . . . . 317
10.3.4 Intersection probability and convex closure . . . . . . . . . . . . . . 319
10.4 Orthogonal projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
10.4.1 Orthogonality condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
10.4.2 Orthogonality flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
10.4.3 Two mass redistribution processes . . . . . . . . . . . . . . . . . . . . . . 322
10.4.4 Orthogonal projection and affine combination . . . . . . . . . . . . 324
10.4.5 Orthogonal projection and pignistic function . . . . . . . . . . . . . 325
10.5 The case of unnormalized belief functions . . . . . . . . . . . . . . . . . . . . . . 326
10.6 Comparisons within the affine family . . . . . . . . . . . . . . . . . . . . . . . . . . 328

11 The epistemic family of probability transforms . . . . . . . . . . . . . . . . . . . 339


11.1 Rationale of epistemic transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
11.1.1 Semantics within the probability bound interpretation . . . . . . 342
11.1.2 Semantics within Shafer’s interpretation . . . . . . . . . . . . . . . . . 344
11.2 Dual properties of epistemic transforms . . . . . . . . . . . . . . . . . . . . . . . . 345
11.2.1 Relative plausibility, Dempster’s rule, and pseudo belief
functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
11.2.2 A (broken) symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
11.2.3 Dual properties of the relative belief operator . . . . . . . . . . . . . 346
11.2.4 Representation theorem for relative beliefs . . . . . . . . . . . . . . . 349
11.2.5 Two families of Bayesian approximations . . . . . . . . . . . . . . . . 350
11.3 Plausibility transform and convex closure . . . . . . . . . . . . . . . . . . . . . . 351
11.4 Generalizations of the relative belief operator . . . . . . . . . . . . . . . . . . . 351
11.4.1 Zero mass to singletons as a singular case . . . . . . . . . . . . . . . . 352
11.4.2 The family of relative mass probability transformations . . . . 353
XIV Table of Contents

11.4.3 Approximating pignistic probability and relative plausibility 354


11.5 Geometry in the space of pseudo belief functions . . . . . . . . . . . . . . . . 358
11.5.1 Plausibility of singletons and relative plausibility . . . . . . . . . . 359
11.5.2 Belief of singletons and relative belief . . . . . . . . . . . . . . . . . . . 359
11.5.3 A three plane geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
11.5.4 A geometry of three angles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
11.5.5 Singular case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
11.6 Geometry in the probability simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
11.6.1 Geometry in the 3-element frame . . . . . . . . . . . . . . . . . . . . . . . 366
11.6.2 Singular case in the 3-element frame . . . . . . . . . . . . . . . . . . . . 369
11.7 Equality conditions for both families of approximations . . . . . . . . . . 371
11.7.1 Equal plausibility distribution in the affine family . . . . . . . . . 371
11.7.2 Equal plausibility distribution as a general condition . . . . . . . 373

12 Consonant approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385


12.1 Geometry of outer consonant approximations in the consonant
simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
12.1.1 Outer consonant approximations . . . . . . . . . . . . . . . . . . . . . . . 390
12.1.2 Geometry in the binary case . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
12.1.3 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
12.1.4 Weak inclusion and mass re-assignment . . . . . . . . . . . . . . . . . 392
12.1.5 The polytopes OC [b] of outer approximations . . . . . . . . . . . . . 393
12.1.6 Maximal outer approximations . . . . . . . . . . . . . . . . . . . . . . . . . 394
12.1.7 Maximal outer approximations as lower chain measures . . . . 394
12.1.8 Example: outer approximations on the ternary frame . . . . . . . 395
12.2 Geometric consonant approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
12.2.1 Principles of geometric approximation . . . . . . . . . . . . . . . . . . 397
12.3 Consonant approximation in the binary belief space . . . . . . . . . . . . . . 402
12.3.1 Bayesian Lp approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
12.3.2 Consonant Lp approximations . . . . . . . . . . . . . . . . . . . . . . . . . 402
12.3.3 Compatible consonant belief functions . . . . . . . . . . . . . . . . . . 403
12.4 Consonant approximation in the mass space . . . . . . . . . . . . . . . . . . . . 405
12.4.1 Results of Lp consonant approximation in the mass space . . 406
12.4.2 Semantics of partial consonant approximations in M . . . . . . 408
12.4.3 Computability and admissibility of global solutions . . . . . . . 411
12.4.4 Relation with other approximations . . . . . . . . . . . . . . . . . . . . . 412
12.4.5 Ternary example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
12.5 Consonant approximation in the belief space . . . . . . . . . . . . . . . . . . . . 415
12.5.1 L1 approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
12.5.2 (Partial) L2 approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
12.5.3 L∞ approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
12.5.4 Approximations in B as generalized maximal outer
approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
12.5.5 Graphical comparison in a ternary example . . . . . . . . . . . . . . 421
12.5.6 Some conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
Table of Contents XV

13 Consistent approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443


13.1 The Lp consistent approximation problem . . . . . . . . . . . . . . . . . . . . . . 445
13.2 Consistent approximation in M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
13.2.1 L1 approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
13.2.2 L∞ approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
13.2.3 L2 approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
13.3 Consistent approximation in the belief space . . . . . . . . . . . . . . . . . . . . 449
13.3.1 L1 /L2 approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
13.3.2 L∞ consistent approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 453
13.4 Approximations in the belief versus the mass space . . . . . . . . . . . . . . 454
13.4.1 Comparison on a ternary example . . . . . . . . . . . . . . . . . . . . . . 455

Part IV A geometric approach to uncertainty

14 Geometric inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

15 Geometric conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465


15.1 Conditioning in belief calculus: a concrete scenario . . . . . . . . . . . . . . 467
15.1.1 Model-based data association . . . . . . . . . . . . . . . . . . . . . . . . . . 467
15.1.2 Rigid motion constraints as conditional belief functions . . . . 468
15.2 Geometric conditional belief functions . . . . . . . . . . . . . . . . . . . . . . . . . 470
15.3 Geometric conditional belief functions in M . . . . . . . . . . . . . . . . . . . 470
15.3.1 Conditioning by L1 norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
15.3.2 Conditioning by L2 norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
15.3.3 Conditioning by L∞ norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
15.3.4 A case study: the ternary frame . . . . . . . . . . . . . . . . . . . . . . . . . 473
15.3.5 Features of geometric conditional belief functions in M . . . . 475
15.3.6 Interpretation as general imaging for belief functions . . . . . . 475
15.4 Geometric conditioning in the belief space . . . . . . . . . . . . . . . . . . . . . 476
15.4.1 L2 conditioning in B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
15.4.2 L1 conditioning in B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
15.4.3 L∞ conditioning in B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
15.5 Mass space versus belief space conditioning . . . . . . . . . . . . . . . . . . . . 482
15.5.1 Geometric conditioning: a summary . . . . . . . . . . . . . . . . . . . . 482
15.5.2 Comparison on the ternary example . . . . . . . . . . . . . . . . . . . . . 483
15.6 An outline of future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485

16 Decision making with epistemic transforms . . . . . . . . . . . . . . . . . . . . . . 497


16.1 The credal set of probability intervals . . . . . . . . . . . . . . . . . . . . . . . . . . 500
16.2 Intersection probability and probability intervals . . . . . . . . . . . . . . . . 502
16.3 Credal interpretation of Bayesian transforms: The ternary case . . . . . 503
16.4 Credal geometry of probability transformations . . . . . . . . . . . . . . . . . 506
16.4.1 Focus of a pair of simplices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
16.4.2 Probability transformations as foci . . . . . . . . . . . . . . . . . . . . . . 506
XVI Table of Contents

16.4.3 Semantic of foci and a rationality principle . . . . . . . . . . . . . . . 507


16.4.4 Mapping associated with a probability transformation . . . . . . 508
16.4.5 Upper and lower simplices as consistent probabilities . . . . . . 509
16.5 Alternative versions of the Transferable Belief Model . . . . . . . . . . . . 510
16.6 A game/utility theory interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
16.6.1 The cloaked carnival wheel scenario . . . . . . . . . . . . . . . . . . . . 512
16.6.2 A minimax/maximin decision strategy . . . . . . . . . . . . . . . . . . . 513

Part V The future of uncertainty

17 An agenda for the future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523


17.1 A statistical random set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
17.1.1 Lower and upper likelihoods . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
17.1.2 Generalised logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . 529
17.1.3 Fiducial inference with belief functions . . . . . . . . . . . . . . . . . . 530
17.1.4 The total probability theorem for random sets . . . . . . . . . . . . 530
17.1.5 Limit theorems for random sets . . . . . . . . . . . . . . . . . . . . . . . . 533
17.1.6 Frequentist inference with random sets . . . . . . . . . . . . . . . . . . 534
17.1.7 Random set variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
17.2 Developing the geometric approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
17.2.1 Geometry of other combination rules . . . . . . . . . . . . . . . . . . . . 537
17.2.2 Geometry of other conditioning operators . . . . . . . . . . . . . . . . 537
17.2.3 Geometric inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
17.2.4 Geometry of continuous formulations . . . . . . . . . . . . . . . . . . . 538
17.2.5 A true geometry of uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . 538
17.2.6 Fancier geometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
17.3 Completing the theory of evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
17.3.1 Reasoning with intervals of belief functions . . . . . . . . . . . . . . 540
17.3.2 Graphical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
17.3.3 Random set random forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
17.4 High-impact applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
17.4.1 Rare events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
17.4.2 Climatic change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
17.4.3 Statistical learning theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
Introduction: Theories of Uncertainty
1
1.1 Uncertainty
Uncertainty is of paramount importance in artificial intelligence, applied science,
and many other areas of human endevour. Whilst each and every one of us possesses
some intuitive grasp on what uncertainty is, providing a formal definition can be
elusive. Uncertainty can be understood as lack of information about an issue of
interest for a certain agent (e.g., a human decision maker or a machine), a condition
of limited knowledge in which it is impossible to exactly describe the state of the
world or its future trajectories.
According to Dennis Lindley [?], for instance:

“There are some things that you know to be true, and others that you know to be
false; yet, despite this extensive knowledge that you have, there remain many things
whose truth or falsity is not known to you. We say that you are uncertain about them.
You are uncertain, to varying degrees, about everything in the future; much of the
past is hidden from you; and there is a lot of the present about which you do not
have full information. Uncertainty is everywhere and you cannot escape from it”.

What is somewhat less clear to scientists themselves, is the existence of a iatus


between two fundamentally distinct forms of uncertainty. The first level consists of
predictable variations, which are normally encoded as probability distributions. For
instance, if one plays a fair roulette they will not, by all means, know the outcome in
advance, but they will nevertheless be able to predict the frequency by which each
outcome manifests itself (1/36), at least in the long run.
The second level is about unpredictable variations, which reflect a more fundamen-

1
2 1 Introduction: Theories of Uncertainty

tal uncertainty about the laws themselves which govern the variations. Following on
with our example, suppose the player is presented with ten different doors, which
lead to rooms containing a roulette modelled by a different probability distribution.
They will then be uncertain about the very game they are suppose to play. How will
this affect they betting behaviour, for instance?
Uncertainty of the second kind is often called Knightian uncertainty [?], from
Chicago economist Frank Knight. He would distinguish ‘risk’ from ‘uncertainty’ as
follows:

“Uncertainty must be taken in a sense radically distinct from the familiar notion
of risk, from which it has never been properly separated.... The essential fact is that
‘risk’ means in some cases a quantity susceptible of measurement, while at other
times it is something distinctly not of this character; and there are far-reaching and
crucial differences in the bearings of the phenomena depending on which of the two
is really present and operating.... It will appear that a measurable uncertainty, or
‘risk’ proper, as we shall use the term, is so far different from an unmeasurable one
that it is not in effect an uncertainty at all.”

In Knights terms, ‘risk’ is what people normally call probability or chance, while
the term ‘uncertainty’ is reserved for second-order uncertainty.
Second-order uncertainty has a consequence on human behaviour: people are em-
pirically averse to unpredictable variations (as highlighted by Ellsberg’s paradox
[?]).
This difference between predictable and unpredictable variation is one of the
fundamental issues in the philosophy of probability, and is sometimes referred to
as distinction between common-cause and special-cause. Different interpretations
of probability treat these two aspects of uncertainty in different ways. Economists
John Maynard Keynes [?] and G. L. S. Shackle have also contributed to this debate.

1.2 Mathematical probability


The mainstream mathematical theory of (first order) uncertainty is measure-theoretical
probability, mainly due to Russian mathematician Andrey Kolmogorov [?]. As most
reader will know, in Kolmogorov’s mathematical probability the latter is simply an
application of measure theory [?], the theory of assigning numbers to sets. In partic-
ular, Kolmogorov’s probability measures are additive measures, i.e., the real value
assigned to a set of outcomes is the sum of the values assigned to its constituing
elements. The collection Ω of possible outcomes (of a random experiment, or a
decision problem) is called the sample space, or universe of discourse. Any (mea-
surable) subset A of the universe Ω is called an event, and is assigned a real number
between 0 and 1.
A recent study of the origins of Kolmogorov’s work is due to Shafer and Vovk1
1
http://www.probabilityandfinance.com/articles/04.pdf.
1.2 Mathematical probability 3

Formally [?], let Ω be the sample space, and let 2Ω represent its power set
.
2 = {A ⊂ Ω}. A subset F ⊂ 2Ω is called a σ-algebra if it satisfies the following

three properties [?]:


– F is non-empty: there is at least one A ⊂ Ω in F;
– F is closed under complementation: if A is in F, then so is its complement,
Ac = {ω ∈ Ω, ω 6∈ A} ∈ F;
– F is closed under countable union: if A1 , A2 , A3 , ... are in F, then so is A =
A1 ∪ A2 ∪ A3 ∪ · · · ,
where ∪ denotes the usual set-theoretical union.
From the above properties, it follows that any σ-algebra F is closed under countable
intersection as well (by De Morgan’s laws).
Definition 1. A probability measure over a σ-field or σ-algebra F ⊂ 2Ω , associ-
ated with a sample space Ω, is a function P : F → [0, 1] such that:
– P (∅) = 0;
– P (Ω) = 1;
– if A ∩ B = ∅, A, B ∈ F then P (A ∪ B) = P (A) + P (B) (additivity).
A simple example of probability measure associated with a spinning wheel is shown
in Figure 1.1.

Fig. 1.1. A spinning wheel is a physical mechanism whose outcomes are associated with a
(discrete) probability measure.

A sample space Ω together with a σ-algebra F of its subsets and a probability


measure P on F forms a probability space: (Ω, F, P ).
Based on the notion of probability space, one can define that of random variable. A
random variable is a variable whose value is subject to random variations, i.e. due
to ‘chance’ (although, as we know, what chance is is subject to debate). Mathemat-
ically, it is a function X from a sample space Ω (endowed with a probability space)
4 1 Introduction: Theories of Uncertainty

to (usually) the real line R: see Figure 1.2-left for an illustration of the random vari-
able associated with a die.
The function X : Ω → R is subject to a condition of measurability: in rough words,
each interval of values of the real line must have an anti-image which belongs to the
σ-algebra F, and therefore has a probability value. In this way we have a means of
assigning probability values to sets of real numbers.

1.3 Interpretations of probability


1.3.1 Does probability exist at all?

When one thinks of classical examples of probability distributions (e.g. a spinning


wheel, a roulette, a rolling die), the suspicion that ‘probability’ is simply a fig leaf for
our ignorance and lack of understanding of nature phenomena arises. Assuming a
view of the physical world that follows the laws of classical Newtonian mechanics,
it is theoretically conceivable that perfect knowledge of the initial conditions of
say, a roulette, and of the impulse applied to it by the croupier would allow the
player to know exactly what number would come out. In other words, with sufficient
information, any phenomenon would be predictable in a completely deterministic
way.
This is a position supported by Einstein himself, as he was famously quoted saying
that “God does not play dice with the universe”. In Doc Smiths Lensmen series, the
ancient race of the Arisians have such mental powers that they compete with each
other on foreseen events far away in the future to the tiniest detail.
A first objection to this argument is to point out that ‘infinite accuracy’ is an
abstraction, and any actual measurements are bound to be affected by a degree of
imprecision. As soon as initial states are not precisely known, the nonlinear nature
of most phenomena inexcapably generates a chaotic behaviour that will prevent any
accurate prediction of future events.
More profoundly, the principles of quantum mechanics seem to suggest that
probability is not just a figment of our mathematical imagination, or a representa-
tion of our ignorance: the workings of the physical world seems to be inherently
probabilistic. However, the question arises of why should the finest structure of the
physical world be describe by additive measures, rather than more general ones (or
capacities, see Chapter ??).
Finally, as soon as we introduce the human element in the picture, any hope to
being able to predict the future deterministically disappears. One may say that this
is just another manifestation of our inability to understanding the internal workings
of a system as complex as a human mind. Fair enough. Nevertheless, we still need
to be able to make useful predictions about human behaviour, and ‘probability’ in a
wide acception, is a useful means to that end.
1.3 Interpretations of probability 5

1.3.2 Competing interpretations

Even assuming that (some form of mathematical) probability is inherent to the phys-
ical world, people cannot agree on what it is. Quoting Savage [?]:

“It is unanimously agreed that statistics depends somehow on probability. But,


as to what probability is and how it is connected with statistics, there has seldom
been such complete disagreement and breakdown of communication since the Tower
of Babel. Doubtless, much of the disagreement is merely terminological and would
disappear under sufficiently sharp analysis”.

As a result, probability has multiple competing interpretations: (1) as an objec-


tive description of frequencies of events (meaning ‘things that happen’) at a certain
persistent rate, or ‘relative frequency’ – this is the so called frequentist interpreta-
tion, mainly due to Fisher and Pearson; (2) as degree of belief on events (interpreted
as statements/propositions on the state of the world), regardless of any random pro-
cess – the Bayesian or evidential interpretation, first proposed by de Finetti and
Savage; (3) as the propensity of an agent to act (or gamble, or decide) in case the
event happens – the so called behavioural probability [1371].
Note that neither frequentist nor Bayesian probability are in constrast with the
classical mathematical definition of probability due to Kolmogorov: others, how-
ever, do require us to employ different classes of mathematical objects (as we will
see in this Book).

1.3.3 Frequentist probability

In the frequentist interpretation, the (aleatory) probability of an event is its relative


frequency in time. When tossing a fair coin, for instance, frequentists say that the
probability of getting a heads is 1/2, not because there are two equally likely out-
comes (due to the structure of the object being tossed) but because repeated series
of large numbers of trials (a random experiment) demonstrate that the empirical fre-
quency converges to the limit 1/2 as the number of trials goes to infinity.
Clearly, it is impossible to actually complete the infinite series of repetitions which
constitutes a random experiment. However, the frequentist interpretation offers
guidance in the design of practical random experiments, using as main tools sta-
tistical hypothesis testing and confidence interval analysis.

Statistical hypothesis testing A statistical hypothesis is a hypothesis (a conjecture


on the state of the world) that is testable on the basis of observing a process that
is modeled via a set of random variables. A data set obtained by sampling is com-
pared against synthetic data from an idealized model. A hypothesis is proposed for
the statistical relationship between the two data sets; this is compared as an alterna-
tive to an idealized null hypothesis that proposes no relationship between two data
sets. The comparison is deemed statistically significant if the relationship between
the data sets would be an unlikely realization of the null hypothesis according to a
6 1 Introduction: Theories of Uncertainty

threshold probability: the significance level. Statistical hypothesis testing is a form


of confirmatory data analysis, as opposed to exploratory data analysis which does
not rely on pre-specified hypotheses.
The steps to follow in hypothesis testing are2 :
1. state the null H0 and alternative H1 hypotheses;
2. state the statistical assumptions being made about the sample, e.g. assumptions
about the statistical independence or about the form of the distributions of the
observations;
3. state the relevant test statistic T (i.e., a quantity derived from the sample);
4. derive from the assumptions the distribution of the test statistic under the null
hypothesis;
5. set a significance level (α), i.e., a probability threshold below which the null
hypothesis will be rejected;
6. compute from the observations the observed value tobs of the test statistic T ;
7. calculate the p-value, the probability (under the null hypothesis) of sampling a
test statistic at least as extreme as the observed value;
8. reject the null hypothesis, in favor of the alternative one, if and only if the p-
value is less than the significance level threshold.
In hypothesis testing false positives (i.e., rejecting a valid hypothesis) are called
‘type I’ errors; false negatives (not rejecting a false hypothesis) are called ‘type II’
errors. Note that if the p-value is above α the result of the test is inconclusive: the
evidence is insufficient to support a conclusion.
The notion of p-value is crucial in hypothesis testing. It is the probability, under
the assumption of hypothesis H, of obtaining a result equal to or more extreme than
what was actually observed, namely: P (X ≥ x|H), where x is the observed value.
The reason for not simply considering P (X = x|H) when assessing the null hy-
pothesis is that, for any continuous random variable that conditional probability is
equal to zero. As a result we need to consider, depending on the situation, a right-
tail event p = P(X ≥ x|H), a left-tail event p = P(X ≤ x|H), or a double-tailed
event: the ‘smaller’ of {X ≤ x} and {X ≥ x}.
Note that the p-value is not the probability that the null hypothesis is true or the
probability that the alternative hypothesis is false: frequentist statistics does not and
cannot attach probabilities to hypotheses.

Maximum Likelihood Estimation A popular tool for estimating the parameters


of a probability distribution which best fits a given set of observations is Maximum
Likelihood Estimation (MLE). The term likelihood was coined by Ronald Fisher in
1922 [?]. He argued against the use of ‘inverse’ (Bayesian) probability as a basis for
statistical inferences, proposing instead inferences based on likelihood functions.
MLE is based on the likelihood principle: all of the evidence in a sample relevant
to model parameters is contained in the likelihood function. Some widely used sta-
tistical methods, for example many significance tests, are not consistent with the
2
1.3 Interpretations of probability 7

likelihood principle. The validity of such an assumption is still hotly debated [].
Given a parametric model {f (.|θ), θ ∈ Θ}, a family of probability distributions of
the data given a (possibly vector) parameter θ, the maximum likelihood estimate of
θ is defined as:  
θ̂MLE ⊆ arg max L(θ ; x1 , . . . , xn ) ,
θ∈Θ

where the likelihood of the parameter given the observed data x1 , . . . , xn is:

L(θ ; x1 , . . . , xn ) = f (x1 , x2 , . . . , xn | θ).

Maximum-likelihood estimators have no optimal properties for finite samples: how-


ever, they do have good limiting properties:
– consistency: the sequence of MLEs converges in probability, for a sufficiently
large number of observations, to the (actual) value being estimated;
– asymptotic normality: as the sample size increases, the distribution of the MLE
tends to the Gaussian distribution with mean on the true parameter (under a num-
ber of conditions);
– efficiency: MLE achieves the Cramer-Rao lower bound [?] when the sample size
tends to infinity, i.e., no consistent estimator has lower asymptotic mean squared
error than MLE.

1.3.4 Propensity

The propensity theory of probability [?, ?, ?, ?, ?, ?, ?, ?, ?] thinks of probability as a


physical propensity or tendency of a physical system to deliver a certain outcome. In
a way, propensity is an attempt to explain why the relative frequencies of a random
experiment turn out to be what they are. The law of large numbers is interpreted
as evidence towards the existence of invariant single-run probabilities, which do
emerge in quantum mechanics, for instance, to which relative frequencies tend at
infinity.
What propensity exactly means remains an open issue. Popper, for instance, has
proposed a theory of propensity which is however plagued by the use of relative
frequencies for its definition [?].

1.3.5 Subjective and Bayesian probability

In epistemic (subjective) probability, probabilities are degrees of belief assigned to


the various events by an individual assessing the state of the world, whereas un-
der frequentist inference, a hypothesis is typically tested without being assigned a
probability.
The most popular theory of subjective probability is perhaps the Bayesian frame-
work [?, 337], due to the English clergyman Thomas Bayes (1702-1761). There, all
degrees of beliefs are encoded by additive mathematical probabilities. It is a special
8 1 Introduction: Theories of Uncertainty

case of evidential probability, in which some prior probability is updated to a pos-


terior probability in the light of new evidence (data). In the Bayesian framework,
Bayes’ rule is sequentially used to compute a posterior distribution when more data
become available, namely whenever we learn that a certain proposition A is true:
P (B ∩ A)
P (B|A) = , (1.1)
P (A)
an operator inextricably related to the notion of conditional probability P (B|A)
[841].
Thomas Bayes proved a special case of what is now called Bayes theorem (1.1) in
a paper titled “An Essay towards solving a Problem in the Doctrine of Chances”.
Pierre-Simon Laplace (1749-1827) later introduced a general version of the theo-
rem. Jeffreys’ “Theory of Probability” (1939) played an important role in the revival
of the Bayesian view of probability, followed by works by Abraham Wald (1950)
and Leonard J. Savage (1954).
De Finetti produced a justification for the Bayesian framework based on the no-
tion of Dutch book. A Dutch book is made when a clever gambler places a set of
bets that guarantee a profit, no matter what the outcome of the bets. If a bookmaker
follows the rules of the Bayesian calculus, de Finetti argued, a Dutch book cannot
be made. It follows that subjective beliefs must follow the laws of (Kolmogorov’s
additive) probability if they are to be coherent. Indeed, Dutch book arguments leave
open the possibility that non-Bayesian updating rules could avoid Dutch books –
one of the purposes of this Book is to show that this is the case. Justification by
axiomatisation has been tried, but with no great success.
Moreover, evidence casts doubts on the assumption that humans maintain coher-
ent beliefs or behave rationally. Daniel Kahneman3 won a Nobel prize supporting
the exact opposite, in collaboration with Amos Tversky. People consistently pursue
courses of action which are bound to damage them, nor do they understand the full
consequences of their actions.
For all its faults (as we discuss later), the Bayesian framework is rather intuitive
and easy to use, and capable of providing a number of ‘off the shelf’ tools to make
inferences or compute estimates from time series.
Bayesian inference In Bayesian inference, the prior distribution is the distribution
of the parameter(s) before any data is observed, i.e. p(θ|α), a function of a vec-
tor of hyperparameters α. The likelihood is the distribution of the observed data
X = {x1 , ..., xN } conditional on its parameters, i.e. p(X|θ). It is termed marginal
likelihood (or ‘evidence’) the distribution of the observed data marginalized over
the parameter(s), namely:
Z
p(X|α) = p(X|θ)p(θ|α) dθ.
θ

The posterior distribution is then the distribution of the parameter(s) after taking
into account the observed data, as determined by Bayes’ rule:
3
https://en.wikipedia.org/wiki/Daniel Kahneman.
1.3 Interpretations of probability 9

p(X|θ)p(θ|α)
p(θ|X, α) = ∝ p(X|θ)p(θ|α). (1.2)
p(X|α)
The posterior predictive distribution is the distribution of a new data point x̃,
marginalized over the posterior:
Z
p(x̃|X, α) = p(x̃|θ)p(θ|X, α) dθ,
θ

amounting to a distribution over possible new data values. The prior predictive dis-
tribution, instead, is the distribution of a new data point marginalized over the prior:
Z
p(x̃|α) = p(x̃|θ)p(θ|α) dθ
θ

By comparison, prediction in frequentist statistics often involves finding an opti-


mum point estimate of the parameter(s) (e.g., by maximum likelihood), not account-
ing for any uncertainty in the value of the parameter. In opposition, (1.2) provides
as output an entire probability distribution over the parameter space.

Maximum A Posteriori Maximum-A-Posteriori (MAP) estimation estimates a sin-


gle value θ for the parameter as the mode of the posterior distribution (1.2):

p(x|θ) p(θ)
θ̂MAP (x) = arg max Z = arg max p(x|θ) p(θ).
θ θ
p(x|ϑ) p(ϑ) dϑ
ϑ

Note that MAP and MLE estimates coincide when the prior g is uniform MAP
estimation is not very representative of Bayesian methods, as the latter are charac-
terized by the use of distributions over parameters to draw inferences. Also, unlike
ML estimators, the MAP estimate is not invariant under reparameterization.

1.3.6 Bayesian versus frequentist inference

Summarising, in frequentist inference unknown parameters are often, but not al-
ways, treated as having fixed but unknown values that are not capable of being
treated as random variates. Bayesian inference allows, instead, probabilities to be
associated with unknown parameters. The frequentist approach does not depend on
a subjective prior that may vary from one investigator to another. However, Bayesian
inference (e.g. Bayes’ rule) can be used by frequentists4 .

Lindley’s paradox Lindley’s paradox is a counterintuitive situation in statistics in


which the Bayesian and frequentist approaches to a hypothesis testing problem give
different results for certain choices of the prior distribution.
More specifically, Lindley’s paradox5 occurs when:
4
www.stat.ufl.edu/c̃asella/Talks/BayesRefresher.pdf
5
onlinelibrary.wiley.com/doi/10.1002/0470011815.b2a15076/pdf
10 1 Introduction: Theories of Uncertainty

– the result x is ‘significant’ by a frequentist test of H0 , indicating sufficient evi-


dence to reject H0 say, at the 5% level, while at the same time
– the posterior probability of H0 given x is high, indicating strong evidence that
H0 is in better agreement with x than H1 .
This can happen when H0 is very specific, H1 less so, and the prior distribution
does not strongly favor one or the other.
It really is not a paradox, but merely a consequence of the fact that the two
approaches answer fundamentally different questions. The outcome of Bayesian in-
ference is typically a probability distribution on the parameters, given the results
of the experiment. The result of frequentist inference is either: a ‘true or false’ (bi-
nary) conclusion from a significance test, or a conclusion in the form that a given
confidence interval, derived from the sample, covers the true value.

1.4 Beyond probability


A long series of students have argued that a number of serious issues arise whenever
uncertainty is handled via Kolmogorov’s measure-theoretical probability theory. On
top of that, one can argue that something is wrong with both mainstream approaches
to probability interpretation.
Before we move on to introduce the mathematics of belief functions and other
theories of uncertainty, we think appropriate to briefly summarise our original take
on these issues.

1.4.1 Something is wrong with probability

Flaws of the frequentistic setting The setting of frequentist hypothesis testing is


rather arguable. First of all, the scope is quite narrow: rejecting or not rejecting a
hypothesis (although confidence intervals can also be provided). The criterion ac-
cording to which this decision is made is arbitrary: who decides what an ‘extreme’
realisation is? In other words, who decides what is the right choice for the value of
α? What is the deal with the ‘magic’ numbers 0.05 and 0.01? In fact, the whole ‘tail
event’ idea comes from the fact that, under measure theory, the conditional probabil-
ity (p-value) of a point outcome is zero – clearly, the framework seems to be trying
to patch up what is instead a fundamental problem with the way probability is math-
ematically defined. Last but not least, hypothesis testing cannot cope with pure data,
without making assumptions on the process (experiment) which generated them.

The issues with Bayesian reasoning Bayesian reasoning is also flawed in a num-
ber of ways. It is extremely bad at representing ignorance: Fisher uninformative
priors, the common way of handling ignorance in a Bayesian setting, lead to dif-
ferent results for different reparameterisations of the universe of discourse. Bayes’
rule assumes the new evidence comes in the form of certainty: ‘A is true’: in the
real world, often this is not usually the case. Finally, model selection is trouble-
some in Bayesian statistics: whilst one is forced by the mathematical formalism to
1.4 Beyond probability 11

pick a prior distribution, there is no clear-cut criterion on how to pick it. In the Au-
thor’s view, this is the result of a confusion between the original description of a
person’s subjective system of beliefs and the way it is updated, and the ‘objectivist’
view of Bayesian reasoning as a rigorous procedure for updating probabilities when
presented with new information.

1.4.2 Pure data: beware of the prior

Indeed, Bayesian reasoning requires modelling the data and a prior. Human beings
do have ‘priors’, which is just a word for denoting what they have learned (or they
think they have learned) about the world throughout their existence. In particular,
they have well sedimented beliefs about the likelihood of various (if not all) events.
There is no need to ‘pick’ a prior, for prior (accumulated) knowledge is indeed
there. As soon as we idealise this mechnism for, say, we want a machine to reason
in this way, we find ourselves forced to ‘pick’ a prior for an entity (an algorithm)
which does not have any past experience, and has not sedimented any beliefs as a
result. Bayesians content themselves by claiming that all will be fine in the end, as,
asymptotically, the choice of the prior does not matter, as proven by he Bernstein-
von Mises theorem [?].

1.4.3 Pure data: designing the universe?

The frequentist approach, on its side, is inherently unable to describe pure data with-
out having to make additional assumptions on the data-generating process. Never-
theless, in Nature one cannot ‘design’ the process which produces the data: data
come our way, whether we want it or not. In the frequentist terminology, we cannot
set the ‘stopping rules’ in most applications (think of driverless cars, for instance).
Again, the frequentist setting recalls the old image of a scientist ‘analysing’ (from
the Greek terms ‘ana’ and ‘lysis’, breaking up) a specific aspect of the world in their
constrained laboratory.
Even more strikingly, it is well known that the same data can lead to opposite
conclusions when analysed in a frequentist way. The reason is that different random
experiments can lead to the same data, whereas the parametric model employed (the
family of probability distributions which is assumed to produce the data) is linked
to a specific experiment6 . Apparently, however, frequentists are just fine with this.

1.4.4 No data: modelling ignorance

The modelling of ignorance (absence of data) is another weakness of Bayesian rea-


soning. The typical solution is to pick a so-called ‘uninformative’ prior distribution,
in particular Jeffrey’s prior, the Gramian of the Fisher information matrix.
6
http://ocw.mit.edu/courses/mathematics/ 18-05-introduction-to-probability-and-statistics-
spring-2014/ readings/MIT18 05S14 Reading20.pdf
12 1 Introduction: Theories of Uncertainty

Unfortunately, Jeffrey’s priors can be improper (unnormalised), and most im-


portantly they violate the strong version of the likelihood principle: when using
Jeffrey’s prior, inferences about a parameter θ depend not just on the probability
of the observed data as a function of θ, but also on the universe Ω of all possible
experimental outcomes, as determined by the experimental design. The reason is
that the Fisher information matrix is computed from an expectation over the chosen
universe of discourse.
In conclusion, uniform priors (and their Jeffrey’s generalisations) depend on the
parameterisation of the sample space, and can therefore lead to different results on
different spaces (reparameterisations), given exactly the same likelihood functions.
This flaw was already pointed out by Glenn Shafer in his landmark book [1149],
where he noted how the Bayesian formalism cannot handle multiple hypothesis
spaces (families of frames, in Shafer’s terminology, see Section ??) in a consistent
way.
In Bayesian statistics, however, one can prove that the asymptotic distribution of
the posterior mode depends only on the Fisher information and not on the prior: the
so-called Bernstein-von Mises theorem. The only little problem is that the amount
of information supplied by a sample of data must be large enough. The result is also
subject to the caveat [?] that the Bernstein-von Mises theorem does not hold almost
surely if the considered random variable has an infinite countable probability space.
As A. W. F. Edwards put it:

“It is sometimes said, in defence of the Bayesian concept, that the choice of
prior distribution is unimportant in practice, because it hardly influences the pos-
terior distribution at all when there are moderate amounts of data. The less said
about this ‘defence’ the better.”

‘Uninformative’ priors can be dangerous, i.e., bias the reasoning process so badly it
can recover only asymptotically7 .
On the other hand, reasoning with belief functions does not require any prior
as belief functions encoding data are combined as they as with no need for priors.
Ignorance is naturally represented in belief function theory by the ‘vacuous’ belief
function, assigning mass 1 to the whole hypothesis space.

1.4.5 Set-valued observations: the clocked die

A die (Figure 1.2) is a simple example of (discrete) random variable. Its probability
space is defined on the sample space Ω = {face1, face 2, · · · , face 6}, whose ele-
ments are mapped to the real numbers 1, 2, ..., 6, respectively (no need to consider
measurability here).
Now, imagine that faces 1 and 4 are cloaked, and we roll the die. How do we
model this new experiment, mathematically? Actually, the probability space has not
7
http://andrewgelman.com/2013/11/21/hidden-dangers-noninformative-priors/
1.4 Beyond probability 13

Fig. 1.2. Left: the random variable associated with a die. Right: the random set (set-valued
random variable) associated with the cloaked die in which faces 1 and 4 are not visible.

changed (as the physical die has not been altered, its faces still have the same prob-
abilities). What has changes is the mapping: since we cannot observe the outcome
when a cloaked face is shown (we assume that only the top face is observable), both
face 1 and face 4 (as elements of Ω) are mapped to the set of possible values {1, 4}.
Mathematically, this is called a random set [?, ?, ?], i.e., a set-valued random vari-
able.
A more realistic scenario is that in which we roll, say, four dice in such a way
that for some of them, their top face is occluded, but some of the side faces are still
visible, providing information on the outcome. For instance, I see the top face of
Red die , Green die and Purple die but, say, I cannot see the outcome of Blue
die. However, I see sides faces and of Blue, therefore the outcome of Blue is
the set {2, 4, 5, 6}.
14 1 Introduction: Theories of Uncertainty

This is just an example of a very common situation called missing data: for part of
the sample I observe in order to make my inference, data are partly or totally miss-
ing. Missing data appears (or disappears?) everywhere in science and engineering.
In computer vision, for instance, this phenomenon is called ‘occlusion’ and is one
of the main nuisance factors in estimation.
The bottom line is, whenever data are missing, observations are inherently set-
valued. Mathematically, we are not sampling a (scalar) random variable but a set-
valued random variable – a random set. My outcomes are sets? My probability dis-
tribution has to be defined over sets.
In opposition, traditional statistical approaches deal with missing data either by:
deletion (discarding any case that has a missing value, which may introduce bias or
affect the representativeness of the results); single imputation (replacing a missing
value with another one, e.g. from a randomly selected similar record in the same
dataset, with the mean of that variable for all other cases, or by using a stochas-
tic regression model); multiple imputation (averaging the outcomes across multiple
imputed data sets using, for instance, stochastic regression). Multiple imputation
involves drawing values of the parameters from a posterior distribution, therefore
simulating both the process generating the data and the uncertainty associated with
the parameters of the probability distribution of the data.
When using random sets, there is no need for imputation or deletion whatsoever.
All observations are set-valued, some of them just happen to be pointwise. Indeed,
when part of the data used to estimate the desired probability distribution is missing,
the resulting constraint is a credal set [837] of the type associated with a belief
function [?].

1.4.6 Propositional data

Just as measurements are naturally set-valued, in various scenarios evidence is di-


rectly supportive of proposations. Consider the following classical example [].
Suppose there is a murder, and three people are under trial for it: Peter, John
and Mary. Our hypothesis space is therefore: Θ = {Peter, John, Mary}. There is a
witness: he testifies that the person he saw was a man. This amounts to supporting
the proposition A = {Peter, John} ⊂ Θ, however: should we take this testimony at
face value? In fact, the witness was tested and the machine reported an 80% chance
he was drunk when he reported the crime. As a result, we should partly support
the (vacuous) hypothesis that any one among Peter, John and Mary could be the
murderer. It seems sensible to assign 80% chance to proposition A, and 20% chance
to proposition Θ.
This example tells us that, even when the evidence (our data) supports whole
propositions, Kolmogorov’s additive probability theory forces us to specify support
for individual outcomes. This is unreasonable – an artificial constraint due to a math-
ematical model that is not general enough. In the example, we have no elements to
assign this 80% probability to either Peter or John, nor information on how to dis-
tribute it among them. The cause is the additivity constraint probability measures
are subject to.
1.4 Beyond probability 15

Kolmogorov’s probability measures, however, are not the only or the most general
type of measure available for sets. Under a minimal requirement of monotoniticy,
any measure can potentially be suitable to describe probabilities of events: these
objects are called capacities. We will study capacities in more detail in Chapter ??.
For the moment, it suffices to note that random sets are capacities, those for which
the numbers assigned to events are given by a probability distribution. As capacities
(and random sets in particular), belief functions therefore allow us to assign mass
directly to propositions.

1.4.7 Scarce data: beware the size of the sample

The current debate on the likelihood of biological life in the universe is an extreme
example of inference from very scarce data. How likely is for a planet to give birth
to life forms? Modern analysis of planetary habitability is largely an extrapolation
of conditions on Earth and the characteristics of the Solar System: a weak form of
the old anthropic principle, so to speak.
What people seem to do is model perfectly the (presumed) causes of the emergence
of life on Earth: the planet needs to circle a G-class star, in the right galactic neigh-
borhood, it needs to be in a certain habitable zone around a star, have a large moon
to deflect hazardous impact events ... The question arises: how much can one learn
from a single example? More, how much can one be sure about what they learned
from very few examples?
Another example is provided by the field of machine learning, which is about de-
signing algorithms that can learn from what they observe. The problem is, machine
learning algorithms are typically trained on ridicously small amount of data, com-
pared to wealth of information truly contained in the real world8 . For instance, action
recognition tools are trained (and tested) over benchmark datasets that contain, at
best, a few tens of thousands of videos – compare that to the billions of videos one
can access on YouTube. How can we make sure they learn the right lesson? Should
they not aim to work with sets of models rather than precise ones?

Constraints on ‘true’ distributions From a statistical point of view, a somewhat


naive objection stresses that, even assuming that the natural description of the vari-
ability of phenomena is a probability distribution, under the law of large numbers
probability distributions are the outcome of an infinite process of evidence accu-
mulation, drawn from an infinite series of samples. In all practical cases, then, the
available evidence may only provide some sort of constraint on the unknown, ‘true’
probability governing the process. [1155]. Klir [], among others, has argued that
“...imprecision of probabilities is needed to reflect the amount of information on
which they are based. [This] imprecision should decrease with the amount of [avail-
able] statistical information.”
Unfortunately, those who believe probabilities to be limits of relative frequen-
cies (the frequentists) never really ‘estimate’ a probability from the data the only
8
thebayesianobserver.wordpress.com.
16 1 Introduction: Theories of Uncertainty

assume (‘design’) probability distributions for their p-values, and test their hypothe-
ses on them. In opposition, those who do estimate probability distributions from the
data (the Bayesians) do not think of probabilities as infinite accumulations of evi-
dence, but as degrees of belief, and content themselves with being able to model the
likelihood function of the data.
What is true is that both frequentists and Bayesians seem to be happy with solving
their problems ‘asymptotically’: thanks to the limit properties of maximum like-
lihood estimation, and the Bernstein-von Mises theorem’s guarantees on the limit
behaviour of posterior distributions.
Clearly this does not fit at all with novel applications of AI, for instance, in which
machines need to make decisions on the spot to the best of their abilities.
Logistic regression Actually, frequentists do estimate probabilities from scarce
data when they do stochastic regression.
Logistic regression allows us, given a sample Y = {Y1 , ..., Yn }, X = {x1 , ..., xn }
where Yi ∈ {0, 1} is a binary outcome at time i and xi is the corresponding mea-
surement, to learn the parameters of a conditional probability relation between the
two, of the form:
1
P (Y = 1|x) = , (1.3)
1 + e−(β0 +β1 x)
where β0 and β1 are two scalar parameters. Given a new observation x, (1.3) delivers
the probability of a positive outcome Y = 1.
Logistic regression generalises deterministic linear regression, as it is a function of
the linear combination beta0 + β1 x. The n trials are assumed independent but not
equally distributed, for πi = P (Yi = 1|xi ) varies with the index i (i.e., the time
instant of collection).
The parameters β0 , β1 of the logistic function are estimated by maximum like-
lihood of the sample, where the likelihood is given by:
n
Y
L(β|Y ) = πiYi (1 − πi )Yi .
i=1

Unfortunately, logistic regression suffers when number of samples is insufficient


or when there are too few positive outcomes (1s) []. Also, inference by logistic
regression tends to underestimate the probability of a positive outcome (see 1.4.8).
Confidence intervals A major tool by which frequentists deal with the size of the
sample is confidence intervals.
Let X be a sample from a probability P (.|θ, φ) where θ is the parameter to
be estimated and φ a nuisance parameter. A confidence interval for the parameter
θ, with confidence level γ, is an interval [u(X), v(X)] determined by the pair of
random variables u(X) and v(X), with the property:
P(u(X) < θ < v(X)|θ, φ) = γ ∀(θ, φ).
For instance, suppose we observe the weight of 25 cups of tea, and we assume it
is normally distributed with mean µ.Since the (normalised) sample mean Z is also
1.4 Beyond probability 17

normally distributed, we can ask what values of the mean are such that P (−z ≤
X−µ
Z ≤ z) = 0.95 (for instance). Since Z = σ/√ , this yield an interval for µ, e.g.
n

P (X − 0.98 ≤ µ ≤ X + 0.98).

Confidence intervals are a form of interval estimate. Their correct interpretation


is about ‘sampling samples’: if we keep extracting new sample sets, 95% (say) of the
time the confidence interval (which will differ for every new sample set) will cover
the true value of the parameter. Alternatively, there is a 95% probability that the
calculated confidence interval from some future experiment encompasses the true
value of the parameter. We cannot say, instead, that a specific confidence interval is
such that it contains the value of the parameter with 95% probability. A Bayesian
version of them exists, called credible intervals [].

1.4.8 Unusual data: rare events

While scarce data denotes situations in which data are of insufficient quantity, rare
events [902] is a term which indicates cases in which the training data are of in-
sufficient quality, in the sense that they do not reflect well enough the underlying
distribution. An equivalent term, coined by Nassim Nicholas Taleb, is ‘black swan’.
It refers to an unpredictable event (also called ‘tail risk’) which, once occurred, is
(wrongly) rationalised in hindsight as being predictable/describable by the exist-
ing risk models. Basically, Knightian uncertainty is presumed to not exist, typically
with extremely serious consequences. Examples include: financial crises, plagues,
but also unexpected scientific or societal developments. In the most extreme cases,
these events may have never even occurred (this is the case of the question ‘will
your vote will be decisive in the next presidential election?’ pose in [?, ?]).
What does consitute a ‘rare’ event? Clearly we are only interested in them be-
cause they are not so rare, after all. We can say that an event is ‘rare’ when it covers
a region of the hypothesis space which is seldom sampled. Although they rarely
take place when considering a single system, while becoming a tangible possibility
when assembling very many systems together (as it is the case in the real world).
Given the rarity of samples of extreme behaviours (tsunami, meltdowns), scientists
are forced to infer probability distributions for these systems’ behaviour using in-
formation captured in ‘normal’ times (e.g. while a nuclear power plant is working
just fine). Using these distributions to extrapolate results at the ‘tail’ of the curve via
popular statistical procedures (e.g. logistic regression, Section 1.4.7) may then lead
to sharply underestimating the probability of rare events []. In response, Harvard’s
G. King [?] proposed corrections to logistic regression based on oversampling rare
events (represented by 1s) with respect to normal ones (0s). Other people prefer
to drop generative probabilistic models entirely, in favour of discriminative ones
[?, ?]. Once again, we fail to understand the root cause of the problem, namely that
uncertainty affects our very models of uncertainty.
Rather, we should aim at explictly modelling second-order (Knightian) uncer-
tainties. The most straightforward way of doing this, is to consider sets of probability
18 1 Introduction: Theories of Uncertainty

distributions as models for the problem. Mathematically, belief functions (and their
random sets generalisation) amount indeed to (convex) sets of probability distribu-
tions – objects which go under the name of credal sets.

1.4.9 Uncertain data

When discussing how different frameworks cope with scarce or unusual data, we
always implicitly assumed that information comes in in the form of certainty: e.g., I
measure vector x, so that my conditioning event is A = {x} and I can apply Bayes’
rule to update my state of the world. Indeed this is the way Bayes’ rule is used by
Bayesians to reason (in time) when new evidence becomes available. Frequentists,
on the other hand, use it to condition a parametric distribution on the gathered (cer-
tain) measurements and generate their p-values (recall Section 1.3.3).
This is quite reasonable or even correct in many situations: in science and engineer-
ing measurements, which are assumed to be accurate, flow in as a form of certain
evidence, so that one can apply Bayes’ rule to condition a parametric model given
a time series of measurements x1 , ..., xT to construct likelihood functions (or p-
values, if you are a frequentist).

Fuzzy data In many real world problems, though, the information provided cannot
be put in a similar form. For instance, concepts themselves can be not well defined,
e.g. ‘this object is dark’ or ‘it is somewhat round’: in the literature, this is referred
to as qualitative data. Qualitative data is common in decision making, in which
expert surveys act as sources of evidence, but can hardly be put into the form of
measurements being equal to sharp values.
As we will see in Chapter ??, fuzzy theory [] is able to account for not-well-defined
concepts via the notion of graded membership of a set (e.g. by assigning every
element of the sample space a certain degree of membership in any given set).

Unreliable data Thinking of measurements produced by sensor equipment as ‘cer-


tain’ pieces of information is also an idealisation. Sensors are not perfect but come
with a certain degree of reliability. Unreliable sensors can then generate faulty (out-
lier) measurements: can we still treat these data as ‘certain’? They can rather be
assimilated to false statements issued with apparent confidence.
It then seems to be more sensible to attach to any measurements a degree of relia-
bility, based on the past track record of the data generating process producing them.
The question is: can we still update our knowledge state using partly reliable data in
the same way as we do with certain propositions, i.e. by conditioning probabilities
via Bayes rule?

Likelihood data Last but not least, evidence is often directly provided in the
form of whole probability distributions. For instance, ‘experts’ (e.g., medical doc-
tors) tend to express themselves directly in terms of chances of an event happen-
ing (e.g. ‘diagnosis A is most likely given the symptoms, otherwise it is either
A or B’, or ‘there is an 80% chance this is a bacterial infection’). If the doctors
1.4 Beyond probability 19

were frequentists, provided with the same data, they would probably apply logis-
tic regression and come up with the same prediction on the conditional probability
P (disease|symptoms): unfortunately, doctors are not statisticians.
In addition, some sensors also provide as output a PDF on the same sample space:
think of two separate Kalman filters based one on color, the other on motion (optical
flow), providing a Gaussian predictive PDF on the location of a target in an image.

Jeffrey’s rule of conditioning Jeffrey’s rule of conditioning [] is a step forward


from certainty and Bayes’ rule, towards being able to cope with uncertain data,
in particular when the latter comes in the form of another probability distribution.
According to this rule, an initial probability P ‘stands corrected’ by a second prob-
ability P 0 , defined only on a certain number of events.
Namely, suppose that P is defined on a σ-algebra A, and that there is a new
probability measure P 0 on a sub-algebra B of A.
If we impose that the updated probability P 00
1. meet the probability values specified by P 0 for events in B, and
2. be such that ∀ B ∈ B, X, Y ⊂ B, X, Y ∈ A
(
P (X)
P 00 (X) if P (Y ) > 0
00
= P (Y )
P (Y ) 0 if P (Y ) = 0

then the problem has a unique solution, given by:


X
P 00 (A) = P (A|B)P 0 (B). (1.4)
B∈B

Equation (1.4) is sometimes called also the law of total probability, and obviously
generalises Bayesian conditioning (obtained when P 0 (B) = 1 for some B).

Beyond Jeffrey’s rule What if the new probability P 0 is defined on the same σ-
algebra A? Jeffrey’s rule cannot be applied. As we discussed, this does happen when
multiple sensors provide predictive PDFs on the same sample space.
Belief functions deal with uncertain evidence by moving away from the concept
of conditioning (e.g., via Bayes’ rule), to that of combining pieces of evidence simul-
taneously supporting multiple propositions to various degrees. While conditioning
is an inherently asymmetric operation, in which the current state of the world and
the new evidence are represented by a probability distribution and a single event,
respectively, combination in belief function reasoning is completely symmetric, as
both the current beliefs about the state of the world and the new evidence are both
represented by a belief function.
Belief functions naturally encode uncertain evidence of the kinds discussed
above (vague concepts, unreliable data, likelihoods) as well as they represent tra-
ditional ‘certain’ events. Vague, ‘fuzzy’ concepts are represented in the formalism
by consonant belief functions, in which supported events are nested – unreliable
measurements can be naturally portrayed as ‘discounted’ probabilities (see Section
??).
20 1 Introduction: Theories of Uncertainty

1.4.10 Knightian uncertainties

Second order uncertainty is real, as demonstrated by its effect on human behaviour,


especially when it comes to decision-making. A classical example of how Knightian
uncertainty empirically affects human decision making is provided by Ellsberg’s
paradox [].

Ellsberg’s paradox A decision problem can be formalized by defining:


– a set Ω of states of the world;
– a set X of consequences;
– a set F of acts, where an act is a function f : Ω → X .
Let < be a preference relation on F, such that f < g means that f is at least as
desirable as g. Given f, h ∈ F and E ⊆ Ω, let f Eh denote the act defined by

(f Eh)(ω) = f (ω) if ω ∈ E; h(ω) if ω 6∈ E (1.5)

Savage’s Sure Thing Principle [] states that ∀E, ∀f, g, h, h0 :

f Eh < gEh ⇒ f Eh0 < gEh0 .

Now, suppose you have an urn containing 30 red balls and 60 balls, either black
or yellow. Then, consider the following gambles:
– f1 : you receive 100 euros if you draw a red (R) ball;
– f2 : you receive 100 euros if you draw a black (B) ball;
– f3 : you receive 100 euros if you draw a red or a yellow (Y) ball;
– f4 : you receive 100 euros if you draw a black or a yellow ball.
In this example Ω = {R, B, Y }, fi : Ω → R and X = R (consequences are
measured in terms of monetary returns). The four acts correspond to the mappings
in the following table:
R B Y
f1 100 0 0
f2 0 100 0
f3 100 0 100
f4 0 100 100
Empirically, it is observed that most people strictly prefer f1 to f2 , while strictly
preferring f4 to f3 . Now, pick E = {R, B}. By definition (1.5):

f1 {R, B}0 = f1 , f2 {R, B}0 = f2 , f1 {R, B}100 = f3 , f2 {R, B}100 = f4 .

Since f1 < f2 , i.e., f1 {R, B}0 < f2 {R, B}0, the Sure Thing principle would im-
ply that f1 {R, B}100 < f2 {R, B}100, i.e., f3 < f4 .
In conclusion, the Sure Thing Principle is empirically violated: this is what consti-
tutes the so-called Ellsberg paradox.
1.4 Beyond probability 21

Aversion to ‘uncertainty’ The argument has been widely studied in economics and
decision making9 , and has to do with people’s instinctive aversion to (second-order)
uncertainty. They favour f1 over f2 for the former ensures a guaranteed 1/3 chance
of winning, while the latter is associated with a (balanced) interval of chances be-
tween 0 and 2/3. Although the average probability of success is still 1/3, the lower
bound is 0 - people tend to find that unacceptable.
Investors, for instance, are known to favour ‘certainty over uncertainty. This was
shown, for instance, in their reaction to ‘brexit’, the UK referendum on leaving the
European Union.

“In New York, a recent meeting of SP Investment Advisory Services five-strong


investment committee decided to ignore the portfolio changes that its computer-
driven investment models were advising. Instead, members decided not to make any
big changes ahead of the vote.”10

Does certainty, in this context, mean a certain outcome of their bets? Certainly not.
It means being confident that their models can handle the observed patterns of vari-
ation.
Climatic change An emblematic application in which second-order uncertainty
is paramount is climatic change models. Admittedly, this constitutes an extremely
challenging decision making problem, where decision makers need to decide whether
to invest billions of dollars/euros/pounds on expensive engineering projects to mit-
igate the effects of climate change, knowing the outcomes of their decision will be
known only in twenty to thirty years time.
Rather surprisingly, the mainstream in climatic change is not about explicitly
modelling uncertainty at all: the onus is really on developing ever more complex
dynamical models of the environment and validate their predictions. This is all the
more surprising as it is well known that even deterministic (but nonlinear) models
tend to display chaotic behaviour, which induces uncertainty on predictions of their
future state whenever initial conditions are not known with certainty.
Climatic change, in particular, requires making predictions very far off in the future:
as dynamical models are obviously much simplified versions of the world: as such,
they become more and more inaccurate as time passes.
What are the challenges of modelling statistical uncertainty explicitly, in this
context? First of all, the lack of priors (ouch, Bayesians) for the climate space, whose
points are very long vectors whose components are linked by complex dependen-
cies. Data are also relatively scarce, especially as we go back in time: as we just saw,
scarcity is a source of Knightian uncertainty as it puts constraints on our ability to
estimate probability distributions.
Finally, hypothesis testing cannot really be used, either (too bad, frequentists): this
is clearly not a designed experiment where one can make sensible assumptions on
the underlying data-generating mechanism.
9
http://www.econ.ucla.edu/workingpapers/wp362.pdf
10
http://www.wsj.com/articles/global-investors-wake-up-to-brexit-threat-1466080015
22 1 Introduction: Theories of Uncertainty

1.5 Mathematics (plural) of uncertainty


Summarising, something is wrong with both mathematical probability and its most
common interpretations. Kolmogorov’s measure-theoretical probability theory is
not quite general enough, for additive probability measures cannot (properly) model
missing or propositional data, nor can they explicitly model second-order uncer-
tainty.
As for its frequentist interpretation, it is utterly incapable of modelling ‘pure’ data
(without ‘designing’ the experiment which generates it). In a way, it cannot even
properly model continuous data (due to the fact that, under measure theoretical
probability, every point of a continuous domain has zero probability), and has to
resort to the invention of ‘tail’ events to assess its own hypotheses. It can model
scarce data only asymptotically.
Besyaian reasoning also has many serious limitations: it just cannot model igno-
rance (absence of data); it cannot model pure data (without artificially introducing
a prior); it cannot model ‘uncertain’ data, i.e. information not in the form of propo-
sitions of the kind ‘A is true’; again, it models scarce data only asymptotically via
the Bernstein-von Mises theorem [?].

1.5.1 A variety of proposals

Similar reflections have led numerous scientists to recognise the need for a coherent
mathematical theory of uncertainty able to tackle all these aspects. Both alternatives
to and extensions of classical probability theory have been proposed, starting from
De Finetti’s pioneering work on subjective probability [478].
Formalisms include possibility-fuzzy set theory [1531, 412], probability intervals
[592], credal sets, monotone capacities [1390], random sets [984] and imprecise
probability theory [1371]. New original foundations of subjective probability in be-
havioral terms [1374] or by means of game theory [1176] have been brought for-
ward.
Also referred to as imprecise probabilities (as most of them comprise classical prob-
abilities as a special case) they form in fact, as we will see in more detail in Chapter
5, an entire hierarchy of encapsulated formalisms.

1.5.2 Belief functions and random sets

The theory of belief functions or ‘theory of evidence’ is one of the most popular
such formalisms for a mathematics of uncertainty.
The notion of belief function originally derives from a series of seminal works
[336, 344, 345] by Arthur Dempster on upper and lower probabilities induced by
multi-valued mappings. Given a probability distribution p on a given domain, and a
one-to-many map x 7→ Γ (x) to another domain, the original probability induces a
probability distribution on the power set of the bottom domain [336], i.e., a ‘random
set’ [925, 984]. The term ‘belief function’ was coined when Glenn Shafer [1149, ?]
1.5 Mathematics (plural) of uncertainty 23

adopted these mathematical objects to represent evidence in the framework of sub-


jective probability, and gave an axiomatic definition of them as non-additive (in-
deed, super-additive) probability measures. In a rather controversial interpretation,
rejected by many (including Shafer), belief functions can also be seen as a special
case of credal set: as they determine a lower and an upper bound to the probability
of each event A, they are naturally associated with the convex set of probabilities
which ‘dominate’ them. The main (but not the only) problem with this interpreta-
tion is that it is not compatible with Shafer’s original proposal (‘Dempster’s rule’)
for the combination of BFs generated by different pieces of evidence.
The theory of belief functions is appealing because it addresses all the above
mentioned issues with the handling of uncertainty: it does not assume an infinite
amount of evidence to model imprecision, but uses all the available partial evidence;
it represents ignorance in a natural way, by means of the mass assigned to the whole
decision space or ‘frame’, and deals with the problem of having to represent uncer-
tainty on different but compatible domains; it copes with missing data in the most
natural of ways.
Furthermore, as a straightforward generalization of probability theory, its rationale
is rather neat and does not require to entirely abandon the notion of event (like Wal-
ley’s imprecise probability theory [1371]). It contains as special cases both fuzzy
set theory and possibility theory.

1.5.3 Belief, evidence and probability

Shafer called his 1976 proposal ‘A mathematical theory of evidence’ [1149], while
the mathematical objects it deals with are called ‘belief functions’. Where do these
names come from, and what interpretation of probability (in its wider acception) do
they entail?
Indeed, belief theory is a theory of epistemic probability: it is about probabilities
as a mathematical representation of knowledge (never mind a humans knowledge,
or a machines). Belief is often defined as the state of mind in which a person thinks
something to be the case, with or without there being empirical evidence in support.
Knowledge is rather more controversial a notion, for it is regarded by some as the
part of belief that is true, while other consider it as that part of belief which is justi-
fied to be true. Epistemology is the branch of philosophy concerned with the theory
of knowledge. Epistemic probability is the study of probability as a representation
of knowledge.
The theory of evidence is also, as the name itself suggests, a theory of eviden-
tial probability: one in which the probabilities representing knowledge are induced
(‘elicited’) by the available evidence. In probabilistic logic [], statements such as
‘hypothesis H is probably true’ are interpreted to mean that the empirical evidence
E supports hypothesis H to a high degree – this degree of support is called the epis-
temic probability of H given E.
As a matter of fact, Pearl [] and others have supported a view of belief functions
as probabilities on the logical causes of a certain proposition (the so-called proba-
24 1 Introduction: Theories of Uncertainty

bility of provability interpretation), closely related to modal logic []. To be fair, this
connection to evidence has often been overlooked in much of the subsequent work.
In conclusion, the rationale for belief function theory can be summarised as
follows: there exists evidence in the form of probabilities, which supports degrees
of belief on the matter at hand. The space where the (probabilistic) evidence lives
is different from the hypothesis space (where belief measures are defined). The two
spaces are linked by a map one to many, yielding a mathematical object known as
random set [].

1.6 Structure of this Book


The Book is articulated into three Parts.
Part I, entitled ‘Theories of uncertainties’ is a rather extensive recapitulation of
the current state of the art in the mathematics of uncertainty, with a focus on belief
theory.
Chapter 2 provides a succint summary of the basic definitions provided by Shafer.
Chapter 3 digs deeper into understanding what belief functions are, by describing
their multiple semantics, discussing the genesis of the approach and the subsequent
debate, and illustrating the main frameworks proposed by a number of authors which
use belief theory as a basis, while developing it further in sometimes original ways.
Chapter 4 can be thought of as a reference manual for the working scientist. It illus-
trates in detail all the elements of the evidential reasoning chain, delving into all the
aspects including inference, conditioning and combination, efficient computation,
decision making, continuous formulations, what tools are available for estimation
or classification, and finally some interesting advances in the mathematics of belief
functions.
Chapter 5 is designed to give the reader a bigger picture of the whole field of
uncertainty theory, but reviewing all the main formalisms (the most significant of
which are arguably Walley’s imprecise probability, the theory of capacities and
fuzzy/possibility theory), with special attention to their relationship with belief and
random set theory.
Part II (‘The geometry of uncertainty’) is probably the core of the Book, as it
introduces the Author’s geometric approach to uncertainty theory, starting with the
geometry of belief functions.
Chapter 6 studies the geometry of the space of belief functions, or belief space, both
in terms of a simplex (a higher-dimensional triangle) and of its recursive bundle
structure.
Chapter 7 extends the analysis to Dempster’s rule of combination, introducing the
notion of conditional subspace and outlining a simple geometric construction for
Dempster’s sum.
Chapter 8 delves into the combinatorial properties of plausibility and commonality
functions, as equivalent representations of the evidence carried by a belief function.
It shows that the corresponding spaces also behave like simplices, which are con-
gruent to the belief space.
1.6 Structure of this Book 25

The remaining Chapter 9 starts extending the applicability of the geometric ap-
proach to other uncertainty measures, focussing in particular on possibility measures
(consonant belief functions), and the related notion of consistent belief function.
Part III is concerned with the interplay of uncertainty measures of different
kinds, and the geometry of their relationship.
Chapters 10 and Chapter 11 study the problem of transforming a belief function into
a classical probability measure. In particular, Chapter 10 introduces the affine fam-
ily of probability transformations, whose which commute with affine combination
in the belief space.
Chapter 11 focusses instead on the epistemic family of tranforms, relative belief and
relative plausibility, studies their dual properties with respect to Dempster’s sum,
and describes their geometry on both the probability simplex and the belief space.
Chapter 12 extends the analysis to the consonant approximation problem, the prob-
lem of finding the possibility measure which best approximates a given belief func-
tion. In particular, approximations induced by classical Lp norms are derived, and
compared with classical outer consonant approximations.
Chapter 13 concludes Part III by describing Lp consistent approximations in both
the mass and the belief space.
Part IV
Part I

Theories of uncertainty
Shafer’s belief functions
2

The theory of evidence [1149] was introduced in the Seventies by Glenn Shafer
as a way of representing epistemic knowledge, starting from a sequence of semi-
nal works ([336], [344], [345]) by Arthur Dempster, Shafer’s advisor [349]. In this
formalism the best representation of chance is a belief function (b.f.) rather than a
classical probability distribution. Belief functions assign probability values to sets
of outcomes, rather than single events: their appeal rests on their ability to naturally
encode evidence in favor of propositions.
The theory embraces the familiar idea of assigning numbers between 0 and 1 to
measure degrees of support but, rather than focusing on how these numbers are de-
termined, it concerns itself with the mechanisms driving the combination of degrees
of belief.
The formalism provides indeed a simple method for merging the evidence carried
by a number of distinct sources (called Dempster’s rule [580]), with no need for

29
30 2 Shafer’s belief functions

any prior distributions [1423]. In this sense, according to Shafer, it can be seen
as a theory of probable reasoning. The existence of different levels of granularity
in knowledge representation is formalized via the concept of family of compatible
frames.
As we recall in this Chapter, the Bayesian framework (see Chapter ??, Section
1.3.5) is actually contained in the theory of evidence as a special case, since:
1. Bayesian functions form a special class of belief functions, and
2. Bayes’ rule is a special case of Dempster’s rule of combination.
In the following we will neglect most of the emphasis Shafer put on the notion of
‘weight of evidence’, which in our view is not strictly necessary to the comprehen-
sion of what follows.

2.1 Belief functions as set functions


Following Shafer [1149] we call the finite set of possibilities/outcomes frame1 of
discernment (FOD).

2.1.1 Basic probability assignment

Definition 2. A basic probability assignment (b.p.a.) [42] over a FOD Θ is a set


function [424, 357, 424] m : 2Θ → [0, 1] defined on the collection 2Θ of all subsets
of Θ such that: X
m(∅) = 0, m(A) = 1.
A⊂Θ

The quantity m(A) is called the basic probability number or ‘mass’ [797, 796] as-
signed to A, and measures the belief committed exactly to A ∈ 2Θ . The elements of
the power set 2Θ associated with non-zero values of m are called the focal elements
of m and their union is called its core:
. [
Cm = A. (2.1)
A⊆Θ:m(A)6=0

Now suppose that empirical evidence is available so that a basic probability assign-
ment can be introduced over a specific FOD Θ.
Definition 3. The belief function associated with a basic probability assignment
m : 2Θ → [0, 1] is the set function b : 2Θ → [0, 1] defined as:
X
b(A) = m(B). (2.2)
B⊆A

1
For a note about the intuitionistic origin of this denomination see Rosenthal, Quantales
and their applications [1095].
2.1 Belief functions as set functions 31

The domain Θ on which a belief function is defined is usually interpreted as the set
of possible answers to a given problem, exactly one of which is the correct one. For
each subset (‘event’) A ⊂ Θ the quantity b(A) takes on the meaning of degree of
belief that the truth lies in A, and represents the total belief committed to a set of
possible outcomes A by the available evidence m.
Example: the Ming vase. A simple example (from [1149]) can clarify the notion
of degree of belief. We are looking at a vase that is represented as a product of
the Ming dynasty, and we are wondering whether the vase is genuine. If we call
θ1 the possibility that the vase is original, and θ2 the possibility that it is indeed
counterfeited, then
Θ = {θ1 , θ2 }
is the set of possible outcomes, and

∅, Θ, {θ1 }, {θ2 }
is the (power) set of all its subsets. A belief function b over Θ will represent the
degree of belief that the vase is genuine as b({θ1 }), and the degree of belief the
vase is a fake as b({θ2 }) (note we refer to the subsets {θ1 } and {θ2 }). Axiom 3
of Definition 25 poses a simple constraint over these degrees of belief, namely:
b({θ1 }) + b({θ2 }) ≤ 1. The belief value of the whole outcome space Θ, therefore,
represents evidence that cannot be committed to any of the two precise answers θ1
and θ2 and is therefore an indication of the level of uncertainty about the problem.
As the Ming vase example illustrates, belief functions readily lend themselves
to the representation of ignorance, in the form of the mass assigned to the whole set
of outcomes (FOD). Indeed, the simplest belief function assigns all the basic prob-
ability to the whole frame Θ and is called vacuous belief function.
Bayesian theory, in comparison, has trouble with the whole idea of encoding igno-
rance, for it cannot distinguish between ‘lack of belief’ in a certain event A (1−b(A)
in our notation) and ‘disbelief’ (the belief in the negated event Ā = Θ \ A). This is
due to the additivity constraint: P (A) + P (Ā) = 1.
The Bayesian way of representing the complete absence of evidence is to assign an
equal degree of belief to every outcome in Θ. As we will see in this Chapter, Section
2.7.2, this generates incompatible results when considering different descriptions of
the same problem at different levels of granularity.

Moebius inversion formula Given a belief function b there exists a unique basic
probability assignment which induces it. The latter can be recovered by means of
the Moebius inversion formula2 :
X
m(A) = (−1)|A\B| b(B). (2.3)
B⊂A

Expression (2.3) establishes a 1-1 correspondence between the two set functions m
and b [537].
2
See [1297] for an explanation in term of the theory of monotone functions over partially
ordered sets.
32 2 Shafer’s belief functions

2.1.2 Plausibility functions or upper probabilities

Other expressions of the evidence generating a given belief function b are what can
.
be called the degree of doubt d(A) = b(Ā) on an event A and, more importantly,
the upper probability of A:
.
pl(A) = 1 − d(A) = 1 − b(Ā), (2.4)

as opposed to the lower probability of A, i.e., its belief value b(A). The quantity
pl(A) expresses the ‘plausibility’ of a proposition A or, in other words, the amount
of evidence not against A [260]. Once again the plausibility function pl : 2Θ →
[0, 1] conveys the same information as b, and can be expressed as
X
pl(A) = m(B) ≥ b(A).
B∩A6=∅

Example As an example, suppose a belief function on a frame Θ = {θ1 , θ2 , θ3 } of


cardinality three has two focal elements B1 = {θ1 , θ2 } and B2 = {θ1 } as in Figure
2.1, with b.p.a. m(B1 ) = 1/3, m(B2 ) = 2/3.
Then, for instance, the belief value of A = {θ1 , θ3 } is:
X
b(A) = m(B) = m({θ1 }) = 2/3, (2.5)
B⊆{θ1 ,θ3 }

while b({θ2 }) = m({θ2 }) = 0 and b({θ1 , θ2 }) = m({θ1 }) + m({θ1 , θ2 }) =


2/3 + 1/3 = 1 (so that the ‘core’ of the considered belief function is C = {θ1 , θ2 }).

Fig. 2.1. An example of (consonant, see Section 2.8) belief function on a frame of discern-
ment Θ = {θ1 , θ2 , θ3 } of cardinality 3, with focal elements B2 = {θ1 } ⊂ B1 = {θ1 , θ2 }.

To appreciate the difference between belief (lower probability) and plausibility


(upper probability) values, let us focus in particular on the event A0 = {θ1 , θ3 }. Its
belief value (2.5) represents the amount of evidence which surely supports {θ1 , θ3 },
and is guaranteed to involve only elements of A0 .
On the other side, its plausibility value:
2.2 Dempster’s rule of combination 33
X
pl({θ1 , θ3 }) = 1−b({θ1 , θ3 }c ) = m(B) = m({θ1 })+m({θ1 , θ2 }) = 1
B∩{θ1 ,θ3 }6=∅

accounts for the mass that might be assigned to some element of A0 , and measures
the evidence not surely against it.

2.1.3 Bayesian theory as a limit case

Confirming what said when discussing the superadditivity axiom (3.6), in the theory
of evidence a (finite) probability function is simply a belief function satisfying the
additivity rule for disjoint sets.
Definition 4. A Bayesian belief function b : 2Θ → [0, 1] meets the additivity condi-
tion:
b(A) + b(Ā) = 1
whenever A ⊆ Θ.
Obviously, as it meets the axioms of Definition 25, a Bayesian belief function is
indeed a belief function. It can be proved that [1149]:
Θ
P function b : 2 → [0, 1] is Bayesian if and only if ∃ p :
Proposition 1. A belief
Θ → [0, 1] such that θ∈Θ p(θ) = 1 and:
X
b(A) = p(θ) ∀A ⊆ Θ.
θ∈A

2.2 Dempster’s rule of combination

Belief functions representing distinct bodies of evidence can be combined by means


of Dempster’s rule of combination [423], also called orthogonal sum.

2.2.1 Definition

Definition 5. The orthogonal sum b1 ⊕ b2 : 2Θ → [0, 1] of two belief functions


b1 : 2Θ → [0, 1], b2 : 2Θ → [0, 1] defined on the same FOD Θ is the unique
belief function on Θ whose focal elements are all the possible intersections of focal
elements of b1 and b2 , and whose basic probability assignment is given by:
X
m1 (Ai )m2 (Bj )
i,j:Ai ∩Bj =A
mb1 ⊕b2 (A) = X , (2.6)
1− m1 (Ai )m2 (Bj )
i,j:Ai ∩Bj =∅

where mi denotes the b.p.a. of the input belief function bi .


34 2 Shafer’s belief functions

Figure 2.2 pictorially expresses Dempster’s algorithm for computing the basic
probability assignment of the combination b1 ⊕ b2 of two belief functions. Let a unit
square represent the total, unitary probability mass one can assign to subsets of Θ,
and associate horizontal and vertical strips with the focal elements A1 , ..., Ak and
B1 , ..., Bl of b1 and b2 , respectively. If their width is equal to their mass value, then
their area is also equal to their own mass m(Ai ), m(Bj ). The area of the intersection
of the strips related to any two focal elements Ai and Bj is then equal to the product
m(Ai ) · m(Bj ), and is committed to the intersection event Ai ∩ Bj . As more than
one such rectangle can end up being assigned to the same subset A (as different
pairs of focal elements can have the same intersection) we need to sum up all these
contributions, obtaining:
X
mb1 ⊕b2 (A) ∝ m1 (Ai )m2 (Bj ).
i,j:Ai ∩Bj =A

Finally, as some of these intersections may be empty, we need to discard the quantity
X
m1 (Ai )m2 (Bj )
i,j:Ai ∩Bj =∅

by normalizing the resulting basic probability assignment, obtaining (2.6).


Note that, by Definition 5 not all pairs of belief functions admit an orthogonal
sum – two belief functions are combinable if and only if their cores (2.1) are not
disjoint: C1 ∩ C2 6= ∅ or, equivalently, iff there exist a f.e. of b1 and a f.e. of b2 whose
intersection is non-empty. A1 A2 A3 A4 B1 B2 B3

Fig. 2.2. Graphical representation of Dempster’s rule of combination: the sides of the square
are divided into strips associated with the focal elements Ai and Bj of the belief functions
b1 , b2 to combine.

Proposition 2. [1149] If b1 , b2 : 2Θ → [0, 1] are two belief functions defined on


the same frame Θ, then the following conditions are equivalent:
2.2 Dempster’s rule of combination 35

– their Dempster’s combination b1 ⊕ b2 does not exist;


– their cores (2.1) are disjoint, Cb1 ∩ Cb2 = ∅;
– ∃A ⊂ Θ s.t. b1 (A) = b2 (Ā) = 1.

Fig. 2.3. Example of Dempster’s sum. The belief functions b1 with focal elements A1 , A2
and b2 with f.e.s B1 , B2 (left) are combinable via Dempster’s rule. This yields a new belief
function b1 ⊕ b2 (right) with focal elements X1 and X2 .

Example of Dempster’s combination Consider a frame of discernment Θ =


{θ1 , θ2 , θ3 , θ4 , θ5 }. We can define there a belief function b1 with basic probability
assignment:
m1 ({θ2 }) = 0.7, m1 ({θ2 , θ4 }) = 0.3.
Such a b.f. has then two focal elements A1 = {θ2 } and A2 = {θ2 , θ4 }. As an
example, its belief values on the events {θ4 }, {θ2 , θ5 }, {θ2 , θ3 , θ4 } are respec-
tively b1 ({θ4 }) = m1 ({θ4 }) = 0, b1 ({θ2 , θ5 }) = m1 ({θ2 }) + m1 ({θ5 }) +
m1 ({θ2 , θ5 }) = 0.7+0+0 = 0.7 and b1 ({θ2 , θ3 , θ4 }) = m1 ({θ2 })+m1 ({θ2 , θ4 }) =
0.7 + 0.3 = 1 (so that the core of b1 is {θ2 , θ4 }).
Now, let us introduce another belief function b2 on the same FOD, with b.p.a.:
m2 (B1 ) = m2 ({θ2 , θ3 }) = 0.6, m2 (B2 ) = m2 ({θ4 , θ5 }) = 0.4.
The pair of belief functions are combinable, as their cores C1 = {θ2 , θ4 } and
C∈ = {θ2 , θ3 , θ4 , θ5 } are clearly not disjoint.
Dempster’s combination (2.6) yields a new belief function on the same FOD,
with focal elements (Figure 2.3-right) X1 = {θ2 } = A1 ∩ B1 = A2 ∩ B1 and
X2 = {θ4 } = A2 ∩ B2 and b.p.a.:
m1 ({θ2 }) · m2 ({θ2 , θ3 }) + m1 ({θ2 , θ4 }) · m2 ({θ2 , θ3 })
m(X1 ) =
1 − m1 ({θ2 }) · m2 ({θ4 , θ5 })
0.7 · 0.6 + 0.3 · 0.6
= = 5/6,
1 − 0.7 · 0.4

m1 ({θ2 , θ4 }) · m2 ({θ4 , θ5 }) 0.3 · 0.4


m(X2 ) = = = 1/6.
1 − m1 ({θ2 }) · m2 ({θ4 , θ5 }) 1 − 0.7 · 0.4
Note that the resulting b.f. b1 ⊕ b2 is Bayesian.
36 2 Shafer’s belief functions

2.2.2 Weight of conflict

The normalization constant in (2.6) measures the level of conflict between the two
input belief functions, for it represents the amount of evidence they attribute to con-
tradictory (i.e., disjoint) subsets.
Definition 6. We call weight of conflict K(b1 , b2 ) between two belief functions b1
and b2 the logarithm of the normalisation constant in their Dempster’s combination:
1
K = log P .
1− i,j:Ai ∩Bj =∅ m1 (Ai )m2 (Bj )

Dempster’s rule can be trivially generalised to the combination of n belief func-


tions. It is interesting to note that, in that case, weights of conflict combine addi-
tively.
Proposition 3. Suppose b1 , ..., bn+1 are belief functions defined on the same frame
Θ, and assume that b1 ⊕ · · · ⊕ bn+1 exist. Then:

K(b1 , ..., bn+1 ) = K(b1 , ..., bn ) + K(b1 ⊕ · · · ⊕ bn , bn+1 ).

2.2.3 Conditioning belief functions

Dempster’s rule describes the way the assimilation of new evidence b0 changes our
beliefs previously encoded by a belief function b, determining new belief values
given by b ⊕ b0 (A) for all events A. In this formalism, a new body of evidence is not
constrained to be in the form of a single proposition A known with certainty, as it
happens in Bayesian theory.
Yet, the incorporation of new certainties is permitted as a special case. In fact, this
special kind of evidence is represented by belief functions of the form:

 1 if B ⊂ A
b0 (A) = ,
0 if B 6⊂ A

where B is the proposition known with certainty. Such a belief function is combin-
able with the original b.f. b as long as b(B̄) < 1, and the result has the form:

. b(A ∪ B̄) − b(B̄)


b(A|B) = b ⊕ b0 =
1 − b(B̄)
or, expressing the result in terms of upper probabilities/plausibilities (2.4):

pl(A ∩ B)
pl(A|B) = . (2.7)
pl(B)
Expression (2.7) strongly reminds us of Bayes’s rule of conditioning (1.1) – Shafer
calls it Dempster’s rule of conditioning.
2.3 Simple and separable support functions 37

2.2.4 Combination vs conditioning

Dempster’s rule (2.6) is clearly symmetric in the role assigned to the two pieces
of evidence b and b0 (due to the commutativity of set-theoretical intersection). In
Bayesian theory, instead, we are constrained to represent new evidence as a true
proposition, and condition a Bayesian prior probability on that proposition. There is
no obvious symmetry, but even more importantly we are forced to assume that the
consequence of any new piece of evidence is to support a single proposition with
certainty!

2.3 Simple and separable support functions


In the theory of evidence a body of evidence (a belief function) usually supports
more than one proposition (subset) of a frame of discernment. The simplest situ-
ation, however, is that in which the evidence points to a single non-empty subset
A ⊂ Θ.
Assume 0 ≤ σ ≤ 1 is the degree of support for A. Then, the degree of support for a
generic subset B ⊂ Θ of the frame is given by:


 0 if B 6⊃ A



b(B) = σ if B ⊃ A, B 6= Θ (2.8)




1 if B = Θ.

Definition 7. The belief function b : 2Θ → [0, 1] defined by Equation (2.8) is


called a simple support function focused on A. Its basic probability assignment
is: m(A) = σ, m(Θ) = 1 − σ and m(B) = 0 for every other B.

2.3.1 Heterogeneous and conflicting evidence

We often need to combine evidence pointing towards different subsets, A and B, of


our frame of discernment. When A ∩ B 6= ∅ these two propositions are compatible,
and we say that the associated belief functions represent heterogeneous evidence.
In this case, if σ1 and σ2 are the masses committed respectively to A and B by two
simple support functions b1 and b2 , we have that their Dempster’s combination has
b.p.a.:

m(A∩B) = σ1 σ2 , m(A) = σ1 (1−σ2 ), m(B) = σ2 (1−σ1 ), m(Θ) = (1−σ1 )(1−σ2 ).

Therefore, the belief values of b = b1 ⊕ b2 are as follows:


38 2 Shafer’s belief functions


 0 X 6⊃ A ∩ B




σ1 σ2 X ⊃ A ∩ B, X 6⊃ A, B








 σ1 X ⊃ A, X 6⊃ B


b(X) = b1 ⊕ b2 (X) =
σ2 X ⊃ B, X ⊃6 A








1 − (1 − σ1 )(1 − σ2 ) X ⊃ A, B, X 6= Θ








1 X = Θ.

As our intuition would suggest, the combined evidence supports A ∩ B with degree
σ1 σ2 .
When the two propositions have empty intersection A ∩ B = ∅, instead, we say
that the evidence is conflicting. In this situation the two bodies of evidence contrast
the effect of each other.
The following example is also taken from [1149].

Example: the alibi A criminal defendant has an alibi: a close friend swears that
the defendant was visiting his house at the time of the crime. This friend has a good
reputation: suppose this commits a degree of support of 1/10 to the innocence of the
defendant (I). On the other side, there is a strong, actual body of evidence providing
a degree of support of 9/10 for his guilt (G).
To formalize this case we can build a frame of discernment Θ = {G, I}, so
that the defendant’s friend provides a simple support function focused on {I} with
bI ({I}) = 1/10, while the hard piece of evidence corresponds to another simple
support function bG focused on {G} with bG ({G}) = 9/10.
Their orthogonal sum b = bI ⊕ bG yields then:

b({I}) = 1/91, b({G}) = 81/91.

The effect of the testimony has mildly eroded the force of the circumstantial evi-
dence.

2.3.2 Separable support functions and decomposition

In general, belief functions can support more than one proposition at a time.
The next simplest class of b.f.s is that of ‘separable support functions’.

Definition 8. A separable support function b is a belief function that is either simple


or equal to the orthogonal sum of two or more simple support functions, namely:

b = b1 ⊕ · · · ⊕ bn ,

where n ≥ 1 and bi is simple ∀ i = 1, ..., n.


2.3 Simple and separable support functions 39

A separable support function b can be decomposed into simple support functions in


different ways. More precisely, given one such decomposition b = b1 ⊕ · · · ⊕ bn
with foci A1 , ..., An and denoting by C the core of b, each of the following
– b = b1 ⊕ · · · ⊕ bn ⊕ bn+1 whenever bn+1 is the vacuous belief function on the
same frame;
– b = (b1 ⊕ b2 ) ⊕ · · · ⊕ bn whenever A1 = A2 ;
.
– b = b01 ⊕ · · · ⊕ b0n , whenever b0i is the simple support function focused on A0i =
0 0
Ai ∩ C such that bi (Ai ) = bi (Ai ), if Ai ∩ C 6= ∅ for all i;
is a valid decomposition of b in terms of simple belief functions. On the other hand,

Proposition 4. If b is a non-vacuous, separable support function with core Cb then


there exists a unique collection b1 , ..., bn of non-vacuous simple support functions
which satisfy the following conditions:
1. n ≥ 1;
2. b = b1 if n = 1, and b = b1 ⊕ · · · ⊕ bn if n ≥ 1;
3. Cbi ⊂ Cb ;
4. Cbi 6= Cbj if i 6= j.

This unique decomposition is called the canonical decomposition of b – we will


reconsider it later in the Book.
An intuitive idea of what a separable support function represents is provided by the
following result.

Proposition 5. If b is a separable belief function, and A and B are two of its focal
elements with A ∩ B 6= ∅, then A ∩ B is a focal element of b.

The set of f.e.s of a separable support function is closed under set-theoretical inter-
section. Such a n.f. b is coherent in the sense that if it supports two propositions, then
it must support the proposition ‘naturally’ implied by them, i.e., their intersection.
Proposition 5 gives us a simple method to check whether a given belief function is
indeed a separable support function.

2.3.3 Internal conflict

Since a separable support function can support pairs of disjoint subsets, it flags the
existence of what we can call ‘internal’ conflict.
Definition 9. The weight of internal conflict Wb for a separable support function b
is defined as:
– 0 if b is a simple support function;
– inf K(b1 , ..., bn ) for the various possible decompositions of b into simple support
functions b = b1 ⊕ · · · ⊕ bn if b is not simple.
It is easy to see (see [1149] again) that Wb = K(b1 , ..., bn ) where b1 ⊕ · · · ⊕ bn is
the canonical decomposition of b.
40 2 Shafer’s belief functions

2.4 Families of compatible frames of discernment


2.4.1 Refinings

One appealing idea in the theory of evidence is the simple, sensible claim that our
knowledge of any given problem is inherently imperfect and imprecise. As a con-
sequence, new evidence may allow us to make decisions on more detailed decision
spaces (represented by frames of discernments). All these frames need to be ‘com-
patible’ with each other, in a sense that we will precise in the following.
One frame can certainly be assumed compatible with another if it can be obtained
by introducing new distinctions, i.e., by analyzing or splitting some of its possible
outcomes into finer ones. This idea is embodied by the notion of refining.
Definition 10. Given two frames of discernment Θ and Ω, a map ρ : 2Θ → 2Ω is
said to be a refining if it satisfies the following conditions:
1. ρ({θ}) 6= ∅ ∀θ ∈ Θ;
2. ρ({θ}) ∩ ρ({θ0 }) = ∅ if θ 6= θ0 ;
3. ∪θ∈Θ ρ({θ}) = Ω.
In other words, a refining maps the coarser frame Θ to a disjoint partition of the
finer one Ω (see Figure 2.4).

Fig. 2.4. A refining between two frames of discernment.

The finer frame is called a refinement of the first one, and we call Θ a coarsening
of Ω. Both frames represent sets of admissible answers to a given decision problem
(see Chapter ?? as well) – the finer one is nevertheless a more detailed description,
obtained by splitting each possible answer θ ∈ Θ in the original frame. The image
ρ(A) of a subset A of Θ consists of all the outcomes in Ω that are obtained by
splitting an element of A.
Proposition 6 lists some of the properties of refinings [1149].
2.4 Families of compatible frames of discernment 41

Proposition 6. Suppose ρ : 2Θ → 2Ω is a refining. Then


– ρ is a one-to-one mapping;
– ρ(∅) = ∅;
– ρ(Θ) = Ω;
– ρ(A ∪ B) = ρ(A) ∪ ρ(B);
– ρ(Ā) = ρ(A);
– ρ(A ∩ B) = ρ(A) ∩ ρ(B);
– if A, B ⊂ Θ then ρ(A) ⊂ ρ(B) iff A ⊂ B;
– if A, B ⊂ Θ then ρ(A) ∩ ρ(B) = ∅ iff A ∩ B = ∅.
A refining ρ : 2Θ → 2Ω is not, in general, onto; in other words, there are subsets
B ⊂ Ω that are not images of subsets A of Θ. Nevertheless, we can define two
different ways of associating each subset of the more refined frame Ω with a subset
of the coarser one Θ.
Definition 11. The inner reduction associated with a refining ρ : 2Θ → 2Ω is the
map ρ : 2Ω → 2Θ defined as:
n o
ρ(A) = θ ∈ Θ ρ({θ}) ⊆ A . (2.9)

The outer reduction associated with ρ is the map ρ̄ : 2Ω → 2Θ given by:


n o
ρ̄(A) = θ ∈ Θ ρ({θ}) ∩ A 6= ∅ . (2.10)

Roughly speaking, ρ(A) is the largest subset of Θ that implies A ⊂ Ω, while ρ̄(A)
is the smallest subset of Θ that is implied by A. As a matter of fact:
Proposition 7. Suppose ρ : 2Θ → 2Ω is a refining, A ⊂ Ω and B ⊂ Θ. Let ρ̄
and ρ the related outer and inner reductions. Then ρ(B) ⊂ A iff B ⊂ ρ(A), and
A ⊂ ρ(B) iff ρ̄(A) ⊂ B.

2.4.2 Families of frames

The existence of distinct admissible descriptions at different levels of granularity of


a same phenomenon is encoded in the theory of evidence by the concept of family of
compatible frames (see [1149], pages 121-125), whose building block is the notion
of refining (Definition 10).

Definition 12. A non-empty collection of finite non-empty sets F is a family of


compatible frames of discernment with refinings R, where R is a non-empty col-
lection of refinings between pairs of frames in F, if F and R satisfy the following
requirements:
1. composition of refinings: if ρ1 : 2Θ1 → 2Θ2 and ρ2 : 2Θ2 → 2Θ3 are in R,
then ρ2 ◦ ρ1 : 2Θ1 → 2Θ3 is in R;
42 2 Shafer’s belief functions

2. identity of coarsenings: if ρ1 : 2Θ1 → 2Ω , ρ2 : 2Θ2 → 2Ω are in R and


∀θ1 ∈ Θ1 ∃θ2 ∈ Θ2 such that ρ1 ({θ1 }) = ρ2 ({θ2 }), then Θ1 = Θ2 and
ρ1 = ρ2 ;
3. identity of refinings: if ρ1 : 2Θ → 2Ω and ρ2 : 2Θ → 2Ω are in R, then
ρ1 = ρ2 ;
4. existence of coarsenings: if Ω ∈ F and A1 , ..., An is a disjoint partition of Ω
then there is a coarsening in F corresponding to this partition;
5. existence of refinings: if θ ∈ Θ ∈ F and n ∈ N then there exists a refining
ρ : 2Θ → 2Ω in R and Ω ∈ F such that ρ({θ}) has n elements;
6. existence of common refinements: every pair of elements in F has a common
refinement in F.

Roughly speaking, two frames are compatible if and only if they concern propo-
sitions which can be both expressed in terms of propositions of a common, finer
frame.
By property (6) each collection of compatible frames has many common refine-
ments. One of these is particularly simple.
Theorem 1. If Θ1 , ..., Θn are elements of a family of compatible frames F, then
there exists a unique frame Θ ∈ F such that:
1. ∃ a refining ρi : 2Θi → 2Ω for all i = 1, ..., n;
2. ∀θ ∈ Θ ∃ θi ∈ Θi f or i = 1, ..., n such that

{θ} = ρ1 ({θ1 }) ∩ ... ∩ ρn ({θn }).

This unique frame is called the minimal refinement Θ1 ⊗ · · · ⊗ Θn of the collec-


tion Θ1 , ..., Θn , and is the simplest space in which we can compare propositions
pertaining to different compatible frames. Furthermore:
Proposition 8. If Ω is a common refinement of Θ1 , ..., Θn , then Θ1 ⊗ · · · ⊗ Θn is
a coarsening of Ω. Furthermore, Θ1 ⊗ · · · ⊗ Θn is the only common refinement of
Θ1 , ..., Θn that is a coarsening of every other common refinement.

Example: number systems Figure 2.5 illustrates a simple example of compatible


frames. A real number r between 0 and 1 can be expressed, for instance, using ei-
ther binary or base-5 digits. Furthermore, even within a number system of choice
(for example the binary one), the real number can be represented with different de-
grees of approximation, using for instance one or two digits. Each of these quantized
versions of r is associated with an interval of [0, 1] (red rectangles) and can be ex-
pressed in a common frame (their common refinement, Definition 12, property (6)),
for example by selecting a 2-digit decimal approximation.
Refining maps between coarser and finer frames are easily interpreted, and are de-
picted in Figure 2.5.
2.4 Families of compatible frames of discernment 43

Fig. 2.5. The different digital representations of the same real number r ∈ [0, 1] constitute a
simple example of family of compatible frames.

2.4.3 Consistent belief functions


If Θ1 and Θ2 are two compatible frames, then two belief functions b1 : 2Θ1 → [0, 1],
b2 : 2Θ2 → [0, 1] can potentially be expression of the same body of evidence. This
is the case only if b1 and b2 agree on those propositions that are discerned by both
Θ1 and Θ2 , i.e., they represent the same subset of their minimal refinement.
Definition 13. Two belief functions b1 and b2 defined over two compatible frames
Θ1 and Θ2 are said to be consistent if
b1 (A1 ) = b2 (A2 )
whenever
A1 ⊂ Θ1 , A2 ⊂ Θ2 and ρ1 (A1 ) = ρ2 (A2 ), ρi : 2Θi → 2Θ1 ⊗Θ2 ,
where ρi is the refining between Θi and the minimal refinement Θ1 ⊗ Θ2 of Θ1 and
Θ2 .
A special case is that in which the two belief functions are defined on frames
connected by a refining ρ : 2Θ1 → 2Θ2 (i.e., Θ2 is a refinement of Θ1 ). In this case
b1 and b2 are consistent iff:
b1 (A) = b2 (ρ(A)), ∀A ⊆ Θ1 .
The b.f. b1 is called the restriction of b2 to Θ1 , and their mass values are in the
following relation: X
m1 (A) = m2 (B), (2.11)
A=ρ̄(B)

where A ⊂ Θ1 , B ⊂ Θ2 and ρ̄(B) ⊂ Θ1 is the inner reduction (2.9) of B.


44 2 Shafer’s belief functions

2.4.4 Independent frames

Two compatible frames of discernment are independent if no proposition discerned


by one of them trivially implies a proposition discerned by the other. Obviously we
need to refer to a common frame: by Proposition 8 what common refinement we
choose is immaterial.
Definition 14. Let Θ1 , ..., Θn be compatible frames, and ρi : 2Θi → 2Θ1 ⊗···⊗Θn
the corresponding refinings to their minimal refinement. The frames Θ1 , ..., Θn are
said to be independent if

ρ1 (A1 ) ∩ · · · ∩ ρn (An ) 6= ∅ (2.12)

whenever ∅ =
6 Ai ⊂ Θi for i = 1, ..., n.
Equivalently, condition (2.12) can be expressed as follows:
– if Ai ⊂ Θi for i = 1, ..., n and ρ1 (A1 ) ∩ · · · ∩ ρn−1 (An−1 ) ⊂ ρn (An ) then
An = Θn or one of the first n − 1 subsets Ai is empty.
The notion of independence of frames is illustrated in Figure 2.6.
In particular, it is easy to see that if ∃j ∈ [1, .., n] s.t. Θj is a coarsening of
some other frame Θi , |Θj | > 1, then {Θ1 , ..., Θn } are not independent. Mathe-
matically, families of compatible frames are collections of Boolean subalgebras of
their common refinement [1197], as Equation (2.12) is nothing but the independence
condition for the associated Boolean sub-algebras 3 .
3
The following material comes from [1197].
Definition 15. A Boolean algebra is a non-empty set U provided with three internal oper-
ations
∩ : U × U −→ U ∪ : U × U −→ U ¬ : U −→ U
A, B 7→ A ∩ B A, B 7→ A ∪ B A 7→ ¬A
called respectively meet, join and complement, characterized by the following properties:

A ∪ B = B ∪ A, A∩B =B∩A

A ∪ (B ∪ C) = (A ∪ B) ∪ C, A ∩ (B ∩ C) = (A ∩ B) ∩ C

(A ∩ B) ∪ B = B, (A ∪ B) ∩ B = B

A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C), A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)

(A ∩ ¬A) ∪ B = B, (A ∪ ¬A) ∩ B = B

As a special case, the collection (2S , ⊂) of all the subsets of a given set S is a Boolean
algebra.

Definition 16. U 0 is a subalgebra of a Boolean algebra U iff whenever A, B ∈ U 0 it


follows that A ∪ B, A ∩ B and ¬A are all in U 0 .
The ‘zero’ of a Boolean algebra U is defined as: 0 = ∩A∈U A.
2.5 Support functions 45

Fig. 2.6. Independence of frames.

2.5 Support functions


Since Dempster’s rule of combination is applicable only to set functions satisfying
the axioms of belief functions (Definition 25), we are tempted to think that the class
of separable belief functions is sufficiently large to describe the impact of a body of
evidence on any frame of a family of compatible frames. This is, however, not the
case as not all belief functions are separable ones.
Let us consider a body of evidence inducing a separable b.f. b over a certain
frame Θ of a family F: the ‘impact’ of this evidence onto a coarsening Ω of Θ is
naturally described by the restriction b|2Ω of b (Equation 2.11) to Ω.

Definition 18. A belief function b : 2Θ → [0, 1] is a support function if there exists


a refinement Ω of Θ and a separable support function b0 : 2Ω → [0, 1] such that
b = b0 |2Θ .

In other words, a support function [743] is the restriction of some separable


support function.
As it can be expected, not all support functions are separable support functions. The
following Proposition gives us a simple equivalent condition.
Proposition 9. Suppose b is a belief function, and C its core. The following condi-
tions are equivalent:

Definition 17. A collection {Ut }t∈T of subalgebras of a Boolean algebra U is said to be


independent if
A1 ∩ · · · ∩ An 6= 0 (2.13)
whenever 0 6= Aj ∈ Utj , tj 6= tk for j 6= k.

Compare expressions (2.13) and (2.12).


46 2 Shafer’s belief functions

– b is a support function;
– C has a positive basic probability number, m(C) > 0.
Since there exist belief functions whose core has mass zero, Proposition 9 tells us
that not all the belief functions are support ones (see Section 2.7).

2.5.1 Vacuous extension

There are occasions in which the impact of a body of evidence on a frame Θ is fully
discerned by one of its coarsening Ω, i.e., no proposition discerned by Θ receives
greater support than what is implied by propositions discerned by Ω.
Definition 19. A belief function b : 2Θ → [0, 1] on Θ is the vacuous extension of a
second belief function b0 : 2Ω → [0, 1], where Ω is a coarsening of Θ, whenever:

b(A) = max b0 (B) ∀A ⊆ Θ.


B⊂Ω, ρ(B)⊆A

We say that b is ‘carried’ by the coarsening Ω. We will make use of this all important
notion in our treatment of two computer vision problems in Part III, Chapter ?? and
Chapter ??.

2.6 Impact of the evidence

2.6.1 Families of compatible support functions

In its 1976 essay [1149] Glenn Shafer distinguishes between a ‘subjective’ and an
‘evidential’ vocabulary, keeping distinct objects with the same mathematical de-
scription but different philosophical interpretations.
Each body of evidence E supporting a belief function b (see [1149]) simultane-
ously affects the whole family F of compatible frames of discernment the domain
of b belongs to, determining a support function over every element of F. We say
that E determines a family of compatible support functions {sΘ
E }Θ∈F .
The complexity of this family depends on the following property.

Definition 20. The evidence E affects F sharply if there exists a frame Ω ∈ F that
carries sΘ
E for every Θ ∈ F that is a refinement of Ω. Such a frame Ω is said to
exhaust the impact of E on F.

Whenever Ω exhausts the impact of E on F, sΩ E determines the whole family


{sΘE }Θ∈F , for any support function over any given frame Θ ∈ F is the restric-
tion to Θ of sΩ
E ’s vacuous extension (Definition 19) to Θ ⊗ Ω.
A typical example in which the evidence affects the family sharply is statistical
evidence, in which case both frames and evidence are highly idealized [1149].
2.6 Impact of the evidence 47

2.6.2 Discerning the interaction of evidence

It is almost a commonplace to affirm that, by selecting particular inferences from a


body of evidence and combining them with particular inferences from another body
of evidence, one can derive almost arbitrary conclusions. In the evidential frame-
work, in particular, it has been noted that Dempster’s rule may produce inaccurate
results when applied to ‘inadequate’ frames of discernment.
Namely, let us consider a frame Θ, its coarsening Ω, and a pair of support func-
tions s1 , s2 on Θ determined by two distinct bodies of evidence. Applying Demp-
ster’s rule directly on Θ yields the following support function on its coarsening Ω:

(s1 ⊕ s2 )|2Ω ,

while its application on the coarser frame Θ after computing the restrictions of s1
and s2 to it yields:
(s1 |2Ω ) ⊕ (s2 |2Ω ).
In general, the outcomes of these two combination strategies will be different. Nev-
ertheless, a condition on the refining linking Ω to Θ can be imposed which guaran-
tees their equivalence.

Proposition 10. Assume that s1 and s2 are support functions over a frame Θ, their
Dempster’s combination s1 ⊕ s2 exists, ρ̄ : 2Θ → 2Ω is an outer reduction, and

ρ̄(A ∩ B) = ρ̄(A) ∩ ρ̄(B) (2.14)

holds wherever A is a focal element of s1 and B is a focal element of s2 . Then

(s1 ⊕ s2 )|2Ω = (s1 |2Ω ) ⊕ (s2 |2Ω ).

In this case Ω is said to discern the relevant interaction of s1 and s2 . Of course if s1


and s2 are carried by a coarsening of Θ then this latter frame discerns their relevant
interaction.
The above definition generalizes to entire bodies of evidence.

Definition 21. Suppose F is a family of compatible frames, {sΘ E1 }Θ∈F is the family
of support functions determined by a body of evidence E1 , and {sΘ E2 }Θ∈F is the
family of support functions determined by a second body of evidence E2 .
Then, a particular frame Ω ∈ F is said to discern the relevant interaction of E1 and
E2 if:
ρ̄(A ∩ B) = ρ̄(A) ∩ ρ̄(B)
whenever Θ is a refinement of Ω, where ρ̄ : 2Θ → 2Ω is the associated outer
reduction, A is a focal element of sΘ Θ
E1 and B is a focal element of sE2 .
48 2 Shafer’s belief functions

2.7 Quasi support functions


Not every belief function is a support function. The question remains of how to
characterise in a precise way the class of belief functions which are not support
functions.
Let us consider a finite power set 2Θ . A sequence f1 , f2 , ... of set functions on 2Θ
is said to tend to a limit function f if

lim fi (A) = f (A) ∀A ⊂ Θ. (2.15)


i→∞

It can be proved that [1149]:


Proposition 11. If a sequence of belief functions has a limit, then the limit is itself
a belief function.
In other words, the class of belief functions is closed with respect to the limit op-
erator (11.4). The latter provides us with an insight into the nature of non-support
functions.
Proposition 12. If a belief function b : 2Θ → [0, 1] is not a support function, then
there exists a refinement Ω of Θ and a sequence s1 , s2 , ... of separable support
functions over Ω such that:
 
b = lim si 2Θ .

i→∞

Definition 22. We call belief functions of this class quasi-support functions.

It should be noted that


 
lim si 2Θ = lim (si |2Θ ),

i→∞ i→∞

so that we can also say that s is a limit of a sequence of support functions.


The following proposition investigates some of the properties of quasi-support
functions.
Proposition 13. Suppose b : 2Θ → [0, 1] is a belief function over Θ, and A ⊂ Θ
a subset of Θ. If b(A) > 0 and b(Ā) > 0, with b(A) + b(Ā) = 1, then b is a
quasi-support function.
It easily follows that Bayesian b.f.s are quasi-support functions, unless they commit
all their probability mass to a single element of the frame.
Proposition 14. A Bayesian belief function b is a support function iff there exists
θ ∈ Θ such that b({θ}) = 1.
Furthermore, it is easy to see that vacuous extensions of Bayesian belief functions
are also quasi-support functions.
As Shafer remarks, people used to think of beliefs as chances can be disap-
pointed to see them relegated to a peripheral role, as beliefs that cannot arise from
2.7 Quasi support functions 49

actual, finite evidence. On the other hand, statistical inference already teaches us
that chances can be evaluated only after infinitely many repetitions of independent
random experiments.4

2.7.1 Bayes’ theorem

Indeed, as it commits an infinite amount of evidence in favor of each possible ele-


ment of a frame of discernment, a Bayesian belief function tends to obscure much
of the evidence additional belief functions may carry with them.
Definition 23. A function l : Θ → [0, ∞) is said to express the relative plausibilities
of singletons under a support function s : 2Θ → [0, 1] if

l(θ) = c · pls ({θ})

for all θ ∈ Θ, where pls is the plausibility function for s and the constant c does not
depend on θ.

Proposition 15. (Bayes’ theorem) Suppose b0 and s are a Bayesian belief function
and a support function on the same frame Θ, respectively. Suppose l : Θ → [0, ∞)
expresses the relative plausibilities of singletons under s. Suppose also that their
Dempster’s sum b0 = s ⊕ b0 exists. Then b0 is Bayesian, and

b0 ({θ}) = K · b0 ({θ})l(θ) ∀θ ∈ Θ,

where X −1
K= b0 ({θ})l(θ) .
θ∈Θ

This implies that the combination of a Bayesian b.f. with a support function requires
nothing more than the latter’s relative plausibilities of singletons.
It is interesting to note that the latter functions behave multiplicatively under com-
bination,
Proposition 16. If s1 , ..., sn are combinable support functions, and li represents
the relative plausibilities of singletons under si for i = 1, ..., n, then l1 · l2 · · · · · ln
expresses the relative plausibilities of singletons under s1 ⊕ · · · ⊕ sn .
providing a simple algorithm to combine any number of support functions with a
Bayesian b.f.
4
Using the notion of weight of evidence Shafer gives a formal explanation of this intuitive
observation by showing that a Bayesian b.f. indicates an infinite amount of evidence in
favor of each possibility in its core [1149].
50 2 Shafer’s belief functions

2.7.2 Incompatible priors

Having an established convention on how to set a Bayesian prior would be useful,


as it would prevent us from making arbitrary and possibly unsupported choices that
could eventually affect the final result of our inference process. Unfortunately, the
only natural such convention (a uniform prior) is strongly dependent on the frame
of discernment at hand, and is sensitive to both refining and coarsening operators.
More precisely, on a frame Θ with n elements it is natural to represent our
ignorance by adopting an uninformative uniform prior assigning a mass 1/n to every
outcome θ ∈ Θ. However, the same convention applied to a different compatible
frame Ω of the same family may yield a prior that is incompatible with the first one.
As a result, the combination of a given body of evidence with one arbitrary such
prior can yield almost any possible result [1149].

Example: Sirius’ planets A team of scientists wonder whether there is life around
Sirius. Since they do not have any evidence concerning this question, they adopt a
vacuous belief function to represent their ignorance on the frame
Θ = {θ1 , θ2 },
where θ1 , θ2 are the answers “there is life” and “there is no life”. They can also
consider the question in the context of a more refined set of possibilities. For exam-
ple, our scientists may raise the question of whether there even exist planets around
Sirius. In this case the set of possibilities becomes
Ω = {ζ1 , ζ2 , ζ3 },
where ζ1 , ζ2 , ζ3 are respectively the possibility that there is life around Sirius, that
there are planets but no life, and there are no planets at all. Obviously, in an eviden-
tial setup our ignorance still needs to be represented by a vacuous belief function,
which is exactly the vacuous extension of the vacuous b.f. previously defined on Θ.
From a Bayesian point of view, instead, it is difficult to assign consistent degrees
of belief over Ω and Θ both symbolizing the lack of evidence. Indeed, on Θ a
uniform prior yields p({θ1 }) = p({θ1 }) = 1/2, while on Ω the same choice will
yield p0 ({ζ1 }) = p0 ({ζ2 }) = p0 ({ζ3 }) = 1/3. Ω and Θ are obviously compatible
(as the former is a refinement of the latter): the vacuous extension of p onto Ω
produces a Bayesian distribution
p({ζ1 }) = 1/3, p({ζ1 , ζ2 }) = 2/3
which is inconsistent with p0 !

2.8 Consonant belief functions


To conclude this brief review of evidence theory we wish to recall a class of belief
functions which is, in some sense, opposed to that quasi-support functions – that of
consonant belief functions.
2.8 Consonant belief functions 51

Definition 24. A belief function is said to be consonant if its focal elements A1 , ..., Am
are nested: A1 ⊂ A2 ⊂ · · · ⊂ Am .
The following Proposition illustrates some of their properties.
Proposition 17. If b is a belief function with upper probability function pl, then the
following conditions are equivalent:
1. b is consonant;
2. b(A ∩ B) = min(b(A), b(B)) for every A, B ⊂ Θ;
3. pl(A ∪ B) = max(pl(A), pl(B)) for every A, B ⊂ Θ;
4. pl(A) = maxθ∈A pl({θ}) for all non-empty A ⊂ Θ;
5. there exists a positive integer n and a collection of simple support functions
s1 , ..., sn such that b = s1 ⊕ · · · ⊕ sn and the focus of si is contained in the
focus of sj whenever i < j.
Consonant b.f.s represent collections of pieces of evidence all pointing towards the
same direction. Moreover,
Proposition 18. Suppose s1 , ..., sn are non-vacuous simple support functions with
foci Cs1 , ..., Csn respectively, and b = s1 ⊕ · · · ⊕ sn is consonant. If Cb denotes the
core of b, then all the sets Csi ∩ Cb , i = 1, ..., n are nested.
By condition (2) of Proposition 17 we have that:

0 = b(∅) = b(A ∩ Ā) = min(b(A), b(Ā)),

i.e., either b(A) = 0 or b(Ā) = 0 for every A ⊂ Θ. Comparing this result to


Proposition 13 explains in part why consonant and quasi-support functions can be
considered as representing diametrically opposed subclasses of belief functions.
Understanding belief functions
3
Chapter 2 summarises the basic notions of the theory of evidence as formulated by
Shafer.
However, belief functions are complex objects which go beyond , both in terms
of their mathematical properties as sophisticated objects, and in terms of
Multiple, equivalent mathematical definitions of ‘belief function’ can be pro-
vided in terms of lower probabilities induced by multivalued mappings (or com-
patibility relations), superadditive measures, credal sets (convex sets of probability
measures)
In addition, belief functions can be given separate and sometimes conflicting
interpretations, in terms of what kind of uncertainty is modelled by them.
In the forty years since its formulation the theory of evidence has obviously
much evolved, thanks to the effort of several researchers, and now this denomina-
tion includes several different interpretations of the mathematical concept of (finite)
random set. A number of scientists have proposed their own formulation or ‘frame-
work’ for evidential reasoning, partly in response of early strong criticisms brought
forward by scholar such as Judea Pearl [1024] and [1025].
Figure 3.1 gives a pictorial representation of the multiple interpretations of belief
functions, and of the subsequent frameworks derived from Shafer’s initial proposal.

Fig. 3.1. Mathematical formulations to the theory of belief functions are grouped in this
diagram in terms of their similarity as mathematical frameworks, and as epistemic approaches
to uncertainty representation.

53
54 3 Understanding belief functions

Chapter outline

In the first part of the Chapter we provide an overview of the main mathematical
formulations and semantic interpretations of the theory of belief functions (Section
3.1).
We start from Dempster’s original proposal in terms of upper and lower probabilities
induced by multi-valued mappings (3.1.1), also termed ‘compatibility relations’,
and explain the origins of Dempster’s rule of combination in this context. We then
recall belief functions’ alternative axiomatic definition as generalised, non-additive
measures (Section 3.1.2), a mathematical formulation closely related to the notion
of ‘inner measure’ (Section 3.1.3). In a robust statistical perspective, belief functions
can also be interpreted as (a specific class of) convex sets of probability measures
(Section 3.1.4), but the most general mathematical framework with the potential
to extend their definition to continuous domains is arguably that of random sets
(Section 3.1.5).
Other interpretations have been proposed along the years, among which Zadeh’s
‘simple view’ as necessity measures associated with second-order relations seem to
be the most widely cited (Section 3.1.6).
The second part is devoted to the long-standing debate on the epistemic nature
of belief functions and their ‘correct’ interpretation (Section 3.2).
Starting from a brief description of Shafer’s evolving position on the matter and the
early support received by his mathematical theory of evidence (Section 3.2.1), we
focus on the specific scientific debate on the relationship between evidential and
Bayesian approaches to reasoning under uncertainty (Section 3.2.2).
After summarising Judea Pearl’s and others’ criticisms to belief theory (Section
3.2.3), we argue that most misunderstandings are due to confusions between the
various interpretations of these objects (Section 3.2.4), and summarise the main
rebuttals and formal justifications which have been advanced in response (Section
3.2.5).
Finally, in the third part (Section 3.3) we review the manifold frameworks which
have been proposed in the last fifty years based on (at least the most fundamental
notions of) the theory of belief functions, and various generalisation and extensions
brought forward by a number of authors. Starting with the most widely applied ap-
proach, Philippe Smets’ Transferable Belief Model (Section 3.3.1), we recall the
main notions of Kohlas and Monney’s theory of hints (Section 3.3.2) and Dezert-
Smandarache Theory (Section 3.3.3). We give quite some space to Dempster and
Liu’s Gaussian belief functions (Section 3.3.4), Ivan Kramosil’s probabilistic inter-
pretation (Section 3.3.5) and Hummel and Landy’s statistics of experts’ opinions
(Section 3.3.6). We review a number of interval or credal extensions of the notion of
basic probability assignment in Section 3.3.7, including Denoeux’s Imprecise Belief
Structures. Finally, a rather comprehensive review of less well-known approaches
(including Lowrance’s evidential reasoning and Grabisch’s belief functions on lat-
tices) concludes the Chapter.
3.1 The multiple semantics of belief functions 55

3.1 The multiple semantics of belief functions


Belief functions were originally introduced by Dempster as a simple application of
probability theory to multiple domains linked by a multi-valued map [342]. Later
Shafer interpreted them as mathematical (set) functions encoding evidence in sup-
port of propositions [1149]. Nevertheless, equivalent alternative interpretations of
these objects can be given in terms of generalised or non-additive probabilities (Sec-
tion 3.1.2), inner measures ([1099, 466], Section 3.1.3), convex sets of probability
measures (credal sets, Section 3.1.4) or random sets ([984, 612], Section 3.1.5).
Zadeh has proposed a ‘simple view’ of the theory of evidence which is discussed in
Section 3.1.6. Finally, Hummel and Landy have provided an interpretation of evi-
dence combination in belief theory in terms of statistics of expert opinions (Section
3.3.6).
Other interesting semantics within a logics frameworks are discussed in the
Chapter 5.

3.1.1 Dempster’s multi-valued mappings, compatibility relations

The notion of belief function [1153, 1162] originally derives from a series of Demp-
ster’s works on upper and lower probabilities induced by multi-valued mappings,
introduced in [336], [344] and [345].

Multi-valued mappings Indeed, the idea that intervals rather than probability val-
ues should be used to model degrees of belief had been suggested and investigated
by earlier researchers [480, 526, 764, 763, 1280]. Dempster, however, defined upper
and lower probability values in terms of statistics of set-valued functions defined
over a measure space. In [336] he gave examples of the use of upper and lower
probabilities in terms of finite populations with discrete univariate observable char-
acteristics.
Shafer later reformulated Dempster’s work by identifying his upper and lower prob-
abilities with epistemic probabilities or ‘degrees of belief’, i.e., the quantitative as-
sessments of one’s belief in a given fact or proposition. The following sketch of the
nature of belief functions is abstracted from [1163]: another analysis on the relation
between belief functions and upper and lower probabilities is developed in [1263].
Let us consider a problem in which we have probabilities (coming from arbi-
trary sources, for instance subjective judgement or objective measurements) for a
question Q1 and we want to derive degrees of belief for a related question Q2 . For
example, Q1 could be the judgement on the reliability of a witness, and Q2 the
decision about the truth of the reported fact. In general, each question will have a
number of possible answers, only one of them being correct.
Let us call Ω and Θ the sets of possible answers to Q1 and Q2 respectively. So,
given a probability measure P on Ω we want to derive a degree of belief b(A) that
A ⊂ Θ contains the correct response to Q2 (see Figure 3.2).
If we call Γ (ω) the subset of answers to Q2 compatible with ω ∈ Ω, each
element ω tells us that the answer to Q2 is somewhere in A whenever
56 3 Understanding belief functions

Fig. 3.2. Compatibility relations and multi-valued mappings. A probability measure P on Ω


induces a belief function b on Θ whose values on the events A of Θ are given by (??).

Γ (ω) ⊂ A.

The degree of belief b(A) of an event A ⊂ Θ is then the total probability (in Ω) of
all the answers ω to Q1 that satisfy the above condition, namely:

Bel(A) = P ({ω|Γ (ω) ⊂ A}). (3.1)

Analogously, the degree of plausibility of A ⊂ Θ is defined as:

P l(A) = P ({ω|Γ (ω) ∩ A 6= ∅}). (3.2)

The map Γ : Ω → 2Θ (where 2Θ denotes, as usual, the collection of subsets of Θ)


is called a multi-valued mapping from Ω to Θ. Each of those mappings Γ , together
with a probability measure P on Ω, induce a belief function on Θ:

b : 2Θ → [0, 1]
. X
A ⊂ Θ 7→ b(A) = P (ω).
ω∈Ω:Γ (ω)⊂A

Compatibility relations Obviously a multi-valued mapping is equivalent to a rela-


tion, i.e., a subset C of Ω × Θ. The compatibility relation associated with Γ :

C = {(ω, θ)|θ ∈ Γ (ω)} (3.3)

describes indeeed the subset of answers θ in Θ compatible with a given ω ∈ Ω.


As Shafer himself admits in [1163], compatibility relations are only a new
name for multivalued mappings. Nevertheless, several authors (among whom Shafer
[1161], Shafer and Srivastava [1174], Lowrance [892] and Yager [1474]) chose this
approach to build the mathematics of belief functions.

Dempster’s combination Consider now two multivalued mappings Γ1 , Γ2 induc-


ing two belief functions over a same frame Θ, Ω1 and Ω2 their domains and P1 , P2
the associated probability measures over Ω1 and Ω2 , respectively.
Under the assumption that the items of evidence generating P1 and P2 are inde-
pendent, we wish to find the belief function resulting from pooling of the two pieces
3.1 The multiple semantics of belief functions 57

of evidence.
Formally, we need to find a new probability space (Ω, P ) and a multivalued map
from Ω to Θ. The independence assumption allows us to build the product space
(P1 × P2 , Ω1 × Ω2 ): two outcomes ω1 ∈ Ω1 and ω2 ∈ Ω2 then tell us that the an-
swer to Q2 is somewhere in Γ1 (ω1 ) ∩ Γ2 (ω2 ). When this intersection is empty the
two pieces of evidence are in contradiction. We then need to condition the product
measure P1 × P2 over the set of non-empty intersections

Γ1 (ω1 ) ∩ Γ2 (ω2 ) 6= ∅, (3.4)

obtaining:
n o
Ω = (ω1 , ω2 ) ∈ Ω1 × Ω2 Γ1 (ω1 ) ∩ Γ2 (ω2 ) 6= ∅ ,

(3.5)
P = P1 × P2 |Ω , Γ (ω1 , ω2 ) = Γ1 (ω1 ) ∩ Γ2 (ω2 ).

It is easy to see that the new belief function Bel is linked to the pair of belief
functions being combined by Dempster’s rule, as defined in (2.6).
The combination of compatibility relations defined by (3.4) can be called the
‘optimistic’ one, as the beliefs (not the doubts) of different subjects (experts) are
combined. Indeed, this is a hidden assumption of Dempster combination rule (3.5),
as important as the assumption of independence of sources.
An alternative approach supported by Kramosil [768, 775] is based on the dual idea
that doubts are shared instead, and that an outcome θ ∈ Θ is incompatible if its is
considered incompatible by all the experts separately, namely whenever:

Γ1 (ω) ∪ Γ2 (ω) = ∅.

This results what Smets calls disjunctive rule of combinition in his Transferable
Belief Model (Section 4.2.2).

3.1.2 Belief functions as generalised (non-additive) probabilities

Let us go back to Kolmogorov’s classical definition of probability measure (Defini-


tion ??, Chapter ??).
If we relax the third constraint:

if A ∩ B = ∅, A, B ∈ F, then p(A ∪ B) = p(A) + p(B),

to allow the function p to meet additivity only as a lower bound, and restrict our-
selves to finite sets, we obtain what Shafer [1149] called a belief function (see Defi-
nition 3).
Definition 25. Suppose Θ is a finite set, and let 2Θ = {A ⊆ Θ} denote the set of
all subsets of Θ. A belief function (b.f.) on Θ is a function b : 2Θ → [0, 1] from the
power set 2Θ to the real interval [0, 1] such that:
– b(∅) = 0;
58 3 Understanding belief functions

– b(Θ) = 1;
– for every positive integer n and for every collection A1 , ..., An ∈ 2Θ we have
that:
X X
b(A1 ∪ ... ∪ An ) ≥ b(Ai ) − b(Ai ∩ Aj ) + ... + (−1)n+1 b(A1 ∩ ... ∩ An ).
i i<j
(3.6)
It can be proven that [1149]:
Proposition 19. Definitions 25 and 3 are equivalent formulations of the notion of
belief function.
Condition (3.6), called superadditivity, obviously generalizes Kolmogorov’s addi-
tivity (Definition 1). Belief functions can then be seen as generalizations of the
familiar notion of (discrete) probability measure.

3.1.3 Belief functions as inner measures

Belief functions can also be assimilated to inner measures.


Definition 26. Given a probability measure P defined over a σ-field of subsets F
of a finite set X , the inner probability of P is the function P∗ defined by:

P∗ (A) = max{P (B)|B ⊂ A, B ∈ F}, A ⊂ X (3.7)

for each subset A of X , not necessarily in F.


The inner probability value P∗ (A) represents the degree to which the available prob-
ability values of P suggest us to believe in A.
Now, let us define as domain X of the inner probability function (26) the compat-
ibility relation C (3.3) associated with a multi-valued mapping Γ , and choose as
σ-field F on C the collection:

F = {C ∩ (E × Θ), ∀E ⊂ Ω}. (3.8)

Each element of F is the collection of all pairs in C which relate a point of E ⊂ Ω


to a subset of Θ. It is then natural to define a probability measure Q over the σ-field
(3.8) which depends on the original measure P on Ω:

Q:F → [0, 1]
C ∩ (E × Θ) 7→ P (E).

The inner probability measure associated with Q is then the function on 2C :

Q∗ : 2C → [0, 1]
A ⊂ C 7→ Q∗ (A) = max{P (E)|E ⊂ Ω, C ∩ (E × Θ)) ⊂ A}.

We can then compute the inner probability of the subset A = C ∩ (Ω × A) of C


which corresponds to a subset A of Θ as:
3.1 The multiple semantics of belief functions 59

Q∗ (C ∩ (Ω × A)) = max{P (E)|E ⊂ Ω, C ∩ (E × Θ) ⊂ C ∩ (Ω × A)}


= max{P (E)|E ⊂ Ω, ω ∈ E ∧ (ω, θ) ∈ C ⇒ θ ∈ A}
= P ({ω|(ω, θ) ∈ C ⇒ θ ∈ A})
which, by definition of compatibility relation, becomes:

= P ({ω : Γ (ω) ⊂ A}) = b(A),

i.e., the classical definition (??) of the belief value of A induced by a multi-valued
mapping Γ . This connection between inner measures and belief functions appeared
in the literature in the second half of the Eighties ([1099, 1098], [466]).

3.1.4 Belief functions as credal sets

The interpretation of belief values as lower bounds to the true unknown probabil-
ity value of an event generates, in turn, an additional angle of the nature of belief
functions [809]. Belief functions admit the following order relation:

b ≤ b0 ≡ b(A) ≤ b0 (A) ∀A ⊂ Θ, (3.9)

called weak inclusion. A b.f. b is weakly included in b0 whenever its belief values
are dominated by those of b0 for all the events of Θ.
A probability distribution P in which a belief function b is weakly included
(P (A) ≥ b(A) ∀A) is said to be consistent with b [801]. Each belief function b then
uniquely identifies a lower envelope of the set of probabilities consistent with it:

P [b] = {P ∈ P : P (A) ≥ b(A)}, (3.10)

i.e., the set of probability measures whose values dominate that of b on all events
A. Accordingly, the theory of evidence is seen by some authors as a special case of
robust statistics [1138]. This position has been heavily critised along the years.
Convex sets of probabilities are often called credal sets [838, 1542, 255, 34]. A
number of scholars, as it turns out, have argued in favour of belief representation
in terms of convex sets of probabilities, including Koopman [763], Good [526] and
Smith [1279, 1280].
Of course not all credal sets ‘are’ belief functions. The set (3.10) is a polytope in the
simplex of all probabilities we can define on Θ. Its vertices are all the distributions
pπ induced by any permutation π = {xπ(1) , ..., xπ(|Θ|) } of the singletons of Θ of
the form [168, 246]:
X
pπ [b](xπ(i) ) = m(A), (3.11)
A3xπ (i); A63xπ (j) ∀j<i

assigning to a singleton element put in position π(i) by the permutation π the mass
of all focal elements containing it, but not containing any elements preceeding it in
the permutation order [1376].
A landmark work by Kyburg [802] establishes a number of results relating belief
updating in belief theory and credal sets (Section 3.1.4).
60 3 Understanding belief functions

Proposition 20. Closed convex sets of classical probability functions include Shafer’s
probability mass functions as a special case.

As an example of credal set not associated with a belief function he considers


a compound experiment consisting of either (i) tossing a fair coin twice, or (ii)
drawing a coin from a bag containing 40% two headed and 60% two tailed coins and
tossing it twice. The two parts (i) and (ii) are performed in some unknown ratio p, so
that, for example the probability that the first toss lands heads is p∗1/2+(1−p)∗0.4,
0 < p < 1. Let A be the event that the first toss lands heads, and B the event that the
second toss lands tails. The representation by a convex set of probability functions
is straight-forvard, where:

P (A ∪ B) = 0.75 < 0.9 = P (A) + P (B) − P (A ∩ B) = 0.4 + 0.5 − 0.

By Theorem 2.1 of [1149], Bel(A ∪ B) ≥ Bel(A) + Bel(B) − Bel(A ∩ B),


therefore P is not a belief function. It is still possible to compute a mass function,
but the masses assigned to the union of any three atoms must be negative.
Kyburg also shows that the impact of ‘uncertain evidence’ (represented either as
a simple support function, Definition 7, or as a probability ‘shift’ of the kind associ-
ated with Jeffrey’s rule, Section 4.3.3) can be represented by Dempster conditioning
in Shafer’s framework, whereas it is represented in the framework of convex sets of
classical probabilities by classical conditionalization. Finally:

Proposition 21. ([802], Theorem 4) The probability intervals resulting from Dempster-
Shafer updating are included in (and may be properly included in) the intervals that
result from applying Bayesian updating to the associated credal set.

Whether this is a good or a bad thing, he argues, depends on the situation. Exam-
ples in which one of the two operators leads to more ‘appealing’ results are provided.
Paul Black [112] emphasised the importance of Kyburg’s result by looking at
simple examples involving Bernoulli trials, and showing that many convex sets of
probability distributions generate the same belief function.

3.1.5 Belief functions as random sets

Having a multi-valued mapping Γ , a straightforward step is to consider the proba-


bility value P (ω) as attached to the subset Γ (ω) ⊂ Θ: what we obtain is a random
set in Θ, i.e., a probability measure on a collection of subsets (see [535, 534, 923]
for the most complete introductions to the matter). The degree of belief b(A) of an
event A then becomes the total probability that the random set is contained in A.
Random set theory first appeared in the context of stochastic geometry theory,
thanks to the independent works of Kendall [704] and Matheron [923], to be later
developed by Dempster, Nguyen, Molchanov [955] and others into a theory of im-
precise probability.
Roughly speaking, a random set is a set-valued random variable. A useful illus-
trative example is provided a dice where one or more of faces are covered, so that
3.1 The multiple semantics of belief functions 61

we do not know what’s beneath. A cloacked dice is a random variable which “spits”
subsets of possible outcomes: a random set. This approach has been emphasized in
particular by Nguyen ([528], [986, 984]) and Hestir [612], and resumed in [1173].
Consider a multi-valued mapping Γ : Ω → 2Θ . The lower inverse of Γ is
defined as:
Γ∗ : 2Θ → 2Ω
. (3.12)
A 7→ Γ∗ (A) = {ω ∈ Ω : Γ (ω) ⊂ A, Γ (ω) 6= ∅},

while its upper inverse is

Γ ∗ : 2Θ → 2 Ω
. (3.13)
A 7→ Γ ∗ (A) = {ω ∈ Ω : Γ (ω) ∩ A 6= ∅}.

Given two σ-fields (see Chapter 2, Footnote ??) A, B on Ω, Θ respectively, Γ is


said strongly measurable iff ∀B ∈ B, Γ ∗ (B) ∈ A. The lower probability measure
.
on B is defined as P∗ (B) = P (Γ∗ (B)) for all B ∈ B. By Equation (??) the latter is
nothing but a belief function.
Nguyen proved that, if Γ is strongly measurable, the probability distribution P̂
of the random set [984] coincides with the lower probability measure:

P̂ [I(B)] = P∗ (B) ∀B ∈ B,

where I(B) denotes the interval {C ∈ B, C ⊂ B}.


In the finite case the probability distribution of the random set Γ is precisely the
basic probability assignment (Definition 2) associated with the lower probability or
belief function P∗ .
An extensive analysis of the relations between Smets’ Transferable Belief Model
and the theory of random sets can be found in [1241].

3.1.6 Zadeh’s ‘simple view’

In Zadeh’s ‘simple view’ [1530], Dempster-Shafer theory is viewed in the context


of relational databases, as an instance of inference/retrieval techniques for second-
order relations. In the terminology of relational databases, a first-order relation is
a relation whose elements are atomic rather than set-valued, while a second-order
relation associates entries i with sets Di of possible values (e.g., person i has age in
the range [22, 26]).
Given a query set Q, the possibility of certain entries satisfying the query can
be measured (e.g., whether person i has age in the range [20, 25]). Namely for each
entry: (i) Q is possible if the query set intersects the set of possible values for that
entry (which Zadeh calls ‘possibility distribution’); (ii) Q is certain (necessary) if
Di ⊂ Q; (iii) Q is not possible if the two do not intersect.
We can then answer questions of the kind ‘what fraction of entries meet the
query?’ (e.g. what percentage of employees are between 20 and 25 years of age) in
terms of necessity and possibility measures defined as follows:
62 3 Understanding belief functions
X X
N (Q) = pD , Π(Q) = pD ,
D⊂Q D∩Q6=∅

where pD is the fraction of entries whose set of possible values is exactly D. As


such, N (Q) and Π(Q) can be computed with the mere histogram (count) of entries
with the same values, rather than the entire table.
Seen in this perspective, then belief and plausibility measures in Dempster-Shafer
theory are, respectively, the certainty (or necessity) and possibility of the query set
Q in the context of retrieval from a second-order relation in which the data entries
are possibility distributions.

3.2 Genesis and debate


The axiomatic imprint that Shafer gave originally to his work could seem quite ar-
bitrary at a first glance. For example, to Dempster’s rule is not granted a convincing
justification in his seminal book [1149], and it is natural to ask whether a different
rule of combination could be chosen. This question has been faced by several au-
thors ([1359], [1536], [1159] among the others): most of them has tried to give an
axiomatic support to the choice of this mechanism for combining evidence.
Perhaps, the right thing to do is going back to the origins, to the notion of upper and
lower probabilities, introduced in [336], [344] and [345]. What Shafer in fact did,
was reformulate Dempster’s work by identifying his upper and lower probabilities
with epistemic probabilities or degrees of belief, i.e. the quantitative measurement
of one’s belief in a given fact or proposition.
The following sketch of the nature of belief functions is abstracted from [1163]:
another debate on the relation between b.f.s and upper and lower probabilities is
developed in [1263].
In his seminal monograph [1149], Shafer gave an axiomatic foundation to his
theory of probable reasoning calling Dempster’s lower probabilities belief functions,
for their ability to represent a mathematical description of degrees of belief. The ax-
iomatic set up that Shafer originally gave to his work, however, may seem rather
arbitrary at a first glance [1188, 717]. This has sparkled a debate on the interpreta-
tion, rationale and correctness of the theory of evidence which has reached a climax
in the early Nineties, but has arguably not been settled as of 2016.

3.2.1 Shafer’s position and early support

According to Shafer [1169], belief function theory has even older antecedents than
Bayesian theory, as similar arguments appear in the work of George Hooper (1640-
1723) and James Bernoulli (1654-1705), as emphasised in [1160].
In a 1976 paper [1148], Shafer illustrated his rationale for his mathematical
theory of evidence, claiming that the impact of evidence on a proposition may either
support it to various degrees, or cast doubt on it to various possible degrees.
3.2 Genesis and debate 63

Shafer’s proposal for a new theory of statistical evidence and epistemic proba-
bility received rather positive attention [476, 1412]. Fine [476] commented in his
review that the fact that probability takes its meaning from and is used to describe
phenomena as diverse as propensities for actual behavior (the behavioural inter-
pretation), propositional attitudes of belief (subjective degrees of belief) and experi-
mental outcomes under prescribed conditions of unlinked repetitions (the frequentist
view), has long been the source of much controversy, resulting in a dualistic con-
ception of probability as being jointly epistemic (oriented towards ‘subjective’ belief
assessment) and aleatory (focussed on the ‘objective’ description of the outcomes
of ’random’ experiments) with most of the present-day emphasis on the latter.
Lowrance and Garvey [894] were early supporters of the use of Shafer’s math-
ematical theory of evidence as a framework for evidential reasoning in expert sys-
tems.
Gordon and Shortliffe [529] argued that the advantage of belief function theory
over other previous approaches is its ability to model the narrowing of the hypothesis
set with the accumulation of evidence in expert reasoning. There, experts rely on
evidence which typically supports whole subsets of hypothesis in the hypothesis
space at hand.
Strat and Lowrance [1307] focused on the difficulty of generating explanations
for the conclusions drawn by evidential reasoning systems based on belief functions,
and presented a methodology for augmenting an evidential-reasoning system with a
versatile explanation facility.
Curley and Golden [219] constructed a very interesing experiment in legal situ-
ations, in which the belief assessor is interested in judging the degree of support or
justification that the evidence affords hypotheses, to determine (a) if subjects could
be trained in the meanings of belief-function responses; and, (b) once trained, how
they use those belief functions in a legal setting. They found that subjects could use
belief functions, identified limits to belief functions’ descriptive representativeness,
and discovered patterns in the way subjects use belief functions which inform our
understanding of their uses of evidence.

Constructive probability and canonical examples An important evolution of Shafer’s


thought is marked by [1153], where he formulates a constructive interpretation of
probability in which probability judgments are made by comparing the situation at
hand to abstract canonical examples, in which the uncertainties are governed by
known chances. These can for instance be games played repeatedly for which the
limit relative frequencies of the possible outcomes are known (e.g. think of design-
ing a random experiment in hypothesis testing, in such a way that the probability
describing the outcomes can be assumed to be known).
Shafer claims that when one does not have sufficient information to comparison the
situation at hand with a classical, probability governed experiment, other kinds of
examples may be judged to be appropriate and used to give rise to a mathematical
description of the uncertainty involved in terms of belief functions.
The canonical examples from which belief functions are to be constructed are based
on ‘coded messages’ c1 , ..., cn which form the values of a random process with
64 3 Understanding belief functions

prior probabilities p1 , ..., pn [1146]. Each message ci has an associated subset Ai of


outcomes, and carries the message that the true outcome is in Ai . The masses rep-
resenting the current state are simply the probabilities (with respect to this random
process) of receiving a message associated with a subset A. Shafer then offers a set
of measures on belief functions to assist in fitting parameters of the coded message
example to instances of subjective notions of belief.
Hummel and Landy [217] argued that the ‘constructive probability’ techniques de-
scribed by Shafer [1153] disregard the statistical theoretical foundations from which
the theory was derived, exactly because they are based on fitting problems to scales
of canonical examples. While the coded message interpretation is essentially sta-
tistical and implicit in Dempsters work, they argued, the proposed fitting scheme
attempts to apply alternate interpretations to the combination formula based on sub-
jective similarities.
As a matter of fact, Shafer’s presentation before the Royal Statistical Society [1146]
was heavily criticised by several discussants, who asked for a closer connection to
be made between the canonical examples and the interpretation of belief values. As
is reported in [217]: “Prof. Barnard, for example, states that ‘the connections be-
tween the logical structure of the . . . example and the story of the uncertain codes
is not at all clear’. Prof. Williams desires ‘a deeper justification of the method and a
further treatment of unrelated bodies of evidence’, while Prof. Krantz states simply
that ‘comparison of evidence to a probabilistically coded message seems strained’.
Prof. Fine summarizes the problem by stating that ‘the coded message interpreta-
tion is ignored when actually constructing belief functions, calling into question the
relevance of the canonical scales’ ”.

3.2.2 Bayesian versus belief reasoning

Much analysis has obviously been directed at understanding or highlighting the


fundamental difference between classical Bayesian and belief function reasoning
[1263, ?].
Scozzafava [1133] argued (1993) that if the Bayesian approach is set up taking
into account all the relevant aspects, i.e., if there is no beforehand given structure
on the set of events, and if probability is interpreted as degree of belief, then the
Bayesian approach leads to very general results that fit (in particular) with those
obtained by the belief function model.
In 2000 Wakker [1364] argued that a ‘principle of complete ignorance’ plays
a central role in decisions based on Dempster belief functions, in that whereas the
Bayesian approach requires that uncertainty on the true state of nature be proba-
bilised, belief functions assume complete ignorance, permiting strict adherence to
the available data.
Belief in belief functions: an examination of Shafer’s canonical examples (13)
Laskey [] In the canonical examples underlying Shafer-Dempster theory, beliefs
over the hypotheses of interest are derived from a probability model for a set of
auxiliary hypotheses. Beliefs are derived via a compatibility relation connecting the
auxiliary hypotheses to subsets of the primary hypotheses. A belief function differs
3.2 Genesis and debate 65

from a Bayesian probability model in that one does not condition on those parts
of the evidence for which no probabilities are specified. The significance of this
difference in conditioning assumptions is illustrated with two examples giving rise
to identical belief functions but different Bayesian probability distributions.
Lindley [857] brought forward arguments, based on the work of Savage and De
Finetti, purporting to show that probability theory is the unique correct description
of‘uncertainty. Other authors have argued that the additivity axiom is felt to be too
restrictive in case one has to deal with uncertainty deriving from partial ignorance
[802].
In a paper published on Cognitive Science, Shafer and Tversky [1175] described
and compared the semantics and syntax of the Bayesian and belief function lan-
guages, and investigated designs for probability judgment afforded by the two lan-
guages.
Klopotek and Wierzcho [] claimed in 2002 that previous attempts to interpret
belief functions in terms of probabilities had failed to produce a fully compatible
interpretation [?, ?, ?], and proposed in reponse three models: a ‘marginally correct
approximation’, a ‘qualitative’ and a ‘quantitative’ model.
One of the very first counterintuitive parts of belief function is its definition, it
takes a simple minded ”direct” sum of the measures of focal elements. In this com-
putation the measures of the intersections are ignored. As a consequence atomic in-
tersections of focal elements (the intersection contains no other focal elements) have
zero belief measure. One may wonder: is belief function a reasonable and mathe-
matically sound measure? Our answer is yes. Belief functions can be viewed as a
generalization of random variables, however, the generalization sounds too ”nave”
in the first impression: In classical theory, the inner probability is the least upper
bound of the direct sum of the probabilities of all masses. Belief functions adopt the
same definition; it is, the least upper bound of the direct sum of the basic probabil-
ities of all focal elements. Indeed, they are the same, however, in classical theory
all masses are disjoints, while focal elements may not. The goal of this paper is to
show that in spite of these, belief functions are sound measures.
We use measure theoretic methods to describe the relationship between the
Dempster Shafer (DS) theory and Bayesian (i.e. probability) theory. Within this
framework, we demonstrated the relationships among Shafer’s belief and plausibil-
ity, Dempster’s lower and upper probabilities and inner and outer measures. Demp-
ster’s multivalued mapping is an example of a random set, a generalization of the
concept of the random variable. Dempster’s rule of combination is the product mea-
sure on the Cartesian product of measure spaces. The independence assumption of
Dempster’s rule arises from the nature of the problem in which one has knowledge
of the marginal distributions but wants to calculate the joint distribution.

3.2.3 Pearl’s criticism

In ‘Perspectives on the theory and practice of belief functions’ [1165] Shafer re-
viewed in 1992 the work conducted until then to the interpretation, implementation
66 3 Understanding belief functions

and mathematical foundations of the theory, placing belief theory within the broader
topic of probability and the probability itself within artificial intelligence.
Wasserman [1394], although agreeing with Shafer that there are situations where
belief functions are appropriate, raised a number of questions about these objects,
motivated by statistical considerations. He argued that the betting paradigm has a
status in the foundations of probability of a different nature than that of the canonical
examples that belief function theory is built on. In addition to using the betting
metaphor to make judgments, we use this analogy at a higher level to judge the
theory as a whole. The author’s thesis is that a similar argument would make belief
functions easier to understand.
Wasserman also questioned the separation of belief from frequency, arguing that it
is a virtue of subjective probability that it contains frequency probability as a special
case, by way of de Finetti’s theory of exchangeability, and mentioned the notion of
asymptotics in belief functions.
Judea Pearl [1022, 1024, 1026, 1020] also contributed to the debate in the early
Nineties. He claimed that belief functions have difficulties representing incomplete
knowledge, in particular knowledge expressed in conditional sentences, and that
encoding if-then rules as belief function expressions (a standard practice at the time)
leads to counterintuitive conclusions. As for the belief function updating process,
he found that the latter violates what he called ‘basic patterns of plausibility’ and
the resulting beliefs cannot serve as a basis for rational decisions. As for evidence
pooling, although belief functions offer in his view a rich language for describing
the evidence, the available combination operators cannot exploit this richness and
are challenged by simpler methods based on likelihood functions.
Detailed answers to several of the criticisms raised by Pearl were provided by
Smets in 1992 [1240], within his transferable belief model interpretation.
In the same year Dubois and Prade tried to clarify some aspects of the theory of
belief functions, addressing most of the questions raised by Pearl in [1022, 1026].
They pointed out that their mathematical model can be useful beyond a theory of
evidence, for the purpose of handling imperfect statistical knowledge. They com-
pared Dempster’s rule of conditioning with upper and lower conditional probabil-
ities, concluding that Dempster’s rule is a form of updating, whereas the second
operation expresses ‘focussing’ (see Section 4.3). Finally, they argued that the con-
cept of focusing models the meaning of uncertain statements in a more natural way
than updating.
Wilson [1419] also responded to Pearl’s criticisms. He noted that Pearl criti-
cised belief functions for not obeying the laws of Bayesian belief, whereas these
laws lead to well-known problems in the face of ignorance, and seem unreasonably
restrictive. He argued that it is not reasonable to expect a measure of belief to obey
Pearl’s sandwich principle, whereas the standard representation of ‘if-then’ rules in
Dempster-Shafer theory, criticised by Pearl, is in his view justified and compares
favorably with a conditional probability representation.
Shafer [1167] addressed Pearl’s remarks by arguing that the interpretation of
beliefs function is controversial because the interpretation of probability is contro-
3.2 Genesis and debate 67

versial. In this work he summarised his constructive interpretation of probability,


probability bounds, and beliefs functions and explained how this interpretation bears
on many of the issues raised.
A further rebuttal by Pearl [1027] responded to Shafer’s comments and dis-
cussed the degree to which his earlier conclusions affect the applicability of belief
functions in automated reasoning tasks.
In Provan’s reply to Pearl and Shafer (1992) [1052] the author noted that Pearl
shows the theory’s deficiencies when dealing with common-sense reasoning in a
‘process-independent’ manner. Although correct under the assumptions stated, the
argument is weakened byPearl’s questioning of whether a process-independent se-
mantics is always necessary or desirable.
As he also pointed out, Shafer claims that multiple uncertainty representations are
necessary, and they should be developed in parallel by defining domains in which
each representation is best suited. In contrast, Pearl implicitly claims that proba-
bility theory alone is necessary, unless the use of another representation (such as
Dempster-Shafer theory) is shown to be clearly advantageous.

3.2.4 Issues with multiple interpretations

The fact that belief functions possess multiple interpretations, all of which sensi-
ble from a certain angle, has ignited, especially in the early Nineties, a debate to
which many scholars have contributed. Halpern and Fagin, for instance, underlined
in [588] two different views of belief functions, as generalized probabilities (cfr.
Sections 3.1.2 and 3.1.3) and as mathematical representation of evidence (that we
completely neglected in our brief summary of Chapter 2). Their claim is that many
problems about the use of belief functions can be explained as a consequence of a
confusion of these two interpretations. As an example, they cite comments by Pearl
[1024, 1025] and others that belief theory leads to incorrect or counterintuitive an-
swers in a number of situations.
Philippe Smets was particularly active in this debate. In [1244] he gave an axiomatic
justification of the use of belief functions to quantify partial beliefs, while in [1240]
he rebuffed Pearl’s criticisms [1024] by accurately distinguishing the different epis-
temic interpretations of the theory of evidence (resounding Halpern et al. in [588]),
focusing in particular on his transferable belief model (Section 3.3.1).
In 1992, Halpern and Fagin [589] argued that there are at least two different ways
of understanding belief functions: as generalised probability functions (technically,
as inner measures induced by a probability function), or as a way of representing
evidence which, in turn, can be understood as a mapping from probability functions
to probability functions.
Under the first interpretation, they argue, it makes sense to think of updating a be-
lief function because of its nature of generalised probability. If we think of belief
functions as a mathematical representation of evidence, instead, using combination
rules to merge two belief functions is the natural thing to do. Problems that have
been pointed out with the belief function approach, therefore, can be explained as a
consequence of confounding these two semantics.
68 3 Understanding belief functions

A comment on the dual interpretation of belief theory was provided by Lingras


and Wong (1990) [858]. Whereas the ‘compatibility’ interpretation constructs the
belief function for a frame of discernment by using the compatibility relation of
the frame with another frame for which a probability function is known (Section
3.1.1), the ‘probability allocation’ view is a generalisation of Bayesian theory in
which probability mass is assigned to propositions, based on some body of evidence.
The authors argue that the first interpretation is useful when limited information
regarding the relationship between the two frames of discernment is available, while
the second one comes in when the evidence cannot be explicitly expressed in terms
of propositions.
Back in 1991, Smets [1237] considered the many interpretations of Dempster-
Shafer’s theory: within classical probability, as a theory of upper and lower proba-
bilities, Dempster’s model, his own transferable belief model, the evidentiary value
model, the provability or necessity model, studying both the knowledge representa-
tion and the belief update aspects of these frameworks.
Smets (1994) [1246] also tackled the issue of the multiple mathematical models
proposed to describe of an agent’s degrees of belief, including Bayesian theory, the
upper and lower probabilities model, Dempster’s belief functions, the evidentiary
value model and the probability of modal propositions.
He argued that none of these models is the best, for each has its own domain of
application. By means of examples he highlighted the underlying hypotheses that
lead to the selection of an adequate model for a given problem. His criterion for
the choice of the appropriate model is: if a probability measure exists and can be
identified, the Bayesian model is to be preferred; if a probability measure exists
but its values cannot be precisely estimated, upper and lower probability models
(of which Dempster’s original proposal is a special case) are an adequate choice;
finally, a probability measure is not known to exists, the Transferable Belief Model
should be used as it formalises an agent’s belief without reference to an underlying
unknown probability.

3.2.5 Rebuttals and justifications

A number of authors have proposed rebuttals to these criticisms [1416], sometimes


based (as in Smets’ case) on an axiomatic justification that cuts all bridges with the
initial statistical rationale.
A very comprehensive rebuttal of criticisms towards evidential reasoning based
on Dempster-Shafer calculus was brought forward by Ruspini, Lowrance and Strat
in 1992 [1098], addressing theoretical soundness, decision support, evidence com-
bination, complexity, and including a detailed analysis of the so-called ‘paradoxes’
generated by the application of the theory.
They showed that evidential reasoning can be interpreted in terms of classical prob-
ability theory, and belief function theory considered a generalisation of probabilistic
reasoning based on the representation of ignorance by intervals of possible values,
without resorting to nonprobabilistic or subjectivist explanations.
They pointed out a confusion between the (then) current state of development of the
3.2 Genesis and debate 69

theory (especially in what was then the situation for decision making) and its po-
tential usefulness. They also considered methodological criticisms of the approach,
focusing primarily on the alleged counterintuitive nature of Dempster’s combination
formula, showing that such results are the result of its misapplication.
Philippe Smets [1250] wrote extensively on the axiomatic justification of the
use of belief functions [1244]. Essentially, he postulated that degrees of belief are
quantified by a function in [0,1] that give the same degrees of beliefs to subsets
that represent the same propositions according to an agent’s evidential corpus. The
impact of coarsening and refining a frame of discernment is derived, as is the condi-
tioning process. A closure axiom is proposed that asserts that any measure of belief
can be derived from other measures of belief defined on less specific frames.
In [8, 10], we present an axiomatic justification for the fact that quantified beliefs
should be represented by belief functions. We show that the mathematical function
that can represent quantified beliefs should be a Choquet capacity monotone of or-
der 2. In order to show that it must be monotone of order infinite, thus a belief
function, we propose several extra rationality requirements. One of them is based
on the negation of a belief function, a concept introduced by Dubois and Prade [2].
Shafer himself [4] produced a very extensive, and somewhat frustrated, docu-
ment addressing the various contributions to the discussion on the interpretation of
belief functions. The main argument is that disagreement on the ‘correct’ meaning
of belief functions boils down, fundamentally, to a lack of consensus on how to
interpret probability, as belief functions are built on probability. In response, he il-
lustrated his own constructive interpretation of probability, probability bounds and
belief functions and related it to the views and concerns of Pearl, Smets, Ruspini
and others making use of the main canonical examples for belief functions, namely
the partially reliable witness and its generalization, the randomly coded message.
Neapolitan [981] further elaborated on Shafer’s defense in two ways: (1) by
showing that belief functions, as Shafer intends them to be interpreted, use proba-
bility theory in the same way as the traditional statistical tool, significance testing;
and (2) describing a problem for which the application of belief functions yields a
meaningful solution, while a Bayesian analysis does not.

3.2.6 Agenda(s) for the future


In their foreword to a volume on ‘Classic works of the Dempster-Shafer theory of
belief functions’ [1496], Dempster and Shafer consider that, although flourishing
by some measures, belief function theory had still not addressed questions such
as deciding whether bodies of evidence are independent, or what to do if they are
dependent. The lamented an ongoing confusion and disagreement about how to in-
terpret the theory, and its little acceptance in the field of mathematical statistics,
where it first began.
They proposed an agenda to move the theory forward, composed of the following
three actions:
– a richer understanding of the uses of probability, for they believe the theory is
best regarded as a way of using probability [1153, 341, 1175].
70 3 Understanding belief functions

– a richer understanding of statistical modeling [372], which goes beyond tradi-


tional statistical analysis which begins by specifying probabilities that are sup-
posed known except for certain parameters.
– in-depth examples of sensible Dempster-Shafer analyses of a variety of problems
of real scientific and technological importance.
summarise
Interesting examples in which the use of belief functions provides sound and
elegant solutions to real life problems, essentially characterised by ’missing’ infor-
matio, are given by Smets in [1257]. These include: classification problems in which
the training set is such that classes are only partially known; an information retrieval
system handling inter-documents relationships; the combination of data from sen-
sors competent on partially overlapping frames; the determination of the number of
sources in a multi-sensor environment by studying the intersensors conflict.
In an interesting although rather overlooked recent work [443], Shafer highlights
two ways of interpreting numerical degrees of belief in terms of betting: (i) you
can offer to bet at the odds defined by the degrees of belief, or (ii) you can make
the judgement that a strategy for taking advantage of such betting offers will not
multiply the capital it risks by a large factor. Both interpretations, he argues, can be
applied to ordinary probabilities and used to justify updating by conditioning, while
only the second can be applied to belief functions and used to justify Dempsters rule
of combination.

3.3 Frameworks
A number of researchers have proposed their own variation on the theme of be-
lief functions, with various degrees of success. Arguably the frameworks with
the highest impact are: Smets’ Transferable Belief Model (TBM) (Section 3.3.1);
Kohlas and Monnet’s Theory of Hints (Section 3.3.2), and the so-called Dezert-
Smandarache Theory (DSmT, Section 3.3.3).
Other significant frameworks, thanks to the mathematical interest or founda-
tional contribution include Dempster and Liu’s Gaussian belief functions (Sec-
tion 3.3.4); Ivan Kramosil’s probabilistic analysis (Section 3.3.5), Lowrance and
Strat’s approach (Section 3.3.8) and Grabisch’s lattice-theoretical formulation (Sec-
tion 3.3.8). A number of scientists have analysed the extension of the theory of
evidence to interval or credal belief structures (Section 3.3.7).
We conclude the section by surveying in less detail other proposed frameworks
in Section 3.3.8.

3.3.1 Smets’ Transferable Belief Model

In his 1990’s seminal work [1218] Philippe Smets introduced the Transferable Belief
Model (TBM) as a framework based on the mathematics of belief functions for the
quantification of a rational agent’s degrees of belief [1251]. The TBM is by far the
3.3 Frameworks 71

approach to the theory of evidence which achieved the largest diffusion and impact.
Its philosophy, however, is very different from Dempster’s (and Shafer’s) original
probabilistic semantics for belief functions, as it cuts all ties with the notion of an
underlying probability measure to employ belief functions directly to represent an
agent’s belief.
As usual here we have room only for a brief introduction to the principles of
the TBM. Aspects of the framework which concern evidence combination ([1225],
Section ??), probability transformation (Section 4.4.2), conditioning (Section 4.3.2)
and decision making (Section 4.5.1) are discussed in more detail in Chapter 4. In
[1276, 1221] (but also [1255] and [1277]) the reader will find an extensive explana-
tion of the features of the transferable belief model. An interesting criticism of the
TBM in terms of Dutch books is conducted by Snow in [1283].
As far as applications are concerned, the transferable belief model has been
employed to solve a variety of problems, including data fusion [1267], diagnostics
[1252] and reliability issues [1242]. In [418] Dubois et al. used the TBM approach
on an illustrative example: the assessment of the value of a candidate.

Credal and pignistic levels In the TBM, beliefs are represented at two distinct
levels:
1. a credal level (from the Latin word credo for ‘I believe’) where the agent’s
degrees of beliefs in a phenomenon are maintained as belief functions;
2. a pignistic level (from Latin pignus, ‘betting’) where decisions are made,
through an appropriate probability function, called the pignistic function:
X m(B)
BetP (A) = . (3.14)
|B|
B⊆A

The credal level At the credal level, each agent is characterised by an ‘evidential
corpus’, a collection of pieces of evidence they have collected in their past. This ev-
idential corpus has an effect on the frame of discernment associated with a certain
problem (e.g., who is the culprit of a certain murder). As in logic-based approaches
to belief theory (compare Chapter 5, Section 5.6), a frame of discernment Θ is a
collection of possible worlds (‘interpretations’, in the logical language), determined
by the problem at hand, and a (logical) proposition is mapped the subset of possible
worlds in which it holds true.
The basic assumption postulated by the TBM is that the impact of a piece of evi-
dence on an agent’s degrees of belief consists in an allocation of parts of an initial
unitary amount of belief among all the propositions in the frame of discernment.
The mass m(A) is the part of the agent’s belief that supports A, i.e. that the ‘actual
world’ θ ∈ Θ is in A ⊂ Θ and that, due to lack of information, does not support any
strict subproposition of A.
As in Shafer’s work, then, each piece of evidence directly supports a proposition. To
underline the lack of any underlying probabilistic model, Smets uses the terminol-
ogy ‘basic belief assignment’, rather then Shafer’s ‘basic probability assignment’,
to refer to mass assignments. Note that Smets does not claim that every form of
72 3 Understanding belief functions

uncertainty or imprecision can be represented by a belief function, but that uncer-


tain evidence induces a belief function in an agent’s belief system [1277]. Belief is
transferred in the model whenever sharp information is available (of the form ‘B is
true’) via Dempster’s (unnormalised) rule of conditioning:
P
m(A|B) = C⊆B m(A ∪ C) A⊆B
0 otherwise,

justifying the model’s name.


For a discussion of evidence combination in the TBM, we refer the reader to
Chapter 4, Section 4.2.2.

The pignistic level At the pignistic level, the pignistic function is axiomatically
derived by Smets in [1232], as the only transformation which meets a number of
rationality requirements:
– the probability value BetP (x) of x only depends on the mass of propositions
containing x;
– BetP (x) is a continuous function of m(B), for each B containing x;
– BetP is invariant to permutations of the elements of the frame Θ;
– the pignistic probability does not depend on propositions not supported by the
belief function which encodes the agent’s beliefs.
Note that, in time, Smets has justified the pignistic transform first in terms of the
Principle of Insufficient Reason, then as the only (sic) transform satisfying a linear-
ity constraint. However, as we show in Chapter 10, the pignistic transform is not the
only probability approximation to commute with convex combination.
Decisions are finally made in a utility theory setting, in which Savage’s axioms hold
[1113] (cfr. Section 4.5.1).
An interesting point Smets makes is that, although beliefs are necessary ingredi-
ents for our decisions, that does not mean that beliefs cannot be entertained without
being manifested in terms of actual behaviour [1281]. In his example, one may have
some beliefs about the status of a traffic light in a particular road in Brussels even
though they are not in Brussels at the moment and they do not intend to make a
decision based on this belief. This is in stark constrast with behavioural interpreta-
tions of probability, of which Walley’s imprecise probability theory is a significant
example (as we will see in Section 5.1).
We will also see in Chapter 4, Section 4.4.2 that the pignistic transform is only
one of many possible probability transforms, each of which comes with a different
rationale.

Unnormalised belief functions Within the TBM, positive basic belief values
can be assigned to the empty set itself, generating unnormalized belief functions
[1239, 1238]. Unnormalised belief functions are indeed a sensible representation
under an ‘open-world’ assumption that the hypothesis set (frame) itself is not known
with certainty (in opposition to the ‘closed-world’ situation in which all alternative
hypotheses are perfectly known).
3.3 Frameworks 73

Unnormalised belief functions, and (pseudo) belief functions which negative


mass assignments will also arise in the geometric approach described in this Book.

TBM vs other interpretations of belief theory As we mentioned, the transfer-


able belief model should not be considered as a generalised probability model - in
the TBM no links are assumed between belief functions and any underlying prob-
ability space (although they may exist). It is rather a normative model for a ‘ratio-
nal’ agent’s subjective beliefs about the external world. In [1237], Smets explictly
stresses this point when comparing his creature to other interpretations of Dempster-
Shafer theory, such as the classical probability model, Dempster’s upper and lower
probability model, Pearl’s probability of provability (basically amounting to a modal
logic interpretation of belief theory, Section 5.6.6) and Ekelof/Gardenfors’ eviden-
tiary value model [7], among others. The final message is a strong criticism of the
careless use of Dempster’s combination when mixed with belief functions’ interpre-
tation as lower bounds to an unknown probability distribution.

3.3.2 Kohlas and Monney’s theory of hints

Kohlas and Monney introduced a notion of belief functions on real numbers in-
spired by Dempster’s multi-valued setting. Indeed, some of the relations introduced
in [1222] and [756, 745] had already been introduced in [344]. This idea has de-
veloped into their mathematical theory of hints [744, 753, 747, 959] (see [745] as
introduction, and the monograph [756] for a detailed exposition). Hints ([753]) are
bodies of information inherently imprecise and uncertain, that do not point to pre-
cise answers but are used to judge hypotheses, leading to support and plausibility
functions similar to those introduced by Shafer. The following introduction to the
theory of hints is abstracted from [374].

Functional models Functional models [756] describe the process by which a datum
x is generated from a parameter θ and some random element ω. The set of possible
values of the data x is denoted by X, whereas the domain of the parameter θ is
denoted by Θ and the domain of the random element ω is denoted by Ω.
The data generation process is specified by a function

f : Θ × Ω → X.

If θ is the correct value of the parameter and the random element ω occurs, then
the data x is uniquely determined by x = f (θ, ω). The function f together with a
probability measure p : Ω → [0, 1] constitute a functional model for a statistical
experiment E.
A functional model induces a parametric family of probability distributions (statis-
tical specifications) on the sample space X, which is usually assumed a priori in
modelling statistical experiments, via:
X
pθ (x) = p(ω). (3.15)
ω:x=f (θ,ω)
74 3 Understanding belief functions

Note that different functional models may induce the same statistical specifications,
i.e., they contain more information that the families of probability distributions
(3.15).
Assumption-based reasoning Consider an experiment E represented by a func-
tional model x = f (θ, ω) with given probabilities p(ω) for the random elements.
Suppose that the observed outcome of the experiment is x. What can be inferred
about the value of the unknown parameter θ?
The basic idea of assumption-based reasoning is to assume that a random ele-
ment ω generated the data, and then determine the consequences of this assumption
on the parameter. The observation x induces an event in Ω, namely:
.
vx = {ω ∈ Ω|∃θ ∈ Θ : x = f (θ, ω).}
Since we know that vx ⊂ Ω has happened, in a Bayesian setting we need to con-
dition the initial probabilities p(ω) with respect to vx , obtaining p0 (ω) = Pp(ω)
(vx ) ,
0 0
P
whose probability measure is trivially P (A) = ω∈A p (ω).
Note that it is unknown what element ω ∈ vx has actually generated the ob-
servation. Assuming ω was the cause, the possible values for the parameter θ are
obviously restricted to the set:
Tx (ω) = {θ ∈ Θ|x = f (θ, ω)}.
Summarising, an observation x in a functional model f, p generates a structure
Hx = (vx , P 0 , Tx , Θ), (3.16)
which Kohlas and Monney call a hint.
A theory of hints In general, if Θ denotes the set of possible answers to a question
of interest, then a hint on Θ is a quadruple of the form H = (Ω, P, Γ, Θ), where Ω
is a set of assumptions, P is a probability measure on Ω reflecting the probability
of the different assumptions, and Γ is a mapping between the assumptions and the
power set of Θ, Γ : Ω → 2Θ . If assumption ω is correct, then the answer in certainly
within Γ (ω).
Note that the mathematical setting of hints is identical to Dempster’s multivalued
mapping framework described in Section 3.1.1. Degrees of support and plausibility
can then be computed as in (3.1), (3.2).
What is (arguably) different is the interpretation of degrees of support/plausibility.
While Dempster interpreted them as lower and upper bounds to the amount of prob-
ability assigned to A, Kohlas and Monney do not assume that there exists an un-
known true probability of A, but rather adopt Pearl’s probability of provability in-
terpretation [1023] (see Section 5.6.7), related to the classical AI paradigm called
‘truth-maintenance systems’ [811]. These systems contain a symbolic mechanism
for identifying the set of assumptions needed to create a proof of a hypothesis A,
so that when probabilities are assigned to the assumptions support and plausibility
functions can be obtained. In the theory of hints, these assumptions form an ar-
gument for the hypothesis A, and their probability is the weight assigned to each
argument.
3.3 Frameworks 75

Functional models and hints Consider again a functional model f (ω, θ) and an
observed datum x. Given a hint (3.16), any hypothesis H ⊆ Θ regarding the correct
value of the parameter can then be evaluated with respect to it.
The arguments for the validity of H are the chance elements in the set ux = {ω ∈
vx : Tx (ω) ⊆ H}, with degree of support P 0 (ux (H)), those compatible with H are
vx = {ω ∈ vx : Tx (ω) ∩ H 6= ∅}, with degree of plausibility P 0 (vx (H))
Conversely, the concept of hint to represent the functional model itself as well
as the observed data. Therefore, not only the result of the inference can be expressed
in terms of hints, but also the experiment that is used to make the inference can be
expressed with hints.
A functional model (??) can be represented by the hint:

Hf = (Ω, P, Γf , X × Θ)

where
Γf (ω) = {(x, θ) ∈ X × Θ|x = f (θ, ω)},
while an observation x can be represented by the hint

Ox = ({v}, P, Γ, X × Θ)

where v is the assumption stating that x has been observed, which is true with prob-
ability P ({v}) = 1, and Γ (v) = {x} × Θ.
This equation is justified by the fact that no restriction can be imposed on Θ when
the observed value x is the only piece of information that is being considered. The
hints Hf and Ox represent two pieces of information that can be put together in
order to determine the information on the parameter that can be derived from the
model and the data, resulting in the combined hint Hf ⊕ Ox .
By marginalising to Θ we obtain the desired information on the value of θ: it is easy
to show that the result is (3.16).
By extension, the hint derived by a series of n repeated experiments Ei under
functional models fi , i = 1, ..., n, whose outcomes are x1 , ..., xn can be written as:

Hx1 ,...,xn = Hx1 ⊕ · · · ⊕ Hxn .

Probabilistic assumptions-based reasoning is then a natural way to conduct statisti-


cal inference [374].

3.3.3 Dezert-Smarandache Theory (DSmT)

The basis of the DSmT [394, 395, 397, 845] is the refutation of the principle of
the third excluded middle, since for many problems (especially in sensor fusion)
the nature of hypotheses themselves (i.e., the elements of the frame of discernment)
is known only vaguely. As a result a ‘precise’ description of the set of possible
outcomes is difficult to obtain, so that the exclusive elements θ cannot be properly
identified or separated.
76 3 Understanding belief functions

Hyperpowersets, free and hybrid models A cornerstone of DSmT is the notion


of hyperpowerset [396], or ‘Dedekind lattice’.
Definition 27. [397] Let Θ = {θ1 , ..., θn } be a frame. The hyperpowerset DΘ is
defined as the set of all composite propositions built from elements of Θ using the ∪
and ∩ operators such that:
1. ∅, θ1 , · · · , θn ∈ DΘ ;
2. if A, B ∈ DΘ then A ∪ B and A ∩ B belong to DΘ ;
3. no other elements belong to DΘ , except those constructed using rules 1. and 2.
In the above definition there seems to be an implicit assumptions that the elements
of the set belong to a Boolean algebra. Indeed, as the authors state, ‘The generation
of hyper-power set ... is closely related with the famous Dedekind’s problem .. on
enumerating the set of isotone Boolean functions.’. An upper bound to the cardinal-
|Θ|
ity of DΘ is obviously 22 .
For example, when Θ = {θ1 , θ2 , θ3 } has cardinality 3 the hyperpowerset has as
elements:

∅, θ1 , ..., θ3 , θ1 ∪ θ2 , ..θ2 ∪ θ3 , · · · θi ∩ θj , · · · θ1 ∪ θ2 ∪ θ3 ,

and therefore has cardinality 19. In general it will follow the sequence of Dedekind
numbers1 1,2,5,19,167,7580, ... Note that classical complement is not contemplated,
as DSmt refuses the law of excluded third.
A hyperpowerset is also called a free model in DSmT. If the problem at hand
allows some constraints to be enforced on the hypotheses which form the frame
(e.g. the non existence or the disjointness of some elements of DΘ ) we obtaine a
so-called hybrid model. The most restrictive hybrid model is the usual frame of
discernment of Shafer’s formulation, in which all hypotheses are disjoint.

Generalised belief functions


Definition 28. Let Θ be a frame. A generalised basic belief assignment is a map
m : DΘ → [0, 1] such that m(∅) = 0 and A∈DΘ m(A) = 1.
P

The generalised belief and plausibility functions are then, trivially:


X X
Bel(A) = m(B), P l(A) = m(B).
B⊆A,B∈D Θ B∩A6=∅,B∈D Θ

For all A ∈ DΘ it still holds that Bel(A) ≤ P l(A) - however, in the free model
(whole hyperpowerset) we have P l(A) = 1 for all A ∈ DΘ , A 6= ∅.
1
Sloane N.J.A., The On-line Encyclopedia of Integer Sequences 2003, (Sequence No.
A014466), http://www.research.att.com/njas/sequences/.
3.3 Frameworks 77

Rules of combination Under the free DSmT mode, the rule of combination be-
comes: X
m(C) = m1 (A)m2 (B) ∀C ∈ DΘ ,
A,B∈D Θ ,A∩B=C

where this time A and B are elements of the hyperpowerset (i.e., conjunctions and
disjunctions of elements of the initial frame). Just like the TBM’s disjunctive rule,
this rule of combination is commutative and associative.
Obviously, the formalism is potentially very computationally expensive. The au-
thors, however, note that in most practical applications bodies of evidence allocate
a basic belief assignment only to few elements of the hyper-power set.
In case of a hybrid model things get quite more complicated. The rule of com-
bination becomes:
h i
m( A) = φ(A) S1 (A) + S2 (A) + S3 (A)

where φ(A) is the ‘characteristic non-emptiness function’ of a set A, which is 1 for


all sets A which have been forced to be empty under the constraints of the model, 0
otherwise, and:
X Y k
S1 (A) = mi (Xi )
X1 ,...,Xk ∈D Θ i=1
(X1 ∩...∩Xk )=A

(S2 and S3 have similar expressions, which depend on the list of sets forced to be
empty by the constraints of the problem).
A generalization of the classic combination rules to DSm hyper-power sets was
also proposed by Daniel in [312]. Daniel has also contributed to the development of
DSmT in [314].
Interestingly, an extension of theory of evidence with non-exclusive elementary
propositions was separately proposed by Horiuchi in 1996 [622].

3.3.4 Gaussian (linear) belief functions

The notion of Gaussian belief function [864] was proposed by A. Dempster [339,
1166] and formalized by L. Liu in 1996 [868].
Technically, a Gaussian belief function is a Gaussian distribution over the mem-
bers of the parallel partition of an hyperplane (Figure 3.3). The idea is to encode
each proposition (event) as a linear equation, so that all parallel sub-hyperplanes of
a given hyperplane are possible focal elements and a Gaussian belief function is a
Gaussian distribution over these sub-hyperplanes. As focal elements (hyperplanes)
cannot intersect, the framework is less general than stadard belief theory, where the
focal elements normally have non-empty intersection. However, it is more general
than Shafer’s original finite formulation as f.e.s form a continuous domain.
By adapting Dempster’s rule to the continuous case, Liu also derived a rule
of combination and proved its equivalence to Dempster’s geometrical description
[339]. In [869], Liu proposed a join-tree computation scheme for expert systems
78 3 Understanding belief functions

Fig. 3.3. Graphical representation of the concept of Gaussian belief function (from [868])
.

using Gaussian belief functions, for he proved their rule of combination satisfies the
axioms of Shenoy and Shafer [1188].
The framework was applied to portfolio evaluation in [870].

3.3.5 Kramosil’s probabilistic interpretation

The theory of evidence can be developed in an axiomatic way quite independent of


probability theory (Section 3.1.2), by way of axioms which come from a number
of intuitive requirements a sensible uncertainty calculus should meet. On the other
side, we learned that belief theory can be seen as a sophisticated application of
probability theory in the random set context (Section 3.1.1 and 3.1.5).
Starting from this point of view, Ivan Kramosil [768, 789, 781, 778, 773, 783,
785] published a number of papers in which he exploited measure theory to expand
the theory of belief functions beyond its classical scope. The topics of his investiga-
tion vary from Boolean and non-standard valued belief functions [788, 783], with
application to expert systems [773], to the extension of belief functions to countable
sets [787] or the introduction of a strong law of large numbers for random sets [791].
Unfortunately, Kramosil’s work does not seem to have received sufficient recog-
nition. A complete analysis of Kramosil’s random-set approach is obviously beyond
the scope of this Chapter. An extensive review of Kramosil’s work can be found in
a series of technical reports by the Academy of Sciences of the Czech Republic
[775, 777, 776]. Here we just briefly mention a few interesting contributions.

Belief functions induced by partial compatibility relations For instance, belief


functions induced by partial generalized compatibility relations are discussed in
[777], Chapter 8. Given a standard (‘total’) compatibility relation C ⊂ Ω × Θ,
3.3 Frameworks 79

its total extension is the relation C ∗ ⊂ P(Ω) × P(Θ) such that (X, Y ) ∈ C ∗ iff
there exist x ∈ X, y ∈ Y such that (x, y) ∈ C. A partial generalized compatibility
relation is then a mapping is a relation defined on a proper subset of P(Ω) × P(Θ)
that can be extended to a total relation.

Signed belief functions A note is due to the notion of signed belief function [774],
in which the domain of classical belief functions is replaced by a measurable space
equipped by a signed measure, i.e. a σ-additive set function which can take values
also outside the unit interval, including the negative and infinite ones. An assertion
analogous to the Jordan decomposition theorem for signed measures is stated and
proved [785], according to which each signed belief function, when restricted to its
finite values, can be defined by a linear combination of two classical probabilistic
belief functions, supposing that the basic set is finite.
A probabilistic analysis of Dempster’s rule is developed [789], and an extension of
Dempster’s rule to signed belief functions is formulated [784].

3.3.6 Hummel and Landy’s statistical view

In [217], Hummel and Landy provided an interpretation of evidence combination


in belief theory in terms of statistics of expert opinions. This approach is closely
related to Kyburg’s [802] explanation for belief theory within a lower probability
framework, in which beliefs are viewed as extrema of opinions of experts.
In this interpretation, evidence combination relates to statistics of experts in a
space of ‘expert opinions’ who combine information in a Bayesian fashion. As a
result, Dempster’s rule of combination, rather than an extension of Bayes’ rule for
combining probabilities, reduces to the Bayesian updating of Boolean assertions,
while tracking multiple opinions. A more general formulation is suggested in which
opinions are allowed to be probabilistic, as opposed to the Boolean opinions that are
implicit in the Dempster formula.

Relation with Shafer’s ‘coded messages’ Hummel and Landy’s space of Boolean
opinions of experts (Section 3.3.6) is equivalent to Shafer’s coded-message formula-
tion [217]. Moreover, the combination of coded messages, in which a pair of codes
ci , cj is chosen independently with probability pi pj , and the combination of ele-
ments in the space of Boolean opinions coincide.
The authors’ point, in introducing the spaces of experts, is that the requisite of (con-
ditional) independence includes not only the choice of messages, but also an as-
sumption that the message is formed by the intersection of the subsets designated
by the constituent messages.

Probabilistic opinions of experts Formally, consider a set of experts E, where each


expert ω ∈ E is attributed a certain weight µ(ω), and maintains a set of possible out-
comes (called ‘labels’ in [217]) Γ (ω) for a question with universe (sets of possible
answers) Λ.
80 3 Understanding belief functions

We assume, additionally, that each expert has a probabilistic opinion pω on Λ, rep-


resenting expert ω’s assessment of the probability of occurrence of the labels, which
satisfies for all ω:

pω (λ) ≥ 0 ∀λ ∈ Λ, pω (λ) > 0 iff λ ∈ Λ,

and X
either pω (λ) = 1 or pω (λ) = 0 ∀λ, (3.17)
λ∈Λ

with the last constraint describing the case in which expert ω has no opinion on the
matter.
This setting generalises Dempster-Shafer theory, in which probabilistic opinions
are used only in terms of test for zero. The indicator functions

1 pω (λ) > 0
xω (λ) = (3.18)
0 pω (λ) = 0

are called the Boolean opinions of the experts.


If we regard the space of experts E as a sample space, then each xω (λ) can be
regarded as a sample of a random (Boolean) variable x(λ). In a similar way, the
pω (λ)s can be seen as samples of a random variable p(λ). The state of the system
will then be defined by statistics on the set of random variables {x(λ)}λ∈Λ .
Statistics are computed Pusing the weights µ(ω) of the individual experts, via the
counting measure µ(F ) = ω∈F µ(ω).

Space of probabilistic opinions of experts


Definition 29. Let K = {Kλ } be a set of positive constants indexed over the label
set Λ. The space of probabilistic opinions of experts (N , K, ⊗) is defined by:
 

N = (E, µ, P ) µ measure on E, P = {pω }ω∈E

where the probabilistic opinions pω meet the constraint (3.17), and the following
binary operation is defined:

(E, µ, P ) = (E1 , µ1 , P1 ) ⊗ (E2 , µ2 , P2 ) (3.19)

such that
E = E1 × E 2 , µ({(ω1 , ω2 )}) = µ1 ({ω1 }) · µ2 ({ω2 })
and
pω1 (λ)pω2 (λ)kλ−1
p(ω1 ,ω2 ) (λ) = P 0 0 −1
.
λ0 pω1 (λ )pω2 (λ )kλ0

whenever the denominator is nonzero, and p(ω1 ,ω2 ) (λ) = 0 otherwise.


3.3 Frameworks 81

The combination operation (3.19) can be interpreted as a Bayesian combination,


where kλ is the prior probability of λ, and expresses the generation of a consensus
between the two sets of experts E1 and E2 , obtained by pairing one expert from E1
with one expert from E2 .
The authors [217] proved that this space maps homomorphically onto a belief
space. Similarly, a space of Boolean opinions of experts can be formed by replacing
the probabilistic opinions pω with the indicator functions (3.18) in Definition 29.

Belief measures from statistics of experts Let then X be the set of Boolean opin-
ions of experts. We can define:

µ(ω ∈ E|xω = xA ) µ(ω ∈ E|Γ ({ω}) = A)


m̃(A) = =
µ(E) µ(E)

where xA is the indicator function of the event A ⊂ Λ. If we view the experts as


endowed with the prior probabilities µ(ω)/µ(E), and say that

µ({ω ∈ E|event is true for ω})


P robE (event) = ,
µ(E)

we get that m̃(A) = P robE (xω = xA ).


Under this interpretation, the belief on a set A is the joint probability:
X
Bel(A) = m(B) = P robE 0 (x(λ) = 0 for λ 6∈ A),
B⊆A

where E 0 is the set of experts expressing an opinion.


For further details on this framework, please consult [217].

3.3.7 Intervals and sets of belief measures

Denoeux’s imprecise belief structures Imprecise belief structures (IBSs) [376,


369] are sets of belief structures whose masses on focal elements Fi verify interval-
valued constraints: m = {m : ai ≤ m(Fi ) ≤ bi }, and express imprecision in the
belief of a rational agent within the Transferable Belief Model.
Note that, however, since
 X 
m(Fi ) ≤ min bi , 1 − aj ,
j6=i

the intervals [ai , bi ] specifying an IBS are not unique. Upper and lower bounds to m
determine interval ranges for belief and plausibility functions, and also for pignistic
probabilities.
Combination of IBSs can be defined either as:
n o
m0 = m = m1 ~ m2 |m1 ∈ m1 , m2 ∈ m2 , (3.20)
82 3 Understanding belief functions

or as the IBS m = m1 ~ m2 with bounds:

m− (A) = min m1 ~m2 (A), m+ (A) = max m1 ~m2 (A),


(m1 ,m2 )∈m1 ×m2 (m1 ,m2 )∈m1 ×m2
(3.21)
where ~ denotes any combination operator for individual mass functions. Clearly:
m0 ⊂ m.
Definition (3.21) results in a quadratic programming problem - an iterative al-
gorithm is proposed in [376], Section 4.1.2.

Yager’s interval valued focal weights Yager [1491] also considers a similar situa-
tion in which the masses of the focal elements lie in some known interval, allowing
us to model more realistically situations in which the basic probability assignments
cannot be precisely identified.
As he points out, this amounts to uncertainty on the actual belief structure of the pos-
sibilistic type. Measures of plausibility and belief and possible rules of combination
for interval valued belief structures are introduced.

Interval Dempster-Shafer approaches A slightly different formal setting based


on Yager’s fuzzy connectivity operators is proposed in [815].
An (interval) basic probability assignment is defined as a function M : 2Θ →
[0, 1] × [0, 1], mapping each subset to a pair of upper and lower mass values, so that
a belief measure can be defined as:
X
Bel(A) = M (B)
B⊆A
P
where, however, denotes the interval summation:
.
[a, b] + [c, d] = [u(a, c), u(c, d)], u(a, b) = min[1, k[a, b]0 kLp ]

where p ∈ [0, ∞] and kkp denotes the classical Lp norm. Two interval-valued b.p.a.s
can be combined via (cfr. conjunctive combination):
X
M (C) = M (A) ? M (B),
A∩B=C

where ? represents an ‘interval multiplication’ operator. defined as:


.
[a, b] ? [c, d] = [i(a, c), i(c, d)], i(a, b) = 1 − min[1, k1 − [a, b]0 kLp ]

Interval summation and multiplication follow Yager’s popular proposals on fuzzy


connectivity operators [1475]. Normalisation in this setting is not necessary and
can be ignored, while this interval approach reduces to classical belief theory when
p = 1 and point intervals are considered.
3.3 Frameworks 83

Generalized basic probability assignments Augustin [43] further extended the


idea of imprecise belief structures by considering sets of basic probability assign-
ments.
Definition 30. [43] Let (Ω, P(Ω)) be a finite measurable space, and denote by
Q(Ω, P(Ω)) the set of all basic probability assignments on (Ω, P(Ω)). Every
nonempty, closed subset S ⊆ Q(Ω, P(Ω)) is called a generalized basic probability
assignment on (Ω, P(Ω)).
He proves that:2
Proposition 22. ([43], Generalised belief accumulation) For every generalized ba-
sic probability assignment S the set-function L : P(Ω) → [0, 1] with:
. X
L(A) = min m(B), ∀A ∈ P(Ω)
m∈S
∅6=B⊆A

is well-defined, and a lower probability.


Generalised belief accumulation offers an alternative to the combination of individ-
ual belief functions: the more the assignments differ from each other, the wider the
intervals of the resulting lower probability (called by some authors such as Weich-
selberger ‘F-probability’, [1400, 1401]).
The opposite can be proven as well: every convex set of probability measures can
be obtained by generalised belief accumulation [43] (although the correspondence is
not one-to-one). Indeed, given a credal set, one just needs to take the convex closure
of the Bayesian belief functions associated with the vertex probability distributions.
Sets of basic probability assignments constitute an appealing constructive ap-
proach to imprecise probability, which allows for a very flexible modelling of un-
certain knowledge.

3.3.8 Other frameworks

Many other frameworks have been proposed - here we list them, roughly in the
order of their impact in terms of citations (as of June 2016) [882, 1066, 1444]. For
some of them not many details are available, including Zarley’s evidential reasoning
system [1543], Fister and Mitchell’s ‘entropy based belief body compression’ [481],
Peterson’s ‘Local Dempster Shafer Theory’ [1036], Mahler’s customisation of belief
theory via a priori evidence [907].
For all the others, we provide in the following a brief description.

Lowrance and Strat’s framework Lowrance and Garvey’s early evidential rea-
soning framework [503, 896] uses Shafer’s theory belief functions in its original
form for encoding evidence.
Their original contribution in [503] was a set of inference rules for computing
belief/plausibility intervals of dependent propositions, from the mass assigned to the
2
Terminology has been changed to that of this book.
84 3 Understanding belief functions

focal elements. Their framework for evidential reasoning systems [896], instead,
focusses more on the issue of specifying a set of distinct frames of discernment,
each of which defines a set of possible world situations, and their interrelationships,
and establishing paths for the bodies of evidence to move through distinct frames
by means of evidential operations, eventually converging on spaces where the target
questions can be answered.
As such, their work is quite related to the author of this Book’s algebraic analysis
of families of compatible frames [222, 227, ?], and his belief modeling regression
approach to pose estimation in computer vision [288, 280, ?, ?].
This is done through a compatibility relation, a subset ΘA,B ⊂ ΘA × ΘB of the
Cartesian product of the two related frames (their common refinement in Shafer’s
terminology). A compatibility mapping taking statements Ak in ΘA to obtain state-
ments of ΘB can then be defined as:
n o
CA7→B (Ak ) = bj ∈ ΘB (ai , bj ) ∈ ΘA,B , ai ∈ Ak .

Interestingly, in dynamic environments compatibility relations can be used to reason


over time, in which case a compatibility relation represents a set of possible state
transitions (cfr. [?], Chapter 7).
Given evidence encoded as a belief function on ΘA , we can obtain a projected belief
function on ΘB via:
X
mB (Bj ) = mA (Ai ).
CA7→B (Ai )=Bj

The framework was implemented using Grasper II3 , a programming language


extension of LISP which uses graphs, encoding families (or, in the authors’ termi-
nology, ‘galleries’) of frames, as primitive data types.

Grabisch’s belief functions on lattices This approach, proposed by Michel Gra-


bisch [540], extends belief functions from the Boolean algebra (a lattice) of subsets
to any lattice. This can be useful, for instance, in cases in which some events are
not meaningful, in non-classical logic, or when coalitions in multi-agent games are
considered.
A lattice is a partially ordered set for which inf (greatest lower bound) and sup
(least upper bound) exist for each pair of elements. A Boolean algebra is a special
case of lattice. A lattice is ‘De Morgan’
V when admits
W negation. A capacity is a
function v on L such that: (1) v( ) = 0 (2) v( ) = 1 and (3) x ≤ y implies
v(x) ≤ v(y) (see Chapter 5, Definition 61). The Moebius transform m : f (x) =
P
y≤x m(y) of any function f on a lattice (L, ≤) can be defined.
A belief function can then be defined as a function on a lattice L s.t. Bel(∧) = 0,
Bel(∨) = 1 and its Moebius transform is non-negative. Dempster’s combination
simply becomes:
3
http://www.softwarepreservation.org/projects/LISP/massachusetts/Lowrance-
Grasper1 .0.pdf
3.3 Frameworks 85
X
m1 ⊕ m2 (x) = m1 (y1 )m2 (y2 ).
y1 ∧y2 =x

Commonality, possibility and necessity measures with the usual properties can also
be defined.
Interestingly, any capacity is a belief function iff L is linear (in other words, a
total order) [65]. The approach has been very recently further extended by C. Zhou
[1553, 1554].

Kohlas’ uncertain information in graded semilattices [947, 750]


In the previous chapter Information Algebra, an algebraic structure capturing
the idea that pieces of information refer to precise questions and that they can be
combined and focussed on other questions is presented and discussed. A prototype
of such information algebras is relational algebra. But also various kind of logic
systems induce information algebras. In this chapter, this framework will be used
to study uncertain information. It is often the case that a piece of information is
known to be valid under certain assumptions, but it is not altogether sure that these
assumptions really hold. Varying the assumptions leads to different information.
Given such an uncertain body of information, assumption-based reasoning permits
to deduce certain conclusions or to prove certain hypotheses under some assump-
tions. This kind of assumption-based inference can be carried further if the varying
likelihood of different assumptions is described by a probability measure on the
assumptions. Then, it is possible to measure the degree of support of a hypothesis
by the probability that the assumptions supporting the hypothesis hold. A prototype
system of such a probabilistic argumentation system based on propositional logic
is described in [3]. Another example will be described in Section 2 of this chapter.
It is shown that this way to model uncertain information leads to a theory which
generalizes the well-known Dempster-Shafer theory [6, 19].

Context model In [509] a context model of vagueness and uncertainty is developed


by Gebhardt and Kruse to provide a formal environment for the comparison and se-
mantic foundation of different uncertainty theories, focussing in particular on Bayes
and belief theory.

‘Improved’ evidence theory In [468] Fan and Zuo (2006) ‘improve’ standard ev-
idence theory by introducing a fuzzy membership function, an importance index,
and a conflict factor to address the issues of scarce and conflicting evidence, and
propose new decision rules.

Josang’s subjective evidential reasoning Josang [690] describes a framework


based on belief theory for combining and assessing subjective evidence from dif-
ferent sources. The author introduces a new rule called the ‘consensus operator’,
based on statistical inference, originally developed in [674].
Josang’s framework makes use of an alternative representation of uncertain
probabilities by probability density functions (ppdf) over a (probability) variable
86 3 Understanding belief functions

of interest (a second order probability distribution, Section 5.4), obtained by gener-


alising the beta family:

Γ (α + β) α−1
f (α, β) = p (1 − p)β−1 ,
Γ (α)Γ (β)

where α, β are the parameters specifying the density function, and Γ denotes the
gamma distribution, for frames of discernments of arbitrary atomicity in a three-
dimensional representation with parameters r, s and a.
A mapping between this three-dimensional representation and belief functions is
then applied as follows:
r s
Bel(A) = , Dis(A) = Bel(Ac ) = .
r+s+2 r+s+2
After this mapping, Dempster and consensual combination can be compared [674].

Qualitative Dempster-Shafer theory Parsons [1014] introduces the idea of using


the theory of evidence with qualitative values, when numerical values are not avail-
able. To cope with this lack of numbers, the author uses qualitative, semiqualitative,
and linguistic values, and applies a form of order of magnitude reasoning.

Conditional and evidential multi-valued mappings In [1518], Yen extends the


original multivalued mapping of Dempster’s formulation (Section 3.1.1) to a prob-
abilistic setting which uses conditional probabilities to express uncertainty in the
mapping itself between the set Ω where the probabilistic evidence lives and the
frame of discernment Θ where focal elements reside.
Going back to Figure ??, if the mapping Γ is known with certainty to associate
ω ∈ Ω with A ⊂ Θ, then P (A|ω) = 1, whereas P (Ac |ω) = 0. We can say that the
deterministic multi-valued setting by Dempster is associated with binary conditional
probabilities on the mapping. We can then define:
Definition 31. A probabilistic multi-valued mapping from a space Ω to a space Θ
as a function:
Θ
Γ ∗ : Ω → 22 ×[0,1]
in which the image of an element ω of Ω is a collection of subset-probability pairs
of the form:
n o
Γ ∗ (ω) = (Aω1 , P (Aω1 |ω)), · · · , (Ai1 , P (Ai1 |ω))

subject to the following conditions:


1. Aωj 6= ∅ j = 1, ..., m;
2. Aωi ∩ Aωj = ∅ whenever i 6= j;
3. P (Aωj |ω) > 0 j = 1, ..., m;
P
4. j P (Aωj |ω) = 1.
3.3 Frameworks 87

Each Aωj is called a granule, and the collection {Aωj , j} is the granule set associ-
ated with ω. In rough words, each focal element of standard belief theory is broken
down into a union of disjoint granules, each with an attached (conditional) probabil-
ity. The mass of each focal element A ⊂ Θ in this extended multi-valued framework
can then be expressed as:
X
m(A) = P (A|ω)P (ω),
ω∈Ω

i.e. by multiplying the conditional probability expressing the mapping by the prior
probability of the elements of Ω.
In this context, Dempster’s rule is used to combine belief update rather than ab-
solute belief, obtaining results consistent with Bayes’ theorem. The combined belief
intervals form probability bounds under two conditional independence assumptions
that are weaker than those of PROSPECTOR and MYCIN.
A further generalisation of Yen’s probabilistic multi-valued mapping is pre-
sented in [880] in which uncertain relations ω → A between elements ω ∈ Ω
and subsets A of Θ replace the disjoint partitions {Aωi , i} of Θ (see Definition 31,
item 2.) considered by Yen, and mass functions on these uncertain relations gener-
alise conditional probabilities.
Interestingly, evidential mappings that uses mass functions to express the uncertain
relationships have been separately introduced by Guan and Bell [556].

Connectionist evidential reasoning In connectionist evidential reasoning [66] a


multilayer perceptron neural network is implemented to calculate, for each source
of information, posteriori probabilities for all classes. Then, a scheme is developed
for transferring the estimated posteriori probabilities to a set of masses along with
the corresponding focal elements. Finally, a network realization of Dempster-Shafer
evidential reasoning (cfr. Section 4.4.4 in Chapter 4) is designed and extended to a
DSET-based neural network to manipulate the evidence structures.

Hybric approach based on assumptions In [301] a hybrid reasoning scheme


that combines symbolic and numerical methods for uncertainty management is pre-
sented. The hybrid is based on symbolic techniques adapted from assumption-based
truth maintenance systems (ATMS), combined with Dempster-Shafer theory, as ex-
tended in Baldwin’s Support Logic Programming system [47].
The hybridization is achieved by viewing an ATMS as a symbolic algebra system for
uncertainty calculations. This technique has several major advantages over conven-
tional methods for performing inference with numerical certainty estimates in addi-
tion to the ability to dynamically determine hypothesis spaces, including improved
management of dependent and partially independent evidence, faster run-time eval-
uation of propositional certainties, and the ability to query the certainty value of a
proposition from multiple perspectives.

Belief with Minimum Commitment In [631] a new approach for reasoning with
belief functions, fundamentally unrelated to probabilities and consistent with Shafer
88 3 Understanding belief functions

and Tversky’s canonical examples, is proposed. Basically the idea is to treat all
available partial information, in the form of marginal or conditional beliefs, as con-
straints the overall belief function needs to satisfy. The principle of minimal com-
mitment then prescribed the least committed such belief function (in the usual weak
inclusion order (3.9)).

A theory of mass assignments Baldwin [51] [55] proposed a theory of mass as-
signments for evidential reasoning, which mathematically correspond to Shafer’s
basic probability assignments, but are treated using a different rule of combination
which uses an assignment algorithm subject to constrains derived from operations
research. An algebra for mass assignments is given, and a conditioning process is
proposed which generalizes Bayesian updating to the case of updating prior mass
assignments with uncertain evidence expressed as another mass assignment.

Evidence theory of exponential possibility distributions Tanaka (1993) [1325]


studied a form of evidence theory which uses exponential possibility distributions.
A rule of combination is given similar to Dempster’s rule. Ignorance and fuzziness
of evidence are measured by a normality factor and the area of a possibility dis-
tribution, respectively. Marginal and conditional possibilities are discussed, and the
posterior possibility is derived from the prior possibility in the same form as Bayes’
formula.

A Set-Theoretic Framework Lu and Stephanou (1984) [899] proposed a set-


theoretic framework based on belief theory for uncertain knowledge processing in
which: (i) first the user enters input observations with an attached degree of cer-
tainty; (2) each piece of evidence receiving non-zero certainty activates a mapping
to an output space (a multivalued mapping) in which its certainty is multiplied by
that of the mapping, and is thus propagated to a proposition in the output space; (3)
the consensus among all the propositions that have non-zero certainties is computed
by Dempster’s rule, and a degree of support is associated with each conclusion.
Interestingly, the inverse of the rule of combination, which the authors call ‘rule
of decomposition’, is derived for separable support belief functions (Section 2.3).

Generalisation to arbitrary Boolean algebras A few authors [550, 562] have


explored extensions of belief theory to general Boolean algebras, rather than power
sets, including spaces of propositions and infinite frames.
Indeed such generalisation is trivially achieved by replacing set-theoretical in-
tersection ∩ and ∪ with the meet and join operators of an arbitrary Boolean algebra
hX , ∨, ∧, 1, 0i [550]. Guan and Bell [550], in particular, produced generalisatiosn
of a number of main results from [1149] in this context. In [562], Guth consid-
ers who mass assignments on Boolean algebras can propagate through a system of
Boolean equations, as a basis for rule-based expert systems and fault trees. The au-
thor also examines rules in the context of a probabilistic logic, where a given rule
itself may be true with some probability in the interval [0,1]. The Dempster-Shafer
mass assignment formalism is shown to be a suitable methodology for calculating
probability assignments throughout the system.
3.3 Frameworks 89

Qualitative Dempster-Shafer theory Parson and Mamdani [1013] have intro-


duced the idea of using belief theory with qualitative, linguistic and relative val-
ues. Practically, all basic probability values are assumed to be either 0 or +, where
the latter denotes any unknown value in (0, 1]. The method is further extend to use
linguistic quanitifiers such as ‘Little’ or ‘Much’ instead of numerical values.

A theory of confidence structures Recently, Balch (2012) [1035] has introduced


a theory of new mathematical objects called confidence structures. A confidence
structure represents inferential uncertainty in an unknown parameter by defining
a belief function whose output is commensurate with NeymanPearson confidence.
Confidence structures on a group of input variables can be propagated through a
function to obtain a valid confidence structure on the output of that function. The
theory of confidence structures is created by enhancing the extant theory of confi-
dence distributions with the mathematical generality of DempsterShafer evidence
theory. Mathematical proofs grounded in random set theory demonstrate the opera-
tive properties of confidence structures.

Non-monotonic compatibility relations Compatibility relations (Section 3.1.1)


play a central role in the theory of evidence, as they provide knowledge about values
of variable given information about a second variable. A compatibility relation is
called monotonic if an increase in information about the primary variable cannot
result in a loss of information about the secondary variable.
Yager [1471] has investigated an extension of the notion of compatibility rela-
tion which allows for non-monotonic relations. A belief function Bel1 is said to be
more spacific than another one Bel2 whenever

[Bel1 (A), P l1 (A)] ⊆ [Bel2 (A), P l2 (A)]

for all events A ⊆ Θ.


Definition 32. A type I compatibility relation C on Ω × Θ is such that:
(i) for each ω ∈ Ω there exists at least one θ ∈ Θ such that C(θ, ω) = 1;
(ii) for each θ ∈ Θ there exists at least one ω ∈ Ω such that C(θ, ω) = 1.
Yager showed that the usual (type I) compatibility relations are always monotonic,
i.e., if C is a type I compatibility relation between Ω and Θ and Bel1 ⊂ Bel2 are
two belief functions on Ω, then Bel1∗ ⊂ Bel2∗ , where Beli∗ is the belief function
induced on Θ by the compatibility relation C.
Type II compatibility relations are then introduced as follows. Let X = 2Ω \{∅}.
Definition 33. A type II compatibility relation C on X × Θ is such that:
(i) for each X ∈ X there exists at least one θ ∈ Θ such that C(X, θ) = 1.
It is show [1471] that a special class of type II relations, called ‘irregular’, are needed
to represent non-monotonic relations between variables.
90 3 Understanding belief functions

Relation-based evidential reasoning In [30] the authors argue that the difficult and
ill-understood task of estimating numerical degrees of belief for the propositions to
be used in evidential reasoning (an issue referred to as ‘inference’, see Chapter 4,
Section 4.1) can be avoided by replacing estimations of absolute values with more
defensible assignments of relations. The claim that it is difficult to justify decisions
based on numerical degrees of belief, leading to a framework based on representing
arguments such as ‘evidence e supports alternative set A’ and their relative strengths,
as in ‘e1 supports A1 better than e2 supports A2 ’.
The authors prove that belief functions (in a precise sense) are equivalent to a special
case of the proposed method, in which all arguments are based on only one piece of
evidence.

Plausible reasoning [556] We describe the mathematical foundations of a knowl-


edge representation and evidence combination framework and relate it to the theory
of evidential reasoning as developed by Dempster and Shafer. Our representation,
called pl-functions, and a simple multiplicative combination rule is shown to be
equivalent to a sub-class of the family of mass-functions as described by Shafer
with Dempster’s rule as the combination function. However, the simpler combina-
tion rule has a complexity which is linear with respect to the number of elements in
the frame of discernment. We also discuss a method which allows our representation
to be automatically generated from statistical data.

Belief functions based on probabilistic multivalued random variables A proba-


bilistic multivalued random variable (PMRV) [854] generalised the concept of ran-
dom variable, as a mapping from a sample space (endowed with a probability mea-
sure) to a target one.
Definition 34. A probabilistic multivalued random variable from Ω to Θ is a func-
tion µ : Ω × Θ → [0, 1] such that, for all ω ∈ Ω:
X
µ(ω, θ) = 1.
θ∈Θ

If µ is a PMRV we can define the inverse mapping


n o
µ−1 (θ) = ω ∈ Ω µ(ω, θ) 6= 0 . (3.22)

If (Ω, p) is a probability space, a PMRV µ induces a probability pΘ on Θ as follows:


X
pΘ (θ) = p(ω)µ(ω, θ). (3.23)
ω∈µ−1 (θ)

By such interpretation, belief and plausibility measures are respectively the


lower and upper estimations of the probability on the sample space.
Suppose a PMRV exists from Ω to Θ. If only the induced probability measure (3.23)
and the inverse mapping (3.22) are know, the PMRV induces a basic probability as-
signment on Ω as follows:
3.3 Frameworks 91
X X X
m(A) = pΘ (θ) = p(ω)µ(ω, θ).
µ−1 (θ)=A µ−1 (θ)=A ω∈A

As the authors note, in Shafer’s definition the probability distribution on Ω is ar-


bitrary and unrelated to the compatibility relation, whereas in theirs the probability
distribution on Θ is induced by the PMRV, which also defines the multivalued map-
ping µ−1 .
A significant difference emerges when deriving Dempster’s combination rule.
Consider two PMRVs µ1 , µ2 from Ω to Θ1 , Θ2 , respectively. Since µ−1 1 (θ1 ) ∩
µ−1
2 (θ 2 ) = ∅ if and only if µ−1
1 (θ 1 ) = ∅ and µ −1
2 (θ 2 ) = ∅, it follows that the
independence assumption behind Dempster’s rule implies:
X
m1 (A)m2 (B) = 0,
A∩B=∅

and no normalization is necessary, because conflict does not arise (under the as-
sumption of independence).

Nonnumeric belief structures An interesting, albeit dated work by Wong, Wang


and Yao [1430] defines nonnumeric belief as the lower envelope of a family of inci-
dence mappings, which can be thought of as nonnumeric counterparts of probability
functions. Likewise, nonnumeric conditional belief is defined as the lower envelope
of a family of conditional incidence mappings. Such definitions are consistent with
the corresponding definitions for belief functions and come in closed-form expres-
sions.
Consider a situation in which the set of possible worlds is described by W , and
an incidence mapping exists i : 2Θ → 2W such that if w ∈ i(A) then A is true,
while it is false otherwise. The mapping associates each proposition A with the set
of worlds (interpretations) in which it is true.
However, when the evidence is not sufficient to specify i completely, it may be
possible to specify lower and upper bounds

F (A) ⊆ i(A) ⊆ F (A)

to the true incidence sets. A set of lower bounds is called an interval structure [1430]
if it meets the following axioms:
1. F (∅) = ∅;
2. F (Θ) = W ;
3. F (A ∩ B) = F (A) ∩ F (B);
4. F (A ∪ B) ⊇ F (A) ∪ F (B).
By observing the close relationships between the above qualitative axioms and the
quantitative axioms of belief functions, we may regard the lower bound of an in-
terval structure as non-numeric belief and the upper bound as the corresponding
nonnumeric plausibility.
92 3 Understanding belief functions

Self conditional probabilities and probabilistic interpretations of belief func-


tions In [203] Cooke presents an interpretation of belief functions within a pure
probabilistic framework, namely as normalised self-conditional expected probabil-
ities, and study their mathematical properties. The self-conditional interpretation
considers surplus belief in an event emerging from a future observation, conditional
on the event occurring. Dempster’s original interpretation, in contrast, involves par-
tial knowledge of a belief state.
Reasoning with belief functions
4
Several generalizations to continuous frames of discernment have been attempted,
even if none of them is still recognized as the definitive answer to the limitations of
Shafer’s original formulation.

In this Chapter, we desire to give a flavor of the current state of development


of the theory of evidence, the theoretical advances achieved together with the algo-
rithmic scheme (based mainly on propagation networks) proposed to cope with the
computational complexity of the rule of combination. The most popular evidential
approaches to decision and inference are also reviewed, and a brief hint of the at-
tempts of formulating a generalized theory valid for continuous sets of possibilities
is given.

Chapter outline

Each section of this chapter deals with one of the fundamental elements of reasoning
with belief function: the inference problem (Section 4.1); the mathematics of evi-
dence combination (Section 4.2); the notion of conditional belief function (Section
4.3); the approaches proposed to limit the computational complexity of an approach
based on power sets (Section 4.4), in particular those based on local propagation
on graphical models (Section 4.4.4); making decision under uncertainty with belief
functions (Section 4.5).
Section 4.7 illustrates the set of tools currently available, based on belief theory,
which allow working scientists to address classification, estimation and regression
problems, often in connection with machine learning.

93
94 4 Reasoning with belief functions

The last part of the Chapter is devoted to more advances topics, such as the for-
mulation of belief functions on arbitrary domains (including the real line, Section
4.6), and the various mathematical facets of belief functions as complex mathemat-
ical objects (Section 4.8).

4.1 Inference
Inference is the first step in any estimation/decision problem. In this context, by
inference we mean constructing a belief function from the available evidence. Now,
belief functions can be constructed from both statistical data (quantitative inference)
and experts’ preferences (qualitative inference).
The question of how to transform a set of available data (typically in the form
of a series of trials) into a belief function (or inference problem) is crucial to al-
low practical statistical inference with belief functions. The data can be of different
nature: statistical [Seidenfeld78], logical, expressed in terms of mere preferences,
subjective. The problem has been studied by scholars of the caliber of Shafer, Sei-
denfeld, Walley, and others, who delivered an array of approaches to the problem.
Unfortunately, different approaches to the inference problem produce different be-
lief functions from the same statistical data.
A very general exposition by Chateauneuf and Vergnaud providing some founda-
tion for a belief revision process, in which both the initial knowledge and the new
evidence is a belief function can be found in [170]. We give here a brief survey of
the main proposals on this topic.

4.1.1 From statistical data

Concerning inference from statistical data, the two dominant approaches are Demp-
ster’s method based on an auxiliary variable, and Wasserman and Shafer’s one based
on the likelihood function. The problem can be posed as follows.
Consider a statistical model
n o
f (x; θ), x ∈ X, θ ∈ Θ ,

where X is the sample space and Θ is a parameter space. Having observed x, how
do we quantify the uncertainty about the parameter θ, without specifying a prior
probability distribution?

Likelihood-based approach Given a parametric model of the data as a function of


a number of parameters, we want to identify (or compute the support for) the pa-
rameter values which better describe the available data. Shafer’s initial proposal for
a likelihood-based support function [1149] was supported by Seidenfeld [Seiden-
feld78], but led him to criticise Dempster’s rule as an appropriate way of combining
different pieces of statistical evidence.
Consider the following requirements:
4.1 Inference 95

– Likelihood principle: the desired belief function BelΘ (·; x) should be based
only on the likelihood function L(θ; x) = f (x; θ);
– Compatibility with Bayesian inference: when a Bayesian prior P0 is available,
combining it with BelΘ (·, x) using Dempster’s rule should yield the Bayesian
posterior:
BelΘ (·; x) ⊕ P0 = P (·|x);
– Principle of minimal commitment: among all the belief functions satisfying
the previous two requirements, BelΘ (·; x) should be the least committed (see
Sectionsoa:least-commitment-principle).
These constraints lead to uniquely identify BelΘ (·; x) as the consonant belief func-
tion with contour function (plausibility of singletons) equal to the normalized like-
lihood:
L(θ; x)
pl(θ; x) = .
supθ0 ∈Θ L(θ0 ; x)
Its plausibility function is:
supθ∈A L(θ; x)
P lΘ (A; x) = sup pl(θ; x) = , ∀A ⊆ Θ,
θ∈A supθ∈Θ L(θ; x)

while the corresponding random set is: (Ω, B(Ω), µ, Γx ) with Ω = [0, 1], µ =
U([0, 1]) and n o
Γx (ω) = θ ∈ Θ pl(θ; x) ≥ ω .

Example: Bernoulli sample Let X = (X1 , . . . , Xn ) consist of independent Bernoulli


observations and θ ∈ Θ = [0, 1] is the probability of success. We get:

θy (1 − θ)n−y
pl(θ; x) = ,
θ̂y (1 − θ̂)n−y
Pn
where y = i=1 xi and θ̂ is the maximum likelihood (MLE) estimate. As an ex-
ample, for n = 20 and y = 10 we get the consonant belief function of Figure ??.
Wasserman [] showed that the likelihood-based belief function can indeed be
used to handle partial prior information, and related it to robust Bayesian inference.
MORE FROM WASSERMAN

Dempster’s auxiliary variable Suppose that the sampling model X ∼ f (x; θ) can
be represented by an “a-equation” of the form

X = a(θ, U ),

where U ∈ U is an (unobserved) auxiliary variable with known probability distri-


bution µ independent of θ. This representation is quite common in the context of
sampling and data generation. For instance, in order to generate a continuous ran-
dom variable X with cumulative distribution function (CDF) Fθ , one might draw U
from U([0, 1]) and set
96 4 Reasoning with belief functions
1
1
linear
0.9 convex
0.9
concave
0.8 constant
0.8

0.7
0.7

BelΘ([0,θ];x),PlΘ([0,θ];x)
0.6
0.6
pl(z’T)

0.5
0.5

0.4
0.4

0.3
0.3

0.2 BelΘ([0,θ];x)
0.2
Pl ([0,θ];x)
Θ
0.1
0.1

0
8.5 9 9.5 10 10.5 11 11.5 12 0
z’T 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
θ

Fig. 4.1. Plausibility function (left) and cumulative distribution function (right) generated in
the Bernoulli sample’s example (courtesy Thierry Denoeux).

X = Fθ−1 (U ).
The equation X = a(θ, U ) defines a multi-valued mapping (or, equivalently, a
“compatibility relation”) as follows:
n o
Γ : U → Γ (U ) = (X, θ) ∈ X × Θ X = a(θ, U ) .

Under the usual measurability conditions (see 3.1.5), the probability space (U, B(U), µ)
and the multi-valued mapping Γ induce a belief function BelΘ×X on X × Θ. Con-
ditioning (by Dempster’s rule) BelΘ×X on θ yields then the desired sampling dis-
tribution f (·; θ) on X - conditioning it on X = x gives a belief function BelΘ (·; x)
on Θ.

Example: Bernoulli sample Let X = (X1 , . . . , Xn ) consist of independent Bernoulli


observations and θ ∈ Θ = [0, 1] is the probability of success. Consider the sampling
model: (
1 if Ui ≤ θ
Xi =
0 otherwise,
n
Pnpivotal measure µ = U([0, 1] ). Having observed the
where U = (U1 , . . . , Un ) has
number of successes y = i=1 xi , the belief function BelΘ (·; x) is induced by a
random closed interval (see also Section 4.6.3)

[U(y) , U(y+1) ],

where U(i) denotes the i-th order statistics from U1 , . . . , Un . Quantities such as
BelΘ ([a, b]; x) or P lΘ ([a, b]; x) can then be readily calculated.
Dempster’s model has several nice features: it allows us to quantify the uncer-
tainty on Θ after observing the data, without having to specify a prior distribution
on Θ. In addition, whenever a Bayesian prior P0 is available, combining it with
BelΘ (·; x) using Dempster’s rule yields the Bayesian posterior: BelΘ (·; x) ⊕ P0 =
4.1 Inference 97

P (·|x). However, it often leads to cumbersome or even intractable calculations ex-


cept for very simple models, which imposes the use of Monte-Carlo simulations
(consult Section 4.4.3). More fundamentally, the analysis depends on the a-equation
X = a(θ, U ) and the auxiliary variable U , which are not observable and not
uniquely determined for a given statistical model {f (·; θ), θ ∈ Θ}.

Shafer’s later proposals Later [Shafer82] Shafer illustrated three different ways
of doing statistical inference in the belief framework, according to the nature of the
available evidence. He stressed how the strength of belief calculus is really about
allowing inference under partial knowledge or ignorance, when simple parametric
models are not available.
PROPOSALS FROM SHAFER82

Walley In the late Eighties ([1370]), Walley characterized the classes of belief and
commonality functions for which statistical independent observations can be com-
bined by Dempster’s rule, and those for which Dempster’s rule is consistent with
Bayes’ rule.

Other statistical approaches Van den Acker ([350]) designed a method to rep-
resent statistical inference as belief functions, designed for application in an audit
context.
An original paper of Hummel and Landy ([635]) has given a new interpretation
of Dempster’s rule of combination as statistics of opinions of experts, combining
information in a Bayesian fashion.
Liu et al. ([863]) described an algorithm for inducting implication networks
from empirical data samples. The validity of the the method was tested by means of
several Monte-Carlo simulations. The values in the implication networks were pre-
dicted by applying the belief updating scheme and then compared to Pearl’s stochas-
tic simulation method, showing that the evidential-based inference has a much lower
computational cost.

4.1.2 From qualitative data

A number of works have been done on inference from preferences as well. Among
them, Wong and Lingras’ perceptron idea, the so called Qualitative Discrimination
Process, and Ben Yaghlane’s constrained optimisation framework.
Wong and Lingras [1427] have proposed a method for generating BFs from a
body of qualitative preference relations between propositions. Preferences are not
needed for all pairs of propositions. Expert opinions are expressed through two bi-
nary relations: preference · > and indifference ∼. The goal is to build a belief func-
tion Bel such that A· > B iff Bel(A) > Bel(B) and A ∼ B iff Bel(A) = Bel(B).
They have proved that such as belief function exists if · > is a “weak order” and ∼
is an equivalence relation. Their algorithm can be summarised as follows:
Algorithm
98 4 Reasoning with belief functions

1. consider all propositions that appear in the preference relations as potential focal
elements;
2. elimination step: if A ∼ B for some B ⊂ A then A is not a FE;
3. a perceptron algorithm is used to generate the mass m by solving the system of
remaining equalities and disequalities
A negative of this approach is that it selects arbitrarily one solution over the many
admissible ones. Also, it does not address possible inconsistencies in the given body
of expert preferences.
To address these issues, Ben Yaghlane et al have proposed a constrained op-
timisation approach which uses preference and indifference relations as in Wong
and Lingras, obeying the same axioms, but converts them into a constrained optimi-
sation problem. The objective is to maximise the entropy/uncertainty of the BF to
generate (in order to select the least informative belief function) under constraints
derived from input preferences/indifferences, in the following way:

A· > B ↔ Bel(A) − Bel(B) ≥ , A ∼ B ↔ |Bel(A) − Bel(B)| ≤ 

Here  is a constant specified by the expert. Various uncertainty measures can be


plugged into the framework (see Section 4.8.3). Ben Yaghlane et al propose various
mono- and multi-objective optimisation problems based on this principle.
In XXX’s Qualitative Distribution Process, instead, the expert assigns proposi-
tions first into a Broad category bucket, then to a corresponding Intermediate bucket,
and finally to a corresponding Narrow category bucket. Then, such a qualitative
scoring table is used to identify and remove non-focal propositions by determin-
ing if the expert is indifferent regarding any propositions and their subsets in the
same or lower Narrow category bucket (as in Wong and Lingras’ elimination step).
Next (“imprecise pairwise comparisons”) the expert is required to provide numeric
intervals to express his beliefs on the relative truthfulness of the propositions, the
consistency of the above information is checked, and a mass interval is provided
for every focal element. Finally, the expert re-examines the results, and restarts the
process if they think this is appropriate.
Bryson et al. ([987], [134]) present an approach to the generation of quantita-
tive belief functions that include linguistic quantifiers to avoid the premature use of
numeric measures.

4.2 Combination

The question of how to update or revise the state of belief represented by a belief
function when new evidence becomes available is also crucial in the theory of evi-
dence. In Bayesian reasoning, this role is performed by Bayes’ rule. In the theory of
belief functions, after an initial proposal by Arthur Dempster, several other aggre-
gation operators have been proposed, leaving the matter still far from settled.
4.2 Combination 99

4.2.1 Dempster’s rule under fire

Dempster’s rule [522] is not really given a convincing justification in Shafer’s sem-
inal book [1149], leaving the reader wondering whether a different rule of combi-
nation could be chosen instead [1170, 1530, 464, 300, 1315, 1229]. This question
has been posed by several authors (e.g. [1359], [1536], [1159] and [1418] among
the others), most of whom tried to provide an axiomatic support to the choice of this
mechanism for combining evidence. Smets, for instance, tried [1266] to formalise
the concept of distinct evidence that is combined by Demster’s rule.
Early on, Seidenfeld [1141] objected to the rule of combination in the context
of statistical evidence, suggesting it was inferior to conditionalization.

Zadeh’s counterexample Most famously, Zadeh [1532] formulated an annoying ex-


ample for which Dempster’s rule seemed to produce counter-intuitive results. Since
then, many authors have used Zadeh’s example either to criticize Demster-Shafer
theory as a whole, or as a motivation for constructing alternative combination rules.
In the literature, Zadeh’s example appears in different but essentially equivalent ver-
sions of disagreeing experts: we report here Haenni’s version [571]. Suppose a doc-
tor uses Θ = {M, C, T } to reason about the possible condition of a patient (where
M stands for meningitis, C for concussion and T for tumor). The doctor consults
two other experts E1 and E2 who provide him with the following answers:
E1 : “I am 99% sure it’s meningitis, but there is a small chance of 1% that it is
concussion”.
E2 : “I am 99% sure it’s a tumor, but there is a small chance of 1% that it is
concussion”.
These two statements can be encoded by the following mass functions:
 
 0.99 A = {M }  0.99 A = {T }
m1 (A) = 0.01 A = {C} m2 (A) = 0.01 A = {C} (4.1)
0 otherwise 0 otherwise,
 

whose (unnormalised) Dempster’s combination is:



 0.9999 A = {∅}
m(A) = 0.0001 A = {C}
0 otherwise.

As the two masses are highly conflicting, normalisation yields the categorical be-
lief function focussed on C – a strong statement that it is definitively concussion,
although both experts had left it as only a fringe possibility.
Zadeh’s dilemma was discussed in 1983 by Yager [1478], where he suggested a
solution based on the inclusion of a ‘hedging’ element.
In [571] Haenni showed, however, that the counter-intuition in Zadeh’s example is
not a problem with Dempster’s rule, but a problem with Zadeh’s own model, which
the author claimed does not correspond to reality.
First of all, the mass functions in (4.1) are Bayesian (i.e., probability measures):
100 4 Reasoning with belief functions

thus, Bayesian reasoning leads to the very same conclusions. The example would
then lead to reject Bayes’ rule as well. Secondly, diseases are never exclusive, so
that it may be argued that Zadeh’s choice of a frame of discernment is misleading
and the root of the apparent ‘paradox’.
Finally, experts are never fully reliable. In the example, they disagree so much that
any person would conclude that one of the them is just wrong. This can be addressed
by introducing two product frames

Θ1 = {R1 , U1 } × Θ, Θ2 = {R2 , U2 } × Θ

and by discounting the reliability of each expert prior to combining their views
(i.e., by assigning a certain p(Ri )). The result of such a combination, followed by a
marginalisation on the original frame Θ, adequately follows intuition.
A number of other authors have reasoned on the discounting techniques, as we will
see in Section ??.

Dubois and Prade’s analysis In 1985, Dubois and Prade [421] had already pointed
out that, when analyzing the behavior of Dempster’s rule of combination, assessing
a zero value or a very small value may lead to very different results. Theirs was
also a criticism to the idea of having ‘certain’ evidence (‘highly improbable is not
impossible’), which led them to something similar to discounting.
In a 1986 note [422] the same authors proved the unicity of Dempster’s rule under a
certain independence assumption, while stressing the existence of alternative rules
corresponding to different assumptions or different types of combination. Eventu-
ally (1988) Dubois and Prade came to the conclusion that the justification for the
pooling of evidence by Dempster’s rule was problematic [440]. As a response, they
proposed a new combination rule based on the minimum specifficity principle (Sec-
tion 4.2.2).

Lemmer’s counterexample Lemmer’s counterexample to the use of Dempster’s rule


was proposed in [829], and can be described as follows.
Imagine balls in an urn which have a single ‘true’ label. The set Θ of these labels
(or rather, the set of. propositions expressing that a ball has a particular label from
the set of these labels) functions as the frame of discernment. Belief functions are
formed empirically on the basis of evidence acquired from observation processes
which are called ‘sensors’. These sensors attribute to the balls labels which are sub-
sets of Θ. The labelling of each sensor is assumed to be accurate in the sense that
the frame label of a particular ball is consistent with the attributed label. Each sensor
s gives rise to a bpa ms in which ms (A), A ⊂ Θ is the fraction of balls labelled
A by sensor s. Then, due to the assumed accurateness of the sensors, Bel(A) is
the minimum fraction of balls with frame label θ ∈ A, and P l(A) the maximum
fraction of balls which could have as frame label an element of A. Lemmer’s ex-
amples shows that Dempster’s combination of belief functions which derive from
accurate labelling processes does not necessarily yield a belief function which as-
signs ‘accurate’ probability ranges to each proposition. However, Voorbraak [1360]
4.2 Combination 101

argued that way Dempster-Shafer theory is a generalization of Bayesian probabil-


ity theory, whereas Lemmer’s sample space interpretation is a generalizaton of the
interpretation of classical probability theory.

Voorbraak’s reexamination In [1360], Voorbraak also analysed randomly coded


messages, Shafer’s canonical examples for Dempster-Shafer theory, in order to clar-
ify the requirements for using Dempster’s rule. His conclusions were that the range
of applicability of Dempster-Shafer theory was rather limited, and that in addition
these requirements did not guarantee the validity of the rule, calling for some ad-
ditional conditions. Nevertheless, the analysis was conducted under Shafer’s con-
structive probability.
interpretation. He provided his own ‘counterintuitive’ result on Dempster’s rule’s
behavior. Let Θ = {a, b, c} and m, m0 two mass assignments such that m({a}) =
m({b, c}) = m0 ({a, b}) = m0 ({c}) = 0.5. Therefore:
1
m ⊕ m0 ({a}) = m ⊕ m0 ({b}) = m ⊕ m0 ({c}) = ,
3
a result which is deemed counterintuitive for the evidence given to {a} or {c} is pre-
cisely assigned to it by at least one of the two belief functions, while that assigned
to {b} was never precisely given to it in the first place.
To the Author of this book, this sounds like a confused argument in favour of taking
into account the entire structure of the filter of focal elements with a given intersec-
tion, when assigning the mass of the combined belief function.

Axiomatic justifications Frank Klawonn and Erhard Schwecke (1992) [716] pre-
sented a set of axioms that uniquely determine Dempster’s rule, and which reflect
the intuitive idea of partially moveable evidence masses.
A nice paper by Nic Wilson (1993) [1421] took a similar axiomatic approach to
the combination of belief functions. The following requirements are formulated1 :
Definition 35. A combination rule π :7→ P s : Ω → [0, 1] mapping a collection
of random sets to a probability distribution on Ω is said to respect contradictions
if for any finite collectionTof combinable random sets, = {(Ωi , Pi , Γi ), i ∈ I} and
.
ω ∈ ×i∈I Ωi , if Γ (ω) = i∈I Γi (ωi ) = ∅ then π s (ω) = 0.
If I s (ω) = 0 then ω cannot be true since that would imply I s (ω) to be true, and
∅ represents the contradictory proposition. Therefore any sensible combination rule
must respect contradictions.
Definition 36. A combination rule π is said to respect zero probabilities if for any
combinable multiple source structure s and ω ∈ Ω s , if Pis (ω) = 0 for some i ∈ ψ s ,
then π s (ω) = 0.
1
Once again, the author’s original statements are translated into the more standard termi-
nology used in this Book.
102 4 Reasoning with belief functions

If Pis (ω) = 0 for some i then ω is considered impossible (since frames are finite).
Therefore, since ω is the conjunction of the propositions ωi , ω should clearly have
zero probability.
A benefit of this approach is that it makes the independence or irrelevance as-
sumptions explicit.

Absorptive behaviour The issue is still open to debate, and some interesting points
on Dempster’s rule behavious have been made.
As recently as of 2012 Dezert, Tchamova et al [393] challenged the validity of
Dempster-Shafer Theory by using an example derived by Zadeh’s classical ‘para-
dox’ to show that Dempster’s rule produces counter-intuitive results. This time, the
two doctors generate the following mass assignments over Θ = {M, C, T }:
 
a A = {M }  b1 A = {M, C}
m1 (A) = 1 − a A = {M, C} m2 (A) = b2 A=Θ (4.2)
0 otherwise 1 − b1 − b2 A = {T }.
 

Assuming equal reliability of the two doctors, Dempster’s combination yields


m1 ⊕ m2 = m1 , i.e, Doctor 2’s diagnosis is completely absorbed by that of Doctor
1 (‘does not matter at all’, in the authors’ terminology) [1329].
The interesting feature of this example is that ‘paradoxical’ behaviour is not a con-
sequence of conflict (as in other counterexamples), but of the fact that, in Demp-
ster’s combination, every source of evidence has a ‘veto’ power over the hypothe-
ses it does not believe to be possible – in other words, evidence is combined only
for ‘compatible’ events/propositions (cfr. Chapter 3, Section 3.1.1). Mathematically,
this translates into an ‘absorptive’ behaviour whose theoretical extent is to be better
analysed in the future.

Criticisms by other authors We conclude this section by briefly summarising other


contributions to the debate on Dempster’s rule.
In 2012, Josang and Pope [676] analysed Dempster’s rule from a statistical and
frequentist perspective and proved, with the help of simple examples on colored
balls, that Dempsters rule in fact represents a method for the serial combination of
stochastic constraints, rather than a method for cumulative fusion of belief functions
under the assumption that subjective beliefs are an extension of frequentist beliefs.
Pei Wang [1386] argued that considering probability functions as special cases
of belief functions, while using Dempster’s rule for combining belief functions leads
to an inconsistency. As a result he rejected some fundamental postulates of the the-
ory, and introduced a new approach for uncertainty management that shares many
intuitive ideas with D-S theory, while avoiding this problem.
Smets [1275] claimed that the reason which led some authors to reject Dempster-
Shafer theory is an inappropriate use of Dempster’s rule of combination. The paper
discusses the roots of this mismanagement, two types of defaults, and the correct
solution for both types within the transferable belief model interpretation.
Following a similar line of reasoning, Liu and Hong (2000) [878] argued that Demp-
ster’s original idea on evidence combination is, in fact, richer than what has been
4.2 Combination 103

formulated in the rule. They concluded that, by strictly following what Dempster has
suggested, there should be no counterintuitive results when combining evidence.
Bhattacharya (2000) [95] analysed the ‘non-hierarchical’ aggregation of belief func-
tions, showing that the values of certain functions defined on a family of belief struc-
tures decrease when the latter are combined by Dempster’s rule. Similar results hold
when an arbitrary belief structure is prioritised while computing the combination.
Furthermore, the length of the belief-plausibility interval decreases during a non-
hierarchical aggregation of belief structures.
A method for dispelling the ‘absurdities’ (probably paradoxes) of DempsterShafer’s
rule of combination was proposed in [847], based on making all experts make their
decision on the same focussed collection.
In [601] it was demonstrated by Hau et al that Dempster’s rule of combination is
not robust when combining highly conflicting belief functions. It was also shown
that Shafer’s (1983) discounted belief functions also suffer from this lack of robust-
ness with respect to small perturbations in the discount factor. A modified version
of Dempster’s rule was proposed to remedy this difficulty.
In [584] a concrete example of the use of the Dempster rule presented by Weichsel-
berger and Pohlmann was discussed, showing how their approach has to be modified
to yield an intuitively adequate result.
In [1520] the authors describe a model in which masses are represented as condi-
tional granular distributions. By comparing it with Zadeh’s relational model, they
show how Zadeh’s conjecture on combinability does not affect the applicability of
Dempster’s rule.

4.2.2 Alternative combination rules


Yager’s proposals In [1485], Yager (1987) highlighted some concerns with Demp-
ster’s rule of combination inherent in the normalization due to conflict, and intro-
duced both a practical recipe for using Dempster’s rule and an alternative combina-
tion rule. The latter is based on the view that conflict is generated by non-reliable
information sources. In response, the conflicting mass (m(∅)) is re-assigned to the
whole frame of descernment Θ:

m∩ (A) ∅= 6 A(Θ
mY (A) = (4.3)
m∩ (Θ) + m(∅) A = Θ.
In [1482] Yager discussed the rule of inference called ‘entailment principle’, and
extended it to situations in which the knowledge is a type of combination of pos-
sibilistic and probabilistic information which he called Dempster-Shafer granules.
He discussed the conjunction of these D-S granules and showed that Dempster’s
rule of combination is a special application of conjunction followed by a particular
implementation of the entailment principle.
Dubois and Prade’s minimum specificity rule The combination operator pro-
posed by Dubois and Prade2 [440] comes from applying the minimum specificity
2
We follow here the same notation as in [819].
104 4 Reasoning with belief functions

principle to the cases in which focal elements B, C of two input belief functions do
not intersect, which results in assigning their product mass to B ∪ C. As a result:
X
mD (A) = m∩ (A) + m1 (B)m2 (C). (4.4)
B∪C=A,B∩C=∅

Obviously the resulting belief function dominates that generated by Yager’s rule.
Smets’ combination rules in the TBM Just like Dempster does, Smets also as-
sumes that all sorces to combine are reliable – conflict is only the result of an incor-
rectly defined frame of discernment.
Rather than normalising (as in Dempster’s rule) or re-assigning the conflicting
mass m(∅) to other non-empty subsets (as in Yager’s and Dubois’ proposals), his
disjunctive rule leaves the conflicting mass with the empty set:

m∩ (A) ∅ =6 A⊆Θ
m( A) = (4.5)
m(∅) A = ∅,
and thus is applicable to unnormalised belief functions. As Lefevre et al note [819],
a similar idea is also present in [1485], in which a new hypothesis is instead intro-
duced in the existing frame.
This amounts to an open world assumption in which the current frame of discern-
ment only approximately describes the set of possible outcomes (hypotheses).
In [918] a mixed conjunctive and disjunctive rule together with a generalization
of conflict repartition rules are presented.
[719] The fundamental updating process in the transferable belief model is re-
lated to the concept of specialization and can be described by a specialization ma-
trix. The degree of belief in the truth of a proposition is a degree of justified sup-
port. The Principle of Minimal Commitment implies that one should never give
more support to the truth of a proposition than justified. We show that Dempster’s
rule of conditioning corresponds essentially to the least committed specialization,
and that Dempster’s rule of combination results essentially from commutativity re-
quirements. The concept of generalization, dual to the concept of specialization, is
described.
Denoeux’s cautious and bold rules
Cautious rule Another major alternative to Dempster’s rule is the so-called cautious
rule of combination [377, 359], based on Smets’ canonical decomposition of non-
dogmatic (i.e. such as m(Θ) 6= 0) belief functions of (generalised) simple belief
functions, namely:
w
m= ∩ A(Θ mA , (4.6)
where mw 3
A denotes the simple pseudo belief function such that:

mw
A (A) = 1 − w, mw
A (Θ) = w, mw Θ
A (B) = 0 ∀B ∈ 2 \ {A, Θ}

and the weights w(A) satisfy: w(A) ∈ [0, +∞) for all A ( Θ.
3
This is denoted by Aw(A) in the author’s original papers.
4.2 Combination 105

Definition 37. Let m1 and m2 be two non-dogmatic basic probability assignments.


Their combination using the cautious (conjunctive) rule is denoted as m1 m
∧ 2 , and
is defined as the mass assignment with the following weight function:

∧ 2 (A) = w1 (A) ∧ w2 (A),


w1 A ∈ 2Θ \ {Θ}. (4.7)
Denoeux proves [377, 359], Proposition 1, that the cautious combination of two be-
lief function is the w-least committed b.f. in the intersection Sw (m1 ) ∩ Sw (m2 ),
where Sx (m) is the set of belief function x-least committed than m (where x ∈
{pl, q, s, d, w} denotes the plausibility/belief, commonality, specialisation, Demp-
sterian and weight-based ordering, respectively).
Notably, the cautious operator is commutative, associative and idempotent – this lat-
ter property makes it suitable to combine belief functions induced by reliable, but
possibly overlapping bodies of evidence.
A cautious conjunctive rule which differs from Denoeux’s is supported by
Destercke et al [384] (2007). The authors argue that when information sources are
not independent, one must find out a cautious merging rule that adds a minimal
amount of information to the inputs, following the principle of minimal commit-
ment. The resulting cautious merging rule is based on maximising the expected
cardinality of the resulting belief function as follows:
Bold rule A dual operator, the bold disjunctive rule, is also introduced after noticing
that if m is an unnormalised b.p.a., its complement m is non-dogmatic and can thus
be (canonically) decomposed as m = ∩ A(Θ mwA . Let us introduce the notation
v(A) = w(A). We can then prove that:
Proposition 23. ([377], Proposition 10) Any unnormalised belief function can be
uniquely decomposed as the following
∪ combination:

m=
∪ A6=∅ mA,v(A) (4.8)
where mA,v(A) is the unnormalised belief function assigning mass v(A) to ∅, and
1 − v(A) to A.
Denoeux calls (4.8) the canonical disjunctive decomposition of m.
Let then Gx (m) be the set of basic probability assignments x-most committed than
m. The bold combination corresponds to the most committed element in the inter-
section set Gv (m1 ) ∩ Gv (m2 ).
Definition 38. Let m1 and m2 be two unnormalised4 basic probability assign-
ments. The v-most committed element in Gv (m1 ) ∩ Gv (m2 ) exists and is unique.
It is defined by the following disjunctive weight function:

∨ 2 (A) = v1 (A) ∧ v2 (A),


v1 A ∈ 2Θ \ {∅}.
Their bold combination is defined as:
m1 m
∨ 2 =
∪ A6=∅ mA,v1 (A)∧v2 (A) . (4.9)
4
‘Subnormal’, in Denoeux’s terminology.
106 4 Reasoning with belief functions

The fact that the bold disjunctive rule is only applicable to unnormalised belief
functions is a severe restriction, as admitted by the author.
A new combination rule called the cautious-adaptive rule is brought forward in
[470], based on generalized discounting defined for separable basic belief assign-
ments (bbas), to be applied to the source correlation derived from the cautious rule.
The cautious-adaptive rule varies between the conjunctive rule and the cautious one,
depending on the discounting level.
Pichon and Denoeux (2008) [1038] pointed out that the cautious and unnormal-
ized Dempster’s rules can be seen as the least committed members of families of
combination rules based on triangular norms and uninorms, respectively.

Consensus operator Josang (2002) [689] introduced a consensus operator, and


showed how it can be applied to dogmatic conflicting opinions, i.e., when the de-
gree of conflict is very high, overcoming the shortcomings of Dempster’s and other
existing rules.
Let the relative atomicity of A ⊂ Θ with respect to B ⊂ Θ be defined as:

|A ∩ B|
a(A/B) = ,
|B|

a measure of how much of B is overlapped by A. Josang represent an agent’s de-


grees of belief as a tuple
. . . .
 
o = b(A) = Bel(A), d(A) = 1−P l(A), u(A) = P l(A)−Bel(A), a(A) = a(A/Θ) .

Definition 39. Let

o1 = (b1 (A), d1 (A), u1 (A), a1 (A)), o2 = (b2 (A), d2 (A), u2 (A), a2 (A))

be opinions held by two agents about the same proposition/event A. The consensus
combination o1 ⊕ o2 is defined as:
 
 b1 u2 +b2 u1 , d1 u2 +d2 u1 , u1 u2 , a1 u2 +a2 u1 −(a1 +a2 )u1 u2 κ 6= 0
κ κ κ u1 +u2 −2u1 u2
o1 ⊕ o2 =  γb +b γd +d γa1 +a2
1 2

γ+1 , 1
γ+1
2
, 0, γ+1 κ = 0,
(4.10)
where κ = u1 + u2 − u1 u2 , and γ = uu12 .
The consensus operator is derived by the posterior combination of beta distributions,
and it proved to be commutative, associative, besides satisfying:

bo1 ⊕o2 + do1 ⊕o2 + uo1 ⊕o2 = 1.

Clearly, (4.10) combines lower and upper probabilities, rather than belief functions
per se.
A ‘cumulative’ rule and an ‘averaging’ rule of belief fusion are presented by
Josang et al in [691] (2010). They represent generalisations of the subjective logic
4.2 Combination 107

consensus operator for independent and dependent opinions respectively, and are
applicable to the combination of general basic probability (belief) assignments.
The authors argue that these rules can be directly derived from classical statistical
theory, and produce results in line with human intuition. In particular, the cumula-
tive rule is equivalent to a posteriori updating of Dirichlet distributions, while the
averaging rule is equivalent to averaging the evidence provided by Dirichlet distri-
butions. Both are based on a bijective mapping between Dirichlet distributions and
belief functions, described in [691].

Averaging and distance-based methods A number of proposal are based on some


way of computing the ‘mean’ of the input mass functions. An averaging method is
indeed proposed by Murphy [974], in which it is suggested that if all the pieces of
evidence are available at the same time, one can average their masses, and calculate
the combined masses by combining the average values multiple times. Yong and
co-authors (2004, 2005) [1523, 848, 352] proposed a modified average method to
combine belief functions, based on a measure of evidence distance, in which the
weight (importance) of each body of evidence is taken into account.
Namely, the degree of credibility Crd(mi ) of the i-th body of evidence is com-
puted as

Sup(mi ) . X
Crd(mi ) = P , Sup(mi ) = 1 − d(mi , mj ).
j Sup(mj ) j6=i

The latter can be used to compute a weighted average of the input masses as:
X
m̃ = Crd(mi ) · mi .
i

As in Murphy’s approach, one can then use Dempster’s rule to combine the resulting
weighted average n times, when n is the number of input masses.
Albeit rather empirical, these methods try to address the crucial issue with
Dempster’s combination (already pointed out in Section 4.2.1), namely that each
piece of evidence has ‘veto’ powers on the possible consensus outcomes. If any
of them gets it wrong, the combined belief function will never give support to the
‘correct’ hypothesis.

Other proposals A number of other proposal combination rules have been brought
forward along the years [172, 563, 1527, 76].
In [673] Josang and Daniel discussed and compared various strategies for deal-
ing with ‘dogmatic’ beliefs, including Lefevre’s weighting operator (cfr. Section
4.2.3), Josang’s own consensus operator (Section 4.2.2) and Daniel’s MinC ap-
proach.
In [1388], existing approaches to combination were reviewed and critically anal-
ysed by Wang et al (2007). In the authors’ view they either ignore the normaliza-
tion or separate it from the combination process, leading to irrational or suboptimal
interval-valued belief structures. In response, a new ‘logically correct’ approach was
108 4 Reasoning with belief functions

developed, where combination and normalization are optimised together rather than
separately.
In [1479] some alternative methods to Dempster’s rule for combining evidence
were given based on interpreting plausibility and belief as a special case of the
compatibility of a linguistically quantified statement with a data base consisting of
an expert’s fragmented opinion as to the location of a special element.
Yamada (2008) [1501] proposes a new combination model called ‘combination
by compromise’ as a consensus generator.
The focus of [1472] is to provide a procedure for aggregating ‘prioritized’ belief
structures, i.e., ... An alternative to the normalization step used in Dempster’s rule
is suggested, inspired by nonmonotonic logics. We show how this procedure allows
us to make inferences in inheritance networks where the knowledge is in the form
of a belief structure.
A generalized evidence combination formula relaxing the requirement of evi-
dence independence is presented by Wu (1996) [1442].
The combination process on non-exhaustive frames of discernment was anal-
ysed by Janez and Appriou in [666]. In previous works (in French), [665] the au-
thors had already presented methods based on a technique called ‘deconditioning’
which allows the combination of such sources. Addition methods based on the same
framework were proposed in [].
In [1507] the concept of Weighted Belief Distribution (WBD) is proposed and
extended to WBD with Reliability (WBDR) to characterise evidence in complement
of Belief Distribution (BD) introduced in DempsterShafer theory of evidence. The
implementation of the orthogonal sum operation on WBDs and WBDRs leads to the
establishment of the new ER rule. It is proven that Dempsters rule is a special case
of the ER rule when each piece of evidence is fully reliable.
Baldwin [50] described an iterative procedure which generalises Bayes’ method
of updating an a priori assignment over the power set of the frame of discernment
using uncertain evidence.
A new combination rule, named ‘absorptive’ method, was proposed in [1320]
which exploits conflict information ....
Destercke and Dubois [383, 380] note that, when dependencies between sources
are illknown, it is sensible to require idempotence from a belief function combina-
tion rule, as this property captures the possible redundancy of dependent sources. In
[380], they study the feasibility of extending the idempotent fusion rule of possibil-
ity theory (the ‘minimum’) to belief functions. However they reach the conclusion
that, unless we accept the idea that the result of the fusion process can be a family
of belief functions, such an extension is not always possible.
In [156], Campos (2003) presents an extension of belief theory that allows the
combination of highly conflicting pieces of evidences, avoiding the tendency to re-
ward low probability but common possible outcomes of otherwise disjoint hypoth-
esis.
In [463] focuses on the possible modifications of combination rules and evidence
sources in case of highly conflicting evidence. The paper proposes to extract the
4.2 Combination 109

intrinsic characteristics of the existing evidence sources by using evidence distance


theory.
Florea et al [485] presented the class of Adaptive Combination Rules (ACR) and
a new efficient Proportional Conflict Redistribution (PCR) rule. Both rules allow to
deal with highly conflicting sources The authors discuss some simulation results
obtained with both rules for Zadeh’s problem, concluding ...
Some authors propose to adjust the input b.p.a.s prior to applying Dempster’s
rule, as in the discounting idea. In [856], for instance, b.p.a.s are pre-treated using a
‘disturbance of ignorance’ technique before applying Dempsters rule.
Murphy [975] focuses on the combination of evidence over time. The author
argues that Dempster’s rule of combination is not appropriate for this domain, and
derive an alternative rule of combination which adapts the belief updating process
based on a contextual weighting parameter. The latter is a function of the expected
landmark permanence, change in sensor discriminability, expected potential for dy-
namic occlusions, and tracking error. These influences are fused using fuzzy rules.
[185] firstly calculates the local decision on the basis of the measurement, on
which the support matrix is based. Then the eigenvector is gained from the support
matrix, and it is the reliability vector of the system. The paper improves the D-
S evidential reasoning by giving a weight to evidence that equals to the evidence
reliability
Campos and de Souza [157] propose a new rule of combination of bodies of ev-
idence that embodies in the numeric results the unknown belief and conflict among
the evidence, naturally modeling the epistemic reasoning.
In [798] it is argued that the concept of specialization generalises Dempster’s
rule. It is founded on the fact that modelling uncertain phenomena always entails a
simplifying coarsening which arises from the renounciation of a description to that
depth a perfect image would require.
A modification of the Dempster-Shafer theory is employed in [1545] to formu-
late a computationally feasible approach to evidence accumulation. Properties of ev-
idence combining operators are formulated axiomatically and employed to demon-
strate the merits of the new approach. A finitary, parametric form of the evidence
accumulator is proposed which makes it possible to set the rates at which evidence
is accumulated toward certainty and contradiction states.
In [636] Hummel et al build on their previous framework on the statistics of
experts opinion (Section ??) by relaxing the assumption that the bodies of experts
to combine have independent information, and giving a model for parameterising
the degree of dependence between bodies of information.
[1528] If many pieces of evidence are to be combined, the amount of conflict
between evidences is at first evaluated by both evidence distance and conflicting
belief, and every piece of evidence is given a weight coefficient according to its
amount of conflict with the others. Two different methods are separately used to
modify the belief function of each piece of evidence based on its weight coefficient.
Finally, the modified functions are combined by DS rule.
110 4 Reasoning with belief functions

To clarify the theoretical foundation of the Dempster combination rule and pro-
vide a direction as how to solve these problems, the Dempster combination rule is
formulated in [1558] based on the random set theory first. Then, under this frame-
work, all possible combination rules are presented, and these combination rules
based on correlated sensor confidence degrees (evidence supports) are proposed.
The optimal Bayes combination rule is given finally.
[1467] Based on the analysis of existing modified combination algorithms, a
new combination method is proposed. First the similar matrix is calculated, then the
credit vector is derived, finally the evidence is averaged by the normalized credit
vector and combined n-1 times by DS rule.
[1529] When several pieces of evidences are combined, the mutual support de-
gree can be calculated according to the evidence distance. Eigenvector for the maxi-
mal eigenvalue of evidence support degree matrix is considered to be weight vector.
Then, evidence discount coefficient can be gained, and it is used to modify every
evidence, which is combined by DS rule.
Ma et al [904, 656] Combination rules proposed so far in the theory of evidence,
especially Dempster rule, are symmetric. They rely on a basic assumption, that is,
pieces of evidence being combined are considered to be on a par, i.e. play the same
role. In the case of revision, the idea is to let prior knowledge of an agent be altered
by some input information. The change problem is thus intrinsically asymmetric.
Assuming the input information is reliable, it should be retained whilst the prior
information should be changed minimally to that effect. To deal with this issue, this
paper defines the notion of revision for the theory of evidence in such a way as
to bring together probabilistic and logical views. Several revision rules previously
proposed are reviewed and we advocate one of them as better corresponding to
the idea of revision. It is extended to cope with inconsistency between prior and
input information. It reduces to Dempster rule of combination, just like revision in
the sense of Alchourron, Gardenfors, and Makinson (AGM) reduces to expansion,
when the input is strongly consistent with the prior belief function. Properties of
this revision rule are also investigated and it is shown to generalize Jeffreys rule of
updating, Dempster rule of conditioning and a form of AGM revision.
[1342] After introducing an interpretation of the mass function we show that
given two bpa Dempster’s rule of combination does not build a coherent bpa with
respect to the interpretation. Next we give a new combination function that over-
comes this problems and study some of its properties.
[1323] we have also proposed a new rule of combination. Efficiency and validity
of our approach have been demonstrated with numerical examples and comparing
with other existing methods.
This article [1341] presents an alternative combination method that is capable
of handling inconsistent evidence and relates evidence focusing to the amount of
information resident in pieces of evidence. The method is capable of combining
belief functions.
[1528] Dempster-Shafer(DS) theory involves counter-intuitive behaviors when
evidence highly conflicts. A new approach of combination of weighted belief func-
4.2 Combination 111

tions is proposed to solve the problem. If many pieces of evidence are to be com-
bined, the amount of conflict between evidences is at first evaluated by both evidence
distance and conflicting belief, and every piece of evidence is given a weight coeffi-
cient according to its amount of conflict with the others. Two different methods are
separately used to modify the belief function of each piece of evidence based on its
weight coefficient. Finally, the modified functions are combined by DS rule.
[1055] The D-S evidence combination method may be useless when the conflict
among evidences is rather great or even complete. Yager had presented some modi-
fied combination methods, but these methods have some deficiencies. According to
the importance of the evidences, the paper is concerned with a new evidence com-
bination method with introducing weight factors and allotting the conflicted prob-
ability again. This method improves the rationality and reliability of the evidence
combination and better results can be acquired.
[303] A new approach to a combination of belief functions, a combination ”per
elements”. A particular regard is devoted to a combination and distribution of con-
tradictive belief masses. Several different instances of this method are compared,
MinC and MaxC combinations are suggested as alternatives of Dempster’s rule of
combination of belief functions.
The thesis [1385] puts forward an improvement method for the combination rule
of Dempster-Shafer evidence theory based on the reliability of evidence and the cor-
relation between evidences. Different from D-S theory, two key parameters are in-
troduced, one is the reliability of evidence, and another is the correlation coefficient
between evidences. By weighting the evidence based on its reliability, the negative
effect of unreliable evidence is reduced. By decreasing the probability evaluation of
certainty and increasing the probability evaluation of uncertainty, the effect of cor-
relative evidence to fusion result is reduced too, and then we can get better fusion
result.
[1492] We suggest an approach to the aggregation of non-independent belief
structures that makes use of a weighted aggregation of the belief structures where
the weights are related to the degree of dependence. It is shown that this aggregation
is non-commutative, the fused value depends on the sequencing of the evidences.
We then consider the problem of how best to sequence the evidence. We investigate
using the measure of information content of the fused value as a method for selecting
the appropriate way to sequence the belief structures.
Guan [557] presented a one-step method for combining evidence from different
evidential sources based on Yen’s extension of D-S theory based on conditional
compatibility relations, and prove that it gives the same results as Yen’s.
[307] Principal ideas of the minC combination are recalled. A mathematical
structure of generalized frames of discernment is analysed and formalized. A gener-
alized schema for a computation of the minC combination is presented. Conflicting
belief masses redistribution among non-conflicting focal elements is overviewed. Fi-
nal general formulas for computation of the minC combination are presented. Some
examples of computation of the minC combination follow. A brief comparison of
the minC combination with other combination rules is presented.
112 4 Reasoning with belief functions

[304] An analysis of various operations of belief functions combination with


respect to its commutativity with coarse/refinement of frame discrenment is pre-
sented. A new operation of combination which commutes with coars/refinement is
searched. Alternative pignistic transformations are suggested.
[1351] This article addresses the performance of Dempster-Shafer (DS) theory,
when it is slightly modified to prevent it from becoming too certain of its decision
upon accumulation of supporting evidence. Since this is done by requiring that the
ignorance never becomes too small, one can refer to this variant of DS theory as
ThresholdedDS. In doing so, one ensures that DS can respond quickly to a consistent
change in the evidence that it fuses. Only realistic data is fused, where realism is
discussed in terms of data certainty and data accuracy, thereby avoiding Zadehs
paradox.
This paper [] generalizes the Dempster Shafer’s(D S) theory of evidence infer-
ence. The Dempster combining rule is modified to the dependent and conflicting
conditions. The generalized D S theory of evidence is implemented for the fusion of
vision information of unstructured road network in ALV. Some problems concerning
the implementation of fusion are discussed, and meaningful results are obtained.
[780] for a particular but large enough class of probability measures, an analogy
of Dempster combination rule, preserving its extensional character but using some
nonstandard and boolean-like structures over the unit interval of real numbers, can
be obtained without the assumption of statistical independence of input empirical
data charged with uncertainty.
[961] self-contained presentation of a method for combining several belief func-
tions on a common frame that is different from a mere application of Dempsters
rule. All the necessary results and their proofs are presented in the paper. It begins
with a review and explanation of concepts related to the notion of non-normalized
mass-function, or gem-function, introduced by P. Smets under the name basic belief
assignment [1,6]. Then the link with Dempsters rule of combination is established.
Several results in relation with the notion of Dempster specialization matrix are
proved for the first time [2].
[876] we have further explored the nature of combination and achieved the fol-
lowing main results. 1). The condition of combination in Dempster’s original combi-
nation framework is more strict than that required by Dempster’s combination rule
in Dempster-Shafer theory of evidence. 2). Some counterintuitive results of using
Dempster’s combination rule shown in some papers are caused by the overlooking
(or ignorance) of different independence conditions required by Dempster’s original
combination framework and Dempster’s combination rule. 3). In Dempster’s com-
bination rule, combinations are performed at the target information level. This rule
itself does not provide a c...

4.2.3 Families of combination rules

Lefevre’s combination rules parameterised by weighting factors The behaviour


of Dempster’s combination operator for the management of the conflict between var-
ious information sources was also criticised by Lefevre et al in [819]. As conflict in-
4.2 Combination 113

creases with the number of information sources, a strategy for re-assigning the con-
flicting mass (they claim) is essential. The family of combination rules they propose
distributes the conflicting mass to each proposition A of a set of subsets P = {A}
according to a weighting factor w(A, m), where m = {m1 , ..., mj , · · · , mJ }:

m(A) = m∩ (A) + mc (A), (4.11)

where5 
w(A, m) · m(∅) A ∈ P
mc (A) =
0 otherwise
P
under the constraint that the weights are normalised: A∈P w(A, m) = 1.
This (weighted) family subsumes Smets’ and Yager’s rules when P = {∅} and
P = {Θ}, respectively. We get Dempster’s rule when P = 2Θ \ {∅} with weights:
m∩ (A)
w(A, m) = ∀A ∈ 2Θ \ {∅}.
1 − m(∅)
Dubois and Prade’s operator can also be obtained by appropriately computing the
weight factors.
In addition, the authors proposed in [819] to learn the most appropriate weights for
a specific problem.
Other authors have proposed similar conflict redistribution strategies.
A similar idea is indeed presented in [842], where the conflicting mass is distributed
to every proposition according to its average supported degree.
In [351] a global conflict is first calculated as the weighted average of the local con-
flict. Then, a validity coefficient is defined to show the effect of conflicting evidence
on the results of the combination.
Han et al [593] proposed in 2008 a modified combination rule which is based on
Ambiguity Measure (AM), a recently proposed uncertainty measure for belief func-
tions. Weight factors based on the AM of the bodies of evidence are used to reallo-
cate conflicting mass assignments.
Haenni (2002) [570] criticised Lefevre’s proposal of a parametrised combination
rule.
Lefevre et al further replied to Haenni in [822], from the point of view of the
Transferable Belief Model, as opposed to the probabilistic argumentation systems
(PAS), proposed by Haenni.

Denoeux’s families induced by t-norms and conorms Cautious and bold rules
are shown [377] to be particular members of infinite families of conjunctive and
disjunctive combination rules, based on triangular norms and conorms.
We recall that a t-norm is a commutative and associative binary operator > on
the unit interval satisfying the monotonicity property:

y ≤ z ⇒ x>y ≤ x>z, ∀x, y, z ∈ [0, 1],


5
Note that Lefevre et al mistakenly wrote A ⊆ P in their definition (17), rather than A ∈ P
(since P is a collection of subsets).
114 4 Reasoning with belief functions

and the boundary condition x>1 = x, ∀x ∈ [0, 1]. A t-conorm ⊥ meets the same
three basic properties (commutativity, associativity, monotonicity), and differs only
by the boundary condition x ⊥ 0 = x. T-norms and t-conorms are usually inter-
preted, respectively, as generalized conjunction and disjunction operators in fuzzy
logic. Denoeux notes that the conjuctive combination is such that:

w1c c c
∩ 2 (A) = w1 (A) · w2 (A), (4.12)

where w1c (A) = 1 ∧ w(A). New rules for combining nondogmatic belief functions
can then be defined by replacing the minimum ∧ in (4.12) by a positive t-norm6 :
1 w (A)∗>,⊥ w2 (A)
m1 ~>,⊥ m2 =
∩ A⊂Θ mA , (4.13)

where ∗>,⊥ is the following operator in (0, +∞):



 x>y x ∨ y ≤ 1,
x ∗>,⊥ y = x ∧ y x ∨ y > 1 and x ∧ y ≤ 1,
 ( 1 ⊥ 1 )−1 otherwise
x y

where > is a positive t-norm, and ⊥ a t-cornorm, for all x, y > 0.


The cautious rule corresponds to ~∧,∨ .

Quasi-associative operators In [1486] quasi-associative operators are proposed


for representing a class of operators used to combine various pieces of evidence,
including averaging operators and Dempster-Shafer combining operators.

Alpha-junctions [1249] Derivation through axiomatic arguments of the operators


that represent associative, commutative and non interactive combinations within be-
lief function theory. The derived operators generalize the conjunction, disjunction
and exclusive disjunction cases. The operators are characterized by one parameter.
[1039] The -junctions are the associative, commutative and linear operators for
belief functions with a neutral element. This family of rules includes as particu-
lar cases the unnormalized Dempsters rule and the disjunctive rule. Until now, the
-junctions suffered from two main limitations. First, they did not have an interpre-
tation in the general case. Second, it was difficult to compute a combination by an
-junction. In this paper, an interpretation for these rules is proposed. It is shown
that the -junctions correspond to a particular form of knowledge about the truthful-
ness of the sources providing the belief functions to be combined. Simple means to
compute a combination by an -junction are also laid bare.
we came to the idea that these three rules may be special cases of a more general
combination scheme... and discovered what we will call the -junction rules. These
new rules could be extended for combining weighted sets, nevertheless our presen-
tation is restricted to the domain covered by the TBM, i.e., to belief functions.
6
A t-conorm can replace the combination of the diffidence components of the weights,
[377].
4.2 Combination 115

Others Denneberg [354] studies three conditioning rules for updating non-additive
measures. Two of these update rules, the Bayes’ and the Dempster-Shafer, are ex-
treme cases of a family of update rules [518]
In [1524] the authors introduce a family of update rules more general than
the one of Gilboa and Schmeidler. We also show how to embed the general and
Dempster-Shafer update formulas in another family of update rules.
[1001] Distances between fusion operators are measured using a class of random
belief functions. With similarity analysis, the structure of this family is extracted, for
two and three information sources. The conjunctive operator, quick and associative
but very isolated on a large discernement space, and the arithmetic mean are iden-
tified as outliers, while the hybrid method and six proportional conictredistributing
rules (PCR) form a continuum. The hybrid method is showed as being central for
the family of fusion methods.
[721] Instead of introducing another rule, we propose to use existing ones as part
of a hierarchical and conditional combination scheme. The sources are represented
by mass functions which are analysed and labelled regarding unreliability and im-
precision. This conditional step divides the problem into specific sub-problems. In
each of these sub-problems, the number of constraints is reduced and an appropriate
rule is selected and applied. Two functions are thus obtained and analysed, allow-
ing another rule to be chosen for a second (and final) fusion level. This approach
provides a fast and robust way to combine disrupted sources using contextual infor-
mation brought by a particle filter.
[150] on the basis of random set theory, an unified formulation of combina-
tion rules is presented, which can describe most classical combination rules besides
Dempster’s rule, and what’s more, which attempt to provide an original ideas for
constructing more applicable and effective combination rules. Finally, by use of this
formulation, a new combination rule is constructed for overcoming a class of coun-
terintuitive phenomena pointed out by Lotfi Zadeh.

4.2.4 Combination of dependent evidence

Su (2016) [1310] proposes an improved method for combining dependent bodies


of evidence which takes the significance of the common information sources into
consideration. The method is based on the significance weighting operation and a
‘decombination’ operation. A numerical example is illustrated to show the use and
effectiveness of the proposed method.
Cattaneo [957] considers the problem of combining belief functions obtained
from not necessarily independent sources of information. It introduces two combi-
nation rules for the situation in which no assumption is made about the dependence
of the information sources. These two rules are based on cautious combinations of
plausibility and commonality functions, respectively. The paper studies the proper-
ties of these rules and their connection with Dempsters rules of conditioning and
combination and the minimum rule of possibility theory.
116 4 Reasoning with belief functions

4.2.5 Combination of conflicting evidence

As we learned from the debate on Dempster’s rule (Section ??), most criticisms of
the original combination operator focus on its behaviour in situations in which the
various pieces of evidence are highly conflicting.
Several solutions were proposed: the TBM solution where masses are not renor-
malized and conflict is stored in the mass given to the empty set, Yagers solution []
where the conflict is transferred to the universe and Dubois and Prades solution []
where the masses resulting from pairs of conflictual focal elements are transferred
to the union of these subsets.
The jungle of combination rules been proposed as a result of the conflict prob-
lem was discussed by Smets (2007) [1260]. There he discussed the nature of the
combinations (conjunctive versus disjunctive, revision versus updating, static ver-
sus dynamic data fusion), argued in favour of normalization, examined the possible
origins of the conflicts, determined whether a combination is justified and analyzed
many of the proposed solutions.
The most relevant work on the issue of conflict is probably [974]. There, Murphy
(2000) presented the problem of failing to balance multiple evidence, illustrates
the proposed solutions and describes their limitations. Of the proposed methods,
averaging best solves the normalization problems, but it does not offer convergence
toward certainty, nor a probabilistic basis. To achieve convergence, this research
suggests incorporating average belief into the combining rule.
Liu (2006) [874] provides a formal definition of when two basic belief assign-
ments are in conflict, using a combination of both quantitative measures of the mass
of the combined belief assigned to the emptyset before normalization and the dis-
tance between betting commitments of beliefs. She argued that only when both mea-
sures are high, it is safe to say the evidence is in conflict.
In [736] a new definition of consistency is introduced and applied to the theory
of evidence
[914] propose some alternative measures of conflict as the distance between
belief functions. These measures of conflict are further used for an a-posteriori esti-
mation of the relative reliability between the sources of information which does not
need any training or prior knowledge.
In [821] combination operators which allow an arbitrary redistribution of the
conflicting mass on the propositions are proposed.
In [315, 305] alternative ways of distributing the contradiction among nonempty
subsets of frame of discernment are studied. The paper employes a new approach
to understanding contradictions and introduces an original notion of potential con-
tradiction. A method of an associative combination of generalized belief functions
minC combination and its derivation is presented as part of the new approach.
In [958] the idea is that each piece of evidence is discounted in proportion to
the degree that it contributes to the conflict. Discounting is performed in a sequence
of incremental steps, with conflict updated at each step, until the overall conflict is
brought down exactly to a predefined acceptable level.
4.2 Combination 117

[512] Uncertainty in Dempster-Shafer framework can be considered to consist


of two components; one arising due to randomness. and the other arising due to the
lack of specificity in evidence. One facet of uncertainty due to randomness is conflict
in the evidence. Here we establish the need for a new measure of conflict and follow
a fresh approach to achieve the same. We view the conflict between propositions
as being proportional to the ‘distance’ between them. We expect this ‘distance’ to
obey the laws of a metric. With this motivation we formulate a set of axioms that
we expect a metric distance to satisfy, in order lo quantify the conflict between two
propositions. Based on this we derive a unique expression for the conflict between
propositions. The average of conflict between propositions gives a measure of total
conflict in the body of evidence. We then prove various properties of this conflict
measure.
[917] We present and discuss a new generalized proportional conflict redistri-
bution rule. The Dezert-Smarandache extension of the Dempster-Shafer theory has
relaunched the studies on the combination rules especially for the management of
the conflict. We study here different combination rules and compare them in terms
of decision on didactic example and on generated data. Indeed, in real applications,
we need a reliable decision and it is the final results that matter. This chapter shows
that a fine proportional conflict redistribution rule must be preferred for the combi-
nation in the belief function theory.
In [1548] the authors firstly review many improved combination rules and ana-
lyze the reasons of drawbacks of Dempster’s rule mentioned above. In addition, a
new kind of method is proposed to solve the problem of high conflict. Two concrete
algorithms are introduced with consideration of with a prior and without any prior.
A review and comparison of currently principal reasoning methods are presented
in [561]. We construct the corresponding theory frameworks based on different
methods and analyze their merits, drawbacks and mutual relationships.
[?] To overcome the shortcomings of conventional measure criterion of evidence
conflict and solve the invalidation problem of Dempster-Shafer evidence combina-
tion rule with high conflict, a novel measure criterion of evidence conflict is defined
through pignistic transformation. Based on this definition, a new improved D-S al-
gorithm is proposed in this paper. According to the decision idea that the minority
is subordinate to the majority, the proposed algorithm first preprocesses evidence
through introducing weight coefficients which represent the importance degree of
evidence, then combines preprocessed evidence using Dempster’s rule.
[1447] We introduce the idea of grey relational analysis (GRA), and propose a
new conflict reassignment approach of belief functions, as a preprocessing method,
to automatically identify and reassign the conflicts on belief functions before com-
bination. The proposed approach can automatically evaluate the reliability of the
information sources and distinguish the unreliable information.
[1116] we demonstrate that it is possible to manage intelligence in constant time
as a pre-process to information fusion through a series of processes dealing with
issues such as clustering reports, ranking reports with respect to importance, ex-
traction of prototypes from clusters and immediate classification of newly arriving
118 4 Reasoning with belief functions

intelligence reports. These methods are used when intelligence reports arrive which
concerns different events which should be handled independently, when it is not
known a priori to which event each intelligence report is related. We use clustering
that runs as a back-end process to partition the intelligence into subsets represent-
ing the events, and in parallel, a fast classification that runs as a front-end process
in order to put the newly arriving intelligence into its correct information fusion
process.
[316] This contribution deals with conflicts of belief functions. Internal conflicts
of belief functions and conflicts between belief functions are described and ana-
lyzed here. Differences between belief functions are distinguished from conflicts
between them. Three new different approaches to conflicts are presented: combi-
national, plausibility, and comparative. The presented approaches to conflicts are
compared to Lius interpretation of conflicts.
In [1446] a method was developed for dealing with seriously conflicting ev-
idence when the Dempster-Shafer combination result can not identify the actual
conditions. The method utilizes the advantages of the Dempster-Shafer evidence
theory and an additive strategy. The conflicting evidence was first verified and then
the additive strategy was used to modify the properties of the conflicting evidence.
[915] the mass appearing on the empty set during the conjunctive combina-
tion rule is generally considered as conflict, but that is not really a conflict. Some
measures of conflict have been proposed, we recall some of them and we show
some counter-intuitive examples with these measures. Therefore we define a con-
flict measure based on expected properties. This conflict measure is build from the
distance-based conflict measure weighted by a degree of inclusion introduced in this
paper.
[379] Recently, the problem of measuring the conflict between two bodies of ev-
idence represented by belief functions has known a regain of interest. In most works
related to this issue, Dempsters rule plays a central role. In this paper, we propose
to study the notion of conflict from a different perspective. We start by examining
consistency and conflict on sets and extract from this settings basic properties that
measures of consistency and conflict should have. We then extend this basic scheme
to belief functions in different ways. In particular, we do not make any a priori
assumption about sources (in)dependence and only consider such assumptions as
possible additional information.
[872] Recently, two new approaches to measuring the conflict among belief
functions are proposed in [JGB01,Liu06]. The former provides a distance-based
method to quantify how close a pair of beliefs is while the latter deploys a pair of
values to reveal the degree of conflict of two belief functions. On the other hand,
in possibility theory, this is done through measuring the degree of inconsistency of
merged information. However, this measure is not sufficient when pairs of uncer-
tain information have the same degree of inconsistency. At present, there are no
other alternatives that can further differentiate them, except an initiative based on
coherence-intervals ([HL05a,HL05b]). In this paper, we investigate how the two
4.2 Combination 119

new approaches developed in DS theory can be used to measure the conflict among
possibilistic uncertain information.
[678] we first bring out the limitation of the combination rule introduced by
Zhang [16]. Subsequently, we focus our study on two other rules. The first one was
proposed by Dubois and Prade [2, 3] and is known as Disjunctive rule of combina-
tion. Incidentally, this rule also appeared in the Hau and Kashyap’s work [5]. The
other combination rule was due to Yager [13]. Even though these rules are robust,
we show that in some cases these rules treat evidences asymmetrically and give
counterintuitive results. We then propose a combination rule which doesn’t have
these drawback.
[497] presents an improved D-S algorithm, which verifies and modifies the con-
flicting evidences.
[824] The conjunctive combination provides interesting properties, as the com-
mutativity and the associativity. However, it is characterized by having the empty
set, called also the conflict, as an absorbing element. So, when we apply a signif-
icant number of conjunctive combinations, the mass assigned to the conflict tends
to 1 which makes impossible returning the distinction between the problem arisen
during the fusion and the effect due to the absorption power of the empty set.
The objective of this paper is then to define a formalism preserving the initial
role of the conflict as an alarm signal announcing that there is a kind of disagreement
between sources. More exactly, that allows to preserve some conflict, after the fusion
by keeping only the part of conflict reflecting the opposition between the belief
functions. This approach is based on dissimilarity measures and on a normalization
process between belief functions.
[382] In this paper, we propose to revisit conflict from a different perspective.
We do not make a priori assumption about dependencies and start from the defini-
tion of conflicting sets, studying its possible extensions to the framework of belief
functions.
[633] Current research shows it is very important to define new conflict coeffi-
cients to determine the conflict degree between two or more pieces of evidence. The
evidential sources of information are considered in this work and the definition of a
conflict measure function (CMF) is proposed for selecting some useful CMFs in the
next fusion work when sources are available at each instant. Firstly, the definition
and theorems of CMF are put forward. Secondly, some typical CMFs are extended
and then new CMFs are put forward.
[388] This paper compares the expressions obtained from an analysis of a prob-
lem involving conflicting evidence when using Dempster’s rule of combination and
conditional probabilities. Several results are obtained showing if and when the two
methodologies produce the same results. The role played by the normalizing con-
stant is shown to be tied to prior probability of the hypothesis if equality is to occur.
This forces further relationships between the conditional probabilities and the prior.
Ways of incorporating prior information into the Belief function framework are ex-
plored and the results are analyzed. Finally a new method for combining conflicting
120 4 Reasoning with belief functions

evidence in a belief function framework is proposed. This method produces results


more closely resembling the probabilistic ones.
[510] Conflict management is a major problem especially during the fusion of
many information sources. In this paper, we analyse and compare existing com-
bination rules and propose a new approach to manage the conflict-Local Conflict
Management. It overcomes shortcomings of Dempster’s rule and other rules.
[1126] we define and derive an internal conflict of a belief function We decom-
pose the belief function in question into a set of generalized simple support functions
(GSSFs). Removing the single GSSF supporting the empty set we obtain the base of
the belief function as the remaining GSSFs. Combining all GSSFs of the base set,
we obtain a base belief function by definition. We define the conflict in Dempsters
rule of the combination of the base set as the internal conflict of the belief function.
Previously the conflict of Dempsters rule has been used as a distance measure only
between consonant belief functions on a conceptual level modeling the disagree-
ment between two sources. Using the internal conflict of a belief function we are
able to extend this also to non-consonant belief functions.
[310] Non-conflicting and conflicting parts of belief functions are introduced in
this study. The unique decomposition of a belief function defined on a two-element
frame of discernment to non-conflicting and indecisive conflicting belief function is
presented. Several basic statements about algebra of belief functions on a general
finite frame of discernment are introduced and unique non-conflicting part of a BF
on an n-element frame of discernment is presented here.
In [823] a formalism allowing the preservation of the conflict which reflects the
opposition between sources, is introduced in this paper.
[670] evidence conflict and belief convergence are investigated based on the
analysis of the coherence degree between two sources of evidence. Moreover, the
stochastic interpretation for basic probability assignment (BPAs) is illustrated. In ad-
dition, a few methods in dealing with evidence conflict are analyzed and compared.
Then, a new paradox combination algorithm based on an absolute difference factor
of two pieces of evidence and a relative difference factor of two pieces of evidence
for a specific hypothesis are proposed with the consideration of local attributions to
local conflict.
[1069] Schematic conflict occurs when evidence is interpreted in different ways
(for example, by different people, who have learned to approach the given ev-
idence with different schemata). Such conflicts are resolved either by weighting
some schemata more heavily than others, or by finding common-ground inferences
for several schemata, or by a combination of these two processes. Belief func-
tions, interpreted as representations of evidence strength, provide a natural model
for weighting schemata, and can be utilized in several distinct ways to compute
common-ground inferences. In two examples, different computations seem to be re-
quired for reasonable common-ground inference. In the first, competing scientific
theories produce distinct, logically independent inferences based on the same data.
In this example, the simple product of the competing belief functions is a plausible
evaluation of common ground. In the second example (sensitivity analysis), the con-
4.2 Combination 121

flict is among alternative statistical assumptions. Here, a product of belief functions


will not do, but the upper envelope of normalized likelihood functions provides a
reasonable definition of common ground. Different inference contexts thus seem
to require different methods of conflict resolution. A class of such methods is de-
scribed, and one characteristic property of this class is proved.
[309] This study deals with conflicts of belief functions. Internal conflicts of
belief functions and conflicts between belief functions are described and analyzed
here. Differences of belief functions are distinguished from conflicts between them.
Three new different approaches to conflicts are presented: combinational, plausibil-
ity and comparative. The presented approaches to conflicts are compared to Liu’s
interpretation of conflicts.

Schubert’s work In a series of papers ([1117],[1119],[1120],[1122]) J. Schubert


established within the framework of the ToE a criterion function called metacon-
flict function. With this criterion, he is able to partition into subsets a set of several
pieces of evidence with propositions that are weakly specified, in the sense that it
may be uncertain to which event a proposition is referring. Finally, each subset in
the partition represents a separate event.
For example, suppose there are several submarines and a number of intelligence
reports referring to one of them: we want to analyze reports referring to different
submarines separately. We will use the conflict between the propositions of two
intelligence reports as a probability that this two document are related to distinct
targets.
In the general case, the metaconflict function comes from the plausibility that
the partitioning is correct when viewing the conflict in Dempster’s rule as meta-
evidence.
In [1125] the idea is that each piece of evidence is discounted in proportion to
the degree that it contributes to the conflict. Discounting is performed in a sequence
of incremental steps, with conflict updated at each step, until the overall conflict is
brought down exactly to a predefined acceptable level.

4.2.6 Combination of (un)reliability of sources of evidence, discounting

[458] We develop a method for the evaluation of the reliability of a sensor when
considered alone. The method is based on finding the discounting factor minimiz-
ing the distance between the pignistic probabilities computed from the discounted
beliefs and the actual values of data. Next, we develop a method for assessing the
reliability of several sensors that are supposed to work jointly and their readings
are aggregated. The discounting factors are computed on the basis of minimizing
the distance between the pignistic probabilities computed from the combined dis-
counted belief functions and the actual values of data.
In [939] an extension of the discounting operation is proposed, allowing to use
more detailed information regarding the reliability of the source in different con-
texts, i.e., conditionally on different hypotheses regarding the variable on interest.
122 4 Reasoning with belief functions

This results in a contextual discounting operation parameterized with a discount rate


vector.
Z. Liu et al (2011) [498] analyzed the problem of weighing different sources
of evidence, proposing a distance and a conflict coefficient based on the pignistic
transform to characterize the dissimilarity. A new estimation method of weighting
factors which uses the proposed dissimilarity measure is presented.
When the independence of the sources is questionable, [?] suggests to use the
least specific combination minimizing the conflict among the ones allowed by a
simple generalization of Dempsters rule. This increases the monotonicity of the
reasoning and helps us to manage situations of dependence.
In [916] a general approach to information correction and fusion for belief func-
tions is proposed, where not only may the information items be irrelevant, but
sources may lie as well. We introduce a new correction scheme, which takes into
account uncertain metaknowledge on the sources relevance and truthfulness and
that generalizes Shafers discounting operation. We then show how to reinterpret all
connectives of Boolean logic in terms of source behavior assumptions with respect
to relevance and truthfulness. We are led to generalize the unnormalized Demp-
sters rule to all Boolean connectives, while taking into account the uncertainties
pertaining to assumptions concerning the behavior of sources. Eventually, we fur-
ther extend this approach to an even more general setting, where source behavior
assumptions do not have to be restricted to relevance and truthfulness. We also es-
tablish the commutativity property between correction and fusion processes, when
the behaviors of the sources are independent.
[1208] This paper presents a new approach for combining sources of evidences
with different “importances”, as distinguished from “reliabilities” which are treated
using classical Shafer’s discounting, arguing that in multicriteria decision contexts
these notions should be clearly distinguished.
[577] proposes a general model of partially reliable sources which includes sev-
eral previously known results as special cases.
Different operations can be used in the theory of belief functions to correct the
information provided by a source, given meta-knowledge about that source. Exam-
ples of such operations are discounting, de-discounting, extended discounting and
contextual discounting. In [937] new interpretations of these schemes are proposed,
and two families of belief function correction mechanisms are introduced and jus-
tified. The first family generalizes previous non-contextual discounting operations,
whereas the second generalizes the contextual discounting.
[720] points out that the key problems related to discounting are: what sources
must be discounted and up to what degree, and proposes a method that jointly tackles
these two issues by computing rates using a “dissent” measure.
In [938] an extension of the discounting operation in the TBM is proposed,
allowing to make use of more detailed information regarding the reliability of the
source in different contexts, a context being defined as a subset of the frame of
discernment.
4.2 Combination 123

[1041] makes the point that pieces of evidence may have different reliability -
by weighting evidence according to reliability, the effect of unreliable evidence is
reduced.
[936] presents an objective way of assessing the reliability of a sensor or an
expert expressing its opinion by way of a belief function. Using the contextual dis-
counting, labeled data and an error function, we generalize an approach proposed
by Elouedi, Mellouli and Smets (2004).
[1556] extends a conventional discounting scheme commonly used with the
Dempster-Shafer evidential reasoning to deal with conflict.
In [935] a new interpretation of the de-discounting operation introduced as the
inverse operation of the discounting operation by Denoeux and Smets is presented in
this paper. A more general form of reinforcement process, as well as a parameterized
family of transformations encompassing all previous schemes, are also introduced.
In [332] the authors propose to estimate discounting factors from the conflict
arising between sources, and from past knowledge about the qualities of these
sources. Under the assumption that conflict is generated by defective sources, an
algorithm is proposed for detecting them and mitigating the problem.
[555] discusses two of the basic operations on evidential functions, the discount
operation and the well-known orthogonal sum operation. We show that the discount
operation is not commutative with the orthogonal sum operation, and derive expres-
sions for the two operations applied to the various evidential functions.
[913] In a belief application, the definition of the basic belief assignments and
the tasks of reduction of focal elements number, discounting, combination and de-
cision, must be thought at the same time. Moreover these tasks can be seen as a
general process of belief transfer. The second aspect of this paper involves the in-
troduction of the reliability in the combination rule directly and not before. Indeed,
in general, the discounting process is made with a discounting factor that is a relia-
bility factor of the sources. Here we propose to include in the combination rule an
estimation of the reliability based on a local conflict estimation.
[490] This paper investigates the conjunctive combination of belief functions
from dependent sources based on the cautious conjunctive rule (CCR). Weight func-
tions in the canonical decomposition of a belief function are divided into two parts,
namely, positive and negative weight functions, whose characteristics are described.
Positive and negative weight functions of two belief functions are used to construct
a new partial ordering between the belief functions. The partial ordering determines
the committed relationship between two belief functions, which is different from
that generated by the weight function based partial ordering in the CCR when one
or two belief functions are not unnormalized separable. A new rule is developed
using the constructed partial ordering to combine belief functions from dependent
sources.
124 4 Reasoning with belief functions

4.3 Conditioning
We will first review the most significant contributions to the definition of conditional
belief functions; then, we will review the work done on the generalisation of the law
of total probability (also called “Jeffrey’s rule”) to belief functions; finally, we will
summarise the basic of Smets’ Generalised Bayes Theorem (GBT).

4.3.1 Conditional belief functions

A number of different approaches to conditioning belief functions have been pro-


posed [169, 464, 661, 516, 353, 1526, 651]. Here we review some of the most sig-
nificant proposals: the original Dempster’s conditioning, Fagin and Halpern’s lower
and upper envelopes of conditional probabilities, Suppes’ geometric conditioning,
Smets’ unnormalized conditioning, and Spies’ sets of equivalent events under multi-
valued mappings.

Dempster’s conditioning Recall that in Dempster’s approach, conditional belief


functions with respect to an arbitrary event A are obtained by simply combining the
original b.f. with a “categorical” (in Smets’ terminology) or “logical” belief function
focussed on A, by means of Dempster’s rule of combination: b(.|A) = b⊕bA (Figure
??).

Fig. 4.2. Dempster’s conditioning: an example.

Lower and upper conditional envelopes Fagin and Halpern [464] proposed an
approach based on the credal (robust Bayesian) interpretation of belief functions, as
lower envelopes of a family of probability distributions:

Bel(A) = inf P (A)


P ∈P[Bel]

They defined the conditional belief function associated with Bel as the lower enve-
lope (that is, the infimum) of the family of conditional probability functions P (A|B),
where P is consistent with Bel:
. .
Bel(A|B) = inf P (A|B), P l(A|B) = sup P (A|B).
P ∈P[Bel] P ∈P[Bel]
4.3 Conditioning 125

It is straightforward to see that this definition reduces to that of conditional prob-


ability whenever the initial b.f. is Bayesian, i.e., its credal set consists of a single
probability measure.
This notion has been considered by other authors too, e.g. Dempster [1967] and
Walley [1981], and is quite related to the concept of inner measure (see Section
3.1.3). Kyburg [802] had also analyzed the links between Dempster’s conditioning
of belief functions and Bayesian conditioning of closed, convex sets of probabilities
(of which belief functions are a special case). He arrived at the conclusion that
the probability intervals generated by Dempster updating were included in those
generated by Bayesian updating.
Indeed, the following closed form expressions for conditional belief and plausi-
bility in the credal approach can be proven:
Bel(A∩B) P l(A∩B)
Bel(A|B) = Bel(A∩B+P l(Ā∩B)
, P l(A|B) = P l(A∩B)+Bel(Ā∩B)

Note that lower/upper envelopes of arbitrary sets of probabilities are not in gen-
eral belief functions, but these actually are, as Fagin and Halpern have proven. A
direct comparison shows that they are quite different from the results of Dempster’s
conditioning:
Bel(A∪B̄) P l(A∩B)
Bel⊕ (A|B) = 1−Bel(B̄)
, P l⊕ (A|B) = P l(B) .

In fact, they provide a more conservative estimate, as the associated probability


interval is included in that resulting from Dempster’s conditioning:

Bel(A|B) ≤ Bel⊕ (A|B) ≤ P l⊕ (A|B) ≤ P l(A|B).

Fagin and Halpern argued that Dempster’s conditioning behaves unreasonably in


the context of their “three prisoners” example.

Geometric versus unnormalised conditioning One way of dealing with the


Bayesian criticism of Dempster’s rule is to abandon all notions of multivalued map-
ping to define belief directly in terms of basis belief assignments, as in Smets’ trans-
ferable belief model [1225].
In [1243], Smets pointed out the distinction between revision and focussing in
the conditional process, following previous work by Dubois and Prade. In probabil-
ity theory, although they are conceptually different operations, both are expressed
by Bayes’ rule: as a consequence this distinction is not given as much attention
as in other theories of uncertainties. Indeed, in belief theory these principles lead
to different conditioning rules. The application of revision and focussing to belief
theory has been explored by Smets in his Transferable Belief Model (TBM) - thus
no random set generating Bel, nor any underlying convex sets of probabilities are
assumed to exist.
In focussing no new information is introduced, as we merely focus on a specific
subset of the original set. When applied to belief functions, this yield Suppes and
Zanotti’s geometric conditioning [1319]:
126 4 Reasoning with belief functions

Bel(A ∩ B)
BelG (A|B) = .
Bel(B)

This was proved by Smets’ using the “probability of provability” interpretation of


belief functions (Section ??). We can note that geometric conditioning is somewhat
dual to Dempster’s conditioning, as it replaces probability with belief in Bayes’
rule - remember that Dempster’s rule dually replaces probability with plausibility in
Bayes’ rule:
P l(A∩B) Bel(A∩B)
P l⊕ (A|B) = P l(B) ↔ BelG (A|B) = Bel(B) .

As oppose to focussing, in belief revision a state of belief is modified to take into


account a new piece of information. This results in Smets’ unnormalized conditional
belief function BelU (.|B) with mass assignment7 :
 X
 m(A ∪ X) if A ⊆ B
mU (.|B) = X⊆B c
0 elsewhere

Remember that in the TBM belief functions which assign mass to ∅ can exist, under
the “open world” assumption). In terms of plausibilities the rule goes P lU (A|B) =
P l(A ∩ B) - in the TBM the mass m(A) is transferred by conditioning on B to
A ∩ B. BelU (.|B) is also the minimal commitment specialization of Bel such that
P l(B c |B) = 0 [718].

Conditional events as equivalence classes On the other side, Spies [1288] es-
tablished a link between conditional events and discrete random sets. Conditional
events were defined as sets of equivalent events under the conditioning relation. By
applying to them a multivalued mapping (see Section ??) he gave a new definition
of conditional belief function. This yields an intriguing approach to conditioning,
which lies within the random set interpretation. Finally, an updating rule (that is
equivalent to the law of total probability is all beliefs are probabilities) was intro-
duced.
Namely, let (C, F, P ) and Γ : C → 2Ω the source probability space and ran-
dom set, respectively. The null sets for P (.|A) are the collection of events with
conditional probability 0: N (P (.|A)) = {B ∈ A : P (B|A) = 0}. Let 4 be the
symmetric difference A4B = (A ∩ B̄) ∪ (Ā ∩ B) of two sets A, B. We can prove
that:
Lemma 1. If ∃Z ∈ A s.t. B, C ∈ Z4N (PA ) then P (B|A) = P (C|A).
In rough words, two events have the same conditional probability if they both are the
symmetric difference between a same event and some null set. We can then define
conditional events as the following equivalence classes:
7
Author’s notation.
4.3 Conditioning 127

Definition 40. A conditional event [B|A] with A, B ⊆ Ω is a set of events with the
same conditional probability P (B|A):

[B|A] = B4N (PA )

It can also be proven that [B|A] = {C : A ∩ B ⊆ C ⊆ Ā ∪ B}. We can now


define a conditional multivalued mapping for B ⊆ Ω as: ΓB (c) = [Γ (c)|B], where
Γ : C → 2Ω . In other words, if A = Γ (c), ΓB maps c to [A|B]. As a consequence,
to all elements of each conditioning event (an equivalence class) must be assigned
equal belief/plausibility, and a conditional belief function is a “second-order” be-
lief function with values on collections of focal elements (the conditional events
themselves).
Definition 41. Given a belief function Bel, the conditional b.f. given B ⊆ Ω is:
1 X
Bel([C|B]) = P ({c : ΓB (c) = [C|B]}) = m(A).
K
A∈[C|B]

It is important to realise that, in this definition, a conditional b.f. is not a belief


function on the sub-algebra {Y = C ∩ B, C ⊆ Ω}. It can be proven that Spies’
conditional belief functions are closed under Dempster’s rule of combination, and
therefore, once again, coherent with the random set interpretation of the theory.
Slobodova also conducted some early studies on the issue of conditioning. In
particular, a multi-valued extension of conditional b.f.s was introduced [1201], and
its properties examined. In [1202] she described how conditional belief functions
(defined as in Spies’ approach) fit in the framework of valuation-based systems.

Other work Klopotek and Wierzchon [733] provided a frequency-based interpre-


tation for conditional belief functions. More recently, Tang and Zheng [1327] also
discussed the issue of conditioning in a multi-dimensional space. Quite recently,
Lehrer [827] proposed a geometric approach to determine the conditional expec-
tation of non-additive probabilities. Such conditional expectation was then applied
for updating, whenever information became available, and to introduce a notion of
independence.

4.3.2 The generalised Bayes theorem

Philippe Smets tackled in a significant work of his the problem of generalising


Bayes’ theorem to belief calculus.
Consider a conditional probability P (x|θi ) over observations x ∈ X, and an a-
priori probability P0 over a set of hidden variables θi ∈ Θ (for instance, as in Smets’
work on medical diagnosis, x is a symptom and θi a disease). After observing x, the
probability distribution on Θ is updated to the posterior via Bayes’s theorem:

P (x|θi )P0 (θi )


P (θi |x) = P ∀θj ∈ Θ
j P (x|θj )P0 (θj )
128 4 Reasoning with belief functions

The Generalised Bayes Theorem (GBT) is indeed a generalisation of Bayes’ the-


orem for conditional belief functions, when the a-priori b.f. on Θ is vacuous, and
Dempster’s normalised/unnormalised conditioning is assumed.

Cognitive independence Consider a belief function over the product space X × Y


of two variables. The latter are said to be cognitively independent if

plX×Y (x ∩ y) = plX (x)plY (y) ∀x ⊆ X, y ⊆ Y.

Cognitive independence generalises stochastic independence, and can be esily ex-


tended to conditional independence, as:

plX×Y (x ∩ y|θi ) = plX (x|θi )plY (y|θi ) ∀x, y, θi .

This implies that the ratio of plausibility/belief on X does not depend on Y :

plX (x1 |y) plX (x1 ) BelX (x1 |y) BelX (x1 )
= , = .
plX (x2 |y) plX (x2 ) BelX (x2 |y) BelX (x2 )
Generalised Likelihood Principle The Likelihood Principle requires the likeli-
hood of an hypothesis given the data to be equal to the conditional probability of the
data given the hypothesis. Namely:

l(θi |x) = p(x|θi )

and, for unions of singleton hypotheses:


n o
l(θ = {θ1 , ..., θk }|x) = max l(θi |x) : θi ∈ θ .

Shafer’s proposed somethings similar in his likelihood-based approach to statis-


tical inference (see Section 4.1.1): pl(θ|x) = maxθi ∈θ pl(θi |x)0 . This, however,
was rejected by Smets, for not satisfying the requirement that, if two pieces of ev-
idence are conditionally independent, BelΘ (.|x, y) is the conjunctive combination
of BelΘ (.|x) and BelΘ (.|y).
In opposition, he proposed the following Generalised Likelihood Principle:
1. plΘ (θ|x) = plX (x|θ);
2. for all x, θ the plausibility of data pl(x|θ) given a compound hypothesis θ =
{θ1 , ..., θm } is a function of the following variables only:

{pl(x|θi ), pl(x̄|θi ) : θi ∈ θ}.

Note that the form of the function is not assumed (not necessarily the max, as in the
original likelihood principle). Both pl(x|θi ) and pl(x̄|θi ) are necessary, according
to Smets, to account for the non-addivitivity of belief functions.
The GLP is justified by the two following requirements: (1) pl(x|θ) remains the
same on the coarsening of X formed by just x and x̄, and (2) the plausibilities of
the θj 6∈ θ are irrelevant for the computation of pl(x|θ).
4.3 Conditioning 129

Generalised Bayes Theorem Under conditional cognitive independence and item


(2) of the Generalised Likelihood Principle, BelX (.|θ), θ ⊂ Θ is generated from
the {BelX (.|θi ), θi ∈ Θ} by disjunctive combination (??):
Y Y
P lX (x|θ) = 1 − (1 − P lX (x|θi )), BelX (x|θ) = BelX (x|θi ).
θi ∈θ θi ∈θ

Then, condition (1) of the GLP plΘ (θ|x) = plX (x|θ) directly yields Smets’ Gener-
alised Bayes Theorem:
1 Y 
P lΘ (θ|x) = 1− (1 − plX (x|θi ))
K
θi ∈θ
1Y Y 
BelΘ (θ|x) = BelX (x̄|θi ) − BelX (x̄|θi ) ,
K
θi ∈θ̄ θi ∈Θ

Q
where K = 1 − θi ∈Θ (1 − plX (x|θi )). Formulas for unnormalised belief functions
are also provided.

4.3.3 Generalising total probability

A number of researchers have been working on the generalisation to belief functions


of a fundamental result of probability theory: the law of total probability or ‘Jeffrey’s
rule’.

Problem statement Suppose P is defined on a σ-algebra A, and that a new prob-


ability measure P 0 on a sub-algebra B of A. We seek an updated probability P 00
which:
– meets the probability values specified by P 0 for events in the sub-algebra B;
– is such that ∀ B ∈ B, X, Y ⊂ B, X, Y ∈ A
(
P (X)
P 00 (X) P (Y ) if P (Y ) > 0
00
=
P (Y ) 0 if P (Y ) = 0.

It can be proven that there is a unique solution to the above problem, given by
Jeffrey’s rule: X
P 00 (A) = P (A|B)P 0 (B).
B∈B

The most compelling interpretation of such a scenario if that the initial probability
measure stands corrected by the second one on a number of events (but not all).
Therefore, the law of total probability generalises standard conditioning, which is
just the special case in which P 0 (B) = 1 for some B and the sub-algebra B reduced
to a single event B.
130 4 Reasoning with belief functions

Spies’ solution Spies has proven the existence of a solution to the generalisation
of Jeffrey’s rule to belief functions, within his conditioning frameworks (Section
4.3.1). The problem generalises as follows.
Let Π = {B1 , ..., Bn } a disjoint partition of Ω, and:
– m1 , ..., mn are the mass assignments of a collection of conditional belief function
Bel1 , ..., Beln on B1 , ..., Bn respectively;
– mB is the mass of an unconditional belief function BelB on the coarsening asso-
ciated with the partition Π.
Then:
Proposition 24. The belief function Beltot : 2Ω → [0, 1] with
X
Beltot (A) = (mB ⊕ ⊕ni mBi ) (C)
C⊆A

is a marginal belief function on Ω, such that Beltot (.|Bi ) = Beli ∀i and the
marginalisation of Beltot to the partition Π coincides with BelB . Furthermore,
if all the belief functions involved are probabilities Beltot reduces to the result of
Jeffrey’s rule of total probability.
The bottom line of Proposition 24 is that by combining the a-priori with all the
conditionals we get an admissible marginal which generalised total probability.
Whether this is this the only admissible solution to the problem will be discussed
in Section ??.

Smets’ generalisations of Jeffrey’s rule Philippe Smets also proposed generalisa-


tions of Jeffrey’s rule based on geometric and Dempster’s conditioning, respectively.
Let B(A) the smallest element of B containing A (the upper approximation of
A in rough set theory, see Section ??), and let B(A) the set of As which share the
same B(A). We wish an overall BF Bel00 such that:
– Bel00 (B) = Bel0 (B) for all B ∈ B;
– be such that ∀ B ∈ B, X, Y ⊂ B, X, Y ∈ A
(
Bel(X|B)
Bel0 (X) Bel(Y |B) if Bel(Y |B) > 0
00
= (4.14)
Bel (Y ) 0 if Bel(Y |B) = 0

where Bel(.|B) is defined by either geometric conditioning or Dempster’s condi-


tioning.
In the former case we obtain Smets’ Jeffrey-geometric rule of conditioning,
which reads as, remembering that BelG (X|B) Bel(X)
BelG (y|B) = Bel(Y ) :

m(A) X
mJG (A) = P m0 (B(A)) ∀A ∈ A s.t. m(X) 6= 0;
X∈B(A) m(X) X∈B(A)
4.4 Efficient computation 131

mJG (A) = 0 if A is such that X∈B(A) m(X) = 0. Whenever m0 (B) = 1 for a


P
single B (Bel’ is “categorical” or “logical”), this reduces to geometric conditioning.
When plugging Dempster’s conditioning into (4.14) we get Jeffrey-Dempster’s
rule of conditioning:

m(A|B(A)) X
mJD (A) = P m0 (B(A)) ∀A ∈ A s.t. m(X|B(A)) 6= 0,
X∈B(A) m(X|B(A)) X∈B(A)

0 otherwise. Again, whenever m0 (B) = 1 for a single B this reduces to Dempster’s


conditioning.

4.4 Efficient computation


Belief functions are complex objects. Working with them in a naive way is computa-
tionally expensive, preventing their application to problems in which computational
complexity is crucial. All aspects of the decision/estimation process are affected, in
particular the evidence combination stage, but also decision making. The issue was
recognized since the early days of the theory of evidence. While some authors (e.g.
Voorbraak) proposed to tackle it by transforming belief functions into suitable prob-
ability measure prior to their aggregation, others worked on efficient approximate
implementation of the rule of combination, for instance by means of sophisticated
MonteCarlo algorithms [Moral96,Wilson91], while Shafer and Shenoy [Shenoy87]
designed a message-passing framework precursor to modern Bayesian networks.
A number of papers focussed on the practical implementation of Dempster’s
rule [300, 1453, 826], given the exponential complexity of its naive implementa-
tion (see [998] for a proof of the NP-completeness of the orthogonal sum). Shafer
and Logan [1171], for instance, proposed an algorithm for the implementation of
Dempster’s rule in the case of hierarchical evidence. Voorbraak [1358] too proposes
a computationally efficient approximation of the rule of combination by restricting
the admissible relations between the involved functions.
Orponen (1990) [999] proved that while the simple belief, plausibility, and com-
monality values Bel(A), P l(A), and Q(A) can be computed in polynomial time,
the problems of computing the combinations (Bel1 ⊕ · · · ⊕ Beln (A), (P l1 ⊕ · · · ⊕
P ln )(A), and (Q1 ⊕ · · · ⊕ Qn )(A) are #P-complete.
Interestingly, Srivastava [1293] developed in 2005 an alternative form of Demp-
ster’s rule of combination for binary variables, which provides a closed form ex-
pression for efficient computation.

Evidential reasoning using neural networks [1377] A method for using a neural
network to model the learning of evidential reasoning is presented. In the proposed
method, the belief function associated with a piece of evidence is represented as a
probability density function which can be in a continuous or discrete form. The neu-
rons are arranged as a roof-structured network which accepts the quantized belief
functions as inputs. The mutual dependency between two pieces of evidence is used
132 4 Reasoning with belief functions

as another input to the network. This framework can resolve the conflicts resulting
from either the mutual dependency among many pieces of evidence or the struc-
tural dependency due to the evidence combination order. Belief conjunction based
on the proposed method is presented, followed by an example demonstrating the
advantages of this method.

4.4.1 Approximation schemes

Probability and possibility transforms reduce the number of focal elements to store
to O(N ) by re-distributing the mass assignment of a belief function to size-1 sub-
sets or chains of subsets, respectively. An alternative approach to efficiency can be
sought by re-distributing all the mass to subsets of size up to k, obtaining a k-additive
belief function.
Some approaches to probability transformation explicitly aimed at reducing the
complexity of belief calculus. Tessem [1335], for instance, incorporated only the
highest-valued focal elements in his mklx approximation. A similar approach in-
spired the summarization technique formulated by Lowrance et al. [896]. SOME
DETAIL

4.4.2 Transformation approaches

One approach to efficient belief calculus that has been explored since the late Eight-
ies consists indeed on approximating belief functions by means of appropriate prob-
ability measures prior to combining them for making decisions. This is known as
the probabilistic transformation problem [e.g., Cobb03]. A number of distinct trans-
formations can and have been introduced, starting from Voorbraak’s plausibility
transform and Smets’ pignistic transform [Smets05], to more recent proposals by
Daniel, Sudano and others []. Different approximations appear to be aimed at dif-
ferent goals, besides that of reducing computational complexity.
In [69], Mathias Bauer reviews a number of approximation algorithms and de-
scribes and empirical study of the appropriateness of these procedures in decision-
making situations.

Probability transformation The relation between belief and probability in the the-
ory of evidence has been an important subject of study. Given a frame of discern-
ment Θ, let us denote by B the set of all belief functions on Θ, and by P the set of
all probability measures on Θ.
According to [311], we call a probability transform of belief functions an operator
pt : B → P, b 7→ pt[b] mapping belief measures onto probability measures, such
that b(x) ≤ pt[b](x) ≤ plb (x) = 1 − b({x}c ). Note that such definition requires
the probability which results from the transform to be compatible with the upper
and lower bounds the original b.f. b enforces on the singletons only, and not on all
the focal sets as in Equation (3.10). This is a minimal, sensible constraint which
does not require probability transforms to adhere to the upper-lower probability se-
mantics of belief functions. As a matter of fact, important such transforms are not
4.4 Efficient computation 133

compatible with such semantics.


A number of papers have been published on the issue of probability transform
[1402, 771, 69, 1498, 358, 363, 574]. Many of these proposals seek efficient im-
plementations of the rule of combination (see Section ??).
In Smets’ “Transferable Belief Model” [1231, 1276] all notions of multivalued
mapping are abandoned to define belief directly in terms of basis belief assignments
(“credal” level). Decisions are made by resorting to the pignistic probability
X mb (A)
BetP [b](x) = , (4.15)
|A|
A⊇{x}

generated by what he called the pignistic transform: BetP : B → P, b 7→ BetP [b].


Justified by means of a linearity axiom, the pignistic probability is the result of a
redistribution process in which the mass of each focal element A is re-assigned to
all its elements x ∈ A on an equal basis, and is perfectly compatible with the upper-
lower probability semantics of belief functions, as it is the center of mass of the
polytope (3.10) of consistent probabilities [168].
Originally developed by Voorbraak [1358] as a probabilistic approximation in-
tended to limit the computational cost of operating with belief functions in the
Dempster-Shafer framework, the plausibility transform [197] has later been sup-
ported by Cobb and Shenoy in virtue of its commutativity properties with respect to
Dempster’s sum. Even though initially defined in terms of commonality values, the
˜ : B → P, b 7→ pl[b]
plausibility transform pl ˜ maps each belief function b onto the
˜ ˜
probability distribution pl[b] = plb obtained by normalizing the plausibility values
plb (x)1 of the element of Θ:

˜ (x) = P plb (x)


pl . (4.16)
b
y∈Θ plb (y)

We call the output pl ˜ (4.16) of the plausibility transform relative plausibility of


b
singletons (r.pl.s.). Voorbraak proved that his (in our terminology) relative plausi-
bility of singletons pl˜ is a perfect representative of b when combined with other
b
probabilities p ∈ P through Dempster’s rule ⊕:
˜ ⊕ p = b ⊕ p ∀p ∈ P.
pl (4.17)
b

Dually, a relative belief transform b̃ : B → P, b 7→ b̃[b] mapping each belief


function to the corresponding relative belief of singletons (r.b.s.) b̃[b] = b̃ [242, 248,
573, 311]
b(x)
b̃(x) = P (4.18)
y∈Θ b(y)
can be defined. The notion of relative belief transform (under the name of normal-
ized belief of singletons) has first been proposed by Daniel [311]. Some initial anal-
yses of the relative belief transform and its close relationship with the (relative)
plausibility transform have been presented in [242, 248].
1
With a harmless abuse of notation we denote the values of b.f.s and pl.f.s on a singleton x
by mb (x), plb (x) instead of mb ({x}), plb ({x}).
134 4 Reasoning with belief functions

More recently, other proposals have been brought forward by Dezert et al. [391],
Burger [?] and Sudano [1314], based on redistribution processes similar to that of
the pignistic transform. More recently, two new Bayesian approximations of belief
functions have been derived by the author from purely geometric considerations
[267] in the context of the geometric approach to the ToE [244], in which belief and
probability measures are represented as points of a Cartesian space.

Possibility transformation Another way of reducing the computational complexity


of reasoning with belief functions consists on mapping the latter to the class of
“consonant” belief functions [Dubois90b], i.e., BFs whose focal elements are nested
(see Section ??). As consonant b.f.s only have N focal elements, where N is the
cardinality of the sample space, operating with them can be seen as an efficient way
of reasoning with belief functions.
Several partial orderings between belief functions have been introduced [1481,
410], in connection with the so-called “least commitment principle”. The latter plays
a similar role in the ToE as the principle of maximum entropy does in Bayesian
theory. It postulates that, given a set of b.p.a.s compatible with a set of constraints,
the most appropriate is the least informative (according to one of those orderings).
In particular, b.f.s admit the following order relation

b ≤ b0 ≡ ∀A ⊆ Θ b(A) ≤ b0 (A), (4.19)

called weak inclusion. It is then possible to introduce the notion of outer consonant
approximations [?] of a belief function b, i.e. those co.b.f.s such that ∀A ⊆ Θ
co(A) ≤ b(A) (or equivalently ∀A ⊆ Θ plco (A) ≥ plb (A)). In other words we seek
co.b.f.s which are less informative than b in the sense specified above.
Outer consonant approximations and their geometry are studied in detail in Chapter
??.
A completely different consonant transformation is proposed within Smets’
Transferable Belief Model [Dubois, Aregui].
Definition 42. The isopignistic” approximation of a belief function Bel : 2Ω →
[0, 1] is the unique consonant belief function whose pignistic probability BetP :
Ω → [0, 1] coincides with that of Bel .
Its contour function or plausibility of singletons is:
X n o
pliso (x) = min BetP (x), BetP (x0 ) ,
x0 ∈Θ

so that its mass assignment becomes:

miso (Ai ) = i · (BetP (xi ) − BetP (xi+1 )),

where {xi } = Ai \ Ai−1 . The isopignistic approximation is used by Milan in his


effort to decompose belief functions into a conflicting and a non-conflicting part.
4.4 Efficient computation 135

4.4.3 Monte-Carlo approaches


To overcome the issues with the exponential complexity of Dempster’s combination
Wilson has proposed a Monte-Carlo algorithm which “simulates” the random set
interpretation of belief functions: Bel(A) = P (Γ (c) ⊆ A|Γ (c) 6= ∅).
Suppose that we seek Bel = Bel1 ⊕ ... ⊕ Belm on Ω, where the various pieces
of evidence are induced by probability distributions Pi on Ci via multi-valued map-
pings Γi : Ci → 2Ω . We can then formulate the following simple Monte-Carlo
approach (Algorithm 4.4.3)

Algorithm 1 A simple Monte-Carlo algorithm for Dempster’s combination


for a large number of trials n = 1 : N do
randomly pick c ∈ C such that Γ (c) 6= ∅
for i = 1 : m do
randomly pick an element ci of Ci with probability Pi (ci )
end for
let c = (c1 , ..., cm )
if Γ (c) = ∅ then
restart trial
end if
if Γ (c) ⊆ A then
trial succeeds, T = 1
end if
end for

in which codes are randomly sampled from the “source” probability space, and
the number of times their image implies A ⊆ Ω is counted to provide an estimator
for the desired combination.
The proportion of trials which succeed converges to Bel(A): E[T̄ ] = Bel(A),
1
V ar[T̄ ] ≤ 4N . We say that the algorithm has accuracy k if 3σ[T̄ ] ≤ k. Picking
c ∈ C involves m random numers so it takes A · m, A constant. Testing if xj ∈ Γ (c)
takes less then Bm, constant B. Therefore, the expected time of the algorithm is:
N
m · (A + B|Ω|)
1−κ
where κ is Shafer’s conflict measure (??). The expected time to achieve accuracy k
9
turns out to be 4(1−κ)κ 2 m · (A + C|Ω|) for constant C, better in the case of simple

support functions.
In conclusion, unless κ is close to 1 (highly conflicting evidence), Dempster’s com-
bination is feasible for large values of m (the number of belief functions to combine)
and large cardinality of the hypothesis space Ω.
An improved version of the algorithm was proposed by Wilson and Moral for the
case in which trials are not independent but form a Markov chain (Markov-Chain-
Monte-Carlo). This is based on a non-deterministic operator OP ERAT IONi
which changes at most the i-th coordinate c0 (i) of a code c0 to y, with chance Pi (y):
136 4 Reasoning with belief functions

P r(OP ERAT IONi (c0 ) = c) ∝ Pi (c(i)) if c(i) = c0 (i), 0 otherwise.


The MCMC algorithm illustrated below returns a value BELN (c0 ) which is the
proportion of time in which Γ (cc ) ⊆ X.

Algorithm 2 A Markov-Chain-Monte-Carlo algorithm for Dempster’s combination


cc = c0
S=0
for n = 1 : N do
for i = 1 : m do
cc = OP ERAT IONi (cc )
if Γ (cc ) ⊆ X then
S =S+1
end if
end for
end for
return NSm

The following result can be proven.


Theorem 2. If C is connected (i.e., any c, c0 are linked by a chain of OP ERAT IONi )
then given , δ there exist K 0 , N 0 s.t. for all K ≥ K 0 and N ≥ N 0 and c0 :
P r(|BELN
K (c0 )| < ) ≥ 1 − δ

where BELN
K (c0 ) is the output of Algorithm 4.4.3
A further step based on importance sampling, in which we pick samples c1 , ..., cN
according to an “easy to handle” probability distribution P ∗ , was later proposed.
Assign to each sample a weight wi = PP∗(c) ∗
(c) . If P (c) > 0 implies P (c) > 0
P
i wi
then the average Γ (c N)⊆X is an unbiased estimator of Bel(X). Obviously we
want to try to use P ∗P
as close as possible to the real one. In [] strategies are proposed
to compute P (C) = c P (c).
Resconi et al. achieved a speed-up of the Monte-Carlo method by using a physi-
cal model of the belief measure as defined in the ToE. Conversely, in [786] Kramosil
adapted the Monte-Carlo estimation method to belief functions.

4.4.4 Local propagation


The complexity of Dempster’s rule of computation is inherently exponential, due
to the necessity of considering all the possible subsets of a frame. In fact, Orponen
([998]) proved that the problem of computing the orthogonal sum of a finite set
of belief functions is N P-complete. In response, a number of local computation
schemes have been proposed to tackle this issue. Here we will review some of the
most relevant proposals, including: Barnett’s computational scheme; Gordon and
Shortliffe’s hierarchical evidence organised in diagnostic trees; Shafer and Logan’s
hierarchical evidence approach; the Shafer-Shenoy architecture.
4.4 Efficient computation 137

Barnett’s computational scheme In Barnett’s scheme computations are linear in


the size of Ω if all the belief functions to combine are simple support functions
focused on singletons or their complements. Recall that a simple support function
focussed on A ⊆ Ω is a b.f. whose focal elements only include A and Ω.
Now, assume that we have a belief function Belω with as focal elements only
{ω, ω̄, Ω} for all singletons ω ∈ Ω, and we want to combine all {Belω , ω ∈ Ω}.
The approximation scheme uses the fact that the plausibility of the
P combined belief
function is a function of the input b.f.s’ commonalities Q(A) = B⊇A m(B):
X Y
P l(A) = (−1)|B|+1 Qω (B).
B⊆A,B6=∅ ω∈Ω

After a few passages we get that


!
X Belω (ω) Y Belω (ω̄)
P l(A) = K 1+ − .
1 − Belω (ω) 1 − Belω (ω)
ω∈A ω∈A

A similar result holds when the belief function to combine are dychotomic on ele-
ments of a coarsening of Ω.
The computation of a specific plausibility value P l(A) is therefore linear in the
size of Ω (as only elements of A and not its subsets are involved). However, the
number of events A themselves is still exponential - this is address by later authors.
Gordon and Shortliffe’s diagnostic trees Gordon and Shortliffe are interested
in computing degrees of belief only for events forming a hierarchy (diagnostic tree,
see Figure 4.3). This is motivated by the fact that in some applications certain events
are not relevant, e.g. certain classes of diseases in medical diagnosis. Their scheme

Fig. 4.3. An example of Gordon and Shortliffe’s diagnostic tree, from [].

combines simple support functions focused on or against the nodes of the tree, and
138 4 Reasoning with belief functions

produces good approximations unless evidence is highly conflicting (see Monte-


Carlo methods). However, intersection of complements may generate focal elements
not associated with nodes in the tree.
The approximated algorithm can be summarised as follows:
1. first we combine all simple functions focussing on the node events (by Demp-
ster’s rule);
2. then, we successively (working down the tree) combine those focused on the
complements of the nodes;
3. whilst we do that, we replace each intersection of focal elements with the small-
est node in the tree that contains it.
Obviously the result depends on the order followed to implement the series of com-
binations of phase 2. There is no guarantee about the quality of the resulting ap-
proximation, and no degrees of belief are assigned to complements of node events.
As a consequence, in particular, we cannot even compute the plausibilities of node
events.

Shafer and Logan’s hierarchical evidence In response, Shafer and Logan pro-
posed an exact implementation of linear complexity for the combinition of hierar-
chical evidence of a more general type. Indeed, although evidence in their scheme
is still focussed on nodes of a tree, it produces degrees of belief for a wider collec-
tion of hypotheses, including the plausibility values of the node events. The scheme
operates on local families of hypotheses, formed by a node and its children.
Namely, suppose again that we have a dichotomous belief function for every

non-terminal node A. Let ϑ be the set of non-terminal nodes; let BelA = ⊕{BelB :

B < A} (the vacuos b.f. when A is a terminal node); let BelA = ⊕{BelB : B 6<
A, B 6= A} (vacuos if A = Θ is the root node), and define:
L ↓ U ↑
BelA = BelA ⊕ BelA , BelA = BelA ⊕ BelA .

The goal of the scheme is to compute BelT = ⊕{BelA : A ∈ ϑ} (note that this is
↓ ↑
equal to BelA ⊕ BelA ⊕ BelA for any node A).

Algorithm 3 Hierarchical evidence - Stage 1 (up the tree)


L L
let {A} ∪ SA a family for which BelB (B), BelB (B̄) are available for every B ∈ SA
↓ L
compute BelA (A) = ⊕{BelB : B ∈ SA }(A)

same for BelA (Ā)
L ↓
compute BelA (A) = (BelA ⊕ BelA )(A)
L
same for BelA (Ā)

Barnett’s technique can be applied to (1) to further improve efficiency. The al-
gorithm starts from terminal nodes and works its way up the tree until we get:
L L
BelA (A), BelA (Ā) ∀A ∈ ϑ.
4.4 Efficient computation 139

Note that there is no need to apply Stage 1 to the root node Ω.


L L
Now, let {A} ∪ SA a family for which BelB (B), BelB (B̄) for every B ∈ SA
U U
but also BelA (A), BelA (Ā) are available.

Algorithm 4 Hierarchical evidence - Stage 2 (down the tree)


for each B ∈ SA do  
↓ U L
compute BelB (B) = BelA ⊕ {BelC : C ∈ SA , C 6= B} (B)

same for BelB (B̄)
U ↑
compute BelB (B) = (BelB ⊕ BelB )(B)
U
same for BelB (B̄)
end for

In stage 2 we start from the family whose parent is Ω andwe work our way down
the tree until we obtain:
↑ ↑
BelA (A), BelA (Ā) ∀A ∈ ϑ.

There is no need to apply Stage 2 to terminal nodes.


Finally, in Stage 3 (computing total beliefs) for each node A we compute:

BelT (A) = (BelA L
⊕ BelA )(A),

similarly for BelT (Ā). Throughout the algorithm, at each node A of the tree twelve
belief values need to be stored.

Shafer-Shenoy architecture on qualitative Markov trees In their 1987’s work


([1173]), Shafer, Shenoy and Mellouli faced the issue of avoiding the computational
complexity of the rule of combination, by posing the problem in the lattice of parti-
tions of a fixed overall frame of discernment. Different questions where represented
as different partitions of this frame, and their relations are represented by relations
of qualitative conditional independence or dependence among the partitions. They
showed that efficient implementation of Dempster’s rule is possible if the ques-
tions are arranged in a qualitative Markov tree, which generalises both diagnostic
trees and causal trees (Pearl), by propagating belief functions through the tree. Their
propagation scheme extends Pearl’s belief propagation idea to belief functions.

Qualitative Conditional Independence


Definition 43. A collection of partitions Ψ1 , ..., Ψn of a frame are qualitatively con-
ditionally independent (QCI) given the partition Ψ if

P ∩ P1 ∩ ... ∩ Pn 6= ∅

whenever P ∈ Ψ , Pi ∈ Ψi and P ∩ Pi 6= ∅ for all i.


140 4 Reasoning with belief functions

For example: {θ1 } × {θ2 } × Θ3 and Θ1 × {θ2 } × {θ3 } are QCI on Θ1 × Θ2 × Θ3


given Θ1 × {θ2 } × Θ3 for all θi ∈ Θi . This definition of independence does not
involve probability, but only logical independence - nevertheless, stochastic condi-
tional independence does imply QCI.
One can prove that if two BFs Bel1 and Bel2 are carried by partitions Ψ1 , Ψ2
which are QCI given Ψ then:

(Bel1 ⊕ Bel2 )Ψ = (Bel1 )Ψ ⊕ (Bel2 )Ψ ,

i.e., ...

Qualitative Markov trees Given a tree, deleting a node and all incident edges yields
a forest. Let us denote the collection of nodes of the j-th subtree by αm (j).
Definition 44. A qualitative Markov tree (QMT) is a tree of partitions such that for
every node i the minimal refinements of partitions in αm (j) for j = 1, ..., k are QCI
given Ψi .
A Bayesian causal tree becomes a qualitative Markov tree whenever we asso-
ciate each node B with the partition ΨB associated with the random variable vB .
A QMT remains such if we insert between parent and child a node associated with
their common refinement. Qualitative Markov trees can also be constructed from
diagnostic trees (see ?? for an example extracted from []) - the same interpolation
property holds in this case as well.

Propagation Assume now that each belief function to combine is carried by a par-
tition (node) in a qualitative Markov tree. The bottom line of Shenoy and Shafer’s
propagation scheme is to replace Dempster’s combination over the whole frame Ω
with multiple implementations over the partitions associated with the nodes of a
QMT. In a message-passing style, a “processor” located at each node Ψi combines
belief functions using Ψi as a frame and projects b.f.s to its neighbours.
The operations performed by each processor node can be summarised as follows
(see Figure 4.4.4).
1. it sends Beli to its neighbours;
2. whenever the node receives a new input, it computes

(BelT )Ψi ← (⊕{(Belx )Ψi : x ∈ N (i)} ⊕ Beli )Ψi

3. it computes

Beli,y ← (⊕{(Belx )Ψi : x ∈ N (i) \ {y}} ⊕ Beli )Ψy

for each of its neighbours y ∈ N (i), and sends the result to y.


Note that inputting new b.f.s in the tree can take place asynchronously - the final
result at each local processor is the coarsening to that partition of the combination
of all the inputted belief functions: (⊕j∈J Belj )Ψi .
The total time to reach equilibrium is proportional to the tree’s diameter.
4.4 Efficient computation 141

Fig. 4.4. Left: a qualitative Markov tree constructed from a diagnostic tree. Right: graphical
representation of a local processor’s operations in the Shenoy-Shafer architecture.

Fast division architecture Markov trees and clique trees are the alternative rep-
resentations of valuation networks and belief networks that are used by local com-
putation techniques for efficient reasoning ([1452]). Bissig, Kohlas and Lehmann
propose an architecture called Fast-Division architecture ([103]) for Dempster’s
rule computation, that has the advantage, with respect to the Shenoy-Shafer and the
Lauritzen-Spiegelhalter architectures, of guaranteeing the intermediate results to be
belief functions. Each of them has a Markov tree as the underlying computational
structure.
Cano’s directed acyclic networks When the evidence is ordered in a complete
direct acyclic graph it is possible to formulate algorithms with lower computational
complexity ([85]).
Shafer and Shenoy’s valuation networks
Ordered valuation algebras Haenni (2003) [576] brought forward a generic ap-
proach of approximating inference based on the concept of valuation algebras. Con-
venient resource-bounded anytime algorithms are presented, in which the maximal
computation time is determined by the user.

4.4.5 Graphical models


Local propagation models later developed into graphical models for reasoning with
conditional belief functions, including: uncertainty propagation in directed acyclic
142 4 Reasoning with belief functions

networks (Cano et al); evidential networks with conditional belief functions (Xu
and Smets); a graphical representation of valuation-based systems (VBS), called
valuation networks (Shenoy); and Ben Yaghlane and Mellouli’s Directed Evidential
Networks.
Evidential networks with conditional belief functions In [1456], Xu and Smets
used conditional belief functions (a la Dempster) to represent relations between
variables in evidential networks, and presented a propagation algorithm for such
networks. ENCs contain a directed acyclic graph with conditional beliefs defined
in a different manner from conditional probabilities in Bayesian networks (BNs), as
edges represent the existence of a conditional belief function, while no form of inde-
pendence assumed. Also, ENC were initially defined only for binary (conditional)
relationships.
Directed Evidential Networks Ben Yaghlane and Mellouli later generalised ENCs
to any number of nodes, proposing their Directed Evidential Network (DEVNs).
These are directed acyclic graphs (DAGs) in which directed arcs describe the con-
ditional dependence relations expressed by conditional BFs for each node given its
parents. New observations introduced in the network are represented by belief func-
tions allocated to some nodes.
Given n BFs Bel1 , ..., Beln over X1 , ..., Xn , the goal if to compute the marginal
on Xi of their joint belief function. DENs use the generalised Bayesian theorem
(GBT, see Section 4.3.2) to compute the posterior Bel(x|y) given the conditional
Bel(y|x). The marginal is computed for each node by combining all the messages
received from its neighbors and its own prior belief:
BelX = Bel0X ⊕ BelY →X , BelY →X (x) = y⊆ΘY m0 (y)Bel(x|y)
P

where Bel(x|y) is given by GBT, in another application of the message-passing


idea to belief functions.
Ben Yaghlane et al also proposed a simplified scheme for simply directed net-
works, and an extension to DEVNs by first transforming them to binary join trees.

4.5 Decision making


Decision making in the presence of partial evidence and subjective assessment is
maybe the original rationale for the development of the theory of evidence. Con-
sequently, decision making with belief functions has been studied throughout the
last three decades, originating a number of different approaches to the problem. A
fairly recent discussion on the meaning of belief functions in the context of decision
making can be found in [1259].
A decision problem can be formalized by defining: a set Ω of possible states of
the world; a set X of consequences; and a set F of acts, where an act is a function
f : Ω → X mapping a world state to a consequence.
In [1365] P. Wakker shows the central role that the so-called ‘principle of com-
plete ignorance’ plays in the evidential approach to decision problems.
4.5 Decision making 143

4.5.1 Based on expected utilities

Let < be a preference relation on F, such that f < g means that f is at least as
desirable as g. Savage (1954) showed that < verifies a number of sensible rationality
requirements iff there exists a probability measure P on Ω and a utility function
u : X → R such that:

∀f, g ∈ F, f < g ⇔ EP (u ◦ f ) ≥ EP (u ◦ g)

where EP denotes the expectation w.r.t. P . Also, P and u are unique up to a posi-
tive affine transformation. Does such a result imply that basing decisions on belief
functions is irrational?
The answer is no, and indeed several authors have proposed decision making
frameworks under belief function uncertainty based on (generalisations of) utility
theory.

Strat’s decision framework Perhaps the first one who noted the lack in Shafer’s
theory of belief functions of a formal procedure for making decision was Strat in
[1305]. He proposed a simple assumption that disambiguates decision problems rep-
resented as b.f.s, maintaining the separation between evidence carrying information
about the decision problem, and assumptions that has to be made to disambiguate
the choices. He also showed how to generalize the methodology for decision analy-
sis employed in probabilistic reasoning to the use of belief functions, allowing their
use within the framework of decision trees.
Strat’s decision apparatus is based on computing intervals of expected values,
and assumes that the decision frame Ω is itself a set of scalar values (e.g. dollar
values, see Figure 4.5). In other words, it does not distinguish between utilities and
elements of Ω (returns), so that an interval of expected values can be computed:
E(Ω) = [E∗ (Ω), E ∗ (Ω)], where
. X . X
E∗ (Ω) = inf(A)m(A), E ∗ (Ω) = sup(A)m(A).
A⊆Ω A⊆Ω

He argues that this is not good enough to make a decision - for instance, should we
pay a 6$ ticket when the expected interval is [5$, 8$]?
In response, Strat identifies the probability ρ that the value assigned to the hidden
sector is the one the player would choose (1 − ρ is the probability that the sector is
chosen by the carnival hawker). Then [1305]:
Proposition 25. The expected value of the mass function of the wheel is E(Ω) =
E∗ (Ω) + ρ(E ∗ (Ω) − E∗ (Ω)).
To decide whether to play the game we only need to assess ρ. Basically, this amounts
to a specific probability transform (like the pignistic one) - Lesh, 1986 had also
proposed a similar approach.
Schubert [1121] subsequently studied the influence of the ρ parameter in Strat’s
decision apparatus.
144 4 Reasoning with belief functions

Fig. 4.5. Strat’s cloaked carnival wheel.

Decision making in the TBM In the TBM, decision making is done by maximising
the expected utility of actions based on the pignistic transform. The set of possible
actions F and the set Ω of possible outcomes are distinct, and the utility function
is defined on F × Ω. In [] Smets proved the necessity of the pignistic transform by
maximizing the expected utility:
X
E[u] = u(f, ω)P ign(ω)
ω∈Ω

Elouedi, Smets et al. ([460], [459]) adapted the decision tree technique to the
presence of uncertainty about the class value, that is represented by a belief function.
A decision system based on the Transferable Belief Model was developed [1462]
and applied to a waste disposal problem by Xu et al.
Both Xu and Yang [1506] propose a decision calculus in the framework of val-
uation based systems [1459] and show that decision problems can be solved using
local computations.
A classical example of how Knightian uncertainty empirically affects human
decision making is provided by Ellsberg’s paradox [].

Gilboa’s Choquet integral Gilboa (1987) proposed a modification of Savage’s ax-


ioms with, in particular, a weaker form of Axiom 2. As a consequence, a preference
relation < meets these weaker requirements iff there exists a (non necessarily addi-
tive) measure µ and a utility function u : X → R such that :

∀f, g ∈ F, f < g ⇔ Cµ (u ◦ f ) ≥ Cµ (u ◦ g),

where Cµ is the Choquet integral, defined for X : Ω → R as


4.5 Decision making 145
Z +∞ Z 0
Cµ (X) = µ(X(ω) ≥ t)dt + [µ(X(ω) ≥ t) − 1]dt. (4.20)
0 −∞

Given a belief function Bel on Ω and a utility function u, this theorem supports
making decisions based on the Choquet integral of u with respect to Bel or P l.

Upper and lower expected utilities For finite Ω, it can be shown that:
X
CBel (u ◦ f ) = m(B) min u(f (ω)),
ω∈B
B⊆Ω
X
CP l (u ◦ f ) = m(B) max u(f (ω)).
ω∈B
B⊆Ω

Let P(Bel) as usual be the set of probability measures P compatible with Bel, i.e.,
such that Bel ≤ P . Then, it follows that:

CBel (u ◦ f ) = min EP (u ◦ f ) = E(u ◦ f ),


P ∈P(Bel)
CP l (u ◦ f ) = max EP (u ◦ f ) = E(u ◦ f ).
P ∈P(Bel)

Decision criteria For each act f we have two expected utilities E(f ) and E(f ).
How do we make a decision? Various decision criteria can be formulated, based on
interval dominance:
1. f < g iff E(u ◦ f ) ≥ E(u ◦ g) (conservative strategy);
2. f < g iff E(u ◦ f ) ≥ E(u ◦ g) (pessimistic strategy);
3. f < g iff E(u ◦ f ) ≥ E(u ◦ g) (optimistic strategy);
4. f < g iff

αE(u ◦ f ) + (1 − α)E(u ◦ f ) ≥ αE(u ◦ g) + (1 − α)E(u ◦ g)

for some α ∈ [0, 1] called a pessimism index (Hurwicz criterion).


The conservative strategy only yields a partial preorder: f and g are not comparable
if E(u ◦ f ) < E(u ◦ g) and E(u ◦ g) < E(u ◦ f ).
Going back to the Ellesberg’s paradox, the evidence naturally translates into a
belief function, with m({R}) = 1/3 and m({B, Y }) = 2/3. We can then compute
lower and upper expected utilities for each action:
R B Y E(u ◦ f ) E(u ◦ f )
f1 100 0 0 u(100)/3 u(100)/3
f2 0 100 0 u(0) u(200)/3
f3 100 0 100 u(100)/3 u(100)
f4 0 100 100 u(200)/3 u(200)/3
The observed behavior (f1 < f2 and f4 < f3 ) is then explained by the pessimistic
strategy.

Other utility-based decision rules


146 4 Reasoning with belief functions

Expected utility interval decision rule In [10] a non-ad hoc decision rule based on
the expected utility interval is proposed. The authors study the effect of redistribut-
ing the confidence levels after getting rid of propositions to reduce computational
complexity. The eliminated confidence levels can in particular be assigned to igno-
rance, or uniformly added to the remaining propositions and to ignorance.

4.5.2 Multicriteria decision making

A work of Beynon et al. [92] explores the potentiality of the theory of evidence as
an alternative approach to multicriteria decision modeling.

4.5.3 Other approaches

A number of decision rules not based on the application of utility theory to the
result of a probability transform have also been proposed, for instance by Troffaes.
Most of those proposals are based on order relations between uncertainty measures
[Denoeux], in particular the least commitment principle, the analogous of maximum
entropy in belief function theory. DETAILS FROM TROFFAES

4.6 Continuous formulations

Since the late Seventies, the need of a general formulation of the theory of evi-
dence to continuous domains has been recognized. Indeed, the original formulation
of the theory of evidence summarized in Chapter 2 was inherently linked to finite
frames of discernment. Numerous proposals have been brought forward since in or-
der to extend the notion of belief function to infinite hypothesis sets. Among them,
Shafer’s allocations of probability [], Nguyen’s random sets [], Strat and Smets’ ran-
dom closed intervals [], XXX’s generalised evidence theory [] and Kroupa’s belief
functions on MV algebras [].

4.6.1 Shafer’s allocations of probabilities

The first attempt (1979) is due to Shafer himself, and goes under the name of al-
locations of probabilities ([1152]). Shafer proved that every belief function can be
represented as an allocation of probability, i.e. a ∩-homomorphism into a positive
and completely additive probability algebra, deduced from the integral represen-
tation due to Choquet. For every belief function Bel defined on a class of events
E ⊆ 2Ω there exists a complete Boolean algebra M, a positive measure µ and an
allocation of probability ρ between E and M such that Bel = µ ◦ ρ.
Two regularity conditions for a belief function over an infinite domain are con-
sidered: continuity and condensability .
Canonical continuous extensions of belief functions defined on “multiplicative
subclasses” E to an arbitrary power set can then be introduced by allocation of
4.6 Continuous formulations 147

probability. Canonical extensions satisfy Shafer’s notion of belief function definable


on infinitely many compatible frames, and show significant resemblance with the
notions of inner measure and extension of capacities.

Continuity and condensability


Definition 45. A collection E ⊂ 2Θ is a multiplicative subclass of 2Θ if A ∩ B ∈ E
for all A, B ∈ E.
A function Bel : E → [0, 1] such that Bel(∅) = 0, Bel(Θ) = 1 and Bel is
monotone of order ∞ is a belief function.
Equally, an upper probability (plausibility) function is alternating of order ∞ (i.e.,
≥ is exchanged with ≤).
Definition 46. A belief function on 2Θ is continuous if

Bel(∩i Ai ) = lim Bel(Ai )


i→∞

for every decreasing sequence of sets Ai s. A b.f. on a multiplicative subclass E is


continuous if it can be extended to a continuous one on 2Θ .
According to Shafer, continuity arises from partial beliefs on ‘objective’ probabili-
ties.
Definition 47. A belief function on 2Θ is condensable if

Bel(∩A) = inf Bel(A)


A∈A

for every downward net8 A in 2Θ . A b.f. on a multiplicative subclass E is condens-


able if it can be extended to a condensable one on 2Θ .
Condensability is related to Dempster’s rule, as the property is required whenever
we have an infinite number of belief functions to combine.

Choquet’s integral representation Choquet’s integral representation implies that


every belief function can be represented by allocation of probability.
Definition 48. A function r : E → F is a ∩-homomorphism if it preserves ∩.
Choquet’s theorem links belief functions to probability spaces via ∩-homomorphisms.
Theorem 3. For every belief function Bel on a multiplicative subclass E of 2Θ ,
there exist a set X and an algebra F of its subsets, a finitely additive probability
measure µ on F, and a ∩-homomorphism r : E → F such that Bel = µ ◦ r.
If we replace the measure space (X , F, µ) with a probability algebra (a complete
Boolean algebra M with a completely additive prob measure µ) we get Shafer’s
allocation of probability.
8
A downward net is such that given two elements there is always an element subset of their
intersection.
148 4 Reasoning with belief functions

Theorem 4. For every belief function Bel on a multiplicative subclass E of 2Θ ,


there exists an allocation of probability ρ : E → M such that Bel = µ ◦ ρ.
Non-zero elements of M can then be thought of as focal elements.
This approach was later reviewed by Jurg Kohlas ([747]), who conducted an
algebraic study of argumentation systems[749, 748] as methods for defining numer-
ical degrees of support of hypotheses, by means of allocation of probability.
Canonical extension
Theorem 5. A belief function on a multiplicative subclass E can always be extended
to a belief function on 2Θ by canonical extension:
. Xn o
Bel(A) = sup (−1)|I|+1 Bel(∩i∈I Ai )|∅ =
6 I ⊂ {1, ..., n}
n≥1,A1 ,...,An ∈E

Indeed there are many such extensions, of which Bel is the minimal one.
The proof is based on the existence of an allocation for the desired extension.
Note the similarity with the superadditivity axiom - the notion is also related to
that of inner measure (Section 3.1.3), which provides approximate belief values for
subsets outside the initial sigma-algebra.
What about evidence combination? The condensability property ensures that
the Boolean algebra M represents intersection properly for arbitrary (not just finite)
collections B of subsets:
^
ρ(∩B) = ρ(B) ∀B ⊂ 2Ω ,
B∈B

allowing us to imagine Dempster’s combinations of infinitely many belief functions.

4.6.2 Random sets


Possibly the most elegant formalism in which to formulate a continuous version of
the theory of belief functions is the theory of random sets [Matheron75], i.e., prob-
ability measures over power sets, of which traditional belief functions are indeed a
special case [Nguyen78]. The notion of condensability has been studied by Nguyen
for upper probabilities generated by random sets too [Nguyen 1978].
A serious obstacle, however, is the formulation of aggregation operators for ran-
dom sets. Shafer’s allocations of probability and Nguyen’s random set interpreta-
tions did not much mention combination rules at the time; 30 years have passed
without concrete steps towards such a goal.
However, for finite random sets (i.e. with a finite number of focal elements),
under independence of variables Dempster’s rule can still be applied, namely:
n o
(F, m) = Ai1 ,...,id = ×dj=1 Aij , mi1 ,...,id = mi1 · · · · · mid .

For dependent sources Fetz and Oberguggenberger have proposed an “unknown


interaction” model, while for infinite random sets Alvarez (see Section 5.8) an in-
triguing Monte-Carlo sampling method.
4.6 Continuous formulations 149

4.6.3 Belief functions on random Borel intervals

Almost at the same time, Strat [Strat84] and Smets had the idea of making the prob-
lem tractable via the standard methods of calculus by allowing only focal elements
which are closed intervals of the real line.

Fig. 4.6. Strat’s representation of belief functions on intervals - finite case (from [?]). Left:
frame of discernment for unit-length sub-intervals of [0, 4]. Right: how to compute belief and
plausibility values for a sub-interval [a, b] (Strat’s notation).

Strat’s initial idea is very simple. Take a real interval I and split it into N bits.
Define as frame of discernment the set of possible intervals with such extreme
points: [0, 1), [0, 2), [1, 4], etcetera. A belief function there has therefore ∼ N 2 /2
possible focal elements, so that its mass function lives on a discrete triangle (see
Figure 4.6-left), and one can compute belief and plausibility values simply by inte-
gration (right).
This idea trivially generalises to all arbitrary intervals of I as in Figure 4.7 Belief

Fig. 4.7. Strat’s representation in the case of arbitrary sub-intervals [a, b].

and plausibility measures assume in this case an integral form:


RbRb RbRN
Bel([a, b]) = a x m(x, y)dydx, P l([a, b]) = 0 max(a,x) m(x, y)dydx
150 4 Reasoning with belief functions

while Dempster’s rule generalises as:

1 a N
Z Z
Bel1 ⊕ Bel2 ([a, b]) = m1 (x, b)m2 (a, y) + m2 (x, b)m1 (a, y)+
K 0 b 
+m1 (a, b)m2 (x, y) + m2 (a, b)m1 (x, y) dydx

A pretty much identical approach is followed by Smets. It allows us to define a


continuous pignistic PDF as:
Z a Z 1
. m(x, y)
Bet(a) = lim dx dy,
→0 0 a+ y − x

and can be easily extended to the real line, by considering belief functions defined
on the Borel σ-algebra of subsets of R generated by the collection I of closed
intervals. The theory provides also a way of building a continuous belief function
from a pignistic density, by applying the least commitment principle and assuming
unimodal pignistic PDFs, namely:
dBet(s)
Bel(s) = −(s − s̄) ,
ds
where s̄ is such that Bet(s) = Bet(s̄). For example, a normally distributed pignistic
function Bet(x) = N (x, µ, σ) generates a contonuous belief functions of the form
2
Bel(y) = √2y 2π
e−y , where y = (x − µ)/σ.

(C,A,P)  

V(c)  
c   Γ  

U(c)  

Fig. 4.8. Notion of random closed interval.

Formally, let (U, V ) be a two-dimensional random variable from (C, A, P ) to


(R2 , B(R2 )) such that P (U ≤ V ) = 1 and Γ (c) = [U (c), V (c)] ⊆ R (see Figure
4.8). This setting defines a random closed interval, which induces a belief function
on (R, B(R)) defined by:

Bel(A) = P ([U, V ] ⊆ A), ∀A ∈ B(R).

Special cases of random closed interval include, for instance:


4.6 Continuous formulations 151

– a fuzzy set onto the real line induces a mapping to a collection of nested intervals,
parameterised by the level c (Figure 4.9-left);
– a p-box, i.e, a pair of upper and lower bounds to a cumulative distribution function
(see Section 5.8) also induces a family of intervals (Figure 4.9-right).

Consonant random interval p-box


π(x)  
1   F*  
1  

F*  
Γ(c)   Γ(c)  
c   c  

x   x  
0   0  
U(c)   V(c)   U(c)   V(c)  

Fig. 4.9. Examples random closed interval.

The approach based on Borel sets of real line has seemed to prove more fer-
tile than more general approaches such as random sets or allocations of probabil-
ity. Generalizations of combination and conditioning rules follow quite naturally
[Smets]; inference with predictive belief function on real numbers have been pro-
posed [Denoeux]; the calculation of the pignistic probability for continuous b.f.s is
straightforward, allowing TBM-style decision making with continuous BFs.
An interesting open problem within the Borel formulation of continuous b.f.s is
therefore the generalization of other probability transforms to the continuous case.
The extension of the author’s geometric approach to random closed interval has
been recently initiated by Kroupa et al [Kroupa10].

4.6.4 Kramosil’s belief function on infinite spaces

Within the compatibility relation interpretation (sec : genesis−dempster), IvanKramosilhasdevelopedatheoryof be

4.6.5 MV algebras

A new, interesting approach studies belief functions in a more general setting than
that of a Boolean algebras of events, inspired by generalization of classical prob-
ability towards “many-valued” events, such as those resulting from formulas in
Lukasiewicz infinite-valued logic.
152 4 Reasoning with belief functions

Definition of MV algebra Indeed, an algebra of such many-valued events is called


an MV algebra, upon which upper/lower probabilities and possibility measures can
be defined.

Definition 49. An MV algebra is an algebra hM, ⊕, 6=, 0i with a binary operation


⊕, a unary operation 6= and a constant 0 such that hM, ⊕, 0i is an abelian monoid
and the following equations hold true for every f, g ∈ M :

¬¬f = f, f ⊕ ¬0 = ¬0, ¬(¬f ⊕ g) ⊕ g = ¬(¬g ⊕ f ) ⊕ f

Building on these base operators, one can also define:

1 = ¬0 f g = ¬(¬f ⊕ ¬g), f ≤ g if ¬f ⊕ g = 1.

If, in addition, we introduce two inf and sup operators as follows:

f ∨ g = ¬(¬f ⊕ g) ⊕ g f ∧ g = ¬(¬f ∨ ¬g)

we make hM, ∨, ∧, 0, 1i a distributive lattice.


For example, the so-called standard MV algebra is the real interval [0, 1]
equipped with

f ⊕ g = min(1, f + g), ¬f = 1 − f, f g = max(0, f + g − 1).

In this case and ⊕ are known as Lukasiewicz t-norm and t-conorm.


Boolean algebras are also a special case, with ⊕, and ¬ are union, intersection
and complement. As we will see, a totally monotone function Bel : M → [0, 1] can
be defined on MV algebra, by replacing ∪ with ∨ and ⊂ with ≤.

States as generalisations of finite probabilities Semisimple algebras are those MV


algebras which are isomorphic to continuous functions onto [0, 1] on some compact
Hausdorff space - these can be view as many-valued counterparts of algebras of sets,
and are quite related to the imprecise-probabilistic notion of gamble (see Definition
56).

Definition 50. A state is a mapping s : M → [0, 1] such that s(1) = 1 and s(f +
g) = s(f ) + s(g) whenever f g = 0.

Clearly states are generalisations of finitely additive prob measure. In addition,


states on semisimple MV algebras are integrals of a Borel probability measure on
the Hausdorff space: ∀f ∈ M
Z
s(f ) = f dµ.
4.7 A toolbox for the working scientist 153

Belief functions on MV algebras Now, consider the MV-algebra [0, 1]P(X) of all
functions P(X) → [0, 1], where X is finite. Let ρ : [0, 1]X → [0, 1]P(X) be defined
as: 
 min{f (x), x ∈ B} B 6= ∅;
ρ(f )(B) =
ρ(f )(B) = 1 otherwise.

If f = 1A (the indicator function of event A) then ρ(1A )(B) = 1 iff B ⊆ A, and


we can rewrite Bel(A) = m(ρ(1A )), where m is defined on collections of events.

Definition 51. Bel : [0, 1]X → [0, 1] is a belief function on [0, 1]X if there is a
state on the MV-algebra [0, 1]P(X) such that s(1∅ ) = 0 and Bel(f ) = s(ρ(f )), for
every f ∈ [0, 1]X . The state s is called a state assignment.

Belief functions so defined have values on continuous functions of X (of which


events are a special case). State assignments correspond to the probability measures
on C in the classical
R random set interpretation (Figure 4.6.5). There is an integral
representation by Choquet integral of such belief functions - the whole approach
is strongly linked with belief functions on fuzzy sets (Section ??).

Fig. 4.10. Relationships between classical belief functions on P(X) and belief functions on
[0, 1]X (from [?]).

All standard properties of classical b.f.s are met (e.g. superadditivity). In addi-
tion, the set of belief functions on [0, 1]X is a simplex whose extreme points corre-
spond to the generalisation of categorical b.f.s (see Chapter ??).

4.6.6 Generalised evidence theory

4.7 A toolbox for the working scientist


Thierry Denoeux and Lalla Zouhal ([365]) proposed a k-nearest neighbor classifier
based on the D-S theory, where each neighbor of a sample is considered as an item
of evidence supporting hypotheses about the class of membership of the sample
itself. The evidence of the k nearest neighbors is then pooled as usual by means of
Dempster’s rule. The problem of tuning the parameter of the classification rule is
solved by minimizing an error function ([1560]).
154 4 Reasoning with belief functions

Le-Hegarat, Bloch et al. apply the theory to unsupervised classification in a


multisource remote sensing environment ([605]), for it permits to consider union of
classes. Masses and focal elements are chosen in an unsupervised way by comparing
monosource classification results.
The salient aspect of [99], instead, is the definition of an empirical learning
strategy for the automatic generation of Dempster-Shafer classification rules from a
set of training data.
Fixsen et al. describe a modified rule of combination with foundations in the
theory of random sets and prove the relationship between this “modified Dempster-
Shafer” ([484]) approach and Smets’ pignistic probabilities. The MDS is applied
to build a classification algorithm which uses an information-theoretic technique to
limit the complexity.
Several woks have been written on the application of the theory of evidence to
neural network classifiers (see for instance [371]). In ([887]) Loonis et al. compare
the multi-classifier neural network fusion scheme with the straightforward applica-
tion of Dempster’s rule in a pattern recognition context.
An original work has been conducted by Johan Schubert, who deeply studied
the clustering problem ([1123], [1129], [1130], [1127]), in which 2n − 1 pieces of
evidence are clustered into n clusters by minimizing a metaconflict function. He
found neural structures more effective and much faster than optimization methods
for larger problems.
Since they both are suitable to solve classification problems, neural networks
and belief functions are sometimes integrated to yield more robust systems ([1379],[954]).
On the other side, Giacinto et al. compare neural networks and belief-based ap-
proaches for pattern recognition in the context of earthquake risk evaluation ([514]).
Resting on his work on clustering of nonspecific evidence, Schubert developed a
classification method [1128] based on the comparison with prototypes representing
clusters, instead of making a full clustering of all the evidence. The resulting com-
putational complexity is O(M · N ), where M is the maximum number of subsets
and N the number of prototypes chosen for each subset.

4.7.1 Classification

In classification problems, population is assumed to be partitioned in c groups or


classes. Let then Ω = {ω1 , . . . , ωc } denote the set of classes. Each instance of the
problem is then described by a feature vector x ∈ Rp and a class label y ∈ Ω.
Given a training set L = {(x1 , y1 ), . . . , (xn , yn )}, the goal is to predict the class
of a new instance described by x.
Past works on classification with belief functions have mainly dealt with two
main approaches:
1. ensemble classification: the outputs from a number of standard classifiers are
converted into belief functions and combined using Dempster’s rule or any other
alternative rule (e.g., [?, ]);
4.7 A toolbox for the working scientist 155

2. evidence-theoretic classifiers developed to directly provide belief functions as


outputs, in particular:
– the Generalized Bayes theorem (Section ??) extends the classical Bayesian
classifier when class densities and priors are ill-known [?, œux & Smets?];
– distance-based approach such as the evidential k-NN rule [œux?] and the
evidential neural network classifier [œux?].

Xi
(Xi,mi)  
di
di  
X
X  
?

Fig. 4.11. Left: classification is about finding out the class label of a test point “?” given the
information provided by a training set whose elements are labelled as belonging to specific
classes. Middle: principle of the k-nearest neighbour (k-NN) classifier. Right: evidential k-
NN classifier.

Evidential K-NN Let Nk (x) ⊂ L denote the set of the k nearest neighbors of
x in L, based on some appropriate distance measure d. Each xi ∈ Nk (x) can be
considered as a piece of evidence regarding the class of x represented by a mass
function mi on Ω:

mi ({yi }) = ϕ (di ) , mi (Ω) = 1 − ϕ (di ) .

The strength of this evidence decreases with the distance di between x and xi - ϕ
M function such that limd→+∞ ϕ(d) = 0. Evidence is then pooled as:
is a decreasing
m = mi . The function ϕ can be fixed heuristically or selected among a
xi ∈Nk (x)
family {ϕθ |θ ∈ Θ} using, e.g., cross-validation. Finally, the class with the highest
plausibility is selected.

Evidential k-NN rule for partially supervised data In some applications, training
instances are labeled by experts or indirect methods, without the use of ground truth.
As the class labels themselves of the training data are uncertain we have a partially
supervised learning problem. The training set can be formalised as:

L = {(xi , mi ), i = 1, . . . , n},
156 4 Reasoning with belief functions

where xi is the attribute vector for instance i, and mi is a mass function representing
uncertain expert knowledge about the class yi of instance i.
Special cases are:
– mi ({ωk }) = 1 for all i (supervised learning);
– mi (Ω) = 1 for all i (unsupervised learning).
The evidential k-NN rule can easily be adapted to handle such uncertain learning
data (Figure 4.11-right). Each mass function mi is first “discounted” (see Section
??) by a rate depending on the distance di :

m0i (A) = ϕ (di ) mi (A), ∀A ⊂ Ω; m0i (Ω) = 1 − A⊂Ω m0i (A).


P

M
The k diuscounted mass functions m0i are then combined: m = m0i .
xi ∈Nk (x)

4.7.2 Clustering

4.7.3 Ranking aggregation

Consider a set of alternatives O = {o1 , o2 , ..., on } and an unknown linear order


(transitive, antisymmetric and complete relation) on O. Typically, this linear order
corresponds to preferences held by an agent or a group of agents - thus, oi  oj is
interpreted as “alternative oi is preferred to alternative oj ” (compare Section ??).
Suppose also that a source of information (elicitation procedure, classifier) pro-
vides us with n(n − 1)/2 pairwise comparisons, affected by uncertainty. The prob-
lem is to derive the most plausible linear order from this uncertain (and possibly
conflicting) information.

Example: Tritchler & Lockwood, 1991 Consider four scenarios O = {A, B, C, D}


describing ethical dilemmas in health care. Suppose two experts gave their prefer-
ence for all six possible scenario pairs with confidence degrees described in Figure
4.12.

0,44 0,44
A B A B

0.94 0.74 0,97 0,93 0.94 0.74 0,97 0,8


0,06 0,01
D C D C

Fig. 4.12. Pairwise preferences in the example from Tritchler & Lockwood.

Assuming the existence of a unique consensus linear ordering L∗ and seeing the
expert assessments as sources of information, what can we say about L∗ ?
4.7 A toolbox for the working scientist 157

Formalisation In this problem the frame of discernment is the set L of linear orders
over O. Each pairwise comparison (oi , oj ) yields a pairwise mass function mΘij on
a coarsening Θij = {oi  oj , oj  oi } with:
mΘij (oi  oj ) = αij , mΘij (oj  oi ) = βij , mΘij (Θij ) = 1 − αij − βij .

The mass assignment mΘij may come from a single expert (e.g., an evidential clas-
sifier) or from the combination of the evaluations of several experts.
Let Lij = {L ∈ L|(oi , oj ) ∈ L}. Vacuously extending mΘij in L yields
mΘij ↑L (Lij ) = αij , mΘij ↑L (Lij ) = βij , mΘij ↑L (L) = 1 − αij − βij .
Subsequently combining the pairwise mass functions using Dempster’s rule pro-
duces: M
mL = mΘij ↑L .
i<j
L
The plausibility of the combination m is:
1 Y
pl(L) = (1 − βij )`ij (1 − αij )1−`ij ,
1 − κ i<j

where `ij = 1 if (oi , oj ) ∈ L, 0 otherwise (an algorithm for computing the degree
of conflict κ has been given by [Tritchler & Lockwood, 1991]).
Its logarithm pl(L) can be maximized by solving the following binary integer pro-
gramming problem:  
X 1 − βij
max `ij ln
`ij ∈{0,1}
i<j
1 − αij
subject to: 
`ij + `jk − 1 ≤ `ik , ∀i < j < k (1)
`ik ≤ `ij + `jk , ∀i < j < k (2)
Constraint (1) ensures that `ij = 1 and `jk = 1 ⇒ `ik = 1, while (2) ensures that
`ij = 0 and `jk = 0 ⇒ `ik = 0.
In conclusion, belief calculus allows us to model uncertainty in paired compar-
isons. The most plausible linear order can be computed efficiently using a binary
linear programming approach. Such approach has been applied to label ranking,
in which the task is to learn a “ranker” that maps p-dimensional feature vectors x
describing an agent to a linear order over a finite set of alternatives, describing the
agent’s preferences [Denœux and Masson, 2012]. As described in Section X, the
method can easily be extended to the elicitation of belief functions from preference
relations [Denœux and Masson. AOR 195(1):135-161, 2012].

4.7.4 Regression
[956] The analysis of classical linear regression models according to the ideas and
principles of the Dempster-Shafer Theory of Evidence is presented. Assumption-
based reasoning plays a central role in the analysis and the Theory of Hints is used
158 4 Reasoning with belief functions

to represent the results. Regression models are considered as functional models on


which a natural assumption-based analysis is performed. The result of the inference
on the parameter is expressed as a Gaussian hint from which degrees of support
and plausibility of hypotheses can be computed. A simple example illustrates the
theory. The comparison of this approach with classical least squares estimation on
generalized linear regression models is discussed. Finally, the basic ideas for the
application of the theory to the Kalman filter are explained.

4.7.5 Estimation

4.7.6 Optimisation

[1072]
This paper proposes solution approaches to the belief linear programming
(BLP). The BLP problem is an uncertain linear program where uncertainty is ex-
pressed by belief functions. The theory of belief function provides an uncertainty
measure that takes into account the ignorance about the occurrence of single states
of nature. This is the case of many decision situations as in medical diagnosis, me-
chanical design optimization and investigation problems. We extend stochastic pro-
gramming approaches, namely the chance constrained approach and the recourse
approach to obtain a certainty equivalent program. A generic solution strategy for
the resulting certainty equivalent is presented.

4.8 Advances
Belief functions are rather complex mathematical objects - thus, they possess links
with a number of fields of (applied) mathematics, on one side, and lead to interesting
generalisations of standard results of classical probability (e.g. Bayes’ theorem, total
probability), on the other.
Indeed many new results have been recently achieved, proving that the discipline
is alive and evolving towards maturity. It is useful to briefly mention some remark-
able results concerning the major open problems of the fields, in order to appreciate
the collocation of the work developed in Part II, too.
The work of Roesmer ([1089]) deserves a note for its original connection be-
tween nonstandard analysis and theory of evidence.

4.8.1 Matrix representation

Given an ordering of the subsets of Ω mass, belief, and plausibility functions can be
represented as vectors, which we can denote by m, bel and pl. Various operations
with belief functions can then be expressed via linear algebra operators acting on
vectors and matrices.
We can define the negation of a mass vector m as: m(A) = m(A). Smets has
shown that m = Jm, where J is the matrix whose inverse diagonal is made of 1s
4.8 Advances 159

(FIX). Given a mass vector m, the vector bel of belief values turns out to be bel =
Bf rM m, where Bf rM is the transformation matrix such that Bf rM (A, B) = 1
iff B ⊆ A and 0 otherwise.
Notably, such transformation matrices can be built recursively, as
 
10
Bf rMi+1 = ⊗ Bf rMi ,
11

where ⊗ denotes the Kronecker product of matrices.


Other transformation matrices representing various Moebius inversions can be
defined:
M f rB(A, B) = (1)|A|−|B| if B ⊆ A, = 0 otherwise
Qf rM (A, B) = 1 if A ⊆ B, = 0otherwise
M f rQ(A, B) = (1)|B|−|A| if A ⊆ B, = 0 otherwise.

These turn out to obey the following relations:

M f rB = Bf rM −1 , Qf rM = JBf rM J, M f rQ = JBf rM −1 J.

The vectors associated with normalised BFs and plausibilities can be computed as:
Bel = b − b(∅)1, pl = 1 − Jb.

Fast Moebius Transform An interesting application of matrix calculus is the com-


putation of the Fast Moebius Transform [], which was proposed by Smets and .. to
efficiently compute the various Moebius transforms involved in belief calculus (e.g.
from Bel to m). The FMT consists of a series of recursive calculations, illustrated
for the case of a frame Ω = {a, b, c} in Figure 4.13.

Fig. 4.13. Detail of the FMT when Ω = {a, b, c}. The symbols a, ab, etcetera, denote m(a),
m(a, b) and so on.

The related series of computations can also be expressed in matrix form as


Bf rM = M3 · M2 · M1 , where the three matrices are recalled in Figure 4.14.
160 4 Reasoning with belief functions

Fig. 4.14. .

4.8.2 Distances and dissimilarities

A number of norms have been introduced for belief functions, with the goal of ...
For example, people have proposed generalizations to belief functions of the
R∞
classical Kullback-Leibler divergence DKL (P |Q) = −∞ p(x) log( p(x) q(x) )dx of two
probability distributions P, Q, measures based on information theory such as fi-
delity, or entropy-based norms [Jousselme IJAR’11]. Many others have been pro-
posed [?, ?, ?, ?]. Any exhaustive analysis would be a huge task, although Jousselme
et al have managed to compile a very nice survey on the topic [].
Figure 4.15, extracted from [], summarises the main families of distances and
dissimilarities that have been proposed in the last twenty years or so.

Fig. 4.15. Some significant dissimilarity measures among belief functions proposed in the
last fifty years (from []).

Experimental tests on randomly generated BFs lead to the emergence of four


families: metric (i.e. proper distance functions), pseudo-metric (dissimilarities),
non-structural (those which do not account for the structure of the focal elements),
and non-metric (Figure 4.16).

Jousselme’s distance The most popular and cited measure of dissimilarity was pro-
posed by Jousselme et al [] as a “measure of performance” of algorithms (e.g. object
identification) in which successive evidence combination leads to convergence to the
“true” solution.
It is based on the geometric representation m of mass functions m, and reads
as:
4.8 Advances 161

Fig. 4.16. Empirical testing led Jousselme et al [] to the detection of four separate families of
dissimilarity measures.

r
. 1
dJ (m1 , m2 ) = (m1 − m2 )T D(m1 − m2 )
2
|A∩B|
where D(A, B) = |A∪B| for all A, B ∈ 2Θ . Jousselme’s distance so defined: (1)
is definite positive, thus it defines a metric distance; (2) takes into account the simi-
larity among subsets (focal elements); (3) is such that D(A, B) < D(A, C) if C is
“closer” to A than B. CLARIFY

Other proposals Among others, it is worth mentioning the following proposals.


– Perry and Stephanou’s distance:
 
F1 ∩ F 2
dP S (m1 , m2 ) = |F1 ∪F2 | 1 − +(m1 ⊕m2 −m1 )T (m1 ⊕m2 −m2 )
F1 ∪ F 2

– Blackman and Popoli’s attribute distance:


 
1 − κ(m1 , m2 )
dBP (m1 , m2 ) = −2 log +(m1 +m2 )T gA −mT1 Gm2 ,
1 − maxi {κ(mi , mi )}
|A|−1
where gA is the vector with elements gA (A) = |Θ|−1 , and

(|A| − 1)(|B| − 1)
G(A, B) = ;
(|Θ| − 1)2

– Lp measures used by Cuzzolin for geometric conditioning [] and consonant/consistent


approximation []:
 1/p
X
dLp (m1 , m2 ) =  |Bel1 (A) − Bel2 (A)|p 
A⊆Θ

(note that the L1 distance was earlier introduced by Klir and Harmanec []);
162 4 Reasoning with belief functions

– Fixen and Mahler’s Bayesian Percent Attribute Miss (BPAM): induced by the
inner product
m01 P m2 ,
p(A∩B)
where P (A, B) = p(A)p(B) and p is an a-priori probability on Θ;
– Zouhal and Denoeux’s inner product of pignistic functions [];
– Dempster’s conflict κ and Ristic’s closely related “additive global dissimilarity
measure”: − log(1 − κ);
√ √
– “fidelity” or Bhattacharia coefficient: m1 T W m2 ;
– the family of information-based distances:

dU (m1 , m2 ) = |U (m1 ) − U (m2 )|,

where U is any uncertainty measure for belief functions.

4.8.3 Measures of uncertainty

NEED FOR THEM? Various measures of uncertainty have been proposed - consult
for instance the 1990’s survey by Nikhil Pal [].
Some of themPare directly inspired by Shannon’s entropy of probability mea-
sures H(p) = − x p(x) log p(x). Yager’s entropy measure [] is a direct generali-
sation of Shannon’s entropy in which probabilities are replaced by plausibilities:
X
E(m) = − m(A) log P l(A).
A∈F

It is 0 for consonant or consistent belief functions (Ai ∩ Aj =


6 ∅ for all FEs), while
it is maximal for disjoint focal elements with equal mass (and Bayesian b.f.s in
particular). Hohle’s measure of confusion:
X
C(m) = − m(A) log Bel(A)
A∈F

is the dual measure in which belief measures replace probabilities in classical en-
tropy.
A different class of measures is designed to capture the specificity of belief mea-
sures, such as:
X m(A)
N (m) = .
|A|
A∈F

This measures the dispersion of the pieces of evidence generating a belief function,
and is rather clearly related to the pignistic function.
Klir has proposed a different non-specificity measure (later extended by Dubois &
Prade): X
I(m) = m(A) log |A|.
A∈F
4.8 Advances 163

Composite measures such as Lamata and Moral’s E(m) + I(m) try to capture
both entropy and specificity. E(m), however, was criticised by Klir & Ramer for it
expresses conflict as A ∩ B = ∅ rather than B 6⊆ A. C(m), instead, was criticised
for it does not measure to what extent two focal elements disagree (i.e., the size of
A ∩ B).
Klir & Ramer’s proposed then a global uncertainty measure defined as: D(m) +
I(m), where
" #
X X |A ∩ B|
D(m) = − m(A) log m(B) .
|B|
A∈F B∈F

Pal [] later argued that none of them is really satisfactory: none of the composite
measures have a unique maximum; there is no sounding rationale for simply adding
conflct and non-specificity measures together to get a “total” one, and finally; some
are computationally very expensive.
In opposition, Harmanec’s Aggregated Uncertainty (AU) is defined as the max-
imal Shannon entropy of all consistent probabilities, obviously in the credal set in-
terpretation of b.f.s. As the author proved [], it is the minimal measure meeting a set
of rationality requirements including: symmetry, continuity, expansibility, subaddi-
tivity, additivity, monotonicity, normalisation. AU itself was once againa criticised
by Klir and Smith for being insensitive to arguably significant changes in evidence,
and replaced by a linear combination of AU and nonspecificity I(m). This is obvio-
suly still characterised by high computational complexity: in response, Jousselme et
al, 2006 brought forward their Ambiguity Measure (AM), as the classical entropy of
the pignistic function.

4.8.4 Algebra and independence of frames


Definition 52. A collection of compatible frames Θ1 , ..., Θn are said to be indepen-
dent [1149] (IF) if:
ρ1 (A1 ) ∩ · · · ∩ ρn (An ) 6= ∅, ∀∅ =
6 Ai ⊂ Θ i .
The notion comes from that of independence of Boolean sub-algebras. One can
prove that independence of sources is indeed equivalent to independence of frames
[?]:
Proposition 26. A set of compatible frames Θ1 , ..., Θn are independent iff all the
possible collections of belief functions Bel1 , ..., Beln on Θ1 , ..., Θn are combinable
on their minimal refinement Θ1 ⊗ · · · ⊗ Θn .
Now, IF and classical independence of vector subspaces display a striking sim-
ilarity:
ρ1 (A1 ) ∩ · · · ∩ ρn (An ) 6= ∅, ∀Ai ⊆ Θi
Θ1 ⊗ · · · ⊗ Θn = Θ1 × · · · × Θn

v1 + · · · + vn 6= 0, ∀vi ∈ Vi
span{V1 , ..., Vn } = V1 × · · · × Vn .
164 4 Reasoning with belief functions

Indeed, one can prove that compatible frames and vector subspaces share the alge-
braic structure of semi-modular lattice.
In a family of frames we can define the following order relation:

Θ1 ≤ Θ2 ⇔ ∃ρ : Θ2 → 2Θ1 refining, (4.21)

together with its dule ≤∗ . Both (F, ≤) and (F, ≤∗ ) are lattices.

Definition 53. A lattice L is upper semi-modular if for each pair x, y of elements


of L, x  x ∧ y 9 implies x ∨ y  y.
A lattice L is lower semi-modular if for each pair x, y of elements of L, x ∨ y  y
implies x  x ∧ y.

Theorem 6. A family of compatible frames endowed with order relation (4.21)


(F, ≤) is an upper semi-modular lattice. Dually, (F, ≤∗ ) is a lower semi-modular
lattice.

Now, abstract independence can be defined on collections {l1 , ..., ln } of non-


zero elements of any semi-modular lattice with initial element 0 as follows:
_ _
I1 : lj 6≤ li ∀j = 1, ..., n; I2 : lj ∧ li = 0 ∀j = 2, ..., n;
i6=j _ X i<j
I3 : h( li ) = h(li ).
i i

In particular, for sets of compatible frames Θ1 , ..., Θn these relations read as:
O
Θ1 , ..., Θn I1∗ ⇔ Θj ⊕ Θi 6= Θj ∀ j = 1, ..., n
i6=j
j−1
O
Θ1 , ..., Θn I2∗ ⇔ Θj ⊕ Θi = 0F ∀ j = 2, ..., n
i=1
On n
X
Θ1 , ..., Θn I3∗ ⇔ Θi − 1 = (|Θi | − 1)

i=1 i=1

Notably, relation I3∗ is equilavent to say that the dimension of the probability poly-
tope for the minimal refinement is the sum of the dimensions of the polytopes asso-
ciated with the individual frames.
The relationship between these lattice-theoretic forms of independence and in-
dependence of frames is summarised by the following diagram:
In the upper semimodular case IF is mutually exclusive with all lattice-theoretic
relations I1 , I2 , I3 . In the lower semimodular case IF is a stronger condition than
both I1∗ and I2∗ . IF is mutually exclusive with the third independence relation (a
form of matroidal [] independence).
9
x “covers” y (x  y) if x ≥ y and there is no intermediate element in the chain linking
them
4.8 Advances 165

Fig. 4.17. Lattice-theoretical independence and independence of frames.

This analysis hints at the possibility that independence of sources may be ex-
plained algebraically. Although families of frames and projective geometries share
the same kind of lattice structure, independence of sources is not a form of lattice-
theoretic independence, nor a form of matroidal independence, but they are related
in a rather complex way.
A possible algebraic solution to the conflict problem (see Section 4.2.5) by
means of a “generalised Gram-Schmidt” can also be outlined.
Starting from a set of belief functions Beli : 2Θi → [0, 1] defined over Θ1 , · · · , Θn ,
we seek a new collection of independent frames of the same family:

Θ1 , ..., Θn ∈ F −→ Θ10 , ..., Θm


0
∈ F,

with m 6= n in general, and the same minimal refinement:

Θ1 ⊗ · · · ⊗ Θn = Θ10 ⊗ · · · ⊗ Θm
0
.

Then, we project the n original b.f.s Bel1 , ..., Beln onto the new set of frames,
achieving a set of surely combinable belief functions Bel10 , ..., Belm
0
equivalent (in
some meaningful sense) to the initial collection of bodies of evidence.

4.8.5 Multivariate analysis

4.8.6 Canonical decomposition

The question of how to define an inverse operation to the Dempster combination rule
for basic probability assignments and belief functions possesses a natural motivation
and an intuitive interpretation. If Dempster’s rule reflects a modification of one’s
system of degrees of beliefs when the subject in question becomes familiar with
the degrees of beliefs of another subject and accepts the arguments on which these
degrees are based, the inverse operation would enable to erase the impact of this
modification, and to return back to one’s original degrees of beliefs, supposing that
the reliability of the second subject is put into doubts.
Within the algebraic framework this inversion problem was solved by Ph. Smets
in [1265].
166 4 Reasoning with belief functions

SMETS’ SOLUTION
Kramosil proposed a solution to the inversion problem within the measure-
theoretic approach ([790]).

4.8.7 Frequentist formulation

WHY A FREQUENTIST INTERPRETATION


To our knowledge, only Walley has tried, in an interesting even if not very recent
paper ([1375]), to formulate a frequentist theory for upper and lower probability,
considering models for independent repetitions of experiments described by interval
probabilities and suggesting generalizations for the usual concepts of independence
and asymptotic behavior.
OTHER MORE RECENT ONES
also Fine ISIPTA 2001
Towards a Frequentist Interpretation of Sets of Measures
and recent paper by Dempster
The bigger picture
5
As we have seen in the Introduction, several different mathematical theories of un-
certainty compete to be adopted by practitioners of all fields of applied science
[1366, 1512, 1176, 592, 905, 502].
The (informal) consensus is that there no such a thing as the best mathematical de-
scription of uncertainty (compare [683], [466], [1190], [726, 727] and [367], to cite
a few) – the choice of the most suitable methodology should depend on the actual
problem at hand. Whenever a probability measure can be estimated, most authors
suggest the use of a classical Bayesian approach. If probability values cannot be
reliably estimated, upper and lower probabilities should instead be preferred.
Scholars have rather extensively discussed and compared the various approaches
to uncertainty theory. Notably George Klir [728] has surveyed in 2004 various theo-
ries of imprecise probabilities, proposing a number of unifying principles. Philippe
Smets [1246, 1254], on his part, has contributed to the analysis of the difference
between imprecision and uncertainty [712], and compared the applicability of vari-
ous models of uncertainty. More recently, Destercke et al. [381] have explored uni-
fying principles for uncertainty representations while discussing their ‘generalised
p-boxes’ proposal.
More specifically, a number of papers have explored theoretical and empirical [603]
comparisons between belief functions theory and other mathematical models of un-
certainty [813, 1493, 908, 608, 1073, 759, 428], especially in the sensor fusion con-
text [114, 165, 128, 618, 1337, 136].
Finally, a number of attempts have been made to unify most approaches to uncer-
tainty theory into a single coherent framework [728, 1371], most notably by Walley
(resulting in his theory of imprecise probability, [1371, 1367, 1374]), Klir (gener-

167
168 5 The bigger picture

alised information theory, [728]) and Zadeh (Generalized Theory of Uncertainty,


[1540]).
Our understanding of what belief functions are would be significantly limited,
if we did not extend our overview to embrace the rich tapestry of methodologies
which go under the name of ‘theories of uncertainty’.
In this Chapter, therefore, we will briefly survey the other approaches to uncertainty
theory in a rather exhaustive way, focussing in particular on their relationship (if
any) with the theory of evidence.

Fig. 5.1. The major approaches to uncertainty theories surveyed in this Chapter are arranged
into a hierarchy, in which less general frameworks are at the bottom and more general ones
at the top. A link between them indicates that the top formalism comprises the bottom one as
a special case.
5.1 Imprecise probability 169

Figure 5.1 arranges all these theories into a hierachy, according to their gener-
ality. Similar diagrams, for a smaller subset of methodologies, appear in [381] and
[728], among others.

Chapter outline

We start by summarising the basis notions of Peter Walley’s theory of imprecise


probabilities, arguably the broader attempt yet to a general theory of uncertainty,
and its behavioural rationale (Section 5.1). The other most general framework for
uncertainty description is arguable the theory of capacities, also called (for historical
reasons) ‘fuzzy measures’ – these are introduced in Section 5.2. A special case of ca-
pacities, but a very general one, is two-monotone capacities or probability intervals
(Section 5.3), which include belief functions (∞-monotone capacities) as subgroup.
In Section 5.4 higher-order probabilities (and second-order ones, in particular) are
introduced and their relation with belief functions is discussed.
Fuzzy theory (Section 5.5) includes, in a wide sense, both possibility theory
(Section 5.5.1) and the various extensions of belief functions with value on fuzzy
sets that have been proposed.
Logic (Section 5.6) is also much intertwined with belief theory, a fact we already
hinted at in the previous Chapter. In particular modal logic interpretations (Section
5.6.6), which include Pearl’s probability of probability semantics (Section 5.6.7),
exhibits very strong links with Dempster-Shafer theory.
Other major uncertainty theories with significant links to belief theory are
Pawlak’s rough sets (Section 5.7) and the notion of probability box, or ‘p-box’ (Sec-
tion 5.8). Destercke et al.’s generalised p-boxes are also discussed (Section 5.8.3).
A theory of epistemic beliefs quite resembling Shafer’s theory of belief func-
tions is Spohn’s, outlined in Section 5.9. Section 5.10 is devoted to Zadeh’s Gen-
eralised Uncertainty Theory framework, in which various uncertainty theories are
unified in terms of generalised constraints acting on ‘granules’.
In Section 5.11 Baoding Liu’s Uncertainty Theory formalisms is reviewed.
A comprehensive survey of other mathematical formalisms for uncertainty de-
scription, including info-gap theory and Vovk and Shafer’s game theoretical defini-
tion of probability concludes the Chapter.

5.1 Imprecise probability

‘Imprecise probability’ is a term that, in a broader sense, refers to the approaches


which make use of collections of probability measures, rather than single distribu-
tions, to model a problem. This can be seen as a variant of robust statistical ap-
proaches.
In a narrower sense, it denotes the framework brought forward by Walley [1374,
1366] to unify all these approaches in a coherent setting. The latter is perhaps the
170 5 The bigger picture

most extensive effort into a general theory of imprecise probabilities, whose gener-
ality is comparable with the theory based on arbitrary closed and convex probability
distributions, and is formalized in terms of lower and upper previsions [211].

5.1.1 Lower probabilities

A lower probability [477] P is a function from 2Θ , the power set of Θ, to the unit
interval [0, 1]. With any lower probability P is associated a dual upper probabil-
ity function P , defined for any A ⊆ Θ as P (A) = 1 − P (Ac ), where Ac is the
complement of A. With any lower probability P we can associate a (closed convex)
set n o
P(P ) = p : P (A) ≥ P (A), ∀A ⊆ Θ (5.1)

of probability distributions p whose measure P dominates P . Such a polytope or


convex set of probability distributions is usually called a credal set [838].
When the lower probability is a belief measure, P = b, we simply get the credal set
of probabilities consistent with b (3.10). As pointed out by Walley [1371], not all
convex sets of probabilities can be described by merely focusing on events.

Definition 54. A lower probability P on Θ is called ‘consistent’ (‘avoids sure loss’


in Walley’s terminology [1371]) if P(P ) 6= ∅, or equivalently:
n
X n
X
sup ξEi (x) ≥ P (Ei ), (5.2)
x∈Θ i=1 i=1

whenever n is a nonnegative integer, E1 , ..., En ∈ F are events of the sigma-algebra


F on which P is defined, and ξEi is the characteristic function of the event Ei on
Θ.

Definition 55. A lower probability P is called ‘tight’ (‘coherent’ for Walley’s) if:

inf p(A) = P (A)


p∈P(P )

or, equivalently:
" n
# n
X X
sup ξEi (x) − m · ξE0 (x) ≥ P (Ei ) − m · P (E0 ), (5.3)
θ∈Θ i=1 i=1

whenever n, m are nonnegative integers, E0 , E1 , ..., En ∈ F and ξEi is again the


characteristic function of the event Ei .

Consistency means that the lower bound constraints P (A) can indeed be sat-
isfied by some probability measure, while tightness indicates that P is the lower
envelope on subsets of P(P ). Any coherent lower probability is monotone and su-
peradditive.
5.1 Imprecise probability 171

5.1.2 Gambles and behavioural interpretation

The concepts of avoiding sure loss and coherence are also applicable to any func-
tional defined on a class of bounded functions on Θ (gambles). According to this
point of view, a lower probability is a functional defined on the class of all charac-
teristic (indicator) functions of sets.
The behavioural rationale for general imprecise probability theory derives from
equalling ‘belief’ to ‘inclination to act’. An agent believes in a certain outcome to
the extent it is willing to accept a gamble on that outcome. A gamble is a decision
which generated different utilities in different states (outcomes) of the world. The
following outline is abstracted from [?].
Definition 56. Let Ω be the set of possible outcomes ω. A gamble is a bounded
real-valued function on Ω: X : Ω → R, ω 7→ X(ω).
Clearly the notion of gamble is very close to that of utility (see Section ??). Note
that gambles are not constrained to be normalised or non-negative. Whether one is
willing to accept a gamble depends on their belief on the outcome.
Let us denote an agent’s set of desirable gambles by D ⊆ L(Ω), where L(Ω)
is the set of all bounded real valued functions on Ω. Since whether a gamble is
desirable depends on the agent’s belief on the outcome, D can be used as a model
of the agent’s uncertainty about the problem.
Definition 57. A set D of desirable gambles is coherent iff:
1. 0 (the constant gamble X(ω) = 0 for all ω) 6∈ D;
2. if X > 0 (i.e., X(ω) > 0 for all ω) then X ∈ D;
3. if X, Y ∈ D, then X + Y ∈ D;
4. if X ∈ D and λ > 0 then λX ∈ D.
As a consequence, if X ∈ D and Y > X then Y ∈ D. In other words, a coherent
set of desirable gambles is a convex cone (it is closed under convex combination).

5.1.3 Lower previsions

Now, suppose the agent buys a gamble X for a price µ. This yields a new gamble
X − µ.
Definition 58. The lower prevision P (X) of a gamble X:
.
P (X) = sup{µ : X − µ ∈ D}
is the supremum acceptable price for buying X.
In the same way, selling a gamble X for a price µ yields a new gamble µ − X.
Definition 59. The upper prevision P (X) of a gamble X:
.
P (X) = inf{µ : µ − X ∈ D}
is the supremum acceptable price for selling X.
172 5 The bigger picture

By definition P (X) = −P (−X). When lower and upper prevision coincide:


P (X) = P (X) = P (X) is called the (precise) prevision of X, or a ‘fair price’
in de Finetti’s sense [?].
A graphical interpretation of lower, upper and precise previsions is given in
Figure 5.2. Specifying a precise prevision for X amounts to being able, for any
real price p, to decide whether we want to buy or sell gamble X. When only lower

Fig. 5.2. Interpretation of lower, upper and precise previsions in term of acceptability of
gambles (transactions).

and upper previsions can be defined, for any price in the interval [P (X), P (X)] we
remain undecided.

5.1.4 Events as indicator gambles

As events A ⊂ Ω are nothing but special indicator gambles of the form



1ω∈A
IA (ω) =
0 ω 6∈ A,

trivially the lower/upper probability of an event can be defined as a the lower/upper


prevision of the corresponding indicator gamble: P (A) = P (IA ), P (A) = P (IA ).
The indicator gamble expresses the fact that we (the agent) are rewarded what-
ever the outcome in A (in an equal way), and not rewarded if the outcome is outside
A. The corresponding lower and upper previsions (probabilities) measure then the
evidence for and against the event A.

5.1.5 Rules of rational behaviour

As lower and upper precisions representing commitments to act in certain ways


under certain circumstances. Rational rules of behaviour impose that:
– the agent does not specify betting rates such as they lose utility whatever the
outcomes (‘avoiding sure loss’) - when the gamble Xk is an indicator function
ξEi , this is expressed by Definition 54;
– the agent is fully aware of the consequences of its betting rates (‘coherence’) -
when the gamble Xk is an indicator function ξEi , this is expressed by Definition
55;
5.1 Imprecise probability 173

If the first condition is not met, there exists a positive combination of gambles which
the agent finds individually desirable which is not desirable to them. One conse-
quence of avoiding sure loss is that P (A) ≤ P (A). A consequence of coherence is
that lower previsions are subadditive: P (A) + P (A) ≤ lpr(A ∪ B) for A ∩ B 6= 0.
A precise prevision P is coherent iff: (i) P (λX + µY ) = λP (X) + µP (Y ); (ii)
if X > 0 then P (X) ≥ 0; (iii) P (Ω) = 1, and coincides with de Finetti’s notion of
coherent prevision.
Special cases of coherent lower/upper previsions include probability measures,
de Finetti previsions, 2-monotone capacities, Choquet capacities, possibility/necessity
measures, belief/plausibility measures, random sets but also probability boxes,
(lower and upper envelopes of) credal sets, and robust Bayesian models.

5.1.6 Natural extension


The natural extension operator addresses the problem of extending a coherent lower
prevision defined on a collection of gambles to a lower prevision on all gambles,
under the constraints of the extension being coherent and conservative (least com-
mittal). Given a set D of gambles the agent has judged desirable, the natural exten-
sion E of D is the smallest coherent set of desirable gambles that includes D, i.e.
the smallest extension of D to a convex cone of gambles that contains all positive
gambles but not the zero gamble.
For the special case of lower probabilities, it is defined as follows.
Definition 60. Let P be a lower probability on Θ that avoids sure loss, and let L
be the set of all bounded functions on Θ. The functional E defined on L as:
X n Xn

E(f ) = sup λi P (Ei ) + c f ≥ λi ξEi + c, n ≥ 0, Ei ⊆ Θ, λi ≥ 0,
i=1  i=1

c ∈ {−∞, +∞} ∀f ∈ L
(5.4)
is called the natural extension of P .
When P is a classical (‘precise’) probability the natural extension agrees with the
expectation. Also, E(ξA ) = P (A) for all A iff P is coherent.

5.1.7 Belief functions and imprecise probabilities


Belief functions as coherent lower probabilities Wang and Klir [?] have shown
that on a frame of just two elements coherence and complete monotonicity reduce to
superadditivity: hence, any coherent lower probability is a belief measure. Already
on frames of 3 elements, however, there exists 2-monotone (coherent) lower proba-
bilities which are not 3-monotone (belief measures).
Belief functions are indeed a special type of coherent lower probabilities, which in
turn can be seen as a special class of lower previsions (consult [1371], Section 5.13).
Walley has proved that coherent lower probabilities are closed under convex combi-
nation: belief functions’ relationship with convexity will be discussed in Part II.
174 5 The bigger picture

Natural extension and Choquet integrals of belief measures Both the Choquet
integral (4.20) with respect to monotone set functions (such as belief functions) and
the natural extensions of lower probabilities are generalizations of the Lebesgue
integral with respect to σ-additive measures. Wang and Klir [1390] investigated the
relations between Choquet integrals, natural extension and belief measures, showing
that the Choquet integral with respect to a belief measure is always greater than or
equal to the corresponding natural extension.
R
More precisely, the Choquet
R integral f dP for all f ∈ L is a nonlinear func-
tional on L, and (X, L, f dP ) is a lower prevision [1371]. It can be proven that
the latter is coherent when P is a belief measure, and:
R
Proposition 27. E(f ) ≤ f dP for any f ∈ L whenever P is a belief measure.
Conceptual autonomy of belief functions Baroni and Vicig [62] claim that the
answers to the questions .. tend to exclude the existence of intuitively appreciable
relationships between belief functions and coherent lower probabilities, confirming
the conceptual autonomy of belief functions with respect to imprecise probability.

5.2 Capacities (A.K.A. fuzzy measures)


The theory of capacities or fuzzy measure theory [1389, 1316, 539] is a general-
isation of classical measure theory upon which classical mathematical probability
is constructed, rather than a generalisation of probability theory itself. It considers
generalized measures in which the additivity property (see Definition 1) is replaced
by the weaker property of monotonicity.
The central concept of fuzzy measure or capacity [?] was introduced by Choquet
in 1953 and independently defined by Sugeno in 1974 [?] in the context of fuzzy
integrals. A number of uncertainty measures can be seen as special cases of fuzzy
measures, including belief functions, possibilities and probability measures [806].
Definition 61. Given a domain Θ and a non-empty family F of subsets of Θ, a
monotone measure (often called monotone capacity or fuzzy measure) µ on hΘ, Fi
is a function µ : F → [0, 1] which meets the following conditions:
1. µ(∅) = 0;
2. if A ⊆ B then µ(A) ≤ µ(B), for every A, B ∈ F (‘monotonicity’);
3. for any increasing sequence A1 ⊆ A2 ⊆ · · · of subsets in F,

[ [ ∞ 
if Ai ∈ F, then lim µ(Ai ) = µ Ai
i→∞
i=1 i=1

(‘continuity from below’);


4. for any decreasing sequence A1 ⊇ A2 ⊇ · · · of subsets in F,

\ ∞
\ 
if Ai ∈ F and µ(A1 ) < ∞, then lim µ(Ai ) = µ Ai
i→∞
i=1 i=1

(‘continuity from above’).


5.2 Capacities (A.K.A. fuzzy measures) 175

When Θ is finite the last two requirements are trivially satisfied and can be disre-
garded. Monotone decreasing measures can be obtained by replacing ≤ with ≥ in
condition 2.

5.2.1 Special types of capacities

Capacities and belief functions


Definition 62. A capacity µ is said to be of order k if it satisfies the inequalities:
k
[  X  \ 
µ Aj ≥ (−1)|K|+1 µ Aj (5.5)
j=1 ∅6=K⊆[1,...,k] j∈K

for all collections of k subsets Aj , j ∈ K of Θ.


Clearly, if k 0 > k the resulting theory is less general than a theory of capacities of
order k (i.e., it contemplates fewer measures). The less general such theory is that
of infinitely monotone capacities [1216].
Proposition 28. Belief functions are infinitely monotone capacities.
We just need to compare (5.5) with the superadditivity property of belief functions
(3.6). The Moebius transforms of a capacity µ can be computed as:
X
m(A) = (−1)|A−B| µ(B),
B⊂A

just as for belief functions (2.3). For infinitely monotone capacities, as we know, the
Moebius inverse (the basic probability assignment) is non-negative.
Klir et al. published an excellent discussion [731] on the relations between belief
and possibility theory [325, 814], and examined different methods for constructing
fuzzy measures in the context of expert systems.
The product of capacities representing belief functions is studied in [609]. The re-
sult ([609], Equation (12)) is nothing but the unnormalised Dempster combination
(or, equivalently, a disjunctive combination in which mass zero is assigned to the
empty set), and is proved to satisfy a linearity property (commutativity with convex
combination).
In [1489], Yager analysed a class of fuzzy measures generated by a belief measure,
seen as providing partial information about an underlying fuzzy measure. An entire
class of such fuzzy measures exists - the notion of entropy of a fuzzy measure is
used to select significant representatives from this class.

Sugeno λ-measures Sugeno λ-measures, gλ , introduced by Sugeno [1316], are


special regular monotone measures that satisfy the requirement

gλ (A ∪ B) = gλ (A) + gλ (B) + λgλ (A)gλ (B) (5.6)


176 5 The bigger picture

for any given pair of disjoint sets A, B ∈ 2Θ , where λ ∈ (−1, ∞) is a parameter by


which individual measures in this class are distinguished. It is well known [1389]
that each λ-measure is uniquely determined by values gλ (x), x ∈ Θ, subject to the
condition that at least two of these values are nonzero.The parameter λ can then be
uniquely recovered from them as follows:
Y
1+λ= [1 + λgλ (x)]. (5.7)
x

Given gλ (x) for all x ∈ Θ and λ, the values gλ (A) of the λ-measure for all subsets
A ∈ 2Θ are then determined by (5.6).
The following three cases must be distinguished:
P
1. if x gλ (x) < 1, then gλ is a lower probability and, thus, a superadditive
measure; λ is determined by the root of Equation (5.7) in the interval (0, ∞),
which
P is unique;
2. if x gλ (x) = 1 then gλ is a probability measure, λ = 0, and it is the only root
of P
Equation (5.7);
3. if x gλ (x) > 1 then gλ is an upper probability and, hence, a subadditive
measure; λ is determined by the root of (5.7) in the interval (−1, 0), which is
unique.
Finally, [1389] lower and upper probabilities based on λ-measures are special belief
and plausibility measures, respectively.

Interval-valued probability distributions Systems of probability intervals (Sec-


tion 5.3) are also a special case of monotone capacities.

Proposition 29. [320] The lower and upper probability measures associated with
a feasible (‘reachable’) set of probability intervals are Choquet capacities of order
2, namely:

l(A ∪ B) + l(A ∩ B) ≥ l(A) + l(B) ∀A, B ⊆ Θ


(5.8)
u(A ∪ B) + u(A ∩ B) ≤ u(A) + u(B) ∀A, B ⊆ Θ.

5.3 Probability intervals (two-monotone capacities)


Dealing with general lower probabilities defined on 2Θ can be difficult when Θ
is large: it may then be interesting for practical applications to focus on simpler
models.
A set of probability intervals or interval probability system [803, 1333, 320] is a
system of constraints on the probability values of a probability distribution p : Θ →
[0, 1] on a finite domain Θ of the form:
.
n o
P(l, u) = p : l(x) ≤ p(x) ≤ u(x), ∀x ∈ Θ . (5.9)
5.3 Probability intervals (two-monotone capacities) 177

Probability intervals [218, 1429, 1326] were introduced as a tool for uncertain rea-
soning in [320, 964], where combination and marginalization of intervals were stud-
ied in detail. The authors also studied the specific constraints such intervals ought
to satisfy in order to be consistent and tight.
As pointed out for instance in [642], probability intervals typically arise through
measurement errors. As a matter of fact, measurements can be inherently of interval
nature (due to the finite resolution of the instruments). In that case the probability
interval of interest is the class of probability measures consistent with the measured
interval.
A set of constraints of the form (16.1) also determines a credal set: credal sets
generated by probability intervals are a sub-class of all credal sets generated by
lower and upper probabilities [1263]. Their vertices can be computed as in [320], p.
174.
A set of probability intervals may be such that some combinations of values
taken from the intervals do not correspond to any probability distribution function,
indicating that the intervals are unnecessarily broad.
Definition 63. A set of probability intervals is called feasible if and only if for each
x ∈ Θ and every value v(x) ∈ [l(u), u(x)] there exists a probability distribution
function p : Θ → [0, 1] for which p(x) = v(x).
If P(l, u) is not feasible, it can be converted to a set of feasible intervals via:
 X   X 
l0 (x) = max l(x), 1 − u(y) , u0 (x) = min u(x), 1 − l(y) .
y6=x y6=x

In a similar way, given a set of bounds P(l, u) we can obtain lower and upper prob-
ability values P (A) on any subset A ⊆ Θ by using the following simple formulas:
X X  X X 
P (A) = max l(x), 1 − u(x) , P (A) = min u(x), 1 − l(x) .
x∈A x6∈A x∈A x6∈A
(5.10)
A generalised Bayesian inference framework based on interval probabilities is pro-
posed in [1].
Belief functions are also associated with a set of lower and upper probability
constraints of the form (16.1): they correspond therefore to a special class of interval
probability systems, associated with credal sets of a specific form.

5.3.1 Probability intervals and belief measures


Besides describing at length the way two compatible sets of probability intervals
can be combined via disjunction or conjuction, and marginalisation and condition-
ing operators for them, [320] delves into the relationship between probability inter-
vals and belief/plausibility measures, and considers the problem of approximating a
belief function with a probability interval.
Given a pair (b, pl) of belief/plausibility functions on Θ, we wish to find the set
of probability intervals P ∗ (l∗ , u∗ ) such that:
178 5 The bigger picture

– (b, pl) ⊂ P ∗ (l∗ , u∗ ) (as credal sets);


– for every P(l, u) such that (b, pl) ⊂ P(l, u), it is also P ∗ (l∗ , u∗ ) ⊂ P(l, u),
i.e., P ∗ (l∗ , u∗ ) is the smallest set of probability intervals containing (b, pl).

Proposition 30. ([828], [320] Proposition 13) For all x ∈ Θ

l∗ (x) = b(x), u∗ (x) = pl(x), (5.11)

i.e. the minimal probability interval containing a pair of belief/plausibility functions


is that whose lower bound is the belief of singletons, the upper bound is the plausi-
bility of singletons.

The opposite problem of finding, given an arbitrary set of probability intervals, a be-
lief function such that (5.11) is met can only be solved whenever ([320], Proposition
14):
X X X X
l(x) ≤ 1, l(y) + u(x) ≤ 1 ∀x ∈ Θ, l(x) + u(x) ≥ 2.
x y6=x x x

In that case several pairs (b, pl) exist which satisfy (5.11): Lemmer and Kyburg
[828] have proposed an algorithm for selecting one. When proper and reachable
sets are considered, the first two conditions are trivially met.
The opposite question, namely approximating an arbitrary probability interval
with a pair belief/plausibility is also considered in [320] - it turns out that such
approximations only have focal elements of size less than or equal to 2 (Proposition
16).

5.4 Higher-order probabilities


Metaprobability and Dempster-Shafer A work by Fung (1985) [494] discusses
second-order probability (which the author calls ‘metaprobability’ theory) as a way
to provide soft or hard constraints on beliefs in much the same manner as the
Dempster-Shafer theory provides constraints on probability masses on subsets of
the state space. As Fung points out, second-order probabilities lack practical moti-
vations for their use, while ‘methodological issues are concerned mainly with con-
trolling the combinatorics of metaprobability state spaces.’
Metaprobabilistic updating of beliefs is still based on Bayes’ rule, namely:

p2 (p|D, P r) ∝ p2 (D|p, P r) · p2 (p|P r)

where D is the data (evidence) and P r a prior on space of (first order) probability
distributions p.
Bounding probability is different from the approach of second-order or twodi-
mensional probability (e.g., Hoffman and Hammonds 1994; Cullen and Frey 1999)
in which uncertainty about probabilities is itself modeled with probability.
5.5 Fuzzy theory 179

Hoffman, F. O., Hammonds, J.S. (1994). Propagation of uncertainty in risk as-


sessments: The need to distinguish between uncertainty due to lack of knowledge
and uncertainty due to variability. Risk Analysis 14(5):707-712.
Cullen, A.C., and H.C. Frey. (1999). Probabilistic Techniques in Exposure As-
sessment: A Handbook for Dealing with Variability and Uncertainty in Models and
Inputs. Plenum Press: New York.

5.5 Fuzzy theory


The concept of fuzzy set was introduced by Lotfi A. Zadeh [?] and Dieter Klaua [?]
in 1965 as an extension of the classical notion of set.
While in classical set theory an element either belongs or does not belong to the
set, fuzzy set theory allows a more gradual assessment of the membership of ele-
ments in a set. The degree of membership is described by a ‘membership function’,
a function from the domain of the set to the real unit interval [0, 1].
Zadeh later introduced possibility theory as an extension of fuzzy set theory, in or-
der to provide a graded semantics to natural language statements 1 . The theory was
later further developed thanks to the contribution of Didier Dubois [411] and Henri
Prade. Indeed, possibility measures are also the basis of a mathematical theory of
partial belief.
Authors like Heilpern [606], Yager [1488, 1483], Palacharla [1009], Romer
[1088], Kreinovich [795] and others [1037, 527, 469, 853, 486, 1143] also stud-
ied the connection between fuzzy and Dempster-Shafer theory. A very technical
work by Goodman [527] explored the mathematical relationship between fuzzy and
random sets, showing that the membership function of any fuzzy subset of a space
is the common ‘one-point coverage function’ of an equivalence class of (in general,
infinitely many) random subsets of that space.

5.5.1 Possibility theory

Definition 64. A possibility measure on a domain Θ is a function P os : 2Θ → [0, 1]


such that P os(∅) = 0, P os(Θ) = 1 and:
[ 
P os Ai = sup P os(Ai )
i
i

for every family of subsets {Ai |Ai ∈ 2Θ , i ∈ I}, where I is an arbitrary set index.

Each possibility measure is uniquely characterized by a membership function or


.
possibility distribution π : Θ → [0, 1] s.t. π(x) = P os({x}) via the formula:

P os(A) = sup π(x).


x∈A

1
http://www.scholarpedia.org/article/Possibility theory
180 5 The bigger picture

The dual quantity N ec(A) = 1 − P os(Ac ) is called necessity measure.


Many studies have pointed out that necessity measures coincide in the theory of
evidence with the class of consonant belief functions.
¯ [679] the restriction of the plausibility
Indeed, let us call plausibility assignment pl b
¯
function (8.8) to singletons plb (x) = plb ({x}).
By Condition 4 of Proposition 17 it follows that:
Proposition 31. The plausibility function plb associated with a belief function b on
a domain Θ is a possibility measure iff b is consonant, in which case the membership
function coincides with the plausibility assignment: π = pl ¯ . Equivalently, a b.f. b
b
is a necessity measure iff b is consonant.
Possibility theory (in the finite case) is then embedded in the ToE. The points of
contact between the evidential formalism in the transferable belief model imple-
mentation and possibility theory is briefly investigated in [1233].

5.5.2 Belief functions on fuzzy sets

In addition to possibility measures being equivalent to consonant belief functions,


belief functions can be generalised to assume values on fuzzy sets, rather than tradi-
tional ‘crisp’ ones [1216, 1519, 96]. Following Zadehs work, Ishizuka et al. [650],
Ogawa and Fu [991], Yager [1476], and recently Biacino [96] have extended the
Dempster-Shafer theory to fuzzy sets by defining a measure of inclusion.
A belief measure can be defined on fuzzy sets as follows:
X
Bel(X) = I(A ⊆ X)m(A), (5.12)
A∈M

where M is the collection of all fuzzy subsets of Ω, A and X are two such fuzzy
subsets, m is a mass function defined this time on the collection of fuzzy subsets
on Ω (rather than the collection of crisp subsets, or power set), and I(A ⊆ X) is a
measure of how much the fuzzy set A is included in the fuzzy set X.
Indeed, defining the notion of inclusion for fuzzy sets is not trivial - various
measures of inclusion can and have been proposed.
Just as a fuzzy set is completely determined by its membership function, different
measures of inclusion between fuzzy sets are associated with a function I : X ×
Y → [0, 1], from which one can get [1438]:
^ 
I(A, B) = I A(x), B(y) .
x∈Θ

Among the most popular we can cite Lukasiewicz’s inclusion: I(x, y) = min{1, 1−
x − y}, proposed by Ishizuka [650], and Kleene-Dienes’: I(x, y) = max{1 − x, y},
supported by Yager [1476].
Although these extensions to belief theory all arrive at frameworks within which
both probabilistic and vague information can be handled, they are all restricted to
5.5 Fuzzy theory 181

finite frames of discernments. Moreover, it is unclear whether or not the belief and
plausibility functions so obtained satisfy subadditivity (respectively, superadditiv-
ity) (3.6) in the fuzzy environment. Biacino [96] studied fuzzy belief functions in-
duced by an infinitely monotone inclusion and proved that they are indeed lower
probabilities.
In response, Wu et al [1438] have recently developed a theory of fuzzy belief func-
tions on infinite spaces.

5.5.3 Vague sets

In vague set theory [504] a vague set A is characterised by truth-membership func-


tion tA (x) and a false-membership function fA (x) for all elements x ∈ U of a
universe of discourse, further generalising classical fuzzy set theory as tA (x) is a
lower bound on the grade of membership of x generated by the evidence in favour
of x, and fA (x) is a lower bound on the negation of x derived from the evidence
against x. In general:
tA (x) + fA (x) ≤ 1,
where the gap 1 − (tA (x) + fA (x)) represents our ignorance about the element x
and is nil for traditional fuzzy sets. Basically, a vague set is a pair of lower and upper
bounds on the membership function of a fuzzy set.
In [843] the authors show that belief theory is a special case of vague set theory
[504], as the belief value of a proposition and the grade membership of an element in
a vague set share a similar form. Whenever the elements in a vague set are subsets
of a total set (frame of discernment), x ↔ A ⊂ Θ and the grade membership of
the subsets is redefined according to Dempster-Shafer theory, tA (x) = Bel(A) and
fA (x) = P l(A), vague set and belief theory coincide.
In the author’s view the analysis conducted in [843] is simplistic and possibly
flawed.

5.5.4 Other fuzzy extensions of the theory of evidence

Numerous fuzzy extensions of belief theory have indeed been proposed [326, 439,
1508, 649]. Constraints on belief functions imposed by fuzzy random variables have
been studied in [1092, 767]. Fuzzy evidence theory was used for decision making
in [1513].

Intuitionistic fuzzy sets Atanassov’s intuitionistic fuzzy sets [40], an approach


mathematically equivalent (but formulated earlier) to vague sets (see the above Sec-
tion 5.5.3) can also be interpreted in the framework of belief theory, so that all math-
ematical operations on intuitionistic fuzzy values can be represented as operations
on belief intervals, allowing us to use Dempsters rule of combination to aggregate
intuitionistic fuzzy values for decision making [451].

Lucas’ fuzzy-valued measure In [161] Lucas and Araabi proposed their own gen-
eralization of the Dempster-Shafer theory [1517] to a fuzzy valued measure.
182 5 The bigger picture

Fuzzy Conditioned Dempster-Shafer (FCDS) Mahler [909] formulated his ‘Fuzzy


Conditioned Dempster-Shafer (FCDS)’ theory, as a probability-based calculus for
dealing with possibly imprecise and vague evidence.The theory uses a finite-level
Zadeh fuzzy logic and a Dempster-like combination operator which is ‘conditioned’
to reflect the influence of any a priori knowledge which can be modeled by a belief
measure on finite-level fuzzy sets. The author shows that FCDS is grounded in the
theory of random fuzzy sets, and is a generalization of Bayesian theory to the case
in which both evidence and a priori knowledge are imprecise and vague.

Yager’s work In [1495], Ronald Yager [1481, 1488] and D. Filev proposed a
combined fuzzy-evidential framework for fuzzy modeling. In another work [1487],
Yager investigated the issue of normalization (i.e., the assignment of non-zero val-
ues to empty sets as a consequence of the combination of evidence) in the fuzzy
Dempster-Shafer theory of evidence, proposing in response a technique called
‘smooth normalization’.

5.6 Logic
Many generalizations of classical logic in which propositions are assigned proba-
bility values [?] rather than truth values (0 or 1) have been proposed in the past2 .
As belief functions naturally generalize probability measures, it is quite natural to
define non-classical logic frameworks in which propositions are assigned belief val-
ues, rather than probability values.
This approach has been brought forward in particular by Ruspini [1100, 1099],
Saffiotti [1104], Josang [?], Haenni [572], and others.

5.6.1 A belief functions logic

In propositional logic, propositions or formulas are either true or false, i.e., their
truth value is either 0 or 1 [922]. Formally, an interpretation or model of a for-
mula φ is a valuation function mapping φ to the truth value ‘true’ (1). Each formula
can therefore be associated with the set of interpretations or models (or ‘hyper-
interpretations’ [1108]) under which its truth value is 1. If we define a frame of
discernment formed by all possible interpretations, each formula φ is associated
with the subset A(φ) of this frame which collects all its interpretations.
If the available evidence allows to define a belief function (or ‘bf-interpretation’
[1108]) on this frame of possible interpretations, each formula A(φ) ⊆ Θ is then
naturally assigned a degree of belief b(A(φ)) between 0 and 1 [1104, 572], measur-
ing the total amount of evidence supporting the proposition ‘φ is true’.
Alessandro Saffiotti, in particular, built in 1992 a hybrid logic attaching belief
values to the classical first-order logic, which he called belief functions logic (BFL)
2
https://en.wikipedia.org/wiki/Probabilistic logic
5.6 Logic 183

[1108], giving new perspectives on the role of Dempster’s rule. Many formal prop-
erties of first-order logic directly generalise to BFL.
Formally, BFL works with formula of the form F : [a, b], where F is a sentence of
a first-order language, and 0 ≤ a ≤ b ≤ 1. Roughly speaking, a is the degree of
belief that F is true, (1 − b) the degree of belief that F is false.
Definition 65. A belief function b is a bf-model of a belief function (bf-) formula
F : [a, b] iff b(F ) ≥ a and b(F ) ≤ b.

Definition 66. A bf-formula Φ ‘bf-entails’ another bf-formula Ψ iff every bf-model


of Φ is also a bf-model of Ψ .

A number of properties of bf-entailment were proved in [1108]. Given a set of bf-


formulas, a ‘D-model’ for the set can be obtained by Dempster-combining the mod-
els of the individual formulas. Saffiotti proved that if the set of formula is coherent
(in a first-order logic sense), then it is D-consistent, i.e., the combined model assigns
zero mass to the empty set (in which case it is the least informative bf-model for the
set of formulas).
The issue has been studied by other authors as well. In [81] and [80], Benferhat
et al., for instance, defined a semantics based on -belief assignments where values
committed to focal elements are either close to 0 or close to 1. Andersen and Hooker
[31] proved probabilistic logic and Dempster-Shafer theory to be instances of a
certain type of linear programming model, with exponentially many variables.

5.6.2 Josang’s subjective logic

[674, 688, 692]

5.6.3 Fagin and Halpern’s logic

[465]

5.6.4 Haenni and Lehmann’s Probabilistic Argumentation Systems

[33, 578, 825, 572]

5.6.5 Default logic

[1422, 1420]
184 5 The bigger picture

5.6.6 Modal logic interpretation

It is worth mentioning the work of Resconi, Harmanec et al. [1081, 595, 598, 1083,
596], who proposed the semantics of propositional modal logic as unifying frame-
work for various uncertainty theories, such as fuzzy set, possibility and evidential
theory, and established an interpretation of belief measures on infinite sets. This
work is closely related to that of Ruspini [1100, 1099], which is based on a form
of epistemic logic. Harmanec et al., however, use a more general system of modal
logic and also address the completeness of the interpretation. Ruspini’s approach,
instead, is a generalization of the method proposed by Carnap [?] for the develop-
ment of logical foundations of probability theory.
Modal logic is a type of formal logic primarily developed in the 1960s that
extends classical propositional logic to include operators expressing modality3 .
Modalities are formalised via modal operators. In particular, modalities of truth
include possibility (‘It is possible that p’, p) and necessity (‘It is necessary that p’,
p). These notions are often expressed using the idea of possible worlds: neces-
sary propositions are those which are true in all possible worlds, whereas possible
propositions are those which are true in at least one possible world.
Formally, the language of modal logic consists of a set of atomic propositions,
logical connectives ¬, ∨, ∧, →, ↔, and modal operators of possibility  and neces-
sity . Sentences or propositions of the language are of the following form:
1. atomic propositions;
2. if p and q are propositions, so are ¬p, p ∧ q, p ∨ q, p → q, p ↔ q, p, and p.
A standard model of modal logic is a triplet M = hW, R, V i, where W denotes
a set of possible worlds, R is a binary relation on W called accessibility relation
(e.g. world v is accessible from world w when wRv), and V is the value assignment
function V (w, p) ∈ {T, F }, whose output is the truth value of proposition p in world
w. The accessibility relation expresses the fact that some things may be possible
in one world and impossible from the standpoint of another. Different restrictions
on the accessibility relation yield different classes of standard models. A standard
model M is called a T-model if R is reflexive.
The notation kpkM denotes the truth set of a proposition p (what we call above
‘hyper-interpretation’), i.e. the set of all worlds in which p is true:

kpkM = w w ∈ W, V (w, p) = T .

(5.13)

In [595, 596], a modal logic interpretation of Dempster-Shafer theory on finite


universes or decision spaces Θ was proposed in terms of finite models, i.e. models
with a finite set of worlds.
Consider propositions eA of the form: ‘a given incompletely characterized element
θ is classified in set A’, where θ ∈ Θ, A ∈ 2Θ .
Proposition 32. [?] A finite T-model M = hW, R, V i that satisfies the Singleton
Valuation Assumption (one and only one proposition e{θ} is true in each world,
3
https://en.wikipedia.org/wiki/Modal logic
5.6 Logic 185

SVA) induces a plausibility measure P lM and a belief measure bM on 2Θ , defined


by:
|keA kM | |k  eA kM |
bM (A) = , plM (A) = . (5.14)
|W | |W |
Proposition 32 states that the belief value of A is the proportion of worlds in which
the proposition ‘Θ belongs to A’ is considered necessary, while its plausibility is
the proportion of worlds in which this is considered possible.
The SVA amounts to saying the each world in model M gives its own unique answer
to the classification question. Furthermore:

Proposition 33. [595] A finite T-model M = hW, R, V i that satisfies SVA induces
a basic probability assignment mM on 2Θ , defined by:

|kEA kM |
mM (A) =
|W |

where  ^ 
EA = eA ∧ (¬(eB )) .
B⊂A

Proposition 34. [595] The modal logic interpretation of basic probability assign-
ments introduced in Proposition 33 is complete, i.e. for every rational-valued basic
probability assignment m on 2Θ , there exists a finite T-model M satisfying SVA such
that mM = m.

In order to develop a modal logic interpretation on arbitrary universes, [596]


adds to a model M a probability measure on the set of possible worlds.
In a series of papers by Tsiporkova et al [1345, 1344, 1347], the modal logic inter-
pretation of Harmanec et al. is further developed by building multivalued mappings
inducing belief measures. A modal logic interpretation of Dempster’s rule is also
proposed.
Other publications on the modal logic interpretation include [523, 56, 1084,
599].

5.6.7 Probability of provability

The Dempster-Shafer theory of evidence can be conceived as a theory of probabil-


ity of provability, as it can be developed on the basis of assumption-based reasoning
[1023, 1236, 89]. The interpretation was first brought forward by Pearl [1023], al-
though Ruspini [1100] had studied a similar problem in terms of the probability of
knowing.
Conceptually, as Smets notes, the probability of provability approach is not different
from the original framework by Dempster [336], but better explains the origin of the
rule of conditioning.
186 5 The bigger picture

Within this approach, Besnard and Kohlas [89] model reasoning by consequence
relations in the sense of Tarski, showing that it is possible to construct evidence the-
ory on top of the very general logics defined by these consequence relations. Support
functions can be derived which are, as usual, set functions, monotone of infinite or-
der. Furthermore, plausibility functions can also be defined. However, as negation
need not be defined in these general logics, the usual duality relations between sup-
port and plausibility functions of Dempster-Shafer theory do not hold in general.

5.6.8 Other logical frameworks

A great deal of other logic-based frameworks have been proposed [1097, 49, 1100,
1555, 1101, 80, 1054, 328] [548, 1011, 624, 590, 586, 581] [87, 1534, 32, 639, 591]
[569, 979, 970, 1557, 1110, 8] [93, 875, 78, 614, 180, 1410] [187, 190, 189].

Incidence calculus Incidence calculus [138] is a probabilistic logic for dealing


with uncertainty in intelligent systems. Incidences are assigned to formulae: they
are the logic conditions under which the formula is true. Probabilities are assigned
to incidences, and the probability of a formula is computed from the sets of inci-
dences assigned to it. In [873] Liu, Bundy et al. propose a method for discovering
incidences that can be used to calculate mass functions for belief functions.

5.7 Rough sets


First described by Polish computer scientist Zdzisaw I. Pawlak, rough sets [?] are a
very popular mathematical description of uncertainty, which is strongly linked to the
idea of partition of the universe of hypotheses. They provide a formal approximation
of a crisp set (i.e., traditional set) in terms of a pair of sets which give a lower and
an upper approximation of the original set.

5.7.1 Pawlak’s rough sets algebras

Let Θ be a finite universe, and R ⊆ Θ × Θ be an equivalence relation which parti-


tions it into a family of disjoint subsets Θ/R, called elementary sets. We can then
call definable or ‘measurable’ sets σ(Θ/R) the unions of one or more elementary
sets, plus the empty set ∅.
The lower approximation apr(A) makes use of the definable elements whose
equivalence classes are contained in A. Dually, the upper approximation apr(A)
elements is based on definable elements whose equivalence classes have nonempty
intersection with A.

5.7.2 Belief functions and rough sets

The relationship between belief functions and rough set algebras was studied by Yao
and Lingras [1512]. Indeed, some very highly cited papers focus on this topic [].
5.8 Probability boxes 187

In a Pawlak rough set algebra, the qualities of lower and upper approximations
of a subset A ⊆ Θ are defined as:
. |apr(A)| . |apr(A)|
q(A) = , q(A) = .
|Θ| |Θ|
Clearly, the qualities of lower/upper approximations measure the fraction of defin-
able elements, over all those definable on Θ, involved in the approximation. This
recalls Harmanec’s modal logical interpretation of belief functions (5.14). Indeed, it
can be proven that Pawlak’s rough set algebra corresponds to modal logic S5 [1509],
in which the lower and upper approximation operators correspond to the necessity
and possibiliy operators. Furthermore:
Proposition 35. The quality of lower approximation q is a belief function, with ba-
|E|
sic probability assignment m(E) = |Θ| for all E ∈ Θ/R, 0 otherwise.
One issue with this interpretation is that belief and plausibility values obviously
need to be rational numbers. Therefore, given an arbitrary belief function Bel, it
may not be possible to build a rough set algebra such that q(A) = Bel(A).
The following results establishes a sufficient condition under which this is possible.
Proposition 36. Suppose Bel is a belief function on Θ with mass m such that:
1. the set of focal elements of Bel is a partition of Θ;
|A|
2. m(A) = |Θ| for every focal element of Bel.
Then there exists a rough sets algebra such that q(A) = Bel(A).
A more general condition can be established via inner and outer measures
(Chapter 4, Section 3.1.3).
Given a σ-algebra F of subsets of Θ, one can construct a rough set algebra such
that F = σ(Θ/R). Suppose P is a probability on F - then it can be extended to 2Θ
using inner and outer measures as follows:
n o
P∗ (A) = sup P (X)|X ∈ σ(Θ/R), X ⊆ A = P (apr(A))
n o
P ∗ (A) = sup P (X)|X ∈ σ(Θ/R), X ⊇ A = P (apr(A)).

Pawlak call these ‘rough probabilities’ of A - in fact, these are a pair of belief and
plausibility functions!
Note that the set of focal elements Θ/R is a partition of the universe (frame) Θ.
Therefore, Pawlak rough set algebra can only interpret belief function whose focal
elements form a partition of the frame of discernment. Nevertheless further gener-
alisations via serial rough algebras and interval algebras can be achieved [1512].

5.8 Probability boxes


Probability boxes [472, 1494] arise from the need in reliability analysis to assess the
probability of failure of a system, expressed as:
188 5 The bigger picture
Z
PX (F ) = f (x)dx,
F

where f (x) is a probability density function (PDF) of the variables x representing


materials and structure, and F is the failure region of values in which the structure
is unsafe. Unfortunately, the available information is usually insufficient to define
accurately the sought joint PDF f . Random sets (Section 3.1.5) and imprecise prob-
ability theories can then be useful to model this uncertainty.
Recall that, let P be a probability measure on the real line R, its Cumulative
Distribution Function (CDF) is a non-decreasing mapping from R to [0, 1], denoted
by FP , such that for any r ∈ R, FP (r) = P ((−∞, r]).
It is thus quite natural to describe the uncertainty on f as a pair of lower and up-
per bounds to the associated CDF, representing the epistemic uncertainty about the
random variable.

Definition 67. A probability box or p-box [1131, 472] hF , F i is a class of Cumu-


lative Distribution Functions (CDFs):
n o
hF , F i = F CDF F ≤ F ≤ F

delimited by upper and lower CDF bounds F and F .

5.8.1 Probability boxes and belief functions

P-boxes and random sets/belief functions are very closely related. Indeed, every pair
of belief/plausibility functions Bel, P l defined on the real line R (a random set),
generates a unique p-box whose CDFs are all those consistent with the evidence
generating the belief function:

F (x) = Bel((−∞, x]), F (x) = P l((−∞, x]). (5.15)

Conversely, every probability box generates an entire equivalence class of ran-


dom intervals consistent with it [681]. A p-box can be discretised to obtain from it a
random set which approximates it, but this discretization is not unique [1131, 587].
For instance [681], given a probability box hF , F i a random set on the real line can
be obtained with as focal elements the following infinite collection of intervals of
R:
−1
n o
F = γ = [F (α), F −1 (α)] ∀α ∈ [0, 1] , (5.16)

where
−1 . .
F (α) = inf{F (x) ≥ α}, F −1 (α) = inf{F (x) ≥ α}
are the ‘quasi-inverses’ of the upper and lower CDFs F and F , respectively.
5.8 Probability boxes 189

5.8.2 Approximate computations for random sets

In an infinite random set belief and plausibility values are computed via the follow-
ing integrals:
Z Z
Bel(A) = I[Γ (ω) ⊂ A]dP (ω) P l(A) = I[Γ (ω) ∩ A 6= ∅]dP (ω),
ω∈Ω ω∈Ω
(5.17)
where Γ : Ω → 2Θ is the multi-valued mapping generating the random set (see
Chapter 3, Section 3.1.5). This is not trivial at all - however, we can use the p-box
representation of infinite random sets (5.15), with set of focal elements (5.16), to
compute approximations of similar integrals [25]. The idea is to index each of its
focal elements by a number α ∈ [0, 1].
Consider then the unique p-box (5.15) associated with the random set Bel. If
there exists a cumulative distribution function Fα for α over [0, 1] we can draw
values of α at random from it, obtaining sample focal elements of the underlying
random set (Figure 5.3). We can then compute the belief and plausibility integrals
(5.17) by adding the mass of the sample intervals.

Fig. 5.3. A p-box amounts to a multi-valued mapping associating values α ∈ [0, 1] with
closed intervals γ of R, i.e., focal elements of the underlying random set [25].

Using this sampling representation we can also approximately compute the


Dempster’s combination of d input random sets.
Each selection of one focal element from each random set is denoted by the vec-
tor α of corresponding indices αi : α = [α1 , ..., αd ] ∈ (0, 1]d . Suppose a copula C
(i.e., a probability distribution whose marginals are uniform) is defined on the unit
hypercube where α lives. We can then use it to compute the desired integrals we
follows: Z
PΓ (G) = dC(α).
α∈G

The joint focal element can be represented either by the hypercube γ = ×di=1 γi ⊆
X (Figure 5.4-left) or by the point α = [α1 , ..., αd ] ∈ (0, 1]d (Figure 5.4-right).
190 5 The bigger picture

Fig. 5.4. X representation (left) and α representation (right) of the focal elements sampled
from a p-box [25].

If all input random sets are independent, these integrals decompose into a series of
d nested integrals (see [25], Equation (36)).
Alvarez [25] has proposed the following Monte-Carlo approach to their calcula-
tion. For j = 1, ..., n:
1. randomly extract a sample αj from the copula C;
2. form the corresponding focal element Aj = ×i=1,...,d γid ;
3. assign to it mass m(Aj ) = n1 .
It can be proven that such an approximation converges as n → +∞ almost surely
to the actual random set.

5.8.3 Generalised probability boxes

According to Destercke et al [381], probability boxes are not adequate to compute


the probability that some output remains close to a reference value ρ, which corre-
sponds to computing upper and lower estimates of the probability of events of the
form |x̃ − ρ| < . In response they developed generalizations of p-boxes to arbitrary
(finite) spaces, which can address such type of query.
Note that any two cumulative distribution functions F, F 0 modelling a p-box are
comonotonic, i.e., for any x, y ∈ X we have that F (x) < F (y) implies F 0 (x) <
F 0 (y).
Definition 68. A generalized p-box hF , F i on X is a pair of comonotonic mappings
F : X → [0, 1], F : X → [0, 1] such that F (x) ≤ F (x) for all x ∈ X, and there
exists at least one element x ∈ X such that F (x) = F (x) = 1.
Basically, there exists a permutation of the elements of X such that the bounds
of a generalised p-box become CDFs defining a ‘traditional’ p-box. A generalised
p-box is associated with a collection of nested sets4 Ay = {x ∈ X : F (x) ≤
F (y), F (x) ≤ F (y)}, which are naturally associated with a possibility distribution.
4
Author’s notation.
5.9 Spohn’s theory of epistemic beliefs 191

While generalising p-boxes, these objects are a special case of random sets (∞-
monotone capacities) and thus a special case of probability intervals (Figure 5.5).

Fig. 5.5. Generalised p-boxes in the (partial) hierarchy of uncertainty measures [381].

5.9 Spohn’s theory of epistemic beliefs


In Spohn’s theory of epistemic beliefs, under an epistemic state for a variable X
some propositions are believed to be true (or ‘believed’), while some others are
believed to be false (or ‘disbelieved’), and the remainder are neither believed nor
disbelieved.

5.9.1 Epistemic states

A number of conditions are required to guarantee logical consistency. Let ΘX be


the set of possible values of X.

Definition 69. An epistemic state is said to be consistent if the following five axioms
are satisfied:
1. for any propositions A, exactly one of the following conditions holds: (i) A is
believed; (ii) A is disbelieved; (iii) A is neither believed nor disbelieved;
2. ΘX is (always) believed;
3. A is believed if and only if Ac is disbelieved;
4. if A is believed and B ⊇ A, then B is believed;
192 5 The bigger picture

5. if A and B are believed, then A ∩ B is believed.


Let B denote the set of all subsets of ΘX that are believed under a given epistemic
state. Then:
Proposition 37. The epistemic state is consistent if and only if there exists a unique
nonempty subset C of ΘX such that B = {A ⊆ ΘX : A ⊇ C}.
We can note the similarity with the definition of consistent belief functions (Section
9.4), whose focal elements are also contrained to contain a common intersection
subset.

5.9.2 Disbelief functions and Spohnian belief functions


The basic representation of an epistemic state in Spohn’s theory is called an ordinal
conditional function in [1290], p. 115, and a natural conditional function in [1291],
p. 316. Shenoy calls this function a disbelief function.
Formally, let g denote a finite set of variables, and Θg the joint frame (set of
possible values) for the variables in the collection g.
Definition 70. (from [1291], p. 316) A disbelief function for g is a function δ :
2Θg → N+ such that:
1. δ(θ) ∈ N for all θ ∈ Θg ;
2. there exists θ ∈ Θg such that δ(θ) = 0;
3. for any A ⊂ Θg , A 6= ∅

δ(A) = min δ(θ), θ ∈ A ;
4. δ(∅) = +∞.
Note that, just like a possibility measure, a disbelief function is completely deter-
mined by its values on the singletons of the frame Θg .
A proposition A is believed in the epistemic state represented by δ iff A ⊇ C,
where C = {θ : δ(θ) = 0} (or, equivalently, iff δ(Ac ) > 0). A is disbelieved
whenever δ(A) > 0; and the proposition is neither believed nor disbelieved iff
δ(A) = δ(Ac ) = 0.
The quantity δ(Ac ) can thus be interpreted as the degree of belief of A. As a conse-
quence, a disbelief function models degrees of disbelief for disbelieved propositions
directly, whereas it models degrees of belief for believed propositions only indi-
rectly. Spohnian belief functions [] can model both beliefs and disbeliefs directly.
Definition 71. A Spohnian belief function for g is a function β : 2Θg → Z+ such
that: 
−δ(A) δ(A) > 0
β(A) =
δ(Ac ) δ(A) = 0
for all A ⊆ Θg , where δ is some disbelief function for g.
A disbelief function can be uniquely recovered from a given (Spohnian) belief func-
tion - the latter enjoys a number of desirable properties [1184].
5.10 Zadeh’s Generalized Theory of Uncertainty (GTU) 193

5.9.3 α-conditionalisation

Spohn proposed the following rule for modifying a disbelief function in light of new
information.

Definition 72. ([1290], p. 117) Suppose δ is a disbelief function for g representing


our initial epistemic state. Suppose we learn something about contingent proposi-
tion A (or Ac ) that consequently leads us to believe A to degree α (or, equivalently,
disbelieve Ac to degree α), where α ∈ N.
The resulting epistemic state, called the A, α-conditionalization of δ and denoted
by disbelief function δA,α , is defined as:

δ(θ) − δ(A) θ∈A
δA,α (θ) =
δ(θ) + α − δ(Ac ) θ 6∈ A,

for all θ ∈ Θg .

As stated by Shenoy [1184], Spohn’s theory of epistemic beliefs shares the essential
abstract features of probability theory and of Dempster-Shafer theory, in particular:
(1) a functional representation of knowledge (or beliefs), (2) a rule of marginaliza-
tion, and (3) a rule of combination.
Shenoy [1184] goes on to show that disbelief functions can also be propagated via
local computations as shown in Chapter 4, Section 4.4.4 for belief functions.

5.10 Zadeh’s Generalized Theory of Uncertainty (GTU)


In Zadeh’s Generalized or perception-based [1537] Theory of Uncertainty (GTU)
[1540] the assumption that information is statistical in nature is replaced by a much
more general thesis that information is a generalized constraint (as opposed to stan-
dard constraints, of the form X ∈ C), with statistical uncertainty being a special,
albeit important case. A generalized constraint has the form GC(X) : XisrR, where
r ∈ { blank, probabilistic, veristic, random set, fuzzy graph, etc } is a label which
determines the type of constraint, and R a constraining relation of that type (could
be a probability distribution, a random set, etc).
This adds the capability to operate on perception-based information, for in-
stance: ‘Usually Robert returns from work at about 6 p.m.’, or ‘It is very unlikely
that there will be a significant increase in the price of oil in the near future’. General-
ized constraints serve to define imprecise probabilities, utilities and other constructs,
and generalized constraint propagation is employed as a mechanism for reasoning
with imprecise probabilities as well as for computation with perception-based infor-
mation.
Given the way it is defined, a generalised constraint is obviously a generalisa-
tion of the notion of belief function as well, albeit a rather nomenclative one. Zadeh
claims that, in this interpretation, the Dempster-Shafer theory of evidence is a the-
ory of a mixture of probabilistic and possibilistic constraints [1540], while GTU
194 5 The bigger picture

Fig. 5.6. Quantisation (left) versus granularisation (right) of variable ‘Age’ (from [1538]).

embraces all possible mixtures and therefore accomodates most theories of uncer-
tainty.
Secondly, bivalence is abandoned throughout GTU, and the foundation of GTU
is shifted from bivalent logic to fuzzy logic. As a consequence, in GTU everything is
or is allowed to be a matter of degree or, equivalently, fuzzy. All variables are, or are
allowed to be ‘granular’, with a granule being ‘a clump of values ... which are drawn
together by indistinguishability, equivalence, similarity, proximity or functionality’
(see Figure 5.6).
Thirdly, one of the principal objectives of GTU is the capability to operate on
information described in natural language. As a result, a generalized constraint lan-
guage (GCL) is defined as the set of all generalized constraints together with the
rules governing syntax, semantics and generation. Examples of elements of GCL
are: (X is small) is likely; ((X, Y ) isp A) ∧ (X is B), where ‘isp’ denotes a proba-
bilistic constraint, ‘is’ denotes a possibilistic constraint, and ∧ denotes conjunction.
Eventually, in GTU computation/deduction is treated as an instance of question-
answering. Given a system of propositions described in a natural language p, and a
query q, likewise expressed in a natural language, GTU performs generalized con-
straint propagation governed by deduction rules that, in Zadeh’s words, ‘drawn from
the Computation/Deduction module. The Computation/Deduction module com-
prises a collection of agent-controlled modules and submodules, each of which con-
tains protoformal deduction rules drawn from various fields and various modalities
of generalized constraints’.
In the author’s view, generality is achieved by GTU in a rather nomenclative
way, which explains the complexity and lack of naturalness of the formalism.
5.11 Baoding Liu’s Uncertainty Theory 195

5.11 Baoding Liu’s Uncertainty Theory


Liu’s Uncertainty Theory [860, 2] is based on the notion of uncertain measure,
defined as a function M on the σ-algebra F of events over a non-empty set Θ5
which obeys the following axioms:
1. M(Θ) = 1 (normality);
2. M(A1 ) ≤ M(A2 ) whenever A1 ⊂ A2 (monotonicity);
3. M(A) + M(Ac ) = 1 (‘self-duality’);
4. For every countable sequence of events {Ai }, we have (countable subadditiv-
ity):
∞ ∞
!
[ X
M Ai ≤ M(Ai ).
i=1 i=1

Liu’s uncertain measures are supposed to formalise subjective degrees of belief,


rather than empirical frequencies, and his theory is therefore an approach to subjec-
tive probability. Self-duality (axiom 3.) is justified by the author as consistent with
the law of excluded middle.
Clearly, uncertain measures are monotone capacities (see Definition 61). Just as
clearly, probability measures do satisfy these axioms. However, Liu claims, proba-
bility theory is not a special case of his formalism since probabilities do not satisfy
the product axiom:
∞ ∞
!
Y ^
M Ak = Mk (Ak )
k=1 k=1

for all Cartesian products of events from individual uncertain spaces (Θk , Fk , Mk ).
However, the product axiom was introduced by Liu only in 2009 [2] (much after his
introduction of uncertain theory in 2002 [3]). Also, extension of uncertain measures
to any subset of a product algebra is rather cumbersome and unjustified (see Equa-
tion (1.10) in [860], or Figure 5.7, extracted from Figure 1.1 in [860]). More in
general, the justification the author provides for his choice of axioms is somewhat
lacking.
Based on such measures a straightforward generalisation of random variables
can then be defined (‘uncertain variables’), as measurable (in the usual sense) func-
tions from an uncertainty space (Θ, F, M) to the set of real numbers.
A set ξ1 , ..., ξn of uncertain variables are said to be independent if:
m
!
\
M {ξi ∈ Bi } = min M({ξi ∈ Bi }).
i
i=1

Just like a random variable may be characterized by a probability density function,


and a fuzzy variable is described by a membership function, uncertain variables
are characterised by identification functions (λ, p), which do not exist, however, for
5
The original notation has been changed to fit that adopted in this book.
196 5 The bigger picture

Fig. 5.7. Extension of rectangle to product algebras in Liu’s uncertain theory (from [860]).
The uncertain measure of Λ (the disk) is the size of the inscribed rectangle Λ1 × Λ2 if the
latter is greater than 0.5. Otherwise, if the inscribed rectangle of Λc is greater than 0.5, then
M(Λc ) is just its inscribed rectangle and M(Λ) = 1 − M(Λc ). If no inscribed rectangle
exists for either Λ or Λc which is greater than 0.5, then we set M(Λ) = 0.5.

every uncertain variable and are subject to rather complex and seemingly arbitrary
axioms (see [2], Definition 4), e.g.:
Z Z
sup λ(x) + ρ(x)dx ≥ 0.5 and/or sup λ(x) + ρ(x)dx ≥ 0.5, (5.18)
B B Bc Bc

where λ and ρ and nonnegative functions on the real line, and B is any Borel set of
real numbers. Equation (5.18) resounds the extension method of Figure 5.7. While
an uncertain entropy and an uncertain calculus are built by Liu on this basis, the
general lack of rigour and convincing justification for a number of elements of the
theory leaves the author of this book quite unimpressed with this work. No mention
of belief functions or other well-established alternative representations of subjective
probabilities is made in [860].

5.12 Other formalisms


5.12.1 Info-gap decision theory

[77]
http://www.sciencedirect.com/science/article/pii/S0301479707000448

5.12.2 Vovk and Shafer’s game theoretical framework

5.12.3 Others
5.12 Other formalisms 197

Endorsements In 1983, Cohen and Grinberg proposed a ‘theory of endorsements’


for reasoning about uncertainty [201, 200], resting on a representation of states of
certainty called endorsements. The authors claimed that ‘numerical representations
of certainty hide the reasoning that produces them and thus limit ones reasoning
about uncertainty’, and that, while easy to propagate, numbers have unclear mean-
ings. Numerical approaches to reasoning under uncertainty, in their view, are re-
stricted because the set of numbers is not a sufficiently rich representation to sup-
port considerable heuristic knowledge about uncertainty and evidence. The authors’
main claim is that there is more about evidence than by ‘how much’ it is believed,
but other aspects need to be taken into account, including context. Namely, justifi-
cations need to be provided in support of evidence. They argue that different kinds
of evidence should be distinguished by an explicit record of what makes them dif-
ferent, not by numbers between 0 and 1.

Fril-fuzzy Baldwin et al [54] presented a a theory of uncertainty, consistent with


and combining the theories of probability and fuzzy sets. The theory extends the
logic programming form of knowledge representation to include uncertainties such
as probabilistic knowledge and fuzzy incompleteness. Applications to knowledge
engineering including expert and decision-support systems, evidential and case-
based reasoning, fuzzy control and databases were illustrated.

Granular computing Y.Y. Yao [1511] surveyed granular computing (GrC), in-
tended as a set of theories and techniques which make use of granules, i.e., groups
or clusters of concepts. In [1511] the author discussed basic issues of GrC, focussing
in particular on the construction of and computation with granules. A set-theoretic
model of granular computing was proposed, based on the notion of power algebras.

Laskey’s assumptions In [812], Laskey demonstrated a formal equivalence be-


tween belief theory and assumption-based truth maintenance (ATMS), so that any
Dempster-Shafer inference network can be represented as a set of ATMS justifica-
tions with probabilities attached to assumptions. A proposition’s belief is equal to
the probability of its label conditioned on label consistency. In [812] an algorithm
is given for computing these beliefs. When the ATMS is used to manage beliefs,
non-independencies between nodes are automatically and correctly accounted for.
The approach, the author claims, unifies symbolic and numeric approaches to un-
certainty management.

Shastri’s Evidential Reasoning in Semantic Networks In his PhD thesis, Shastri


[1180] argues that the best way to cope with partial and incomplete information is
to adopt an evidential form of reasoning, wherein, inference does not involve estab-
lishing the truth of a proposition but rather finding the most likely hypothesis from
among a set of alternatives. In order for inference to take place in real time, we must
provide a computational account of how this may be performed in an acceptable
time frame. Inheritance and categorization within a conceptual hierarchy are iden-
tified as two operations that humans perform very fast, and which lie at the core
198 5 The bigger picture

of intelligent behavior and are precursors to more complex reasoning. These con-
siderations lead to an evidential framework for representing conceptual knowledge,
wherein the principle of maximum entropy is applied to deal with uncertainty and
incompleteness. It is demonstrated that the proposed framework offers a uniform
treatment of inheritance and categorization, and can be encoded as an interpreter-
free, connectionist network.
In [1181] the author proposes an evidence combination rule which is incremen-
tal, commutative and associative and hence, shares most of the attractive features of
Dempster’s rule, while being ‘demonstrably better’ (in the author’s words) than the
Dempster’s rule in the context considered there.

Evidential confirmation theory Grosof [547] considered the issue of aggregat-


ing measures of confirmatory and disconfirmatory evidence for a common set of
propositions. He showed that a revised MYCIN Certainty Factor [604] (an ad-hoc
method for managing uncertainty then widely used in rule-based expert systems)
and the PROSPECTOR [445] methods are special cases of Dempster-Shafer theory.
The paper also shows that by using a non-linear but invertible transformation, we
can interpret a special case of Dempster’s rule in terms of conditional independence.
This unified approach resolves the ‘take-them-or-leave-them’ problem with priors:
MYCIN had to leave them out, while PROSPECTOR had to have them in.

Groen’s extension of Bayesian theory Groen and Mosleh (2004) [545] have pro-
posed an extension of Bayesian theory based on a view of inference according to
which observations are used to rule out possible valuations of the variables. The
extension is different from probabilistic approaches such as Jeffrey’s rule (see Sec-
tion 4.3.3), in which certainty in a single proposition A is replaced by a probability
on a disjoint partition of the universe, and Cheeseman’s rule of distributed mean-
ing [179], while non-probabilistic analogues are found in evidence and possibility
theory.

Inferential models In [919], Martin and Liu presented a new framework for proba-
bilistic statistical inference without priors, alternative to Fisher’s fiducial inference,
belief function theory and Bayesian inference with default priors, based on infer-
ential models (IMs). The framework provides data-driven probabilistic measures of
uncertainty about an unknown parameter, and does so with an automatic long-run
frequency calibration. The approach identifies an unobservable auxiliary variable,
associated with observable data and unknown parameter, and predicts it using a ran-
dom set before conditioning on data.

Padovitz’s unifying model In 2006 Padovitz et al. [1005] proposed a novel ap-
proach for representing and reasoning about context in the presence of uncer-
tainty, based on multi-attribute utility theory as the means to integrate heuristics
about the relative importance, inaccuracy and characteristics of sensory informa-
tion. The authors qualitatively and quantitatively compare their reasoning approach
with Dempster-Shafer’s sensor data fusion.
5.12 Other formalisms 199

Similarity-based reasoning (SBR) Similarity-Based Reasoning (SBR) is based


on the principle that ‘similar causes bring about similar effects’. In [615] the author
proposed a probabilistic framework for SBR, based on a ‘similarity profile’ which
provides a probabilistic characterization of the similarity relation between observed
cases (instances). She further develops an inference scheme in which instance-based
evidence is represented in the form of belief functions, casting the combination of
evidence derived from individual cases as an information fusion problem.

Neighborhoods systems A neighborhood system mathematically formalises the


notion of ‘negligible quantity’. Neighborhood systems span topology (topological
neighborhood systems), rough sets (S5-neighborhood systems) and binary relations
(basic neighborhood systems). In [855] the authors study real valued functions based
on neighborhood systems, showing how this covers many important uncertainty
quantities such as belief functions, measures, and probability distributions.

Preference relations In [1428] Wong, Lingras and Yao argue that preference re-
lations can provide a more realistic model of random phenomena than quantitative
probability or belief functions. In order to use preference relations for reasoning un-
der uncertainty, it is necessary to perform sequential and parallel combinations of
propagated information in a qualitative inference network, which are discussed in
[1428].

Comparative belief structures Comparative belief is a generalization of com-


parative probability. In [1432] the authors provide an axiomatic system for belief
relations, and show that within this system there are belief functions which almost
agree with comparative beliefs.
Part II

The geometry of uncertainty


The geometry of belief functions
6
When one tries and apply the theory of evidence to classical computer vision prob-
lems, a number of important issues arise. Object tracking [288], for instance, con-
sists in estimating at each time instant the current configuration or ‘pose’ of a mov-
ing object from a sequence of images of the latter. Image features can be represented
as belief functions and combined to produce an estimate q̂(t) ∈ Q̃ of the object’s
pose, where Q̃ is a finite approximation of the pose space Q of the object collected
in a training stage (compare ??, Chapter 8).
Deriving a pointwise estimate from the belief function emerging from the com-
bination is desirable to provide an expected pose estimate - this can be done, for
example, by finding the ‘best’ probabilistic approximation of the current belief es-
timate and computing the corresponding expected pose. This requires a notion of
‘distance’ between belief functions, or between a belief function and a probability
distribution.
In data association [296], a problem described in detail in ??, Chapter 7, the
correspondence between moving points appearing in consecutive images of a se-
quence is sought. Whenever these points belong to an articulated body whose topo-
logical model is known, the rigid motion constraint acting on each link of the body
can be used to obtain the desired correspondence. Since the latter can only be ex-
pressed in a conditional way, the notion of combining conditional belief functions
in a filtering-like process emerges. Conditional belief functions can again be de-
fined in a geometric fashion, as those objects which minimise an appropriate dis-
tance [399, 1191, 713, 685] between the original belief function and the ‘condi-
tional simplex’ associated with the conditioning event A (an approach developed in
[256, ?, 261]).

203
204 6 The geometry of belief functions

From a more general point of view, the notion of representing uncertainty mea-
sures such as belief functions [1380] and probability distributions as points of a
certain space [110, 111, 279, 265, 960] can be appealing, as it provides a picture in
which different forms of uncertainty descriptions are unified in a single geometric
framework. Distances can there be measured, approximations sought, and decom-
positions easily calculated.
It is worth mentioning the work of P. Black, who devoted his doctoral thesis to the
study of the geometry of belief functions and other monotone capacities [110]. An
abstract of his results can be found in [111], where he uses shapes of geometric
loci to give a direct visualization of the distinct classes of monotone capacities. In
particular a number of results about lengths of edges of convex sets representing
monotone capacities are given, together with their ‘size’ meant as the sum of those
lengths.
Black’s work amounts therefore to a geometric analysis of belief functions as
special types of credal sets in the probability simplex.
In opposition, in this Chapter we introduce a geometric approach to the theory of
evidence, in which belief measures and the corresponding basic probability assign-
ments are represented by points in a (convex) belief space, immersed in a Cartesian
space.

Chapter Outline
A central role is played by the notion of belief space B, introduced in Section 6.1, as
the space of all the belief functions one can define on a given frame of discernment.
In Section 6.2 we characterize the relation between the focal elements of a belief
function and the convex closure operator in the belief space. In particular, we show
that every belief function can be uniquely decomposed as a convex combination
of ‘basis’ or ‘categorical’ belief functions, giving B the form of a simplex, i.e., the
convex closure of a set of affinely independent points.
In Section ??, instead, the Moebius inversion lemma (2.3) is exploited to investigate
the symmetries of the belief space. With the aid of some combinatorial results, a
recursive bundle structure of B is proved and an interpretation of its components
(bases and fibers) in term of important classes of belief functions is provided.

6.1 The space of belief functions


Θ|
Consider a frame of discernment Θ and introduce in the Euclidean space R|2
Θ|
an orthonormal reference frame {xA }A∈2Θ . Each vector v ∈ R|2 can then be
expressed in terms of this basis of vectors as:
X
v= vA xA = [vA , A ⊆ Θ]0 .
A⊆Θ

For instance, if the frame of discernment has cardinality three, Θ = {x, y, z}, each
such vector has the form:
6.1 The space of belief functions 205
h i0
v = v{x} , v{y} , v{z} , v{x,y} , v{x,z} , v{y,z} , vΘ .

As each belief function b : 2Θ → [0, 1] is completely specified by its belief values


b(A) on all the subsets of Θ, any such vector v is potentially a belief function, its
component vA measuring the belief value of A: vA = b(A) ∀A ⊆ Θ.
Definition 73. The belief space associated with Θ is the set BΘ of vectors v of
Θ|
R|2 such that there exists a belief function b : 2Θ → [0, 1] whose belief values
correspond to the components of v, for an appropriate ordering of the subsets of Θ.
In the following we will drop the dependency on the underlying frame Θ, and denote
the belief space by B.

6.1.1 The simplex of dominating probabilities


The have a first idea of the shape of the belief space it can be useful to start under-
standing the geometric properties of Bayesian belief functions.
Lemma 2. Whenever p : 2Θ → [0, 1] is a Bayesian belief function defined on a
frame Θ, and B is an arbitrary subset of Θ, we have that:
X
p(A) = 2|B|−1 p(B).
A⊆B
P
Proof. The sum can be rewritten as θ∈B kθ p(θ) where kθ is the number of subsets
A of B containing θ. But kθ = 2|B|−1 for each singleton, so that:
X X
p(A) = 2|B|−1 p(θ) = 2|B|−1 p(B).
A⊆B θ∈B

As a consequence, all Bayesian belief functions are constrained to belong to a


well-determined region of the belief space.
Corollary 1. The set P of all the Bayesian belief functions which can be defined on
a frame of discernment Θ is a subset of the following |Θ| − 1-dimensional region
 X 
Θ |Θ|−1
L = b : 2 → [0, 1] ∈ B s.t. b(A) = 2 (6.1)
A⊆Θ

of the belief space B, which we call the limit simplex1 .


Theorem 7. Given a frame of discernment Θ, the corresponding belief space B is
Θ|
a subset of the region of R|2 ‘dominated’ by the limit simplex L:
X
b(A) ≤ 2|Θ|−1 ,
A⊆Θ

where the equality holds iff b is Bayesian.


1
As it can be proved that L is indeed a simplex, i.e., the convex closure of a number of
affinely independent points (http://www.cis.upenn.edu/ cis610/geombchap2.pdf).
206 6 The geometry of belief functions
P
Proof. The sum A⊆Θ b(A) can be written as

X f
X
b(A) = ai · m(Ai )
A⊆Θ i=1

where f is the number of focal elements of b and ai is the number of subsets of Θ


which include the i-th focal element Ai , namely: ai = |{B ⊂ Θ s.t. B ⊇ Ai }|.
Obviously, ai = 2|Θ\A| ≤ 2|Θ|−1 and the equality holds iff |A| = 1. Therefore:

X f
X f
X
b(A) = m(Ai )2|Θ\A| ≤ 2|Θ|−1 m(Ai ) = 2|Θ|−1 · 1 = 2|Θ|−1 ,
A⊆Θ i=1 i=1

where the equality holds iff |Ai | = 1 for every focal element of b, i.e., b is Bayesian.
It is important to point out that P does not, in general, sell out the limit simplex
L. Similarly, the belief space does not necessarily coincide with the entire region
bounded by L.

6.1.2 Dominating probabilities and L1 norm

Another hint on the structure of B comes from a particular property of Bayesian


belief functions with respect to the classical L1 distance in the Cartesian space R|Θ| .
Let Cb denote the core of a belief function b, and introduce the following order
relation:
b ≥ b0 ⇔ b(A) ≥ b0 (A) ∀A ⊆ Θ. (6.2)
Lemma 3. If b ≥ b0 , then Cb ⊆ Cb0 .
Proof. Trivially, since b(A) ≥ b0 (A) for every A ⊆ Θ, that holds for Cb0 too, so that
b(Cb0 ) = 1. But then, Cb ⊆ Cb0 .
Theorem 8. If b : 2Θ → [0, 1] is a belief function defined on a frame Θ, then
X
kb − pkL1 = |b(A) − p(A)| = const
A⊆Θ

for every Bayesian belief function p : 2Θ → [0, 1] dominating b according to order


relation (6.2).
Proof. Lemma 3 guarantees that Cp ⊆ Cb , so that p(A) − b(A) = 1 − 1 = 0 for
every A ⊇ Cb . On the other hand, if A ∩ Cb = ∅ then p(A) − b(A) = 0 − 0 = 0. We
are left with sets which amount to the union of a non-empty proper subset of Cb and
an arbitrary subset of Θ \ Cb . Given A ⊆ Cb there exist 2|Θ\Cb | subsets of the above
type which contain A. Therefore:
X  X X 
|b(A) − p(A)| = 2|Θ\Cb | p(A) − b(A) .
A⊆Θ A⊆Cb A⊆Cb
6.1 The space of belief functions 207

Finally, by Lemma 2 the latter is equal to:


 
. |Θ\Cb | |Cb |−1 X
f (b) = 2 2 −1− b(A) . (6.3)
A⊆Cb

The L1 distance (7.20) between a belief function and any Bayesian belief func-
tion p dominating it is not a function of p, and depends only on b. A probability
distribution satisfying the hypothesis of Theorem 8 is said to be consistent with b
[801]. Ha et al. [567] proved that the set P[b] of probability measures consistent
with a given belief function b can be expressed (in the probability simplex P) as the
sum of the probability simplexes associated with its focal elements Ai , i = 1, ..., k,
weighted by the corresponding masses:
k
X
P[b] = m(Ai )conv(Ai )
i=1

where conv(Ai ) is the convex closure of the probabilities {pθ : θ ∈ Ai } assigning


mass 1 to a single element θ of Ai . The analytical form of the set P[b] of consistent
probabilities has been further studied in [284].

6.1.3 Exploiting the Moebius inversion lemma

These preliminary results suggest that the belief space may have the form of a sim-
plex. To proceed in our analysis we need to resort to the axioms of basic probability
assignments (Definition 2).
Given a belief function b, the corresponding basic probability assignment can be
found by applying the Moebius inversion lemma (2.3), which we recall here:
X
m(A) = (−1)|A\B| b(B). (6.4)
B⊆A

Θ|
We can exploit it to determine whether a point b ∈ R|2 corresponds indeed to a
belief function, by simply computing the related b.p.a. and checking whether the
resulting m meets the axioms b.p.a.s
P must obey.
The normalization constraint A⊆Θ m(A) = 1 trivially translates into B ⊆ {b :
b(Θ) = 1}. The positivity condition is more interesting, for it implies an inequality
which echoes the third axiom of belief functions (cf. Definition 25 or [1149], page
5):
X X
b(A) − b(B) + · · · + (−1)|A\B| b(B) + · · ·
B⊆A,|B|=|A|−1 |B|=k
|A|−1
X (6.5)
· · · + (−1) b({θ}) ≥ 0 ∀A ⊆ Θ.
θ∈Θ
208 6 The geometry of belief functions

Example: ternary frame Let us see how these constraints act on the belief space
in the case of a ternary frame Θ = {θ1 , θ2 , θ3 }. After introducing the notation

x = b({θ1 }), y = b({θ2 }), z = b({θ3 }),


u = b({θ1 , θ2 }), v = s({θ1 , θ3 }), w = b({θ2 , θ3 })

the positivity constraint (6.5) can be rewritten as




 x ≥ 0, u ≥ (x + y)




 y ≥ 0, v ≥ (x + z)


B: (6.6)
z ≥ 0, w ≥ (y + z)








1 − (u + v + w) + (x + y + z) ≥ 0.

Note that b(Θ) is not needed as a coordinate, for it can be recovered by normal-
ization. By combining the last equation in (6.6) with the others, it follows that the
belief space B is the set of points [x, y, z, u, v, w]0 of R6 such that:

0 ≤ x + y + z ≤ 1, 0 ≤ u + v + w ≤ 2.
.
After defining k = x + y + z, it necessary follows that points of B ougth to meet:

u ≥ (x + y), v ≥ (x + z), w ≥ (y + z), 2k ≤ u + v + w ≤ 1 + k.

6.1.4 Convexity of the belief space

Now, all the positivity constraints of Equation (6.5) (which determine the shape of
the belief space B) are of the form:
X X
xi ≥ xj
i∈G1 j∈G2

where G1 and G2 are two disjoint sets of coordinates, as the above example and
Equation (6.6) confirm. It immediately follows that:

Theorem 9. The belief space B is convex.

Proof. Let us consider two points of the belief space b0 , b1 ∈ B (two belief func-
tions) and prove that all the points bα of the segment b0 + α(b1 − b0 ), 0 ≤ α ≤ 1,
belong to B. Since b0 , b1 belong to B:
X X X X
x0i ≥ x0j , x1i ≥ x1j
i∈G1 j∈G2 i∈G1 j∈G2

|Θ|
where x0i , x1i are the i-th coordinates in R2 of b0 , b1 , respectively. Hence, for
every point bα with coordinates xα i we have that:
6.1 The space of belief functions 209
X X X X

i = [x0i + α(x1i − x0i )] = x0i + α (x1i − x0i )
i∈G1 i∈G1 X X i∈G1 i∈G
X1 X
= (1 − α) x0i + α x1i ≥ (1 − α) x0j + α x1j
X i∈G1 i∈G1 X j∈G2 j∈G2
= [x0j + α(x1j − x0j )] = xα
j,
j∈G2 j∈G2

hence bα meets the same constraints. Therefore, B is convex.

Belief functions and coherent lower probabilities It is well-known that belief


functions are a special type of coherent lower probabilities (see Chapter 5, Sec-
tion 5.1.1), which in turn can be seen as a sub-class of lower previsions (consult
[1371], Section 5.13). Walley proved that coherent lower probabilities are closed
under convex combination — this implies that convex combinations of belief func-
tions (completely monotone lower probabilities) are still coherent.
Theorem 9 is a stronger result, stating that they are also completely monotone.

6.1.5 Symmetries of the belief space

In the ternary example 6.1.3, the system of equations (6.6) exhibits a natural symme-
try which reflects the intuitive partition of the variables in two sets, each associated
with subsets of Θ of the same cardinality, respectively {x, y, z} ∼ |A| = 1 and
{u, v, w} ∼ |A| = 2.
It is easy to see that the symmetry group of B (i.e., the group of transformations
which leave the belief space unchanged) is the permutation group S3 , acting onto
{x, y, z} × {u, v, w} via the correspondence:

x ↔ w, y ↔ v, z ↔ u.

This observation can be extended to the general case of a finite n-dimensional frame
Θ = {θ1 , · · · , θn }. Let us adopt here for sake of simplicity the following notation:
.
xi xj ...xk = b({θi , θj , ..., θk }).

The symmetry of the belief space in the general case is described by the following
logic expression:
_ n−1
^ ^
xi xi1 · · · xik−1 ↔ xj xi1 · · · xik−1 ,
1≤i,j≤n k=1
{i1 , ..., ik−1 } ⊂ {1, ..., n} \ {i, j}
WV
where ( ) denotes the logical or (and), while ↔ indicates the permutation of pairs
of coordinates.
To see this, let us rewrite the Moebius constraints using the above notation:
k−1
X X
xi1 · · · xik ≥ (−1)k−l+1 xj1 · · · xjl .
l=1 {j1 ,...,jl }⊂{i1 ,...,ik }
210 6 The geometry of belief functions

Focussing on the right side of the equation, it is clear that only a permutation be-
tween coordinates associated with subsets of the same cardinality may leave the
inequality inalterate.
Given the triangular form of the system of inequalities (the first group concerning
variables of size 1, the second one variables of size 1 and 2, and so on), permuta-
tions of size-k variables are bound to be induced by permutations of variables of
smaller size. Hence, the symmetries of B are determined by permutations of single-
tons. Each such swap xi ↔ xj determines in turn a number of permutations of the
coordinates related to subsets containing θi and θj .
The resulting symmetry Vk induced by xi ↔ xj for the k-th group of constraints
is then: ∀{i1 , ..., ik−1 } ⊂ {1, ..., n} \ {i, j}
(xi ↔ xj ) ∧ · · · ∧ (xi xi1 · · · xik−1 ↔ xj xi1 · · · xik−1 ).
Since Vk is obviously implied by Vk+1 , and Vn is always trivial (as a simple check
confirms), the overall symmetry induced by a permutation of singletons is deter-
mined by Vn−1 , and by considering all the possible permutations xi ↔ xj we have
as desired.
In other words, the symmetries of B are determined by the action of the per-
mutation group Sn on the collection of cardinality-1 variables, and the action of Sn
naturally induced on higher-size variables by set-theoretical membership:
s ∈ Sn : Pk (Θ) → Pk (Θ)
(6.7)
xi1 · · · xik 7→ sxi1 · · · sxik ,
where Pk (Θ) is the collection of the size-k subsets of Θ.
It is not difficult to recognize in (6.7) the symmetry properties of a simplex, i.e.,
the convex closure of a collection v0 , v1 , ..., vk of k + 1 of affinely independent2
points (vertices) of Rm .

6.2 Simplicial form of the belief space


Indeeed, B is a simplex, with as vertices the special belief functions which assign
unitary mass to a single subset of the frame of discernment.
Let us call categorical belief function focused on A ⊆ Θ, and denote it by bA , the
unique belief function with b.p.a. mbA (A) = 1, mbA (B) = 0 for all B 6= A.
Theorem 10. Every belief function3 b ∈ B can be uniquely expressed as a convex
combination of all the categorical belief functions:
X
b= m(A)bA , (6.8)
∅6=A(Θ

2
The points v0 , v1 , ..., vk are said to be affinely independent iff v1 − v0 , ..., vk − v0 are
linearly independent.
3
Here and in the rest of the Chapter we will denote both a belief function and the vector of
RN −2 representing it by b. This should not lead to confusion.
6.2 Simplicial form of the belief space 211

with coefficients given by the basic probability assignment m.


Proof. Every belief function b in B is represented by the vector:
X 0 X 0
6 B ( Θ ∈ RN −2 ,

b= m(B), ∅ = 6 A(Θ = m(A) δ(B), ∅ =
B⊆A ∅6=A(Θ
.
where N = |2Θ | and δ(B) = 1 iff B ⊇ A. As the vector [δ(B), B ⊆ Θ]0 is the
vector of belief values associated with the categorical b.f. bA , we have the thesis.
This ‘convex decomposition’ property can be easily generalized in the following
way.
Theorem 11. The set of all the belief functions with focal elements in a given col-
(
lection X ⊂ 22 Θ) is closed and convex in B, namely:

b : Eb ⊂ X = Cl({bA : A ∈ X }),
where Cl denotes the convex closure of a set of points of a Cartesian space:
 X 
Cl(b1 , ..., bk ) = b ∈ B : b = α1 b1 + · · · + αk bk , αi = 1, αi ≥ 0 ∀i .
i
(6.9)
Proof. By definition:
  X 0 

b : Eb ⊂ X = b:b= m(B), ∅ =
6 A ( Θ , Eb ⊂ X .
B⊆A,B∈Eb

But
 X 0 X X
b= m(B), ∅ =
6 A(Θ = m(B)bB = m(B)bB
B⊆A,B∈Eb B∈Eb B∈X

after extending m to the elements B ∈ X \ Eb , by enforcing


P m(B) = 0 for those
elements. Since m is a basic probability assignment, B∈X m(B) = 1 and the
thesis follows.
As a direct consequence,
Corollary 2. The belief space B is the convex closure of all the categorical belief
function, namely:
B = Cl(bA , ∀∅ =6 A ⊆ Θ). (6.10)
As it is easy to see that the vectors {bA , ∅ =
6 A ( Θ} associated with all categorical
belief functions (except the vacuous one) are linearly independent, the vectors {bA −
bΘ = bA , ∅ = 6 A ( Θ} (since bΘ = 0 is the origin of RN −2 ) are also linearly
independent, i.e., the vertices {bA , ∅ 6= A ⊆ Θ} of the belief space (6.10) are
affinely independent. Hence:
Corollary 3. B is a simplex.
212 6 The geometry of belief functions

Fig. 6.1. The belief space B2 for a binary frame is a triangle in R2 whose vertices are the
categorical belief functions bx , by , bΘ focused on {x}, {y} and Θ, respectively.

6.2.1 Simplicial structure on a binary frame


As an example let us consider a frame of discernment containing only two elements,
Θ2 = {x, y}. In this very simple case each belief function b : 2Θ2 → [0, 1] is
completely determined by its belief values b(x), b(y), as b(Θ) = 1 and b(∅) = 0 ∀b.
We can therefore collect them in a vector of RN −2 = R2 (since N = 22 = 4):
[b(x) = m(x), b(y) = m(y)]0 ∈ R2 . (6.11)
Since m(x) ≥ 0, m(y) ≥ 0, and m(x) + m(y) ≤ 1 we can easily infer that the set
B2 of all the possible belief functions on Θ2 can be depicted as the triangle in the
Cartesian plane of Figure 6.1, whose vertices are the points:
bΘ = [0, 0]0 , bx = [1, 0]0 , by = [0, 1]0
(compare Equation (6.10)). These correspond (through Equation (6.11)) to the ‘vac-
uous’ belief function bΘ (mbΘ (Θ) = 1), the categorical Bayesian b.f. bx with
mbx (x) = 1, and the categorilca Bayesian b.f. by with mby (y) = 1, respectively.
Bayesian belief functions on Θ2 obey the constraint m(x) + m(y) = 1, and are
therefore located on the segment P2 joining bx = [1, 0]0 and by = [0, 1]0 . Clearly
the L1 distance between b and any Bayesian b.f. dominating it is constant and equal
to 1 − m(x) − m(y) (see Theorem 8).
The limit simplex (11.4) is the region of set functions such that:
b(∅) + b(x) + b(y) + b(x, y) = 1 + b(x) + b(y) = 2,
i.e. b(x) + b(y) = 1. Clearly P2 is a proper4 subset of the limit simplex (recall
Section 6.1.1).
4
indeed the region of normalized sum functions (Section ??) ς which
The limit simplex isP
meet the constraint x∈Θ mς (x) = 1
6.3 The differential geometry of belief functions 213

6.2.2 Faces of B as classes of belief functions

Obviously a Bayesian belief function (a finite probability) is a b.f. with focal ele-
ments in the collection of singletons: Cb = {{x1 }, ..., {xn }}. Immediately by The-
orem 11
Corollary 4. The region of the belief space corresponding to probability functions
is the part of its border determined by all simple probabilities, i.e. the simplex2
P = Cl(bx , x ∈ Θ).
P is then an (n − 1)-dimensional face of B (whose dimension is instead N − 2 =
2n − 2 as it has 2n − 1 vertices).
Some one-dimensional faces of the belief space have also an intuitive meaning
in terms of belief. Consider the segments Cl(bΘ , bA ) joining the vacuous belief
function bΘ (mbΘ (Θ) = 1,mbΘ (B) = 0 ∀B 6= Θ) with the basis b.f. bA (??). Points
of Cl(bΘ , bA ) can be written as a convex combination as b = αbA +(1−α)bΘ . Since
convex combinations are b.p.a.s in B, such a belief function b has b.p.a. mb (A) = α,
mb (Θ) = 1 − α i.e. b is a simple support function
S focused on A (Chapter 2). The
union of these segments for all events A: S = A⊂Θ Cl(bΘ , bA ), is the region of
simple support belief functions on Θ. In the binary case (Figure ??-right) simple
support functions focused on {x} lie on the horizontal segment Cl(bΘ , bx ), while
simple support b.f. focused on {y} form the vertical segment Cl(bΘ , by ).

6.3 The differential geometry of belief functions


We learned that belief functions can be identified with vectors of a sufficiently large
Cartesian space (RN , where N = 2|Θ| − 2 and |Θ| is the cardinality of the frame
on which the belief functions are defined). More precisely, the set of vectors B of
RN which do correspond to belief functions or “belief space” is a simplex, whose
vertices are the categorical belief functions assigning mass 1 to a single event.
As we show in this second part of the Chapter, we can also think of the mass
m(A) given to each event A as recursively assigned to subsets of increasing size.
Geometrically, this translates as a recursive decomposition of the space of belief
functions, which can be formally described through the differential-geometric no-
tion of fiber bundle [441].
A fiber bundle is a generalization of the familiar idea of Cartesian product, in which
each point of the (total) space analyzed can be smoothly projected onto a base space,
defining a number of fibers of points which project onto the same element of the
base. In our case, as we will see in the following, B can be decomposed n = |Θ|
times into bases and fibers which are themselves simplices and possess natural inter-
pretations in terms of degrees of belief. Each level i = 1, .., n of this decomposition
reflects nothing but the assignment of basic probabilities to size i events.
2
With a harmless abuse of notation we denote the basis belief function associated with a
singleton x by bx instead of b{x} . Accordingly we will write mb (x) instead of mb ({x}).
214 6 The geometry of belief functions

After giving an informal presentation of the way the b.p.a. mechanism induces
a recursive decomposition of B we will analyze the simple case study of a ternary
frame (6.3.1) to get an intuition on how to prove our conjecture on the bundle struc-
ture of the belief space in the general case, and give the formal definition of smooth
fiber bundle (6.3.2). After noticing that points of RN −2 outside the belief space can
be also seen as (normalized) sum functions (Section 6.3.3), we will proceed to prove
the recursive bundle structure of the space of all sum functions (Section 6.4). As B
is immersed in this Cartesian space it inherits a “pseudo” bundle structure (6.4.2) in
which bases and fibers are no more vector spaces but simplices in their own right
(Section 6.4.3), and possess meanings in terms of i-additive belief functions.

6.3.1 A case study: the ternary case

Let us then first consider the structure of the belief space for a frame of cardinality
n = 3: Θ = {x, y, z}, according to the principle of assigning mass recursively to
subsets of increasing size. In this case each BF b is represented by the vector:

b = [b(x), b(y), b(z), b({x, y}), b({x, z}), b({y, z})]0 ∈ R6 .

If the mass not assigned to singletons 1 − mb (x) − mb (y) − mb (z) is attributed to


A = Θ, we have that b({x, y}) = mb ({x, y}) + mb (x) + mb (y) = mb (x) + mb (y)
and so on for all size-2 events, so that b belongs to the three-dimensional region:
n
D = b : 0 ≤ b(x) + b(y) + b(z) ≤ 1, b({x, y}) = b(x) + b(y),
o (6.12)
b({x, z}) = b(x) + b(z), b({y, z}) = b(y) + b(z) .

It is easy to realize that any arbitrary belief function b ∈ B3 on Θ3 can be mapped


onto a point π[b] of D
h i0
π[b] = b(x), b(y), b(z), b(x) + b(y), b(x) + b(z), b(y) + b(z) ∈ D (6.13)

through a projection map π : B → D. Let us call D the “base” of B3 . Such base


admits as coordinate chart the basic probabilities of the singletons, as each point
d ∈ D can be written as: d = [mb (x), mb (y), mb (z)]0 .
Given a point d ∈ D on the base, we might want to understand what belief func-
.
tions b ∈ B3 have d as projection: F(d) = {b : π[b] = d}. In virtue of the constraint
acting on b.p.a.s, such belief functions have to meet the following constraints:

 mb ({x, y}) ≥ 0 ≡ b({x, y}) ≥ b(x) + b(y) ≡ b({x, y}) ≥ mb (x) + mb (y)

 mb ({x, z}) ≥ 0 ≡ b({x, z}) ≥ b(x) + b(z) ≡ b({x, z}) ≥ mb (x) + mb (z)



 mb ({y, z}) ≥ 0 ≡ b({y, z}) ≥ b(y) + b(z) ≡ b({y, z}) ≥ mb (y) + mb (z)


mb (Θ) ≥ 0 ≡
b(Θ) + b(x) + b(y) + b(z) ≥ b({x, y}) + b({x, z}) + b({y, z})




≡ b({x, y}) + b({x, z}) + b({y, z}) ≤ 1 + b(x) + b(y) + b(z)




≡ b({x, y}) + b({x, z}) + b({y, z}) ≤ 1 + mb (x) + mb (y) + mb (z),

6.3 The differential geometry of belief functions 215

where ≡ denotes equivalence.


Each BF d ∈ D on the base is associated with a whole “fiber” F(d) of belief
functions projecting onto d (as they have the same b.p.a. on sigletons):
n
F(d) = b ∈ B3:b(x) = mb (x), b(y) = mb (y), b(z) = mb (z),
b({x, y}) ≥ mb (x) + mb (y), o
b({x, z}) ≥ mb (x) + mb (z), b({y, z}) ≥ mb (y) + mb (z) .
(6.14)
Belief functions on F(d) can be parameterized by the three coordinates mb ({x, y}),
mb ({x, z}) and mb ({y, z}), the basic probabilities of events of size greater than 1
(see Figure 6.2-right).
Given the nature of b.p.a.s as simplicial coordinates (??) we can infer that the
base (6.12) of the belief space B3 is in fact the three dimensional simplex D =
Cl(bx , by , bz , bΘ ), as all the mass is distributed among {x}, {y}, {z} and Θ in all
possible ways (see Figure 6.2-left). The base D is the simplex of all quasi Bayesian
or discounted belief functions on Θ3 , i.e., the belief functions for which mb (A) 6= 0
iff |A| = 1, n [687]. For each point d of the base the corresponding fiber (6.14) is
also a simplex of dimension 3 (Figure 6.2-right).

Fig. 6.2. Bundle structure of the belief space in the case of ternary frames Θ3 = {x, y, z}.

Summarizing, we have learned that (at least in the ternary case):


– the belief space can be decomposed into a base, i.e., the set of BFs assigning mass
zero to events A of size 1 < |A| < n,
.
n o
D = b ∈ B : mb (A) = 0, ∀ A : 1 < |A| < n , (6.15)

and a number of fibers F(d) passing each through a point d of the base;
– points on the base are parameterized by the masses assigned to singletons d =
[mb (A), |A| = 1]0 , while points on the fibers have as coordinates the mass values
assigned to higher size events, [mb (A), 1 < |A| < n]0 ;
216 6 The geometry of belief functions

– both base and fibers are simplices.


As we will see in the following, the same sort of decomposition applies recursively
to general belief spaces, for an increasing size of the events we assign mass to.
We first need to introduce the relevant mathematical tool.

6.3.2 Definition of smooth fiber bundles

Fiber bundles [441] generalize of the notion of Cartesian product.


Definition 74. A smooth fiber bundle ξ is a composed object {E, B, π, F, G, U},
where
1. E is an s + r-dimensional differentiable manifold called total space;
2. B is an r-dimensional differentiable manifold called base space;
3. F is an s-dimensional differentiable manifold called fiber;
4. π : E → B is a smooth application of full rank r in each point of B, called
projection;
5. G is the structure group;
6. the atlas U = {(Uα , φα )} defines a bundle structure; namely
– the base B admits a covering with open sets Uα such that
.
– Eα = π −1 (Uα ) is equipped with smooth direct product coordinates

φα : π −1 (Uα ) → Uα × F
(6.16)
e 7→ (φ0α (e), φ00α (e))

satisfying two conditions:


– the coordinate component with values in the base space is compatible with
the projection map:
π ◦ φ−1
α (x, f ) = x (6.17)
or equivalently φ0α (e) = π(e);
– the coordinate component with values on the fiber can be transformed,
jumping from a coordinate chart into another, by means of elements of the
structure group. Formally the applications
.
λαβ = φβ φ−1
α : Uαβ × F → Uαβ × F
(x, f ) 7→ (x, T αβ (x)f )

called gluing functions are implemented by means of transformations


T αβ (x) : F → F defined by applications from a domain Uαβ to the
structure group
T αβ : Uαβ → G
satisfying the following conditions

T αβ = (T βα )−1 , T αβ T βγ T γα = 1. (6.18)
6.4 Recursive bundle structure 217

Intuitively, the base space is covered by a number of open neighborhoods {Uα },


which induce a similar covering {Eα = π −1 (Uα )} on the total space E. Points e
of each neighborhood Eα of the total space admit coordinates separable into two
parts: the first one φ0 (e) = π(e) is the projection of e onto the base B, while the
second part is its coordinate on the fiber F . Fiber coordinates are such that in the
intersection of two different charts Eα ∩ Eβ they can be transformed into each other
by means of the action of a group G.
In the following, however, all the involved manifolds will be linear spaces, so
that each of them can be covered by a single chart. This makes the bundle structure
trivial, i.e., the identity transformation. The reader can then safely ignore the gluing
conditions on φ00α .

6.3.3 Points of the Cartesian space as sum functions

As the belief space does not exhaust the whole RN −2 it is natural to wonder whether
arbitrary points of RN −2 , possibly ‘outside’ B, have any meaningful interpretation
in this framework [265]. In fact, each vector v = [vA , ∅ ( A ⊆ Θ]0 ∈ RN −1
can be thought of as a set function ς : 2Θ \ ∅ → R s.t. ς(A) = vA . By applying
P functions ς we obtain another set function
the Möbius transformation (2.3) to such
mς : 2Θ \ ∅ → R such that ς(A) = B⊆A mς (B). In other words each vector ς of
RN −1 can be thought of as a sum function. However, contrarily to basic probability
assignments, the Möbius inverses mς of generic sum functions ς ∈ RN −1 are not
guaranteed to meet the non-negativity constraint: mς (A) 6≥ 0 ∀A ⊆ Θ.
Now, the section {v ∈ RN −1 : vΘ = 1} of RN −1 corresponds to the constraint
ς(Θ) = 1. Therefore,Pall the points of this section are sum functions meeting the
normalization axiom A⊂Θ mς (A) = 1 or normalized sum functions (n.s.f.s). Nor-
malized sum functions are the natural extensions of belief functions in our geometric
framework.

6.4 Recursive bundle structure


6.4.1 Recursive bundle structure of the space of sum functions

We can now reinterpret our analysis of the ternary case by means of the formal def-
inition of smooth fiber bundle. The belief space B3 can be in fact equipped with
a base (6.12), and a projection (6.13) from the total space R6 to the base, which
generates fibers of the form (6.14). However, the original definition of fiber bundle
requires the involved spaces to be manifolds, while the ternary case suggests we
have here to deal with simplices.
We can notice though how the idea of recursively assigning mass to subsets of in-
creasing size does not necessarily require the mass itself to be positive. In other
words, this procedure can be in fact applied to normalized sum functions, yielding a
classical fiber bundle structure for the space S = RN −2 of all NSFs on Θ, in which
218 6 The geometry of belief functions

all the involved bases and fibers are linear spaces. We will see in the following what
happens when considering proper belief functions.
Theorem 12. The space S = RN −2 of all the sum functions ς with domain on a
finite frame Θ of cardinality |Θ| = n has a recursive fiber bundle structure, i.e.,
there exists a sequence of smooth fiber bundles
n o
(i−1) (i) (i)
ξi = FS , DS , FS , πi , i = 1, ..., n − 1

(0) (i−1) (i)


where FS = S = RN −2 , the total space FS , the base space DS and the
(i)
fiber FS of the i-th bundle level are linear subspaces of RN −2 of dimensions
Pn−1 n n Pn−1 n

k=i k , i , k=i+1 k respectively.
(i−1) (i)
Both FS and DS admit a global coordinate chart. As

(i−1)
X n n o
dim FS = = A ⊂ Θ : i ≤ |A| < n ,

k
k=i,...,n−1

(i−1)
each point ς i−1 of FS can be written as
h i0
ς i−1 = ς i−1 (A), A ⊂ Θ, i ≤ |A| < n

and the smooth direct product coordinates (6.16) at the i-th bundle level are
n o n o
φ0 (ς i−1 ) = ς i−1 (A), |A| = i , φ00 (ς i−1 ) = ς i−1 (A), i < |A| < n .

The projection map πi of the i-th bundle level is a full-rank differentiable application
(i−1) (i)
πi : F S → DS
ς i−1 7→ πi [ς i−1 ]

whose expression in this coordinate chart is

πi [ς i−1 ] = [ς i−1 (A), |A| = i]0 . (6.19)

Bases and fibers are simply geometric counterparts of the mass assignment
mechanism. Having assigned a certain amount of mass to subsets of size smaller
than i, the fraction of mass attributed to size-i subsets determines a point on a linear
(i) (i)
space: DS . For each point of DS the remaining mass can “float” among the higher
(i)
size subsets, describing again a vector space FS .

6.4.2 Recursive bundle structure of the belief space

As we have seen in the ternary example of Section 6.3.1, as the belief space is a
simplex immersed in S = RN −2 , the fibers of RN −2 do intersect the space of belief
6.4 Recursive bundle structure 219

functions too. B then inherits some sort of bundle structure from the Cartesian space
in which it is immersed. The belief space can also be recursively decomposed into
fibers associated with events A of the same size. As one can easily conjecture, the
intersections of the fibers of RN −2 with the simplex B are themselves simplices:
bases and fibers in the case of the belief space are therefore polytopes instead of
linear spaces. Due to the decomposition of RN −2 into basis and fibers, we can ap-
ply the non-negativity and normalization constraints which distinguish belief func-
tions from NSFs separately at each level, eliminating at each step the fibers passing
through points of the base that do not meet these conditions.
We first need a simple combinatorial result.
i−1   X
i−(m+1) n − (m + 1)
X X
Lemma 4. b(A) ≤ 1 + (−1) · b(B), and
m=1
i−m
|A|=i |B|=m
P P
the upper bound is reached when |A|=i mb (A) = 1 − |A|<i mb (A).
The bottom line of Lemma 4 is P that, given a mass assignment for events of
size 1, ..., i − 1 the upper bound for |A|=i b(A) is obtained by assigning all the
remaining mass to the collection of size i subsets.
Theorem 13. The belief space B ⊂ S = RN −2 inherits by intersection with the
recursive bundle structure of S a “convex”-bundle decomposition. Each i-th level
“fiber” can be expressed as
n o
(i−1) 1
FB (d , ..., di−1 ) = b ∈ B : Vi ∧ · · · ∧ Vn−1 (d1 , ..., di−1 ) , (6.20)

where Vi (d1 , ..., di−1 ) denotes the system of constraints



m b (A) ≥ 0 ∀A ⊆ Θ : |A| = i,
Vi (d1 , ..., di−1 ) :
X X
mb (A) ≤ 1 − mb (A) (6.21)

|A|=i |A|<i

and depends on the mass assigned to lower size subsets dm = [mb (A), |A| = m]0 ,
(i)
m = 1, ..., i − 1. The corresponding “base” DB (d1 , ..., di−1 ) is expressed in terms
of basic probability assignments as the collection of BFs b ∈ F (i−1) (d1 , ..., di−1 )
such that 

 mb (A) = 0, ∀A : i < |A| < n
 m (A) ≥ 0, ∀A : |A| = i
b
X X (6.22)

 m b (A) ≤ 1 − m b (A).

|A|=i |A|<i

6.4.3 Bases and fibers as simplices

Simplicial and bundle structure coexist in the space of belief functions, both of them
consequences of the interpretation of belief functions as sum functions, and of the
basic probability assignment machinery. It is then natural to conjecture that bases
220 6 The geometry of belief functions

and fibers of the recursive bundle decomposition of B must also be simplices of


some sort, as suggested by the ternary example (Section 6.3.1). Let us work recur-
sively, and suppose we have already assigned a mass k < 1 to the subsets of size
smaller than i:
mb (A) = const = mA , |A| < i. (6.23)
All the admissible BFs constrained by this mass assignment are then forced to live
in the following (i − 1)-th level fiber of B:
(i−1)
FB (d1 , ..., di−1 ), dj = [mA , |A| = j]0

which, as the proof of Theorem 13 suggests, is a function of the mass assigned to


lower-size events.
(i−1) 1
We have seen that such a fiber FB (d , ..., di−1 ) admits a pseudo-bundle
(i) 1 i−1
structure whose pseudo-base P space is DB (d , · · · , d ) given by Equation (6.22).
Let us denote by k = |A|<i mA the total mass already assigned to lower size
events, and call
.
n o
(i−1) 1
X
P (i) (d1 , ..., di−1 ) = b ∈ FB (d , ..., di−1 ) : mb (A) = 1 − k
|A|=i
.
n o
(i) 1 i−1 (i−1) 1 i−1
O (d , ..., d ) = b ∈ FB (d , ..., d ) : mb (Θ) = 1 − k

(i−1)
the collections of belief functions on the fiber FB (d1 , ..., di−1 ) assigning all the
remaining basic probability 1 − k to subsets of size i or to Θ, respectively.
As the simplicial coordinates of a BF in B are given by its basic probability as-
(i−1) 1
signment (??), each belief function b ∈ FB (d , ..., di−1 ) on such a fiber can be
written as:
X X X
b= mb (A)bA = mA bA + mb (A)bA
A⊆Θ |A|<i |A|≥i
k X 1−k X
= mA bA + mb (A)bA
k 1−k
|A|<i |A|≥i
k X 1−k X
= P m A bA + P mb (A)bA .
|A|<i mA |A|<i |A|≥i mb (A) |A|≥i

We can therefore define two new belief functions b0 , b0 associated with any b ∈
(i−1) 1
FB (d , ..., di−1 ), with basic probability assignments
. mA
mb0 (A) = P |A| < i, mb0 (A) = 0 |A| ≥ i;
mB|B|<i
. mb (A)
mb0 (A) = P |A| ≥ i, mb0 (A) = 0 |A| < i
|B|≥i mb (B)

respectively, and decompose b as follows:


6.4 Recursive bundle structure 221
X X
b=k mb0 (A)bA + (1 − k) mb0 (A)bA = kb0 + (1 − k)b0
|A|<i |A|≥i

where b0 ∈ Cl(bA : |A| < i), b0 ∈ Cl(bA : |A| ≥ i). As


P
|A|<i mA
X X X
mb0 (A) = mb0 (A) = P = 1, mb0 (A) = 1
A⊆Θ |A|<i |A|<i mA A⊆Θ

both b0 and b0 are indeed admissible BFs, b0 assigning non-zero mass to subsets of
size smaller than i only, b0 assigning mass to subsets of size i or higher.
(i−1) 1
However, b0 is the same for all the BFs on the fiber FB (d , ..., di−1 ), as
it is determined by the mass assignment (8.5). The other component b0 is instead
free to vary in Cl(bA : |A| ≥ i). Hence, we get the following convex expres-
(i−1)
sions for FB , P (i) and O(i) (neglecting for sake of simplicity the dependence on
1 i−1
d , ..., d or, equivalently, on b0 ):
n o
(i−1)
FB = b = kb0 + (1 − k)b0 , b0 ∈ Cl(bA , |A| ≥ i) = kb0 + (1 − k)Cl(bA , |A| ≥ i),
P (i) = kb0 + (1 − k)Cl(bA : |A| = i),
O(i) = kb0 + (1 − k)bΘ .
(6.24)
(i)
By definition the i-th base DB is the collection of BFs such that

mb (A) = 0 i < |A| < n, mb (A) = const = mA |A| < i,


(i)
so that points on DB are free to distribute the remaining mass to Θ or size i events
only. Therefore we obtain the following convex expression for the i-th level base
(i)
space DB of B:
(i)
DB = kb0 + (1 − k)Cl(bA : |A| = i or A = Θ)
= Cl(kb0 + (1 − k)bA : |A| = i or A = Θ)
= Cl(kb0 + (1 − k)bΘ , kb0 + (1 − k)bA : |A| = i) = Cl(O(i) , P (i) ).

In the ternary case of Section 6.3.1 we get:

O(1) = bΘ , P (1) = P = Cl(bx , by , bz ), D(1) = Cl(O(1) , P (1) ).

The elements of the bundle decomposition possess a natural meaning in terms of


belief values. In particular, P (1) = P is the set of all the Bayesian belief functions,
while D(1) is the collection of all the discounted probabilities [1149], i.e., belief
functions of the form (1 − )p + bΘ , with 0 ≤  ≤ 1 and p ∈ P.
On the other hand, BFs assigning mass to events of cardinality smaller than a
certain size i are called in the literature i-additive belief functions ([948]). It is clear
that the set P (i) (6.24) is nothing but the collection of all i-additive BFs. The i-th
level base of B can then be interpreted as the region of all “discounted” i-additive
belief functions.
222 6 The geometry of belief functions

6.5 Open questions

Appendix: proofs
Proof of Theorem 12

Proof. the bottom line of the proof is that the mass associated with a sum function
can be recursively assigned to subsets of increasing size. We prove Theorem 12 by
induction.
First level of the bundle structure. As we mentioned above, each normalized
sum function ς ∈ RN −2 is uniquely associated with a mass function mς through the
inversion lemma. To define a base space of the first level, we set to zero the mass of
all events of size 1 < |A| < n. This determines a linear space DS ⊂ S = RN −2
defined by the system of linear equations
.
n X o
DS = ς ∈ RN −2 : mς (A) = (−1)|A−B| ς(B) = 0, 1 < |A| < n
B⊂A

of dimension dim DS = n = |Θ| (as there are n unconstrained variables corre-


sponding to the singletons). As DS is linear, it admits a global coordinate chart.
Each point d ∈ D is parameterized by the mass values the corresponding sum func-
tion ς assigns to singletons:

d = [mς (A) = ς(A), |A| = 1]0 .

The second step is to precise a projection map between the total space S = RN −2
and the base DS . The Moebius inversion lemma (2.3) induces indeed a projection
map from S to DS
π : S = RN −2 → DS ⊂ RN −2
ς 7→ π[ς]
mapping each NSF ς ∈ RN −2 to a point p[ς] of the base space D:

π[ς](A) = [ς(A), |A| = 1]0 . (6.25)

Finally, to define a bundle structure we need to describe the fibers of the total space
S = RN −2 , i.e., the vector subspaces of RN −2 which project onto a given point
d ∈ D of the base.
Each point d ∈ D is of course associated with the linear space of all the NSFs
ς ∈ RN −2 whose projection p[ς] on D is d:
.
n o
FS (d) = ς ∈ S : π[ς] = d ∈ D .

It is easy to see that as d varies on the base space D, the linear spaces we obtain are
.
all diffeomorphic to F = RN −2−n .
According to Definition 74 this defines a bundle structure, since:
6.5 Open questions 223
.
– E = S = RN −2 is a smooth manifold, in particular a linear space;
.
– B = DS , the base space, is a smooth (linear) manifold;
– F = FS , the fiber, is a smooth manifold, again a linear space.
Finally, the projection π : S = RN −2 → DS is differentiable (as it is a linear
function of the coordinates ς(A) of ς) and has full rank n in every point ς ∈ RN −2 .
This is easy to see when representing π as a matrix (for as ς is a vector, a linear
function of ς can always be thought of as a matrix)

π[ς] = Πς,

where  
1 0 ··· 0 0 ··· 0
0 1 0 0 0 ··· 0  = [In |0n×(N −2−n) ]
Π=
 ··· ··· 
0 ··· 0 1 0 ··· 0
according to Equation (6.25), and the rows of Π are obviously linearly independent.
As mentioned above the bundle structure (Definition 74, (6)) is trivial, since DS
is linear and can be covered by a single coordinate system (6.16). The direct product
coordinates are
φ : S = RN −2 → DS × FS
ς 7→ (π[ς], f [ς])
where the coordinates of ς on the fiber FS are the mass values it assigns to higher
size events:
f [ς] = [mς (A), 1 < |A| < n]0 .
Bundle structure of level i.
By induction, let us suppose that S admits a recursive bundle structure for all
sizes from 1 to i − 1 characterized according to the hypotheses, and prove that
(i−1)
FS can in turn be decomposed in the same way into a linear base space and a
(i−1)
collection of diffeomorphic fibers. By inductive hypothesis FS has dimension
Pi−1 n i−1 (i−1) 3
N − 2 − k=1 k and each point ς ∈ FS has coordinates

ς i−1 = [ς i−1 (A), i ≤ |A| < n]0 .

We can then apply the constraint ς i−1 (A) = 0, i < |A| < n which identifies the
linear variety
(i) .
n o
(i−1)
DS = ς i−1 ∈ FS : ς i−1 (A) = 0, i < |A| < n (6.26)

(i−1)
, of dimension ni (the number of size-i subsets of Θ).

embedded in FS
(i−1)
The projection map (6.19) induces in FS fibers of the form
3
The quantity ς i−1 (A) is in fact the mass mς (A) the original NSF ς attaches to A, but this
is irrelevant for the purpose of the decomposition.
224 6 The geometry of belief functions
(i) . (i−1)
FS = {ς i−1 ∈ FS : πi [ς i−1 ] = const}

which are also linear manifolds, and induce in turn a trivial bundle structure in
(i−1)
FS
(i−1) (i) (i)
φ : FS → DS × F S
i−1 0 i−1 00 i−1
ς 7→ (φ (ς ), φ (ς ))
with φ0 (ς i−1 ) = πi [ς i−1 ] = [ς i−1 (A), |A| = i]0 .
Again, the map (6.19) is differentiable and has full rank, for its ni rows are inde-


pendent.
(n)
The decomposition ends when dim FS = 0, and all fibers reduce to points of
S.

Proof of Lemma 4

Proof. since n−m



i−m is the number of subsets of size i containing a fixed set
B, |B| = m in a frame with n elements, we can write:

X X X Xi X n − m
b(A) = mb (B) = mb (B)
m=1 |B|=m
i−m
|A|=i |A|=i B⊆A
i−1 X  
X X n−m
= mb (B) + mb (B) (6.27)
m=1
i−m
|B|=i |B|=m
i−1 X  
X X n−m
≤ 1− mb (B) + mb (B),
m=1
i−m
|B|<i |B|=m

X X
as mb (B) = 1 − mb (B) by normalization. By Möbius inversion (2.3):
|B|=i |B|<i

X X X
mb (A) = (−1)|A−B| b(B)
|A|<i |A|<i B⊆A
i−1 m   X (6.28)
X X
m−l n−l
= (−1) b(B)
m−l
|A|=m=1 |B|=l=1 |B|=l

n−l

for, again, m−l is the number of subsets of size m containing a fixed set B, |B| =
l in a frame with n elements. The role of the indexes m and l can be exchanged,
obtaining:
i−1 i−1  X i−1  
X X X n−l
mb (B) = b(B) · (−1)m−l . (6.29)
m−l
|B|=l=1 |B|=l=1 |B|=l m=l

Now, a well known combinatorial identity ([?], volume 3, Equation (1.9)) states that,
for i − (l + 1) ≥ 1:
6.5 Open questions 225
i−1    
X
m−l n−l i−(l+1) n − (l + 1)
(−1) = (−1) . (6.30)
m−l i − (l + 1)
m=l

By applying (6.30) to the last equality, (6.28) becomes:


i−1  X  
X n − (l + 1)
b(B) · (−1)i−(l+1) . (6.31)
i − (l + 1)
|B|=l=1 |B|=l

Similarly, by (6.29) we have:


i−1 X   i−1 X i−1   
X n−m X X n−l n−m
mb (B) = b(B) · (−1)m−l
m=1 |B|=m
i−m m−l i−m
l=1 |B|=l m=l
i−1 X i−1   
X X
m−l i−l n−l
= b(B) · (−1) ,
m−l i−l
l=1 |B|=l m=l

     
n−l n−m i−l n−l
as it is easy to verify that = .
m−l i−m m−l i−l
By applying (6.30) again to the last equality we get:
i−1 X   i−1 X  
X n−m X
i−(l+1) n − l
mb (B) = (−1) . (6.32)
m=1
i−m i−l
|B|=m l=1 |B|=l

By replacing (6.29) and (6.32) in (6.27) we get the thesis.

Proof of Theorem 13

Proof. to understand the effect on B ⊂ S of the bundle decomposition of the space


of normalized sum functions S = RN −2 in which it is immersedP we need to con-
sider the effect of the non-negativity mς ≥ 0 and normalization A mς (A) = 1
conditions, for they constrain the admissible values of the coordinates of points of
S.
We can appreciate how these constraints are separable into groups that apply to
subsets of the same size. The set of conditions

Pb (A) ≥ 0,
m ∀A ⊆ Θ
A⊆Θ m b (A) = 1

can in fact be decomposed as V1 ∧ · · · ∧ Vn−1 , where the system of constraints Vi


is given by Equation (6.21). The bottom inequalityP in (6.21) implies that, given a
mass assignment for events of size 1, ..., i − 1 ( |A|<i mb (A)) the upper bound for
P
|A|=i b(A) isPobtained by assigning all the remaining mass to the collection of
P
size i subsets: |A|=i mb (A) = 1 − |A|<i mb (A) (Lemma 4). Let us see their
effect on the bundle structure of S.
226 6 The geometry of belief functions

Level 1. By definition B = {ς ∈ S : V1 ∧ · · · ∧ Vn−1 }. As the coordinates of


the points of S are decomposed into coordinates on the base [mς (A), |A| = 1]0 and
coordinates on the fiber [mς (A), 1 < |A| < n]0 it is easy to see that V1

mb (A) ≥ 0, |A| = 1
P (6.33)
|A|=1 m b (A) ≤ 1

(1)
acts only on the base DS , yielding a new set
n o
(1)
X
DB = b ∈ B : mb (A) = 0 1 < |A| < n, mb (A) ≥ 0 |A| = 1, mb (A) ≤ 1
|A|=1

of the form of Equation (6.22) for i = 1.


(1)
This in turn selects the fibers of S passing through DB , and discards the others.
As a matter of fact there cannot be admissible belief functions within fibers passing
(1)
through points outside this region, since all points of a fiber FS share the same
1 1
level 1 coordinates d : when the basis point d does not meet the inequalities (6.33),
none of them can.
(1)
Therefore the remaining constraints V2 ∧ · · · ∧ Vn−1 act on the fibers FS of
(1)
S passing through DB . However, Equation (6.21) shows that those higher size
(1)
constraints V2 , ..., Vn−1 in fact depend on the point d1 = [mb (A), |A| = 1]0 ∈ DB
(1)
on the base space. Each admissible fiber FS ∼ RN −2−n of S is then subject to a
different system of constraints V2 ∧ · · · ∧ Vn−1 (d1 ), yielding the corresponding first
level fiber of B (see Equation (6.20)):
n o
(1) (1)
FB (d1 ) = b ∈ FS (d1 ) : V2 ∧ · · · ∧ Vn−1 (d1 ) .

Level i. Let us now suppose that, by induction, we have a family of con-


straints Vi ∧ · · · ∧ Vn−1 (d1 , ..., di−1 ) of the form of Equation (6.21), acting on
(i−1) 1 (i−1) 1
FS (d , ..., di−1 ). The points of FS (d , ..., di−1 ) have coordinates which can
(i)
be decomposed into coordinates d = [mς (A), |A| = i]0 on the base DS and coor-
i
(i)
dinates on the fiber FS . Again, the set of constraints Vi acts on coordinates associ-
(i) (i)
ated with size-i events only, i.e., it acts on DS and not on FS .
Furthermore, constraints of type (6.21) for k > i become trivial when acting
(i)
on DS . In fact, inequalities of the form mb (A) ≥ 0, |A| > i are satisfied by
(i)
DS by definition, since it imposes mς (A) = 0, |A| > i. On the other side, all
inequalities corresponding to the second row of Equation (6.21) for k > i reduce
to the corresponding inequality for size-i subsets. Instead of displaying a long com-
binatorial
P proof, we can just recall the meaning of Lemma 4: the upper bound for
|A|=i b(A) is obtained by assigning maximal mass to the collection of size i sub-
(i)
sets. But then, points in DS correspond to a zero-assignment for higher size events
mς (A) = 0, i < |A| < n and all those upper bounds are automatically satisfied.
6.5 Open questions 227
(i)
We then get the i-th level base for B: DB (d1 , ..., di−1 ) is the set of BFs
(i−1) 1
b ∈ FS (d , ..., di−1 ) such that conditions (6.22) are satisfied. The remain-
(i)
ing constraints Vi+1 ∧ ... ∧ Vn−1 (d1 , ..., di ) act on the fibers FS of S passing
(i) 1
through points di of i−1
n DB (d , ..., d ), yielding a collectionoof level-i fibers for B:
(i) (i)
FB (d1 , ..., di ) = b ∈ FS : Vi+1 ∧ ... ∧ Vn−1 (d1 , ..., di ) .
Geometry of Dempster’s rule
7
As we have seen in Chapter 6, belief functions can be seen as points of a simplex
B called the ‘belief space’. It is therefore natural to wonder whether the orthogonal
sum operator (2.6), a mapping from a pair of prior belief functions to a posterior
belief function on the same frame, can also be interpreted as a geometric operator
in B. The answer is positive, and in this Chapter we will indeed understand the
property of Dempster’s rule in this geometric setting. As we point out at the end,
such an analysis can be obviously extended to the other combination rules proposed
in the last fifty years (cf. Chapter ??, Section 4.2).
The key observation which allows us to conduct our analysis is that the objects
we obtain by relaxing the constraint of mass assignment being non-negative, which
we call normalized sum functions or pseudo belief functions (we met them in Section
6.3.3 as the points of embedding Cartesian space RN which lie outside the belief
space B), admit a straightforward extension of Dempster’s rule, originally defined
for proper belief functions (Section 7.1). This leads to the analysis of the behaviour
of the orthogonal sum when applied to whole affine subspaces (Section 7.2) and
convex combinations of (pseudo) belief functions. In particular this allows us to
derive a ‘convex decomposition’ of Dempster’s rule of combination in terms of
Bayes’ rule of conditioning (Section 7.3), and prove that under specific conditions
orthogonal sum and affine closure commute (Section 7.4).
In Section 7.5 we exploit the commutativity property to introduce the notion of
conditional subspace hbi generated by an arbitrary belief function b, i.e., the set of
all combinations of b with any other combinable b.f. Conditional subspaces describe
“global” behavior of the rule of combination, can be interpreted as the set of possible
“futures” of a given belief function (interpreted as our uncertain knowledge state).
Geometrically, they have once again the form of convex sets.

229
230 7 Geometry of Dempster’s rule

The second part of the Chapter, instead, is dedicated to the analysis of the “point-
wise” behavior of Dempster’s rule. We first discuss (Section 7.6) a toy problem, the
geometry of ⊕ in the binary belief space B2 , to gather useful intuition about the
general case. We observe that Dempster’s rule exhibits a rather elegant behavior
when applied to collections of belief functions assigning the same mass k to a fixed
subset A (constant mass loci), which turn out to be affine subspaces of normalized
sum functions. As a consequence, their images under the mapping b ⊕ (.) can be
derivedby applying the commutativity results of Section 7.4.
Perhaps the most striking result of our geometric analysis of Dempster’s rule states
that for each subset A the resulting mapped affine spaces have a common inter-
section for all k ∈ [0, 1], a geometric entity which is therefore characteristic of the
belief function b being combined. We call the latter the A-th focus of the conditional
subspace hbi. In Section 7.7 we formally prove the existence and study the geometry
of such foci. This eventually leads us to an interesting algorithm for the geometric
construction of the orthogonal sum of two belief functions.
The material presented in this Chapter is a realaboration of results first published
in [265]. All proofs have been collected in an Appendix at the end of the Chapter.

7.1 Dempster’s combination of pseudo belief functions


As mentioned above, we start by observing that Dempster’s rule can be easily ex-
tended to normalized sum functions. This is necessary since, as we will see later on,
the geometry of the orthogonal sum can only be appropriately described in terms of
whole affine spaces which do not fit the confines of the belief space, a simplex.
Theorem 14. The application of Dempster’s rule as defined as in Equation (2.6) to
a pair of normalized sum functions ς1 , ς2 : 2Θ → R yields another normalized sum
function, which we denote by ς1 ⊕ ς2 .
Just like we do for proper belief functions, we say that two normalized sum functions
ς1 , ς2 are not combinable if the denominator of Equation (2.6) is nil:
. X
∆(ς1 , ς2 ) = mς1 (A)mς2 (B) = 0, (7.1)
A⊆Θ,B⊆Θ:A∩B6=∅

where mς1 and mς1 denote the Moebius transforms of the two n.s.f.s ς1 , ς2 , respec-
tively.
Note that in the case of normalised sum functions the normalization factor
∆(ς1 , ς2 ) can be zero even in the presence of non-empty intersections between focal
elements of ς1 , ς2 . This becomes clear as soon as we rewrite it in the form:
X X
∆(ς1 , ς2 ) = mς1 (A)mς2 (B),
C6=∅ A,B⊆Θ:A∩B=C

since there can exist non-zero products mς1 (A) · mς2 (B) whose overall sum is zero
(being mς1 (A), mς2 (B) arbitrary real numbers).
7.2 Dempster’s sum of affine combinations 231

Example. A simple example can be useful to grasp this point more easily. Con-
sider a sum function ς1 with focal elements A1 , A2 , A3 and masses m1 (A1 ) =
1, m1 (A2 ) = −1, m1 (A3 ) = 1 such that A2 ⊆ A1 , as in Figure 7.1. If we com-
bine ς1 with a new n.s.f. ς2 with a single focal element B: m2 (B) = 1 (which,
.
incidentally, is a belief function), we can see that even if A1 ∩ B = D 6= ∅ and
A2 ∩ B = D 6= ∅ the denominator of Equation (2.6) becomes 1 · (−1) + 1 · 1 = 0
and the two functions turn out to be not combinable.

Fig. 7.1. Example of a pair of non combinable normalised sum functions whose focal ele-
ments have nevertheless non-empty intersections.

7.2 Dempster’s sum of affine combinations


The extension of Dempster’s rule to normalized sum functions (Theorem 14) allows
us to work with the entire Cartesian space RN , rather than with the belief space.
In Chapter 6 we have seen that any belief function can be seen as a convex closure
of categorical b.f.s. A similar relation exists between Pnormalized sum Pfunctions and
affine closures. In fact, any affine combination ς = i αi ςi with i αi = 1 of a
collection of normalized sum functions {ς1 , ..., ςn } is still a n.s.f., since
X X X X
mς (A) = (−1)|A−B| αi ςi (B) =
A6=∅ A6=∅ B⊂A i
X X X X X X
|A−B|
= αi (−1) ςi (B) = αi mςi (A) = αi = 1.
i A6=∅ B⊂A i A6=∅ i

We can then proceed to show how Dempster’s rule applies to affine combinations
of pseudo belief functions, and to convex closures of (proper) belief functions in
particular. We first consider the issue of combinability.
232 7 Geometry of Dempster’s rule

Lemma 5. Consider a collection


P Pnormalized sum functions {ς1 , ..., ςn }, and their
of
affine combination: ς = i αi ςi , i αi = 1. P
A normalized sum function ς is combinable with i αi ςi iff
X
αi ∆i 6= 0,
i
P
where ∆i = A∩B6=∅ mς (A)mςi (B).

Proof. By definition (Equation (7.1)) two n.s.f.s ς and τ are combinable iff
X
mς (A)mτ (B) 6= 0.
A∩B6=∅
P P
If τ = i αi ςi is an affine combination, its Moebius transform is mτ = i αi mςi
and the combinability condition becomes, as desired:
X X  X X X
mς (A) αi mςi (B) = αi mς (A)mςi (B) = αi ∆i 6= 0.
A∩B6=∅ i i A∩B6=∅ i
P
A couple of remarks. If ∆i = 0 for all i then P i αi ∆i = 0, so that if ς is not
combinable with any ςi then the combination ς ⊕ i αi ςi does not exists, in accor-
dance with our intuition. On the other hand, even if all the n.s.f.s ςi are combinable
withPς there is always a choice of the coefficients αi of the affine combination such
that i αi ∆i = 0, so that ς is still not combinable with the affine combination.
This remains true when considering affine combinations of belief functions (for
which ∆i > 0 ∀i).

Lemma 6. Consider a collection of normalized P sum functions


P {ς, ς1 , ..., ςn }. For
any set of real numbers α1 , ..., αn such
P that i αi = 1, i αi ∆ i 6= 0, ς is com-
binable with the affine combination i αi ςi and the mass assignment (Moebius
transform) of their orthogonal sum is given by:
X αi Ni (C)
mς⊕Pi αi ςi (C) = P ,
i j αj ∆j

where
. X
Ni (C) = mςi (B)mς (A)
B∩A=C

is the numerator of mς⊕ςi (C).

Under specific conditions, the orthogonal sum of an affine combination of


(pseudo) belief functions can be expressed as an affine combination of the partial
combinations. Such conditions are specified in the following theorem.
7.2 Dempster’s sum of affine combinations 233

Theorem
P 15. Consider
P a collection {ς, ς1 , ..., ςn } of normalized sum functions such
that P i αi = 1, i α i ∆ i 6= 0, i.e., the n.s.f. ς is combinable with the affine combi-
nation i αi ςi .
If ςi is combinable with ς for each i = 1, ..., n (and in particular, when all the
normalised sum P functions involved {ς, ς1 , ..., ςn } = {b, b1 , · · · , bn } are belief func-
tions), then ς ⊕ i αi ςi is still an affine combination of the partial sums ς ⊕ ςi :
X X
ς⊕ αi ςi = βi (ς ⊕ ςi ), (7.2)
i i

with coefficients given by:


αi ∆i
βi = Pn . (7.3)
j=1 αj ∆j

Proof. By definition mς⊕ςi (C) = N∆ i (C)


i
. If ς is combinable with ςi ∀i then ∆i 6= 0
∀i so that Ni (C) = 0 iff ∆i · mς⊕ςi (C) = 0 and we can write by Lemma 6:
P P
i αi Ni (C) αi ∆i mς⊕ςi (C) X
mς⊕ i αi ςi (C) =
P P = i P = βi mς⊕ςi (C).
j αj ∆j j αj ∆j i
(7.4)
Moebius transform (2.3) immediately yields Equation (7.2).
If b and bi are both belief functions, ∆i = 0 implies Ni (C) = 0 for all C ⊂ Θ. We
can then still write Ni (C) = ∆i · mς⊕ςi (C), so that Equation (7.4) still holds.

When considering convex combinations of proper belief functions only, the


combinability condition of Lemma 5 simplifies as follows:
P
Lemma 7. If i αi = 1 and αi > 0 for all i then
X
∃b⊕ αi bi ⇔ ∃i : ∃b ⊕ bi ,
i

i.e., b is combinable with the affine combination if and only if it is combinable with
at least one of the belief functions bi ,
P
This is due to the fact that if αi ∆i > 0 then i αi ∆i > 0.
Theorem 15 then specializes in the following way.

Corollary 5. Consider a collection of belief functions {b, b1 , · · · , bn } such that


P exists at least one b.f. bj combinable with b.
there
If i αi = 1, αi > 0 for all i = 1, ..., n then
X X
b⊕ αi bi = β i b ⊕ bi ,
i i

where βi is again defined by Equation (7.3).


234 7 Geometry of Dempster’s rule

7.3 Convex formulation of Dempster’s rule


An immediate consequence of the properties of Dempster’s rule with respect to
affine (and therefore convex) combination is an interesting convex decomposition
of the orthogonal sum itself.

Theorem 16. The orthogonal sum b ⊕ b0 of two belief functions can be expressed
as a convex combination of the results b ⊕ bA of Bayes’ conditioning b with respect
to all the focal elements of b0 , namely:
X mb0 (A)plb (A)
b ⊕ b0 = P b ⊕ bA , (7.5)
B∈Eb0 mb (B)plb (B)
0
A∈Eb0

where Eb denotes as usual the collection of focal elements of a b.f. b.

Proof. We know from Chapter 6 that any belief function b0 ∈ B can be written as
a convex sum of the categorical b.f.s bA (Equation (6.8)). We can therefore apply
Corollary 5 to Equation (6.8), obtaining:
X X mb0 (A)∆A
b ⊕ b0 = b ⊕ mb0 (A)bA = µ(A)b ⊕ bA , µ(A) = P .
B∈Eb0 mb (B)∆B
0
A∈Eb0 A∈Eb0

Here ∆A is the normalization factor for b ⊕ bA , i.e.


X X
∆A = mb (B) = 1 − mb (B) = 1 − b(Ac ) = plb (B),
B:B∩A6=∅ B⊂Ac

and by plugging it into the above expression we have (7.5), as desired.

We can simplify Equation (7.5) after realizing that some of the partial combina-
tions b ⊕ bA may in fact coincide. Since b ⊕ bA = b ⊕ bB iff A ∩ Cb = B ∩ Cb we
can write:
X
mb0 (B)plb (B)
X B∩C =A, B∈E 0
b ⊕ b0 =
b b
b ⊕ bA X . (7.6)
0 0
A=A ∩Cb , A ∈Eb0 m b 0 (B)plb (B)

B∈Eb0

It is well known that Dempster sums involving categorical belief functions b ⊕ bA


can be thought of as applications of Bayes’ rule to the original belief function b,
.
when conditioning with respect to an event A: b ⊕ bA = b|A. Hence, Theorem 16
highlights a convex decomposition of Dempster’s rule of combination in terms of
Bayes’ rule of conditioning:
X mσ (A)plb (A)
b⊕σ = P · b|A.
A∈Eσ B∈Eσ mσ (B)plb (B)
7.4 Commutativity 235

7.4 Commutativity
In Chapter 6 we have seen that the basic probability assignment mechanism is rep-
resented in the belief space framework by the convex closure operator. Theorem 15
in fact treats in full generality affine combinations of points, for they prove to be
more significant in the perspective of a geometric description of the rule of combi-
nation. The next natural step, therefore, is to analyse Dempster’s combinations of
affine closures, i.e., sets of affine combinations of points.
Let us denote by v(ς1 , ..., ςn ) the affine subspace generated by a collection of
normalized sum functions {ς1 , ..., ςn }:
( n
)
. X X
v(ς1 , ..., ςn ) = ς : ς = αi ςi , αi = 1 .
i=1 i

Theorem 17. Consider a collection of pseudo belief functions {ς, ς1 , ..., ςn } defined
on the same frame of discernment. If ςi is combinable with ς (∆i 6= 0) for all i then:

v(ς ⊕ v(ς1 , ..., ςn )) = v(ς ⊕ ς1 , ..., ς ⊕ ςn ).

More precisely,

v(ς ⊕ ς1 , ..., ς ⊕ ςn ) = ς ⊕ v(ς1 , ..., ςn ) ∪ M(ς, ς1 , ..., ςn ),

where M(ς, ς1 , ..., ςn ) is the following affine subspace:


 
∆j ∆n
v ς ⊕ ςj − ς ⊕ ςi ∀j : ∆j 6= ∆n , ∀i : ∆i = ∆n . (7.7)

∆j − ∆n ∆j − ∆ n

If {ς, ς1 , · · · , ςn } = {b, b1 , ..., bn } are all belief functions, then

v(b ⊕ v(b1 , ..., bn )) = v(b ⊕ bi1 , ..., b ⊕ bim )

where {bi1 , · · · , bim }, m ≤ n, are all the belief functions combinable with b in the
collection {b1 , ..., bn }.

7.4.1 Affine region of missing points

Theorem 17 states that Dempster’s rule maps affine spaces to affine spaces, but for
a lower dimensional subspace. From its proof (see Chapter Appendix), the affine
coordinates {αi } of a point τ ∈ v(ς1 , ..., ςn ) correspond to the affine coordinates
{βi } of the sum ς ⊕ τ ∈ v(ς ⊕ ς1 , ..., ς ⊕ ςn ) through the following equation:
βi 1
αi = P . (7.8)
∆i j βj /∆j

Hence, the values of the affine coordinates βi of v(ς ⊕ ς1 , ..., ς ⊕ ςn ) which are not
associated with affine coordinates of v(ς1 , ..., ςn ) are given by Equation (7.24):
236 7 Geometry of Dempster’s rule
X βi
= 0. (7.9)
i
∆i

If the map ς ⊕ (.) is injective then the points of the subspace M(ς, ς1 , ..., ςn ) asso-
ciated with the affine coordinates βi meeting (7.9) are not images through ς ⊕ (.) of
any points of v(ς1 , ..., ςn ): we call them missing points.
However, if the map in not injective, points in the original affine space with ad-
missible coordinates can be mapped onto M(ς, ς1 , ..., ςn ). In other words, missing
coordinates do not necessarily determine missing points. THIS PART TO CLAR-
IFY, EXAMPLE?
If we restrict our attention to convex combinations only (αi ≥ 0 ∀i) of belief
functions (∆i ≥ 0), Theorem 17 implies that
Corollary 6. Cl and ⊕ commute, i.e. if b is combinable with bi ∀i = 1, ..., n, then

b ⊕ Cl(b1 , ..., bn ) = Cl(b ⊕ b1 , · · · , b ⊕ bn ).

7.4.2 Non-combinable points and missing points: a duality

Even when all the pseudo belief functions ςi of Theorem 17 are combinable with
ς, the affine space v(ς1 , ..., ςn ) generated by them includes an affine subspace of
non-combinable functions, namely those meeting the following constraint:
X
αi ∆i = 0, (7.10)
i

where ∆i is the degree of conflict between ς and ςi .


There exists a sort of duality between these non-combinable points and the miss-
ing points in the image subspace v(ς ⊕ ς1 , ..., ς ⊕ ςn ).
In fact, by Equation 7.8, M(ς, ς1 , ..., P ςn ) turns out to be the image of the infinite
βi
point of v(ς1 , ..., ςn ) via b ⊕ (.), since i ∆ i
= 0 implies αi → ∞.
On the other hand, the non-combinable points in v(ς1 , ..., ςn ) meet Equation (7.10),
so that Equation (7.9) yields βj → ∞. Non-combinable points are hence mapped to
the infinite point of v(ς ⊕ ς1 , ..., ς ⊕ ςn ).
This geometric duality is graphically represented in Figure 7.2.

7.4.3 The case of unnormalized belief functions

The results of this Section greatly simplify when we consider unnormalized belief
functions (u.b.f.s), i.e., belief functions assigning non-zero mass to the empty set
too. Unnormalized belief functions are obtained by relaxing the constraint m(∅) = 0
in Definition 2. The meaning of the basic probability value of ∅ has been studied by
Smets [1239], as a measure of the internal conflict present in a b.p.a. m. It is easy to
see that, for Dempster’s sum of two belief functions b1 and b2 , we get mb1 ⊕b2 (∅) =
1 − ∆(b1 , b2 ) with ∆(b1 , b2 ) as above.
7.4 Commutativity 237

åa D
i
i i =0 V Å V1
V1

V Å V2

b
V2 åi D
i

i
= 0

Fig. 7.2. The dual role of non-combinable and missing points in Theorem 17, and their
relation with the infinite points of the associated affine spaces.

Clearly, Dempster’s rule can be naturally modified to cope with such functions.
Equation (2.6) simplifies in the following way: if mb1 , mb2 are the b.p.a.s of two
unnormalized b.f.s, their Demspter’s combination becomes:
X
mb1 ⊕b2 (C) = mb1 (Ai )mb2 (Bj ). (7.11)
i,j:Ai ∩Bj =C

This new operator gets the name of unnormalized rule of conditioning, and has been
introduced by Smets in its Transferable Belief Model [1218] (cf. Chapter ??, Section
3.3.1).
Obviously enough, unnormalized belief functions are always combinable through
(7.11). If we still denote by ⊕ the unnormalized conditioning operator, given a col-
lection of u.b.f.s b̃, b̃1 , ..., b̃n , we get that
X X hX i
mb⊕P αi b̃i (C) = mP αi b̃i (B)mb̃ (A) = αi mb̃i (B) mb̃ (A)
i i
B∩A=C
X X B∩A=C
X i
= αi mb̃i (B)mb̃ (A) = αi mb̃⊕b̃i (C)
i B∩A=C i

= mP αi b̃⊕b̃i (C). Therefore, Corollary 5 transforms as follows.


i
238 7 Geometry of Dempster’s rule

Proposition 38. If b̃, b̃1 , ..., b̃n are unnormalized belief functions defined on the
same frame of discernment, then:
X X
b̃ ⊕ αi b̃i = αi b̃ ⊕ b̃i ,
i i
P
whenever i αi = 1, αi ≥ 0 ∀i.

Clearly Theorem 16 also simplifies, for the coefficients of a convex combination are
preserved under (7.11). Namely
X X X
b̃0 = mb̃0 (A)bA ⇒ b̃ ⊕ b̃0 = mb̃0 (A)b̃ ⊕ bA = mb̃0 (A)b̃|A.
A∈Eb̃0 A∈Eb̃0 A∈Eb̃0

The commutativity results of Section 7.4 remain valid too. Indeed, Proposition 38
implies that:
nX X o
b̃ ⊕ Cl(b̃1 , ..., b̃n ) = b̃ ⊕ αi b̃i : αi = 1, αi ≥ 0
n X X i oi
= b̃ ⊕ αi b̃i : αi = 1, αi ≥ 0
nX i Xi o
= αi b̃ ⊕ b̃i : αi = 1, αi ≥ 0 = Cl(b̃ ⊕ b̃1 , ..., b̃ ⊕ b̃n )
i i

for any collection of u.b.f.s {b̃, b̃1 , · · · , b̃n }, since their combinability is always guar-
anteed.

7.5 Conditional subspaces


7.5.1 Definition

The commutativity results we proved in Section 7.2 are rather powerful, as they
specify how the rule of combination works when applied to entire regions of the
Cartesian space, and in particular to affine closures of normalized sum functions.
Since the belief space itself is a convex region of RN , it is easy to realize that these
results can help us draw a picture of the “global” behavior of ⊕ within our geometric
approach to the theory of evidence.
Definition 75. Given a belief function b ∈ B we call conditional subspace hbi the
set of all Dempster’s combinations of b with any other combinable belief function
on the same frame, namely:
.
n
hbi = b ⊕ b0 , b0 ∈ B s.t. ∃ b ⊕ b0 }. (7.12)
7.5 Conditional subspaces 239

Roughly speaking, hbi is the set of possible “futures” of b under the assumption that
new evidence is combined with b via Dempster’s rule.
Since not all belief functions are combinable with a given b, we need to un-
derstand the geometric structure of such combinable b.f.s. Let us call compatible
subspace C(b) associated with a belief function b the collection of all the b.f.s with
focal elements included in the core of b:
.
n o
C(b) = b0 : Cb0 ⊂ Cb .

The conditional subspace associated with b is nothing but the result of combining b
with its compatible subspace.

Py

<s>
C(s)=S2

PQ
Px
Fig. 7.3. REVISE Conditional and compatible subspaces for a belief function b in the binary
belief space B2 . The coordinate axes measure the belief values of {x} and {y}, respectively.
The vertices of hbi are b, bx and by since b ⊕ bx = bx ∀b 6= by , and b ⊕ by = by ∀b 6= bx .

Theorem 18. hbi = b ⊕ C(b) = Cl{b ⊕ bA , A ⊆ Cb }.

Proof. Let us denote by Et = {Ai } and Es = {Bj } the focal elements of two belief
functions defined on the same frame b0 and b, respectively, where b0 is combinable
with b. Obviously Bj ∩Ai = (Bj ∩Cb )∩Ai = Bj ∩(Ai ∩Cb ). Therefore, once defined
.
a new b.f. b00 with focal elements {Aj , j = 1, ..., m} = {Ai ∩ Cb , i = 1, ..., n} and
basic probability assignment
X
mb00 (Aj ) = mb0 (Ai ),
i:Ai ∩Cb =Aj
240 7 Geometry of Dempster’s rule

we have that b ⊕ b0 = b ⊕ b00 . In other words, any point of hbi is a point of b ⊕ C(b).
The reverse implication is trivial.
Finally, Theorem 11 ensures that C(b) = Cl(bA , A ⊆ Cb ), so that Corollary 6
eventually yields the desired expression for hbi (being bA combinable with b for all
A ⊂ Cb ).

Figure 7.3 illustrates the form of the conditional subspaces in the belief space
related to the simplest, binary frame.
The original belief function b is always a vertex of its own conditional subspace hbi,
as the result of the combination of itself with the b.f. focussed on its core: b⊕bCb = b.
In addition the conditional subspace is a subset of the compatible one, hbi ⊆ C(b),
since if b00 = b ⊕ b0 for some b0 ∈ C(b) then Cb00 ⊆ Cb , i.e., b00 is combinable with b
as well.

7.5.2 The case of unnormalized belief functions

The notion of conditional subspace is directly applicable to unnormalized belief


functions as well. More precisely, we can write:
n o
hb̃i = b̃ ⊕ b̃0 , b̃0 ∈ B̃ ,

since all u.b.f.s are combinable with any arbitrary u.b.f. b̃. The idea of compatible
subspace retains its validity, though, as the empty set is a subset of the core of any
u.b.f. Note that in this case, however, if Cb̃ ∩ Cb̃0 = ∅ then the combination b̃ ⊕ b˜0
reduces to the single point b∅ .
The proof of Theorem 18 still works for u.b.f.s too, so that we can write

hb̃i = b̃ ⊕ C(b̃)

where C(b̃) = Cl(bA : ∅ ⊆ A ⊆ Cb̃ ) (A = ∅ this time included).

7.5.3 Vertices of conditional subspaces

The vertices of a conditional subspaces possess an interesting structure. Indeed,


Equation (2.6) implies:
X
mb (E)
E∈Eb :E∩A⊂B
b ⊕ bA (B) = X =
1− mb (E)
E∈Eb :E∩A=∅

(since E ∩ A ⊂ B implies E ∩ (A ∩ B) 6= ∅ but E ∩ (A \ B) = ∅, see Figure 7.4),

b((A \ B)c ) − b(Ac ) plb (A) − plb (A \ B)


= = .
plb (A) plb (A)
7.6 Constant mass loci 241

Therefore:
1 X
b ⊕ bA = vB (plb (A) − plb (A \ B)), (7.13)
plb (A)
B⊂Θ

having denoted as usual by vB the B-th axis of the orthonormal reference frame in
|Θ|
R2 −2 with respect to which we measure belief coordinates. Notice that plb (A) 6=
0 for every A ⊆ Cb .

B
E

EÇB
A ÇB
A

Fig. 7.4. One of the sets E involved in the computation of the vertices of hsi.

Equation (7.13) suggests an interesting result.


Theorem 19.
X
b= (−1)|Cb \A|+1 plb (A)b ⊕ bA + (−1)|Cb |−1 mb (Cb )1, (7.14)
∅(A⊆Cb

|Θ|
−2
where 1 denotes the vector of R2 whose components are all equal to 1.
Any belief function b can be decomposed as an affine combination of its own Demp-
ster’s combinations with all the categorical belief functions that agree with it (up to
a constant mb (Cb ) that measures the uncertainty of the model). The coefficients of
this decomposition are nothing but the plausibilities of the events A ⊆ Cb given the
evidence represented by b.

7.6 Constant mass loci


As we mentioned before Theorem 18 depicts, in a sense, the global action of Demp-
ster’s rule in the belief space. Theorem 16, instead, expresses its pointwise behavior
242 7 Geometry of Dempster’s rule

in the language of affine geometry. It would be interesting to give a geometric inter-


pretation to Equation (7.5), too. The commutativity results we proved in Section 7.4
could clearly be very useful, but we still need an intuition to lead us in this effort.
In remaining of the Chapter we will find this source of intuition in the discussion
of the pointwise geometry of Dempster’s rule in the simplest, binary frame and make
inferences about the general case.
We will realize that Dempster’s rule exhibits a very elegant behavior when applied
to sets of belief functions that assign the same mass k to a fixed subset A, or constant
mass loci. Such loci turn out to be convex sets, while they assume the form of affine
subspaces when we extend the analysis to normalized sum functions. This allows us
to apply the commutativity results of Theorem 17 to compute their images through
the map b ⊕ (.).
The amazing fact is that, for any subset A, the resulting affine spaces have a
common intersection for all k ∈ [0, 1], which is therefore characteristic of b. We
call it A-th focus of the conditional subspace. In the latter sections we will prove
the existence and study the geometry of these foci. This will in turn lead us to an
interesting geometric construction for the orthogonal sum of two belief functions.

7.6.1 Geometry of Dempster’s rule in S2

We already know that when Θ2 = {x, y} the belief space B2 is 2-dimensional.


Hence, given two belief functions b = [mb (x), mb (y)]0 and b0 = [k, l]0 on Θ2 , it is
simple to derive the coordinates of their orthogonal sum as:
(1 − mb (x))(1 − k)
mb⊕b0 ({x}) = 1 − ;
1 − mb (x)l − mb (y)k (7.15)
(1 − mb (y))(1 − l)
mb⊕b0 ({y}) = 1 − .
1 − mb (x)l − mb (y)k
At a first glance this expression does not suggest any particular geometrical intu-
ition. Let us then keep the first operand b fixed, and analyze the behavior of b ⊕ b0
as a function of the two variables k, l.
We need to distinguish two cases: mb (Θ2 ) 6= 0 or mb (Θ2 ) = 0, since in the first
case hbi = Cl(b, b ⊕ bx , b ⊕ by ) = Cl(b, bx , by ), while if b = p is Bayesian
hpi = Cl(p, p ⊕ bx , p ⊕ by ) = Cl(p, bx , by ) = Cl(bx , by ).
If mb (Θ2 ) 6= 0, when keeping k constant in Equation (7.15), the resulting com-
bination describes a line segment in the belief space. If we instead allow b0 to be
a normalized sum function and apply the extended Dempster’s rule, we appreciate
that the locus of all the combinations turns out to be the line containing the same
segment, but for a single point with coordinates
 mb (Θ2 ) 
Fx (b) = 1, − , (7.16)
mb (x)

which incidentally is the limit of b ⊕ b0 for l → ±∞ (we omit the details). This is
true for every k ∈ [0, 1], as shown in Figure 7.5.
7.6 Constant mass loci 243

Fig. 7.5. The x-focus Fx of a conditional subspace hb1 i in the binary belief space for
m1 (Θ) 6= 0. The white circle place in Fx indicates that the latter is a missing point for
each of the lines representing images of constant mass loci.

Simple manipulations of Equation (7.15) can help us to realize that all the col-
lections of Dempster’s sums b ⊕ b0 (where b0 is a n.s.f.) with k = const have a
common intersection at the point (7.16) located outside the belief space. This is true
in the same way for the sets {b ⊕ b0 : l = const}, which lie each on a distinct line
passing through a twin point:
 mb (Θ) 
Fy (b) = − ,1 .
mb (y)
We call Fx (b), Fy (b) the foci of the conditional subspace hbi.
Note that Fx (b) can be located by intersecting the two lines for k = 0 and
k = 1. It is also worth to notice that the shape of these loci is in agreement with the
prediction of Theorem 17 for b1 = kbx , b2 = kbx + (1 − k)by . Indeed, according
to Equation (7.7) the missing points are supposed to have coordinates:
244 7 Geometry of Dempster’s rule
∆1 ∆2   1 − k + k(1 − m (y))
b
v b ⊕ b1 − b ⊕ b2 = v · b ⊕ kbx +
∆1 − ∆2 ∆1 − ∆2 mb (x)(1 − k)
(mb (x) − 1)(1 − k) − k(1 − mb (y)) 
+ b ⊕ [kbx + (1 − k)by ] ,
mb (x)(1 − k)

which coincide with those of Fx (b) (as it is easy to check). It is quite interesting
to note that the intersection takes place exactly where the images of the lines {k =
const} do not exist.
If ms (Θ2 ) = 0 the situation is slightly different. The combination locus turns
out to be v(Px , Py ) \ {Px } for every k ∈ [0, 1) (note that in this case Fx (s) =
(1, −ms (Θ)/mx ) = (1, 0) = Px ). If k = 1, instead, Equations (7.15) yield

Px l 6= 1
s ⊕ [1 · Px + l · Py ] =
∅ l = 1.

Incidentally, in this case the missing coordinate l = 1 (see Section 7.4) does not
correspond to an actual missing point. The situation is represented in Figure 7.6.

Py

Fx(s)=Px
®¥
PQ
®¥
Î

Fig. 7.6. The x-focus of a conditional subspace in the binary belief space for mb (Θ2 ) = 0
(b ∈ P). For each value of k in [0, 1) the image of the locus k = const through the map
b⊕(.) coincides with the line spanned by P, with missing point bx . The value of the parameter
l of this line is shown for some relevant points. For k = 1 the locus reduces to the point bx
for all values of l.
7.6 Constant mass loci 245

It is interesting to note that, when mb (Θ2 ) 6= 0, for all b0 = [k, l]0 ∈ B2 the sum
b ⊕ b0 is uniquely determined by the intersection of the following lines:
. .
lx = b ⊕ {b00 : mb00 (x) = k} ly = b ⊕ {b00 : mb00 (y) = l}.

These lines are in turn determined by the related focus plus an additional point. We
can for instance choose their intersections with the probabilistic subspace P:
. .
px = lx ∩ P py = ly ∩ P,

for px = b ⊕ p0x and py = b ⊕ p0y where p0x , p0y are the unique probabilities with
m(x) = k, m(y) = l respectively.

Fig. 7.7. Graphical construction of Dempster’s orthogonal sum in B2 .

This suggests a geometrical construction for the orthogonal sum of a pair of be-
lief functions b, b0 in B2 :

Algorithm.
1. compute the foci Fx (b), Fy (b) of the conditional subspace hbi;
2. project b0 onto P along the orthogonal directions, obtaining p0x and p0y ;
3. combine b with p0x and p0y to get px and py ;
4. draw the lines px Fx (b) and py Fy (b): their intersection is the desired orthogonal
sum b ⊕ b0 .
illustrated in Figure 7.7.
246 7 Geometry of Dempster’s rule

By the way, the notion of focus makes sense even for A = Θ2 . For instance, in
the case mb (Θ2 ) 6= 0 a few passages yield the following coordinates for the FΘ2 (b):
 1 − mb (y) 1 − mb (x) 
FΘ2 (b) = , , (7.17)
mb (x) − mb (y) mb (y) − mb (x)

which turns out to belong to v(P). Anyway, the point is FΘ (b) plays no role in
the geometric construction of Dempster’s rule, and we will not mention it in the
following.
It is instead important to realize that the algorithm does not work when mb (Θ2 ) =
0, since in this case the intersection of the lines

px Fx (b) = px bx = v(P ), py Fy (b) = py by = v(P )

is clearly not unique!

7.6.2 Affine form of constant mass loci

Our study of the binary belief space suggests how to proceed in the general case.
We first need to precise the shape of constant mass loci, and the way Dempster’s
rule acts on them. After proving the existence of the intersections of their images
and understanding their geometry we will finally be able to formulate a geometric
construction for the orthogonal sum in a generic belief space. Let us introduce the
following notations
k . k .
HA = {b : mb (A) = k}, k ∈ [0, 1]; HA = {ς : mς (A) = k}, k ∈ R

for constant mass loci related to belief functions and normalized sum functions, re-
k
spectively. Their dimension is of course dim(B)−1, that for B2 becomes dim(HA )=
4 − 2 − 1 = 1 and any constant mass locus is a line.
Theorem 20.
k
HA = v(kbA + γB bB : B ⊆ Θ, B 6= A), k∈R

where γB is an arbitrary non-zero real number for any B ⊆ Θ, B 6= A, while


k
HA = Cl(kbA + (1 − k)bB : B ⊆ Θ, B 6= A), k ∈ [0, 1].

Proof. By Equation (??), mb (A) = k iff


X X
b = kbA + αB bB , αB + k = 1 (7.18)
B6=A B6=A

with αB ≥ 0 ∀B 6= A. Trivially,
X X
0 0 0
b = kbA + (1 − k) αB bB , αB = 1, αB ≥ 0 ∀B 6= A
B6=A B6=A
7.6 Constant mass loci 247

so that

b ∈ kbA + (1 − k)Cl(bB , B 6= A) = Cl(kbA + (1 − k)bB , B 6= A).


k
Equation (7.18) characterizes HA too. If, on the other side, ς ∈ v(kPA + γB PB :
B ⊂ Θ, B 6= A), then ς can be written as:
X X X
ς= βB (kbA + γB bB ) = βB kbA + βB γB bB =
B6=A B6=A B6=A  
X X X
= kbA + βB γB bB = kbA + βB γB bB + 1 − k − βB γB bΘ ,
B6=A B6=A B6=A
P
since bΘ = 0 ( B6=A βB = 1). The last expression becomes
X  X 
kbA + βB γB bB + 1 − k − β B γ B bΘ .
B6=A,Θ B6=A,Θ

. . P
After denoting αB = βB γB for B 6= A, Θ, αΘ = 1 − k − B6=A,Θ βB γB we find
Equation (7.18) again.

7.6.3 Action of Dempster’s rule on constant mass loci

Once expressed constant mass loci as affine closures, we can exploit the commu-
tativity property to compute their images through Dempster’s rule. Since hbi =
b ⊕ C(b), our intuition suggests that we should only consider constant mass loci
related to subsets of Cb . In fact, given a belief function b0 with basic probability
assignment mb0 , it is clear that, by definition:
\ m 0 (A)
b0 = HA b ,
A⊆Θ

for there cannot be several distinct normalized sum functions with the same mass
assignment. From Theorem 18 we know that b ⊕ b0 = b ⊕ b00 , where 00 0
P b = b ⊕ bCb is
a new belief function with probability assignment mb00 (A) = B:B∩Cb =A mb0 (B).
After introducing the notations
.
n o
k k
HA (b) = b ∈ B : b ∈ C(b), mb (A) = k = HA ∩ C(b)
.
n o
k N k
HA (b) = ς ∈ R : ς ∈ v(C(b)), mς (A) = k = HA ∩ v(C(b))

with A ⊆ Cb , we can write


m (A)
\
b00 = HA b00 (b),
A⊆Cb

m (A)
since b00 ∈ HA b00 for all A ⊆ Cb by definition, and the intersection is unique in
v(C(b)). We are then only interested in computing loci of the type
248 7 Geometry of Dempster’s rule
k
b ⊕ HA (b) = b ⊕ Cl(kbA + (1 − k)bB : B ⊆ Cb , B 6= A).

as a natural consequence of Theorem 20.


To get a more comprehensive view of the problem, let us consider the expression
k
b ⊕ HA = b ⊕ Cl(kbA + (1 − k)bB : B ⊆ Θ, B 6= A),
.
and assume Cb 6= Θ. Theorem 15 allows us to write, whenever ∆kB = kplb (A) +
(1 − k)plb (B) 6= 0,
h i kpl (A)
b (1 − k)plb (B)
b ⊕ kbA + (1 − k)bB = b ⊕ bA + b ⊕ bB
∆hkB ∆kB i (7.19)
= b ⊕ kbA∩Cb + (1 − k)bB∩Cb ,

for b ⊕ bX = b ⊕ bX∩Cb and plb (X) = plb (X ∩ Cb ). When A ∩ Cb = ∅ Theorem 17


yields (since kbA + (1 − k)bB is combinable with b iff B ∩ Cb 6= ∅)
 
k
b ⊕ HA = Cl b ⊕ [kbA + (1 − k)bB ] : B ∩ Cb 6= ∅, B 6= A
 
= Cl b ⊕ [kbA + (1 − k)bB ] : B ⊆ Cb = Cl(b ⊕ bB : B ⊆ Cb ) = hbi

for Equation (7.19), since B ⊆ Cb implies B 6= A. The set trivially coincides with
the whole conditional subspace.
k
If, on the other hand, A ∩ Cb 6= ∅ the image of HA becomes
 
k
b ⊕ HA = Cl b ⊕ bA , b ⊕ [kbA + (1 − k)bB ] : B ⊆ Cb , B 6= A

regardless of whether A ⊂ Cb or A 6⊂ Cb . Indeed, if A ⊆ Cb then


 
k
b ⊕ HA = Cl b ⊕ [kbA + (1 − k)bB ] : B ⊆ Cb

since B = A ∪ Cbc 6= A is associated with the point

b ⊕ [kbA + (1 − k)bA∪Cbc ] = b ⊕ [kbA + (1 − k)bA ] = b ⊕ bA

by Equation (7.19). If instead A 6⊂ Cb , the set B = A ∩ Cb 6= A corresponds to the


point

b ⊕ [kbA + (1 − k)bA∩Cb ] = b ⊕ [kbA∩Cb + (1 − k)bA∩Cb ] = b ⊕ bA∩Cb = b ⊕ bA .


k
In conclusion, b ⊕ HA has the undesirable property of assuming different shapes
according to whether Cb = Θ or Cb 6= Θ. Moreover, in the latter case the notion
of focus vanishes and a geometric construction of Dempster’s rule makes no sense
(just recall the 2D example of Section 7.6.1).
In the following we will then operate on images of restrictions of constant mass loci
to the compatible subspace C(b), as suggested by our geometric intuition. Again,
k
since HA (b) = Cl(kbA + (1 − k)bB : B ⊂ Cb , B 6= A), Corollary 6 yields
7.7 Geometric orthogonal sum 249
 
k
b ⊕ HA (b) = Cl b ⊕ [kbA + (1 − k)bB ] : B ⊆ Cb , B 6= A ,

where b ⊕ [kbA + (1 − k)bB ] is again given by Equation (7.19).


When k = 0, in particular,
0
b ⊕ HA (b) = Cl(b, b ⊕ bB : B 6= A)
i.e., we get the antipodal face of hbi with respect to the event A (consult [222]).

7.7 Geometric orthogonal sum


7.7.1 Foci of conditional subspaces

We have previously conjectured that the 2-dimensional example of Section 7.6.1


does not represent an anomaly, and it is always true that all the affine subspaces v(b⊕
k
HA (b)) (images of the constant mass loci for normalized sum functions) related to
a same subset A ⊆ Cb have a common intersection, which is then characteristic of
the conditional subspace hbi itself. Accordingly:
Definition 76. We call A-th focus of the conditional subspace hbi, A ⊆ Cb , the
linear variety:
. \ k
FA (b) = v(b ⊕ HA (b)).
k∈[0,1]

As we mentioned before we have no interest in the focus FCb (b). Hence, in the
following discussion we will assume A 6= Cb .
The toy problem of Section 7.6.1 also suggests that an analysis exclusively based
on belief functions could lead to wrong conclusions. In such case, in fact, since
1
v(b ⊕ HA (b)) reduces to the single point b ⊕ bA , we would only able to compute the
intersection: \
k
v(b ⊕ HA (b)),
k∈[0,1)

which is in general different from the actual focus FA (b). Consider for instance the
case mb (Θ2 ) = 0 in the binary belief space.
Our conjecture about the existence of the foci is indeed supported by a rigorous
analysis based on the affine methods we introduced in Section 7.2 and the results of
Section 7.6.3. We first note that when k ∈ [0, 1) Theorem 20 yields (since we can
choose γB = 1 − k for all B 6= A):
k
b ⊕ HA (b) = b ⊕ v(kbA + (1 − k)bB , B ⊂ Cs B 6= A).
k
As the generators of HA (b) are all combinable with b, we can apply Theorem 17
and get:
k k
v(b ⊕ HA (b)) = v(b ⊕ [kPA + (1 − k)PB ] : B ⊆ Cb , B 6= A) = v(b ⊕ HA (b)).
(7.20)
Let us then take a first step towards a proof of existence of FA (b).
250 7 Geometry of Dempster’s rule

k
Theorem 21. For all A ⊆ Cb the family of affine spaces {v(b ⊕ HA (b)) : 0 ≤ k <
0 . \ k
1} has a non-empty common intersection FA (b) = v(b ⊕ HA (b)), and
k∈[0,1)

0
FA (b) ⊃ v(ςB |B ⊆ Cb , B 6= A),

where
1 plb (B)
ςB = b+ b ⊕ bB . (7.21)
1 − plb (B) plb (B) − 1
The proof of Theorem 21 can be easily modified to cope with the case A = Cb .
System (7.29) is still valid, so we just need to modify the last part of the proof by
replacing A = Cb with another arbitrary subset C ( Cb . This yields a family of
generators for FC0 b (b), whose shape

plb (C) plb (B)


b ⊕ bC + b ⊕ bB
plb (C) − plb (B) plb (B) − plb (C)
turns out to be slightly different from that of Equation (7.21). Clearly, when applied
to the binary belief space this formula yields expression (7.17).
Now, the binary case suggests that the focus should be uniquely determined by
the intersection of just two subspaces, one of which associated with some k ∈ [0, 1),
1
the other being v(b ⊕ HA (b)). We can simplify the maths by choosing k = 0: the
result is particularly attractive.
0 1
Theorem 22. v(b ⊕ HA (b)) ∩ v(b ⊕ HA (b)) = v(ςB |B ⊆ Cb , B 6= A).
Again, the proof can be modified to include the case A = Cb . An immediate conse-
quence of Theorems 22 and 21 is that:
\
0 1 k
v(b ⊕ HA (b)) ∩ v(b ⊕ HA (b)) ⊂ v(b ⊕ HA (b)),
k∈[0,1)

so that in turns:
\ \
k 1 k
FA (b) = v(b ⊕ HA (b)) = v(b ⊕ HA (b)) ∩ v(b ⊕ HA (b))
k∈[0,1] k∈[0,1)
\
0 1 k
= v(b ⊕ HA (b)) ∩ v(b ⊕ HA (b)) ∩ v(b ⊕ HA (b))
k∈[0,1)
0 1
= v(b ⊕ HA (b)) ∩ v(b ⊕ HA (b)).
In other words,
Corollary 7. Given a belief function b, the A-th focus of its conditional subspace
hbi is the affine subspace

FA (b) = v(ςB |B ⊆ Cb , B 6= A) (7.22)

generated by the collection of points (7.21). It is natural to call them focal points of
the conditional subspace hbi.
7.7 Geometric orthogonal sum 251

Note that the coefficient of b in Equation (7.21) is non-negative, while the coefficient
of b ⊕ bB is non-positive. Hence, focal points cannot be internal points of the belief
space, i.e, they are not admissible belief functions. Nevertheless, they possess a very
intuitive meaning in terms of mass assignment, namely:

ςB = lim b ⊕ (1 − k)bB .
k→+∞

Indeed,

lim b ⊕ (1 − k)bB = lim [kbΘ + (1 − k)bB ]


k→+∞ k→+∞
 kb (1 − k)plb (B)b ⊕ bB 
= lim +
k→+∞ k + (1 − k)plb (B) k + (1 − k)plb (B)
1 plb (B)
= b− b ⊕ bB = ςB .
1 − plb (B) 1 − plb (B)
The B-th focal point can then be obtained as the limit of the combination of s with
the simple belief function having B as only non trivial focal element, when the mass
of B tends toward −∞.
An even more interesting relationship connects the focal points of hbi to the
k
missing points of v(b ⊕ HA (b)).
Theorem 23. FA (b) coincides with the missing point subspace for each locus v(b⊕
k
HA (b)) ∀ k ∈ [0, 1].
k
Proof. The collection of missing points of v(b ⊕ HA (b)) is determined by the limits
limγB →∞ b ⊕ [kbA + γB bB ] for every B ⊂ Cb , B 6= A. But then:
h kplb (A)b ⊕ bA
lim b ⊕ [kbA + γB bB ] = lim +
γB →∞ kplb (A) + γB plb (B) + 1 − k − γB
γB →∞
γB plb (B)b ⊕ bB (1 − k − γB )b i
+ + =
kplb (A) + γB plb (B) + 1 − k − γB kplb (A) + γB plb (B) + 1 − k − γB
= ςB for every B 6= A.

This again confirms what we have seen in the binary case, where the focus Fx (b)
turned out to be located in correspondence to the missing points of the images v(b ⊕
[kbx + by ], b ⊕ [kbx + bΘ ]) (see Figures 7.5, 7.6).

7.7.2 Algorithm

We are finally ready to formulate a geometric algorithm for Dempster’s combination


of belief functions in the general belief space. Let us then consider the orthogonal
sum b ⊕ b0 = b ⊕ b00 of a pair b, b0 of belief functions, where b00 is the projection of b0
onto C(b). As we have proved in Section 7.6.3, the second b.f. is uniquely identified
as the intersection: \ m 00 (A)
b00 = HA b (b).
A⊂Cb
252 7 Geometry of Dempster’s rule

Now,
m (A)
\
b ⊕ b00 ∈ b ⊕ HA b00 (b)
A⊂Cb
n o
= b000 ∈ C(b) ∀A ⊂ Cb ∃b0A : mb0A (A) = mb00 (A) : b000 = b ⊕ b0A .

If, in addition, the map b ⊕ (.) is injective (i.e., if dim(hbi) = dim(C(b))), such
intersection is unique as there can be only one such b00 = b0A for all A. In other
words, ⊕ and ∩ commute and we can write:
m (A) m (A)
\ \
b ⊕ b00 = b ⊕ HA b00 (b) = v(b ⊕ HA b00 (b))
A⊂Cb  A⊂Cb 
\
= v b ⊕ [mb00 (A)bA + (1 − mb00 (A))bB ] : B ⊂ Cb , B 6= A
A⊂Cb

by Equation (7.20).
At this point the geometric algorithm for the orthogonal sum b ⊕ b0 is easily
outlined. We just need one last result.

Theorem 24.
k
v(b ⊕ HA (b)) = v(FA (b), b ⊕ kbA ).

The proof is valid for k < 1 since when k = 1 the combination is trivial, but can be
easily modified to cope with unitary masses.

Algorithm.
1. First, all the foci {FA (b), A ⊆ Cb } of the subspace hbi conditioned by the first
belief function b are computed by calculating the corresponding focal points
(7.21);
2. then, an additional point b ⊕ mb00 (A)bA for each A ⊆ Cb is detected, selecting
the subspace
 
m (A)
v(b⊕HA b00 (b)) = v b⊕[mb00 (A)bA +(1−mb00 (A))bB ] : B ⊂ Cb , B 6= A ;

3. all these subspaces are intersected, eventually yielding the desired combination
b ⊕ b0 = b ⊕ b00 .
It is interesting to note that the focal points ςB have to be computed just once as
trivial functions of the upper probabilities plb (B) ∀B ⊆ Cb . In fact, each focus is
nothing more than a particular selection of 2|Cb | − 3 focal points out of a collection
of 2|Cb | − 2. Different groups of points are selected for each foci with no need for
further calculations.
Without discussing the computational complexity of the algorithm, we just point
out that the computation of ςB involves just Bayes’ conditioning (as in b|B = b ⊕
bB ) rather then the more general Dempster’s sum. There is hence no need for any
multiplication of probability assignments.
7.9 Open questions 253

7.8 Consistency of conditional belief functions


From our previous discussion, it is clear that each point x of the belief space has
multiple interpretations in terms of belief functions. In other words, it can be thought
as the geometric representative of infinitely many conditional belief functions,

x = b|ς, ∀ ς : x ∈ hςi.

Along this line we can generalize the notion of consistent probabilities, defining a
probability measure p to be consistent with a conditional b.f. x = b|ς when it domi-
nates x according to the oblique coordinates associated to the conditional subspace
hςi.
The set of probabilities consistent with a conditional b.f. b|ς then becomes:
.
n o
P̃(b) = p : p̃(A) ≥ b̃(A) ∀ A .

Theorem 25. P̃ commutes with Dempster’s rule:

P̃(b ⊕ b0 ) = b ⊕ P̃(b0 ).

7.9 Open questions


The study of the geometric properties of Dempster’s rule of combination in the
framework of the belief space has led us in this Chapter to first extend Dempster’s
rule to normalized sum functions, and later prove that orthogonal sum and affine
closure commute in the space of normalized sum functions. This also allowed us
to unveiled an interesting convex decomposition of the orthogonal sum in terms of
Bayes’ conditioning.
We investigated the geometry of conditional subspaces, proving their convex shape.
The pointwise behavior of the rule of combination has also been studied, founding
it on the notion of constant mass locus. The commutativity results have been used
to prove the existence of the foci of conditional subspace, that eventually led us to
propose a geometric construction of the orthogonal sum of two belief functions.

Appendix: proofs

Proof of Theorem 14

Let mς1 , mς2 be the Moebius transforms of ς1 , ς2 , respectively. The application of


Equation (2.6) to ς1 , ς2 yields a mass that satisfy the normalization constraint. In-
deed,
254 7 Geometry of Dempster’s rule
X
mς1 (A)mς2 (B)
X X A∩B=C
mς1 ⊕ς2 (C) = X
C6=∅ 1−
C6=∅ mς1 (A)mς2 (B)
X X A∩B=∅ X
mς1 (A)mς2 (B) mς1 (A)mς2 (B)
C6=∅ A∩B=C A∩B6=∅
= X = X .
1− mς1 (A)mς2 (B) 1− mς1 (A)mς2 (B)
A∩B=∅ A∩B=∅
P P
SinceP A⊆Θ mς1 (A) = 1 and A⊆Θ mς2 (A) = 1 (for ς1 , ς2 are n.s.f.s) we have
that A⊆Θ,B⊆Θ mς1 (A)mς2 (B) = 1. Therefore:
X X
mς1 (A)mς2 (B) = 1 − mς1 (A)mς2 (B).
A,B:A∩B6=∅ A,B:A∩B=∅

Proof of Lemma 6
P
By Lemma 5 i αi ςi is combinable with ς. Hence, remembering that
X X
mPi αi ςi (B) = (−1)|B\X| αi ςi (X) =
X⊂B
X X i X
= αi (−1)|B\X| ςi (X) = αi mςi (B)
i X⊂B i

Dempster’s rule yields mς⊕Pi αi ςi (C) =


X X X
mPi αi ςi (B)mς (A) mς (A) αi mςi (B)
B∩A=C i
= X = B∩A=C
X X =
1− mPi αi ςi (B)mς (A) 1− mς (A) αi mςi (B)
B∩A=∅
X X B∩A=∅
X X i
αi mςi (B)mς (A) αi mςi (B)mς (A)
i B∩A=C i
=X X X = X h B∩A=C
X i
αi − αi mςi (B)mς (A) αi 1 − mςi (B)mς (A)
i i B∩A=∅ i B∩A=∅
P αi Ni (C) P
= i
P , since i αi = 1.
j αj ∆j

Proof of Theorem 17

By definition, ς ⊕ v(ς1 , ..., ςn ) =


nX X o n X X X o
=ς⊕ αi ςi , αi = 1 = ς ⊕ αi ςi : αi = 1, ∃ ς ⊕ αi ςi
n Xi i
X X i o i i
= ς⊕ α i ςi : αi = 1, αi ∆i 6= 0 .
i i i
7.9 Open questions 255

by Lemma 5. If {ς1 , ..., ςn } are all combinable n.s.f.s or belief functions, Theorem
15 applies and we can write
nX αi ∆i X X o
ς ⊕ v(ς1 , ..., ςn ) = βi · ς ⊕ ς i , β i = P : αi = 1, αi ∆i 6= 0 .
i j αj ∆j i i

Since ∆i = 0 implies βi = 0 we have that:


n X αi ∆i X X o
ς ⊕v(ς1 , ..., ςn ) = P ·ς ⊕ςi : αi = 1, αi ∆i 6= 0 ,
j:∆j 6=0 αj ∆j
i:∆i 6=0 i i:∆i 6=0
P
and being i βi = 1 all these points belong to v(ς ⊕ ςi , i : ∆i 6= 0).
For the vice-versa to be true we have to prove that, given a set of non-zero nor-
malizationPfactors {∆i1 , ..., ∆im }, for any collection of real numbers {β1 , ..., βm }
m
such that j=1 βj = 1 there exists another collection {α1 , ..., αn } with n ≥ m and
Pn Pm
i=1 αi = 1, j=1 αij ∆ij 6= 0 such that

αi ∆i
βj = P m j j ∀j = 1, ..., m. (7.23)
j=1 αij ∆ij

This means that


m
βj X βj
αij = · αij ∆ij ∝
∆ij j=1 ∆ij
.
for all j = 1, ..., m. Hence if we just take
P αij = βj /∆ij for j = 1, ..., m then
system (7.23) is satisfied, being βj = βj / k βk = βj , and
m
X m
X
αij ∆ij = βj = 1 6= 0
j=1 j=1

i.e. the combinability condition is also met.


To satisfy the normalization constraint, we just need to choose the other n − m
coefficients in such a way that
X X m
X
αi = 1 − αi = 1 − αij
i:∆i =0 i:∆i 6=0 j=1

which is always possible when n > m since they do not play any role in the other
two constraints. If instead n = m (i.e. when considering only combinable func-
tions), we have no choice but to normalize the coefficients αij = αj , obtaining in
conclusion
αj βj
αj0 = Pn = Pn βi , j = 1, ..., n.
α
i=1 i ∆j i=1 ( ∆i )
n
X βi
However, this is clearly impossible iff = 0, which is equivalent to:
i=1
∆ i
256 7 Geometry of Dempster’s rule
n−1 n
X ∆i − ∆n X
βi = 1, βi = 1, (7.24)
i=1
∆i i=1

which further reduces to:


n
X ∆j − ∆ n X
βj = 1, βi = 1.
∆j
j:∆j 6=∆n i=1

A set of basis solutions of system (7.24) is therefore given by


∆j X ∆j
βj = , βi = 0 i 6= j : ∆i 6= ∆n , βi = 1 −
∆j − ∆n ∆j − ∆n
i:∆i =∆n

for every j : ∆j 6= ∆n . Each basis solution corresponds to an affine subspace of


v(ς ⊕ ς1 , ..., ς ⊕ ςn ), namely:
   X X 
∆j ∆j
ς ⊕ ςj + 1 − β̂i ς ⊕ ςi β̂i = 1
∆ j − ∆n ∆j − ∆ n
i:∆i =∆n i:∆i =∆n

for every j : ∆j 6= ∆n . The above subspace can be also expressed as:

∆j ∆n
ς ⊕ ςj − v(ς ⊕ ςi |i : ∆i = ∆n )
∆j − ∆n ∆j − ∆n
 ∆ ∆n 
j
=v ς ⊕ ςj − ς ⊕ ςi i : ∆i = ∆n .

∆j − ∆n ∆ j − ∆n

Being the general solution of system (7.24) an arbitrary affine combination of the
P βiin v(ς ⊕ ς1 , ..., ς ⊕ ςn ) that correspond to “for-
basis ones, the region of the points
bidden” coordinates {βi } s.t. i ∆ i
= 0 is finally:
 ∆j ∆n 
v ς ⊕ ςj − ς ⊕ ςi ∀j : ∆j 6= ∆n , ∀i : ∆i = ∆n .

∆j − ∆n ∆j − ∆n

Proof of Theorem 19

Let us consider the following expression:


X X X
(−1)|A| plb (A)b ⊕ bA = (−1)|A| XB [plb (A) − plb (A \ B)].
∅(A⊂Cb ∅(A⊂Cb B⊂Θ
(7.25)
Since
X X
XB [plb (∅) − plb (∅ \ B)] = XB [plb (∅) − plb (∅)] = 0,
B⊂Θ B⊂Θ

expression (7.25) becomes


7.9 Open questions 257
X X X X
= (−1)|A| plb (A) XB − (−1)|A| XB plb (A \ B)
A⊂Cb B⊂Θ X A⊂Cb B⊂Θ
X X (7.26)
=1 (−1)|A| plb (A) − XB (−1)|A| plb (A \ B),
A⊂Cb B⊂Θ A⊂Cb

where 1 is the 2n − 2-dimensional vector whose entries are all equal to 1. Also
X X X X
(−1)|A| plb (A) = (−1)|A| (1−b(Ac )) = (−1)|A| − (−1)|A| b(Ac ),
A⊂Cb A⊂Cb A⊂Cb A⊂Cb
(7.27)
where
|Cb |  
X
|A|
X
k |Cs |−k Cb
(−1) = (−1) 1 =0
k
A⊂Cb |A|=k=0

for the Newton expression of the power (−1 + 1)|Cb | . Concerning the second adden-
dum of Equation (7.27), since
X X X
b(Ac ) = mb (B) = mb (B) = mb (B) = b(Cb \ A)
B⊂Ac ,B⊂Θ B⊂Ac ,B⊂Cb B⊂Cb \A

(for mb (B) = 0 for B 6⊂ Cb ), we have


X X X
− (−1)|A| b(Ac ) = − (−1)|A| b(Cb \ A) = − (−1)|Cb \B| b(B) =
A⊂Cb A⊂Cb B⊂Cb
.
= −mb (Cb ) by the Moebius inversion formula (2.3), having called B = Cb \ A.
As for the second part of Equation (7.26), we can note that for every X ⊂ B any
set of the form A = C + X, with C ⊂ Cb \ B yields the same difference

A \ B = C + X \ B = C.
.
Hence, if we fix C = A \ B and let X vary we get, for all B ⊂ Θ
X X X
(−1)|A| plb (A \ B) = plb (C) (−1)|C+X|
A⊂Cb C⊂Cb \B X⊂B∩Cb
|B∩Cb |  
X X |B ∩ Cb |
= plb (C) (−1)|C|+|X|
|X|
C⊂Cb \B |X|=k=0
|B∩Cb |  
X X |B ∩ Cb |
= (−1)|C| plb (C) (−1)k =0
k
C⊂Cb \B k=0

by Newton’s binomial again. In conclusion, Expression (7.25) becomes


X
(−1)|A| plb (A)b ⊕ bA = −1mb (Cb ),
A⊂Cb ,A6=∅

whose immediate consequence is Equation (7.14).


258 7 Geometry of Dempster’s rule

Proof of Theorem 21

By Equation (7.20) if k 6= 1
X X
k
v(b ⊕ HA (b)) = αB b ⊕ [kbA + (1 − k)bB ], αB = 1.
B⊂Cb ,B6=A B⊂Cb ,B6=A
(7.28)
.
Therefore Corollary 5 yields, after defining ∆kB = kplb (A) + (1 − k)plb (B),
X h kpl (A)
b (1 − k)plb (B) i
= αB b ⊕ b A + b ⊕ b B =
B⊂Cb ,B6=A
∆kB ∆kB
X αB kplb (A) X αB plb (B)(1 − k)
= b ⊕ bA k
+ b ⊕ bB
B⊂Cb ,B6=A
∆B B⊂Cb ,B6=A
∆kB
X αB kplb (A) X αB plb (B)(1 − k)
= b ⊕ bA k
+ b ⊕ bB .
B⊂C ,B6=A
∆B B⊂C ,B6=A
∆kB
b b

As the vectors {b ⊕ bB , B ⊂ Cb } are generators of hbi, we just have to find the


scalars αB such that their coefficients do not depend on k. After introducing the
0
notation αB = αB ∆kB we can write the following system of conditions:
 X
0

 kplb (A) αB ∈R

B⊂Cb ,B6=A


0
(1 −X k)plb (B)αB ∈R ∀ B ⊂ Cb , B 6= A
 0 k



 αB ∆B = 1.
B⊂Cb ,B6=A

0 xB
From the second and third equations we have that αB = 1−k ∀B ⊂ Cb , B 6= A, for
some xB ∈ R. By replacing this expression in the normalization constraint we get:
X
xB [kplb (A) + (1 − k)plb (B)]
B⊂Cb ,B6=A X X
= kplb (A) xB + (1 − k) plb (B) = 1 − k,
B⊂Cb ,B6=A B⊂Cb ,B6=A

so that the real vector [xB , B ⊂ Cb , B 6= A]0 has to be a solution of the system
 X

 xB plb (B) = 1;

B⊂CX
b ,B6=A



 xB = 0,
B⊂Cb ,B6=A

where the last equality ensures that


X
0 kplb (A) X
kplb (A) αB = xB = 0 ∈ R.
1−k
B B
7.9 Open questions 259

After subtracting the above equations we get the system


 X

 xB (plb (B) − 1) = 1;

B⊆CXb ,B6=A
(7.29)


 xB = 0,
B⊂Cb ,B6=A

since plb (Cb ) − 1 = 0. When A 6= Cb then xCb appears in the second equation only.
The admissible solutions of the system are then all the affine combinations of the
following basis solutions
1 1
xB̄ = , xCb = −xB̄ = , xB = 0 ∀B ⊆ Cb , B 6= A, B̄.
plb (B̄) − 1 1 − plb (B̄)

Each basis solution generates the following values of the coefficients {αB }:

xB ∆k xCb ∆kCb
αB̄ = 1−k ,
B
αCb = , αB = 0 ∀B ⊆ Cb , B 6= A, B̄,
1−k
in turn associated via Equation (7.28) to the point:
α αCb  α αC
b ⊕ bA kplb (A) B̄ + + b ⊕ bB̄ (1 − k) B̄ + b(1 − k) kb
∆kB̄ ∆kCb ∆kB̄ ∆Cb
kplb (A)
= b ⊕ bA (xB̄ + xCb ) + b ⊕ bB̄ xB̄ plb (B̄) + bxCb
1−k
plb (B̄)b ⊕ bB̄ b
= + .
plb (B̄) − 1 1 − plb (B̄)
T
The affine subspace generated by all these points then belongs to k∈[0,1) v(b ⊕
k 0
HA (b)), even if it is not guaranteed to exhaust the whole FA (b).

Proof of Theorem 22
1
We first need to compute the explicit form of v(b ⊕ HA (b)). After recalling that
1
HA (b) = v(bA + γB bB : B ⊂ Cb , B 6= A)

we can notice that for any B there exists a value of γB such that the point bA +γB bB
is combinable with b, i.e., ∆B = plb (A) + γB plb (B) − γB 6= 0. Since bA , bB and
bΘ are belief functions, Theorem 15 applies and we get:

plb (A)b ⊕ bA + γB plb (B)b ⊕ bB − γB b


b ⊕ [bA + γB bB − γB bΘ ] = ,
1 · plb (A) + γB plb (B) − γB
plb (A)
so that it suffices to ensure that γB 6= 1−plb (B) . Let us then choose a suitable value
plb (A)
to simplify this expression, for instance γB = − 1−pl b (B)
(note that plb (A) 6= 0 for
A ⊆ Cb ) . We get:
260 7 Geometry of Dempster’s rule
 
1 plb (B) 1
b ⊕ bA + b ⊕ bB +b .
2 plb (B) − 1 1 − plb (B)
For B = Cb , instead, ⊕[bA + γB bB − γB bΘ ] = b ⊕ bA , so that we can write
1
v(b ⊕ HA (b)) =
   
1 plb (B) 1
= v b ⊕ bA , b ⊕ bA + b ⊕ bB +b : B ⊆ Cb , B 6= A .
2 plb (B) − 1 1 − plb (B)
On the other side:
0 0
v(b ⊕ HA (b)) = v(b ⊕ HA (b)) = v(b ⊕ bB : B ⊂ Cb , B 6= A).

A sum function ς belonging to both subspaces must then meet the following pair of
constraints: X X
ς= αB b ⊕ bB , αB = 1,
B⊂Cb ,B6=A B
0
(ς ∈ v(b ⊕ HA (b))); and

1 X h plb (B) b i
ς = βCb b ⊕ bA + βB b ⊕ bA + b ⊕ bB +
2 plb (B) − 1 1 − plb (B)
B⊆Cb ,B6=A
 X βB 
= b ⊕ bA βCb + +
2
B⊆Cb ,B6=A
b X βB X βB plb (B)
+ + b ⊕ bB
2 1 − plb (B) 2(plb (B) − 1)
B⊆Cb ,B6=A B⊆Cb ,B6=A

1
P
(ς ∈ b ⊕ HA (b)), where B⊂Cb ,B6=A βB = 1. By comparison we get:

1
 X

 βCb + βB = 0

 2
B⊆C ,B6 =A

 b

 1 plb (B)
αB = βB , B ⊆ Cb , B 6= A (7.30)
 2 pl b (B) − 1
1 βB

 X

 αCb = 2 .


1 − plb (B)

B⊆Cb ,B6=A

2αB (plb (B)−1)


Therefore, as βB = plb (B) for B 6= A, Cb , the last equation becomes:
X 1 X αB
αCb + αB =0≡ = 0.
plb (B) plb (B)
B⊆Cb ,B6=A B⊂Cb ,B6=A

0 1
In conclusion, the points of the intersection v(b ⊕ HA (b)) ∩ v(b ⊕ HA (b)) are
0
associated with affine coordinates {αB } of v(b ⊕ HA (b)) satisfying the constraints:
X X 1
αB = 1, αB = 0.
plb (B)
B⊂Cb ,B6=A B⊂Cb ,B6=A
7.9 Open questions 261

To recover the actual shape of this subspace we just need to take their difference, to
obtain:
X plb (B) − 1 X
αB = 1, αCb = 1 − αB . (7.31)
plb (B)
B⊆Cb ,B6=A B⊆Cb ,B6=A

Note that the first constraint of Equation (7.30) can be written as


X
βB = 2
B⊆Cb ,B6=A

and is automatically satisfied by the last system’s solutions. If we choose for any
B̄ ⊆ Cb , B̄ 6= A the following basis solution of system (7.31):

plb (B̄)
αB̄ =




 plb (B̄) − 1
αB = 0 B ⊆ Cb , B 6= A, B̄

 1
 αCb =


1 − plb (B̄)
0 1
we get a set of generators ςB for v(b ⊕ HA (b)) ∩ v(b ⊕ HA (b)), with ςB given by
Equation (7.21).

Proof of Theorem 24

It suffices to show that each point


kplb (A)
b ⊕ [kbA + (1 − k)bB̄ ] = b ⊕ bA
kplb (A) + (1 − k)plb (B̄)
(1 − k)plb (B̄)
+b ⊕ bB̄ : B̄ ⊂ Cb , B̄ 6= A
kplb (A) + (1 − k)plb (B̄)
(7.32)
can be generated by the collection {b ⊕ kbA , ςB : B ⊆ Cb , B 6= A}. We have:
 
X kplb (A)b ⊕ bA (1 − k)b
βb ⊕ kbA + βB ς B = β + +
kplb (A) + (1 − k) kplb (A) + (1 − k)
B⊆Cb ,B6=A 
X 1 plb (B)
+ βB b+ b ⊕ bB =
1 − plb (B) plb (B) − 1
B⊆Cb ,B6=A
kplb (A)  (1 − k)
=β b ⊕ bA + b β+
kplb (A) + (1 − k) kplb (A) + (1 − k)
X 1  X plb (B)
+ βB + βB b ⊕ bB .
1 − plb (B) plb (B) − 1
B⊆Cb ,B6=A B⊆Cb ,B6=A

If we choose βB = 0 for B 6= B̄ we get

βkplb (A)b ⊕ bA  (1 − k)β βB̄  β plb (B̄)


+b + + B̄ b ⊕ bB̄ ,
kplb (A) + (1 − k) kplb (A) + (1 − k) 1 − plb (B̄) plb (B̄) − 1
262 7 Geometry of Dempster’s rule

so that, when

kplb (A) + (1 − k) (1 − k)(plb (B̄) − 1)


β= , βB̄ =
kplb (A) + (1 − k)plb (B) kplb (A) + (1 − k)plb (B̄)

(β + βB̄ = 1), the coefficient of b vanishes and we get the point (7.32).
Three equivalent models
8
Plausibility
X
plb : 2Θ → [0, 1], plb (A) = 1 − b(Ac ) = mb (B)
B∩A6=∅

and commonality functions


X
Qb : 2Θ → [0, 1], Qb (A) = mb (B)
B⊇A

are both equivalent representations of the evidence carried by a belief function. It is


therefore natural to wonder whether they share with belief functions the combinato-
rial form of sum function on the power set 2Θ .
In this Chapter we show that we can indeed represent the same evidence in terms
of a basic plausibility (commonality) assignment on the power set, and compute the
related plausibility (commonality) set function by integrating the basic assignment
over similar intervals. Proving that both plausibility and commonality functions
share with belief functions the structure of sum function amounts to introducing
two alternative combinatorial formulations of the theory of evidence.
Besides providing the overall mathematical structure of the theory of evidence with
a rather elegant symmetry, the notions of basic plausibility and commonality assign-
ments turn out to be useful in problems involving the combination of plausibility or
commonality functions. This is the case for the problem of transforming a belief
function into a probability distribution [896, 1231, 1358, 1335, 69, 197, 267], or
when computing the canonical decomposition of support functions [1265, 779].
We will see this in more detail in Part III.

263
264 8 Three equivalent models

The geometric approach to the theory of evidence introduced in Chapter 6 [244]


can naturally be extended to these alternative combinatorial models. Just as belief
functions can be seen as points of a simplex whose simplicial coordinates are pro-
vided by their Moebius inverse (the basic probability assignment), plausibility and
commonality functions possess a similar simplicial geometry, with their own Moe-
bius inverses playing again the role of simplicial coordinates.
The equivalence of the associated formulations of the ToE is geometrically mirrored
by the congruence of their simplices. In particular, the relation between upper and
lower probabilities (so important in subjective probability) can be geometrically ex-
pressed as a simple rigid transformation.

Chapter outline

First we introduce the notions of basic plausibility (Section 8.1) and commonal-
ity (8.2) assignments as the Moebius transforms of plausibility and commonality
functions, respectively.
Later we show that the geometric approach to uncertainty can be extended to
plausibility (Section 8.3) and commonality (Section 8.4) functions, in such a way
that the simplicial structure of the related spaces can be recovered as a function of
their Moebius transforms. We then show (Section 8.5) that the equivalence of the
proposed alternative formulations of the ToE is reflected by the congruence of the
corresponding simplices in the geometric framework. The point-wise geometry of
the triplet (b, plb , Qb ) in terms of the rigid transformation mapping them onto each
other, as a geometric nexus between the proposed models, is discussed in Section
8.6.
We summarise and comment these results in Section 8.7.

8.1 Basic plausibility assignment


Belief functions encode evidence by cumulating basic probabilities
P on intervals of
events {B ⊆ A}, yielding a collection of belief values b(A) = B⊆A m(B).
Let us define the Moebius inverse µb : 2Θ → R of a plausibility function plb :
. X
µb (A) = (−1)|A\B| plb (B) (8.1)
B⊆A

so that X
plb (A) = µb (B). (8.2)
B⊆A

It is natural to call the function µb : 2Θ → R defined by expression (8.1) basic plau-


sibility assignment (b.pl.a.). PL.F.s are then sum functions on 2Θ of the form (8.2),
whose Moebius inverse is the b.pl.a. (8.1). Basic probabilities and plausibilities are
obviously related.
8.1 Basic plausibility assignment 265

Theorem 26. Given a belief function b with basic probability assignment mb , the
corresponding basic plausibility assignment can be expressed in terms of mb as
follows:  X
 (−1)|A|+1 mb (C) A 6= ∅
µb (A) = C⊇A (8.3)
0 A = ∅.

As b.p.a.s do, basic plausibility assignments meet the normalization constraint.


In other words, pl.f.s are normalized sum functions [19]:
X X X X X
µb (A) = − (−1)|A| mb (C) = − mb (C) (−1)|A| = 1
A⊆Θ ∅(A⊆Θ C⊇A C⊆Θ ∅(A⊆C

since X
− (−1)|A| = −(0 − (−1)0 ) = 1
∅(A⊆C

by Newton’s binomial theorem:


n  
X n k n−k
p q = (p + q)n . (8.4)
k
k=0

However, unlike its counterpart mb , µb is not guaranteed to be non-negative.

8.1.1 Example of basic plausibility assignment

Let us consider as an example a belief function b on a binary frame Θ2 = {x, y}


with b.p.a.:
1 2
mb (x) = , mb (Θ) = .
3 3
Using Equation (8.1) we can compute its basic plausibility assignment as follows:
X
µb (x) = (−1)|x|+1 mb (C) = (−1)2 (mb (x) + mb (Θ)) = 1,
C⊇{x}
X
µb (y) = (−1)|y|+1 mb (C) = (−1)2 mb (Θ) = 2/3,
C⊇{y}
X
µb (Θ) = (−1)|Θ|+1 mb (C) = (−1)mb (Θ) = −2/3 < 0.
C⊇Θ

This confirms that b.pl.a.s meet the normalization constraint but not the non-
negativity one.

8.1.2 Relation between basic probability and plausibility assignments

Basic probability and plausibility assignments are linked by a rather elegant relation.
266 8 Three equivalent models

Theorem 27. Given a belief function b : 2Θ → [0, 1], for each element x ∈ Θ of
the frame of discernment the sum of basic plausibility assignments of all the events
cointaining x equals its basic probability assignment:
X
µb (A) = mb (x). (8.5)
A⊇{x}

Proof.
X X X  X X
µb (A) = (−1)|A|+1 mb (B) = mb (B) (−1)|A|
A⊇{x} A⊇{x} B⊇A B⊇{x} {x}⊆A⊆B
Pn
where, by Newton’s binomial 1n−k (−1)k = 0,
k=0

X 0 B 6= {x}
(−1)|A| =
−1 B = {x}.
{x}⊆A⊆B

8.2 Basic commonality assignment


It is straightforward to prove that commonality functions are also sum functions
and sport some interesting similarities with plausibility functions. Let us define the
Moebius inverse qb : 2Θ → R, B 7→ qb (B) of a commonality function Qb as:
X
qb (B) = (−1)|B\A| Qb (A). (8.6)
∅⊆A⊆B

It is natural to call the quantity (8.6) the basic commonality assignment (or b.comm.a.)
associated with a belief function b. To arrive at its explicit form we just need to re-
place the definition of Qb (A) into (8.6). We obtain:
X X 
qb (B) = (−1)|B\A| mb (C)
∅⊆A⊆B
X  C⊇A
X  X
|B\A|
= (−1) mb (C) + (−1)|B|−|∅| mb (C)
∅(A⊆B  C⊇A  C⊇∅
X X
= mb (C) (−1)|B\A| + (−1)|B| .
B∩C6=∅ ∅(A⊆B∩C

But now, since B \ A = B \ C + B ∩ C \ A, we have that:


X X
(−1)|B\A| = (−1)|B\C| (−1)|B∩C|−|A|
∅(A⊆(B∩C) h∅(A⊆(B∩C) i
= (−1)|B\C| (1 − 1)|B∩C| − (−1)|B∩C|−|∅| =

= (−1)|B|+1 . Therefore, the b.comm.a. qb (B) can be expressed as:


8.3 The geometry of plausibility functions 267
X  X 
qb (B) = (−1)|B|+1 mb (C) + (−1)|B| = (−1)|B| 1 − mb (C)
B∩C6=∅ B∩C6=∅
= (−1)|B| (1 − plb (B)) = (−1)|B| b(B c )
(8.7)
(note that qb (∅) = (−1)|∅| b(∅) = 1).

8.2.1 Properties of basic commonality assignments

Basic commonality assignments do not meet the normalization axiom, as


X
qb (B) = Qb (Θ) = mb (Θ).
∅⊆B⊆Θ

In other words, whereas belief functions are normalized sum functions (n.s.f.) with
non-negative Moebius inverse, and plausibility functions are normalized sum func-
tions, commonality functions are combinatorially unnormalized sum functions. Go-
ing back to the example of Section 8.1.1, the b.comm.a. associated with mb (x) =
1/3, mb (Θ) = 2/3 is (by Equation (8.7))

qb (∅) = (−1)|∅| b(Θ) = 1, qb (x) = (−1)|x| b(y) = −mb (y) = 0,


qb (Θ) = (−1)|Θ| b(∅) = 0, qb (y) = (−1)|y| b(x) = −mb (x) = −1/3

so that X
qb (B) = 1 − 1/3 = 2/3 = mb (Θ) = Qb (Θ).
∅⊆B⊆Θ

8.3 The geometry of plausibility functions


We have seen that the theory of evidence can be given alternative formulations in
terms of plausibility and commonality assignments. As a consequence, plausibil-
ity and commonality functions (just like belief functions [244, 267, 225]) can be
given simple but elegant geometric descriptions in terms of generalized triangles or
simplices.
THIS REDUNDANT? In the practical use of the theory of evidence, people
often consider
P unnormalized belief functions (u.b.f.s) [1238], i.e., belief functions
b(A) = ∅6=B⊆A mb (B) admitting non-zero support mb (∅) 6= 0 for the empty set
∅. The latter is an indicator of the amount of internal conflict of the evidence carried
by a belief function, or of the possibility that the current frame of discernment does
not exhaust all the possible outcomes of the problem.
Unnormalized belief functions are naturally associated with vectors with N = 2|Θ|
coordinates, as b(∅) cannot be neglected anymore. We can then extend the set of
categorical belief functions as follows

{bA ∈ RN , ∅ ⊆ A ⊆ Θ},
268 8 Three equivalent models
.
this time including a new vector b∅ = [1 0 · · · 0]0 . Note also that in this case
0
bΘ = [0 · · · 0 1] . The space of unnormalized b.f.s is again a simplex in RN , namely
B U = Cl(bA , ∅ ⊆ A ⊆ Θ).
Indeed, as it is the case for belief functions, plausibility functions are completely
specified by their N − 2 plausibility values {plb (A), ∅ ( A ( Θ} and can also be
represented as vectors of RN −2 . We can therefore associate a pair of belief b and
plausibility plb functions with the following vectors, which we still denote by b and
plb : X X
b= b(A)xA , plb = plb (A)xA , (8.8)
∅(A⊆Θ ∅(A(Θ

where {xA : ∅ ( A ( Θ} is, as usual, a reference frame in the Cartesian space


RN −2 (see Chapter 6, Section 6.1).

8.3.1 Plausibility assignment and simplicial coordinates


Now, as the categorical belief functions {bA : ∅ ( A ( Θ} also form a set of
independent vectors in RN −2 , the collections {xA } and {bA } represent two distinct
coordinate frames in the same Cartesian space. To understand where a plausibility
vector is located in the categorical reference frame {bA , ∅ ( A ( Θ} we need to
compute the coordinate change between such frames.
Lemma 8. The coordinate change between the two coordinate frames {xA : ∅ (
A ( Θ} and {bA : ∅ ( A ( Θ} is given by
X
xA = (−1)|B\A| bB . (8.9)
B⊇A

We can use Lemma 8 to find the coordinates of a plausibility function in the


categorical reference frame, by putting the corresponding vector plb (8.8) in the
form of Equation (6.8).
By replacing expression (8.9) for xA into Equation (8.8) we get:
X X X 
plb = plb (A)xA = plb (A) bB (−1)|B\A|
∅(A(Θ
X X ∅(A(Θ  B⊇AX (8.10)
|B−A|
= bB (−1) plb (A) = µb (A)bA ,
∅(B(Θ A⊆B ∅(A(Θ

where we used the definition (8.1) of basic plausibility assignment and we inverted
the role of A and B for sake of homogeneity of the notation.
Incidentally, as bΘ = [0, · · · , 0]0 = 0 is the origin of RN −2 , we can also write:
X
plb = µb (A)bA
∅(A⊆Θ

(including Θ). Analogously to what happens in Equation (6.8), the coordinates of


plb in the categorical reference frame are given by the values of its Moebius inverse,
the basic plausibility assignment.
8.3 The geometry of plausibility functions 269

8.3.2 Plausibility space

Let us call plausibility space the region PL of RN −2 whose points correspond to


admissible plausibility functions.

Theorem 28. The plausibility space PL is a simplex PL = Cl(plA , ∅ ( A ⊆ Θ)


whose vertices can be expressed in terms of the categorical belief functions (the
vertices of the belief space) as:
X
plA = − (−1)|B| bB . (8.11)
∅(B⊆A

Note that
plx = −(−1)|x| bx = bx ∀x ∈ Θ,
so that: B ∩ PL ⊃ P.
The vertices of the plausibility space have a natural interpretation.
Theorem 29. The vertex plA of the plausibility space is the plausibility vector as-
sociated with the categorical belief function bA : plA = plbA .
When considering the case of unnormalized belief functions, whose role is so
important in the Transferable Belief Model, it is easy to see that Theorems 26 and
29 fully retain their validity. In the case of Theorem 28, however, as in general
mb (∅) 6= 0, we need to modify Equation (8.18) by adding a term related to the
empty set. This yields:
X
plb = mb (C)plC + mb (∅)pl∅
∅(C⊆Θ

where plC , C 6= ∅ is still given by Equation (8.11), and pl∅ = 0 is the origin of
RN . Note that even in the case of unnormalized belief functions (Equation (8.11))
the empty set is not considered, for µ(∅) = 0.

8.3.3 Running example: the binary case

Figure 8.1 shows the geometry of belief and plausibility spaces in the familiar case
study of a binary frame Θ2 = {x, y}, where belief and plausibility vectors are points
of a plane R2 with coordinates

b = [b(x) = mb (x), b(y) = mb (y)]0


plb = [plb (x) = 1 − mb (y), plb (y) = 1 − mb (x)]0 ,

respectively. They form two simplices (in this special case, two triangles)

B = Cl(bΘ = [0, 0]0 = 0, bx , by ),


PL = Cl(plΘ = [1, 1]0 = 1, plx = bx , ply = by )
270 8 Three equivalent models

by =[0,1]'=ply plΘ=[1,1]'

PL

P
B pl b
1−m b(x)
P[b]

m b(y) b

bΘ=[0,0]' bx =[1,0]'=pl x
a(b,pl b) m b(x) 1−m b(y)

Fig. 8.1. Geometry of belief and plausibility spaces in the binary case. Belief B and plau-
sibility PL spaces are congruent and lie in symmetric locations with respect to the axis of
symmetry formed by the probability simplex P.

which are symmetric with respect to the probability simplex P (in this case a seg-
ment) and congruent, so that they can be moved onto each other by means of a rigid
transformation. In this simple case such transformation is just a reflection through
the Bayesian segment P.
From Figure 8.1 it is clear that each pair of belief/plausibility functions (b, plb )
determines a line a(b, plb ) which is orthogonal to P, on which they lay on symmetric
positions on the two sides of the Bayesian segment.

8.4 The geometry of commonality functions


In the case of commonality functions, as
X X X
Qb (∅) = mb (A) = mb (A) = 1, Qb (Θ) = mb (A) = mb (Θ)
A⊇∅ A⊆Θ A⊇Θ

each comm.f. Qb needs 2|Θ| = N coordinates to be represented. The geometric


counterpart of a commonality function is therefore the following vector of RN
X
Qb = Qb (A)xA ,
∅⊆A⊆Θ

where {xA : ∅ ⊆ A ⊆ Θ} is the extended reference frame introduced in the case of


unnormalized belief functions (A = Θ, ∅ this time included).
8.4 The geometry of commonality functions 271

Just as before we can use Lemma 8 to change the reference frame and get the
coordinates of Qb with respect to the base {bA , ∅ ⊆ A ⊆ Θ} formed by all the
categorical u.b.f.s. We get:
X  X 
|B\A|
Qb = Qb (A) bB (−1)
∅⊆A⊆Θ
X  X B⊇A  X
= bB (−1)|B\A| Qb (A) = qb (B)bB
∅⊆B⊆Θ A⊆B ∅⊆B⊆Θ

where qb is the basic commonality assignment (8.6).


Once again, we can use the explicit form (8.7) of a basic commonality assign-
ment to recover the shape of the space Q ⊂ RN of all the commonality functions.
We obtain:
X  X 
|B|
Qb = (−1) bB mb (A)
∅⊆B⊆Θ  ∅⊆A⊆B c 
X X X
|B|
= mb (A) (−1) bB = mb (A)QA ,
∅⊆A⊆Θ ∅⊆B⊆Ac ∅⊆A⊆Θ

where
. X
QA = (−1)|B| bB (8.12)
∅⊆B⊆Ac

is the A-th vertex of the commonality space. The latter is hence given by:

Q = Cl(QA , ∅ ⊆ A ⊆ Θ).

Again, QA is the commonality function associated with the categorical belief func-
tion bA , i.e.: X
QbA = qbA (B)bB .
∅⊆B⊆Θ

Indeed qbA (B) = (−1)|B| if B c ⊇ A (i.e., B ⊆ Ac ), while qbA (B) = 0 otherwise,


so that the two quantities coincide:
X
QbA = (−1)|B| bB = QA .
∅⊆B⊆Ac

8.4.1 Running example: the binary case

In the binary case the commonality space Q2 needs N = 22 = 4 coordinates to


be represented. Each commonality vector Qb = [Qb (∅), Qb (x), Qb (y), Qb (Θ)]0 is
such that:
272 8 Three equivalent models

Qb(Θ) QΘ = [1 1 1]'

Qb(y)

Q
Qy = [0 1 0]'

Qx = [1 0 0 ]' Q b(x)

Fig. 8.2. Commonality space in the binary case.

X
Qb (∅) = 1, Qb (x) = mb (A) = plb (x),
A⊇{x}
X
Qb (Θ) = mb (Θ), Qb (y) = mb (A) = plb (y).
A⊇{y}

The commonality space Q2 can then be drawn (if we neglect the coordinate Qb (∅)
which is constant ∀b) as in Figure 8.2.
The vertices of Q2 are, according to Equation (8.12):
X
Q∅ = (−1)|B| bB = b∅ + bΘ − bx − by
∅⊆B⊆Θ
1, 1, 1]0 + [0, 0, 0, 1]0 − [0, 1, 0, 1]0 − [0, 0, 1, 1]0 = [1, 0, 0, 0]0 = Qb∅ ,
= [1,X
Qx = (−1)|B| bB = b∅ − by = [1, 1, 1, 1]0 − [0, 0, 1, 1]0
∅⊆B⊆{y}
1, 0, 0]0 = Qbx ,
= [1,X
Qy = (−1)|B| bB = b∅ − bx = [1, 1, 1, 1]0 − [0, 1, 0, 1]0
∅⊆B⊆{x}
= [1, 0, 1, 0]0 = Qby .

8.5 Equivalence and congruence


Summarising, both plausibility and commonality functions can then be thought of
as sum functions on the partially ordered set 2Θ (even though whereas belief and
plausibility functions are normalized sum functions, comm.f.s are not). This in turn
allows to describe them as points of some simplices B, PL and Q in a Cartesian
space.
In fact, it turns out that the equivalence of such alternative models of the ToE is
geometrically mirrored by the congruence of the associated simplices.
8.5 Equivalence and congruence 273

8.5.1 Congruence of belief and plausibility spaces

We have seen that in the case of a binary frame of discernment, B and PL are con-
gruent, i.e. they can be superposed by means of a rigid transformation (see Section
8.3.3). Indeed the congruence of belief, plausibility and commonality spaces is a
general property.
Theorem 30. The corresponding 1-dim faces Cl(bA , bB ), Cl(plA , plB ) of belief
and plausibility spaces are congruent, namely

kplB − plA kp = kbA − bB kp


qP
. N
where kkp denotes the classical norm kvkp = p p
i=1 |vi | , p = 1, 2, ..., +∞.

Proof. This a direct consequence of the definition of plausibility function. Let us


denote by C, D two generic subsets of Θ. As plA (C) = 1 − bA (C c ) we have that
bA (C c ) = 1 − plA (C), which in turn implies:

bA (C c ) − bB (C c ) = 1 − plA (C) − 1 + plB (C) = plB (C) − plA (C).

Therefore, ∀p:
X X X
|plB (C) − plA (C)|p = |bA (C c ) − bB (C c )|p = |bA (D) − bB (D)|p .
C⊆Θ C⊆Θ D⊆Θ

Notice that the proof of Theorem 30 holds no matter whether the pair (∅, ∅c =
Θ) is considered or not (i.e., it is immaterial whether classical or unnormalised
belief functions are considered).
A straightforward consequence is that:
Corollary 8. B and PL are congruent; B U and PLU are congruent.
as their corresponding 1-dimensional faces have the same length. This is due
to the generalization of a well-known Euclid’s theorem, which states that triangles
whose sides are of the same length are congruent. It is worth to notice that, although
this holds for simplices (generalized triangles), the same is not true for polytopes in
general, i.e. convex closures of a number of vertices greater than n + 1 where n is
the dimension of the Cartesian space in which they are defined (think, for instance,
of a square and a rhombus both with sides of length 1).

8.5.2 Running example: the binary case

In the case of unnormalized belief functions belief, plausibility and commonality


spaces all have N = 2|Θ| vertices and dimension N − 1:

B U = Cl(bA , ∅ ⊆ A ⊆ Θ), PLU = Cl(plA , ∅ ⊆ A ⊆ Θ),

QU = Cl(QA , ∅ ⊆ A ⊆ Θ).
274 8 Three equivalent models

For a frame Θ2 = {x, y} of cardinality 2 they form three-dimensional simplices


embedded in a four-dimensional Cartesian space:
B = Cl(b∅ = [1, 1, 1, 1]0 , bx = [0, 1, 0, 1]0 , by = [0, 0, 1, 1]0 , bΘ = [0, 0, 0, 1]0 );
PL = Cl(pl∅ = [0, 0, 0, 0]0 , plx = [0, 1, 0, 1]0 , ply = [0, 0, 1, 1]0 , plΘ = [0, 1, 1, 1]0 );
Q = Cl(Q∅ = [1, 0, 0, 0]0 , Qx = [1, 1, 0, 0]0 , Qy = [1, 0, 1, 0]0 , QΘ = [1, 1, 1, 1]0 ).
(8.13)
We know from Section 8.3.3 that PL2 and B2 are congruent.
By Equation (8.13) it follows that:

kb∅ − bx k2 = k[1, 0, 1, 0]0 k2 = 2 = k[0, 1, 0, 1]0 k2 = kplx − pl∅ k2 ,
kby − bΘ k2 = k[0, 0, 1, 0]0 k2 = 1 = k[0, 1, 0, 0]0 k2 = kplΘ − ply k2

etcetera, and as B2U and PLU


2 are simplices they are also congruent.

8.5.3 Congruence of plausibility and commonality spaces

A similar result holds for plausibility and commonality spaces.


We first need to point out the relationship between the vertices of plausibility
and commonality spaces in the unnormalized case, as:
X
plA = − (−1)|B| bB ,
∅(B⊆A

while:
X X
QA = (−1)|B| bB = (−1)|B| bB + b∅ = −plAc + b∅ . (8.14)
∅⊆B⊆Ac ∅(B⊆Ac

Theorem 31. The 1-dimensional faces Cl(QB , QA ) and Cl(plB c , plAc ) of the
commonality and the plausibility space, respectively, are congruent. Namely:
kQB − QA kp = kplB c − plAc kp .
Proof. Since QA = bΘ − plAc then
QA − QB = b∅ − plAc − b∅ + plB c = plB c − plAc
and the two faces are trivially congruent.
Therefore, the following map between vertices of PLU and QU
QA 7→ plAc (8.15)
maps 1-dimensional faces of the commonality space to congruent faces of the plau-
sibility space Cl(QA , QB ) 7→ Cl(plAc , plB c ). Therefore, the two simplices are
congruent.
However, (8.15) clearly acts as a 1-1 application of unnormalized categorical
commonality and plausibility functions (as the complement of ∅ is Θ, so that QΘ 7→
pl∅ ). Therefore we can only claim that:
8.6 Point-wise rigid transformation 275

Corollary 9. QU and PLU are congruent.


Of course, in virtue of Corollary 1, we also have that:
Corollary 10. QU and B U are congruent.

8.5.4 Running example: congruence of Q2 and PL2

Let us get back to the binary example: Θ2 = {x, y}. It is easy to see from Figures
8.1 and 8.2 that PL2 and Q2 are not congruent in the case of√normalized belief
functions, as Q2 is an equilateral triangle with sides of length 2, while PL2 has
two sides of length 1.
In the unnormalized case instead, recalling Equation (8.13), we have:

QΘ − Q∅ = [0, 1, 1, 1]0 , plΘ − pl∅ = [0, 1, 1, 1]0


Qx − Qy = [0, 1, −1, 0] , plx − ply = [0, 1, −1, 0]0
0
(8.16)
Qx − QΘ = [0, 0, −1, −1]0 , pl∅ − ply = [0, 0, −1, −1]0
U
etcetera, confirming that QU
2 and PL2 are indeed congruent.

8.6 Point-wise rigid transformation


Belief, plausibility and commonality functions form simplices which can be moved
onto each other by means of a rigid transformation, as a reflection of the equivalence
of the associated models.
Let us also analyse the geometric behavior of single functions, i.e., of each triplet
of related non-additive measures (b, plb , Qb ). In the binary case (Section 8.3.3) the
point-wise geometry of a plausibility vector can be described in terms of a reflection
with respect to the probability simplex P. In the general case, as the simplices B U ,
PLU , and QU are all congruent, there must exist an Euclidean transformation τ ∈
E(N ) mapping each simplex onto one of the others.

8.6.1 Belief and plausibility spaces

In the case of belief and plausibility spaces (in the standard, normalized case) the
rigid transformation
P is obviously encoded by Equation (3.13): plb (A) = 1 − b(Ac ).
Since plb = ∅(A⊆Θ plb (A)xA , Equation (3.13) implies the following relation:

plb = 1 − bc ,

where bc is the unique belief function whose belief values are the same as b’s on the
complement of each event A: bc (A) = b(Ac ).
As in the normalized case 1 = plΘ and 0 = bΘ , the above relation reads as:

plb = 1 − bc = 0 + 1 − bc = bΘ + plΘ − bc .
276 8 Three equivalent models

As a consequence, the segments Cl(bΘ , plΘ ) and Cl(bc , plb ) have the same center
of mass, for
plb + bc bΘ + plΘ
= .
2 2
In other words:
Theorem 32. The plausibility vector plb associated with a belief function b is the
reflection in RN −2 through the segment Cl(bΘ , plΘ ) = Cl(0, 1) of the “comple-
ment” belief function bc .
Geometrically, bc is obtained from b by means of another reflection (by swap-
ping the coordinates associated with the reference axes xA and xAc ), so that the
desired rigid transformation is completely determined.
Figure 8.3 illustrates the nature of the transformation, and its instantiation in the
binary case for normalized belief functions.
In the case of unnormalized belief functions (b∅ = 1, pl∅ = 0) we have

plb = pl∅ + b∅ − bc ,

i.e., plb is the reflection of bc through the segment Cl(b∅ , pl∅ ) = Cl(0, 1).

by =[0,1]'=ply plΘ=[1,1]'


b
c P
b(x)
pl b

b(y)
b
bΘ=[0,0]' bx =[1,0]'=pl x
b(y) b(x)

Fig. 8.3. The pointwise rigid transformation mapping b onto plb in the normalized case. In
the binary case the middle point of the segment Cl(0, 1) is the mean probability P.

8.6.2 Commonality and plausibility spaces

The form of the desired point-wise transformation is also quite simple in the case of
the pair (PLU , QU ). We can indeed use Equation (8.14), getting:
8.7 Comments 277
X X X
Qb = mb (A)QA = mb (A)(b∅ − plAc ) = b∅ − mb (A)plAc
∅⊆A⊆Θ ∅⊆A⊆Θ ∅⊆A⊆Θ
= b∅ − plbmc ,
c
where bm is the unique belief function whose b.p.a. is mbmc (A) = mb (Ac ). But
then, since pl∅ = 0 = [0, · · · , 0]0 for unnormalized belief functions (remember the
binary example), we can rewrite the above equation as:
Qb = pl∅ + b∅ − plbmc .
In conclusion:
Theorem 33. The commonality vector associated with a belief function b is the
reflection in RN through the segment Cl(pl∅ , b∅ ) = Cl(0, 1) of the plausibility
c
vector plbmc associated with the belief function bm .
c
In this case, however, bm is obtained from b by swapping the coordinates with
respect to the base {bA , ∅ ⊆ A ⊆ Θ}. A pictorial representation for the binary case
(similar to Figure 8.3) is more difficult in this case as R4 is involved.
It is natural to stress the analogy between the two rigid transformations
τBU PLU : B U → PLU , τPLU QU : PLU → QU
mapping an unnormalized belief function onto the corresponding plausibility, and
an unnormalized pl.f. onto the corresponding commonality function, respectively:
b(A)7→b(Ac ) ref l. through Cl(0,1)
τBU PLU : b −→ bc −→ plb
c
mb (A)7→mb (A ) mc ref l. through Cl(0,1)
τPLU QU : plb −→ b −→ Qb .
They are both the composition of two reflections: a swap of the axes of the coor-
dinate frame {xA , A ⊂ Θ} ({bA , A ⊂ Θ}) induced by set-theoretic complement,
plus a reflection with respect to the center of the segment Cl(0, 1).

8.7 Comments
Although they are equivalent mathematical representations of the same evidence,
belief, plausibility, and commonality functions form a hierarchy of sum functions
meeting increasingly more constraints. Indeed, their Moebius transforms meet both
normalization and positivity constraints in the case of basic probability assignments,
just the normalization constraint for basic plausibility assignments, or none of them
(in the case of basic commonality assignments). This is summarised in Table 8.1.
Nevertheless, all these functions possess a similar simplicial geometry, which is
reflected in the congruence of the associated spaces and by the point-wise geometry
of the triplet (b, plb , Qb ).
We will see in Part ?? that the alternative models introduced here, and in par-
ticular the notion of basic plausibility assignment, can be put to good uise in the
probability transformation problem.
278 8 Three equivalent models

quantity Moebius transform combinatorial properties

belief function b.p.a. non-negative n.s.f.

plausibility function b.pl.a. n.s.f.

commonality function b.comm.a. sum function

Table 8.1. Combinatorial properties of the Moebius transforms of belief, plausibility and
commonality functions.

Appendix: proofs
Proof of Theorem 26

The definition (3.13) of plausibility function yields:


X X
µb (A) = (−1)|A\B| plb (B) = (−1)|A\B| (1 − b(B c ))
B⊆A
X X B⊆A
= (−1)|A\B| − (−1)|A\B| b(B c )
B⊆A X B⊆A X
|A\B|
= 0− (−1) c
b(B ) = − (−1)|A\B| b(B c ),
B⊆A B⊆A

since by Newton’s binomial theorem B⊆A (−1)|A\B| = 0 if A 6= ∅, (−1)|A|


P
otherwise.
If B ⊆ A then B c ⊇ Ac , so that the above expression becomes:
X  X 
µb (A) = − (−1)|A\B| mb (C)
∅(B⊆A C⊆B c
X  X 
=− mb (C) (−1)|A\B| (8.17)
C⊆Θ c ⊇C
X  B:B⊆A,B
X 
=− mb (C) (−1)|A\B| ,
C⊆Θ B⊆A∩C c

for B c ⊇ C, B ⊆ A is equivalent to B ⊆ C c , B ⊆ A ≡ B ⊆ (A ∩ C c ).
Let us now analyze the following function of C:
. X
f (C) = (−1)|A\B| .
B⊆A∩C c

If A ∩ C c = ∅ then B = ∅ and the sum is equal to f (C) = (−1)|A| . If A ∩ C c 6= ∅,


.
instead, we can write D = C c ∩ A and obtain
8.7 Comments 279
X X
f (C) = (−1)|A\B| = (−1)|A\D|+|D\B| ,
B⊆D B⊆D

since B ⊆ D ⊆ A and |A| − |B| = |A| − |D| + |D| − |B|. But then
X
f (C) = (−1)|A|−|D| (−1)|D|−|B| = 0,
B⊆D

|D|−|B|
P
given that B⊆D (−1) = 0 by Newton’s binomial formula again.
In conclusion, f (C) = 0 if C c ∩ A 6= ∅, f (C) = (−1)|A| if C c ∩ A = ∅. We
can then rewrite (8.17) as:
X X X
− mb (C)f (C) = − mb (C) · 0 − mb (C) · (−1)|A| =
C⊆Θ c C:C c ∩A=∅
X∩A6=∅
C:C X
|A|+1 |A|+1
= (−1) mb (C) = (−1) mb (C).
C:C c ∩A=∅ C⊇A

Proof of Lemma 8
We first need to recall that a categorical belief function can be expressed as
X
bA = xC .
C⊇A

The entry of the vector bA associated with the event ∅ ( B ( Θ is by definition:



1B⊇A
bA (B) =
0 B 6⊃ A.

P xC (B) = 1 iff B = C, 0 otherwise, the corresponding entry of the vector


As
C⊇A xC is also: 
X 1B⊇A
xC (B) =
0 B 6⊃ A.
C⊇A

Therefore, if (8.9) is true we have that:


X X X X  X 
bA = xC = bB (−1)|B\C| = bB (−1)|B\C| .
C⊇A C⊇A B⊇C B⊇A A⊆C⊆B

Let us then consider the factor


X
(−1)|B\C| .
A⊆C⊆B

When A = B, C = A = B and the coefficient becomes 1. On the other hand, when


B 6= A we have that:
X X
(−1)|B\C| = (−1)D = 0
A⊆C⊆B D⊆B\A
Pn
by Newton’s binomial ( k=0 1n−k (−1)k = [1 + (−1)]n = 0). Hence bA = bA .
280 8 Three equivalent models

Proof of Theorem 28

We just need to rewrite expression (8.10) as a convex combination of points.


We get (by Equation (8.3)):
X X X 
plb = µb (A)bA = (−1)|A|+1 mb (C) bA
∅(A⊆Θ ∅(A⊆Θ
  C⊇A
X X
|A|+1
= (−1) bA mb (C) (8.18)
∅(A⊆Θ
X  X C⊇A  X
= mb (C) (−1)|A|+1 bA = mb (C)plC .
∅(C⊆Θ ∅(A⊆C ∅(C⊆Θ

The latter is indeed a convex combination, since basic probability assignments are
non-negative (but mb (∅ = 0)) and have unitary sum. It follows that:

PL = {pl
 b , b ∈ B} 
 X X 
= mb (C)plC , mb (C) = 1, mb (C) ≥ 0 ∀C ⊆ Θ
 
∅(C⊆Θ C
= Cl(plA , ∅ ( A ⊆ Θ),

after swapping C with A to keep the notation consistent.

Proof of Theorem 29

Expression (8.11) is equivalent to:


X
plA (C) = − (−1)|B| bB (C) ∀C ⊆ Θ.
∅(B⊆A

But since bB (C) = 1 if C ⊇ B, 0 otherwise, we have that:


X X
plA (C) = − (−1)|B| = − (−1)|B| .
B⊆A,B⊆C,B6=∅ ∅(B⊆A∩C

Now, if A ∩ C = ∅ then there are no addenda in the above sum, which nullifies.
Otherwise, by Newton’s binomial formula (8.4), we have:
n o
plA (C) = − [1 + (−1)]|A∩C| − (−1)0 = 1.

On the other side, by definition of plausibility function:



X 1 A∩C = 6 ∅
plbA (C) = mbA (B) =
0 A ∩ C = ∅,
B∩C6=∅

and the two quantities coincide.


The geometry of possibility
9
Among the manifold approaches to uncertainty theory, possibility theory [411] (see
Chapter 5, Section 5.5.1) has a number of attractive features. ADD DETAILS
As we have learned, the theory of evidence, at least in the case of finite domains,
includes both possibility theory and the theory of fuzzy sets as special cases.
In particular, it is well known that necessity measures, i.e., measures of the form
N ec(A) = 1 − P os(Ac ), A ⊆ Θ where P os is a possibility measure, have as
counterparts in the theory of evidence consonant b.f.s [?, 682, 414, 61], i.e., belief
functions whose focal elements are nested (Proposition 17). On the wider study
of the relationship between belief functions, fuzzy sets and possibility measures
we refer the reader to Chapter 5. The study of the geometry of consonant belief
functions amounts therefore to an investigation of the geometry of possibility theory.
In this Chapter we then move forward to analyse the convex geometry of con-
sonant belief functions (co.b.f.s), as a step towards a unified geometric picture of a
wider class of uncertainty measures.
In the first part we show that consonant b.f.s are in correspondence with chains of
subsets of their domain, and are hence located in a collection of convex regions of
the belief space which has the form of a simplicial complex, i.e. a structured collec-
tion of simplices. This approach, on one side, provides a useful visualization tool
which can be used to stimulate conjectures on the properties of the entities of in-
terest. On the other side, it generates new problems and allows to look at known
problems from a different perspective.

281
282 9 The geometry of possibility

Chapter outline

We first recall in Section 9.1 the relationship between consonant belief functions and
necessity measures. We then move on to study the geometry of the space of conso-
nant belief functions, or consonant subspace CO (Section 9.2). After observing the
correspondence between co.b.f.s and maximal chains of events, we look for useful
insights by studying the case of ternary frames, which leads us to prove that the con-
sonant subspace has the form of a simplicial complex [441], a structured collection
of simplices. In Section 9.3 we investigate the convex geometry of the components
of CO in more detail, proving that they are all congruent to each other, and can be
decomposed into faces which are right triangles.
In the second half of the Chapter we introduce the notion of consistent belief
function as the natural generalization in the context of belief theory of consistent
knowledge bases in classical logic (Section 9.4). In Section 9.5, following the intu-
ition provided by the simple case of binary frames, we prove that the set of consistent
b.f.s (just like consonant b.f.s do) form a simplicial complex in the space of all belief
functions, and that the maximal simplices of such a complex are all congruent with
each other. Finally, in Section 9.6 we show that each belief function can be decom-
posed into a number of consistent components living the consistent complex, rather
closely related to the pignistic transformation [1424, 1276].
To improve the readability of the paper several major proofs are collected in an
Appendix.

9.1 Consonant belief functions as necessity measures


Recall that a belief function is said to be “consonant” if its focal elements are nested.
The following conditions are equivalent [1149]:
1. b is consonant;
2. b(A ∩ B) = min(b(A), b(B)) for every A, B ⊂ Θ;
3. plb (A ∪ B) = max(plb (A), plb (B)) for every A, B ⊂ Θ;
4. plb (A) = maxθ∈A plb ({θ}) for all non-empty A ⊂ Θ;
5. there exists a positive integer n and simple support functions b1 , ..., bn such that
b = b1 ⊕ · · · ⊕ bn and the focus of bi is contained in the focus of bj whenever
i < j.
Consonant belief functions represents bodies of evidence all pointing towards the
same direction. However, their constituent pieces of evidence do not need to be
completely nested for the belief function resulting from their aggregation to be con-
sonant, as the next Proposition states.

Proposition 39. Suppose b1 , ..., bn are non-vacuous simple support functions with
foci Cb1 , ..., Cbn respectively, and b = b1 ⊕ · · · ⊕ bn is consonant. If C denotes the
core of b, then the sets Cbi ∩ C are nested.
9.2 The consonant subspace 283

By condition 2 above it follows that:

0 = b(∅) = b(A ∩ Ā) = min(b(A), b(Ā)),

i.e., either b(A) = 0 or b(Ā) = 0 for every A ⊆ Θ. This result and Proposition
13 explain why we said in Chapter 2 that consonant and quasi-support functions
represent opposite sides of the class of belief functions.
As we recalled in Chapter 5, Section 5.5.1:

Definition 77. A possibility measure on a domain Θ is a function P os : 2Θ → [0, 1]


such that P os(∅) = 0, P os(Θ) = 1 and
[ 
P os Ai = sup P os(Ai )
i
i

for any family {Ai |Ai ∈ 2Θ , i ∈ I} where I is an arbitrary set index.

Each possibility measure is uniquely characterized by a membership function π :


.
Θ → [0, 1] s.t. pi(x) = P os({x}) via the formula P os(A) = supx∈A π(x).
The restriction of the plausibility function to singletons pl ¯ (x) = plb ({x}) is called
b
contour function, or sometimes plausibility assignment pl ¯ [679]. From Condition
b
4 it follows immediately that the plausibility function plb associated with a belief
function b on a domain Θ is a possibility measure iff b is consonant, with the plau-
sibility assignment playing the role of the membership function: π = pl ¯ .
b
Possibility theory (at least in the finite case) is then embedded in the ToE. There-
fore, studying the geometry of consonant belief functions amounts to study the ge-
ometry of possibility.

9.2 The consonant subspace

To gather intuition on the geometry of consonant belief functions, we will start as


usual from the familiar running example of belief functions defined on a binary
frame Θ2 = {x, y}, in continuation with the example of Chapter 6, Section 6.2.1
(see Figure 9.1).
We know that the region P2 of all Bayesian b.f.s on Θ2 is in this case the diago-
nal line segment Cl(bx , by ). On the other side, simple support functions focused on
{x} lie on the horizontal segment Cl(bΘ , bx ), while simple support b.f. focused on
{y} form the vertical segment Cl(bΘ , by ).
On Θ2 = {x, y} consonant belief functions can have as chain of focal elements
one between {{x}, Θ2 } and {{y}, Θ2 }. As a consequence, all co.b.f.s on Θ2 are
simple support functions, and their region CO2 is the union of two segments

CO2 = S2 = COx ∪ COy = Cl(bΘ , bx ) ∪ Cl(bΘ , by ).


284 9 The geometry of possibility

by =[0,1]'

P2
COy B2

b
mb(y)

COx mb(x)
bΘ=[0,0]' bx =[1,0]'

Fig. 9.1. The belief space B for a binary frame is a triangle in R2 whose vertices are the basis
belief functions focused on {x}, {y} and Θ, (bx , by , bΘ ) respectively. The probability region
is the segment Cl(bx , by ), while consonant and consistent belief functions are constrained to
belong to the union of the two segments CS x = COx = Cl(bΘ , bx ) and CS y = COy =
Cl(bΘ , by ).

9.2.1 Chains of subsets as consonant belief functions


In general terms, while arbitrary belief functions do not admit restrictions on their
list of focal elements, consonant b.f.s are characterized by the fact that their focal
elements can be rearranged into a totally ordered set by set inclusion.
The power set 2Θ of a frame of discernment is a partially ordered set with respect
to the set-theoretic inclusion: ⊆ meets the three properties of reflexivity (whenever
A ⊆ Θ, A ⊆ A), antisymmetry (A ⊆ B and B ⊆ A implies A = B), and
transitivity (A ⊆ B and B ⊆ C implies A ⊆ C). A “chain” of a poset is a collection
of pairwise comparable elements (a totally ordered set).
All the possible lists of focal elements associated with consonant belief functions
correspond therefore to all the possible chains of subsets A1 ⊆ ... ⊆ Am in the
partially ordered set (2Θ , ⊆).
Now, Theorem 11 implies that the b.f.s whose focal elements belong to a chain
C = {A1 , · · · , Am } form in the belief space the simplex Cl(bA1 , ..., bAm ) (re-
member that the vectors {bA , A ⊂ Θ} representing categorical belief functions are
affinely independent in the embedding Cartesian space RN ).
.
Let us denote by n = |Θ| the cardinality of the frame of discernment Θ. Since each
chain in (2Θ , ⊆) is a subset of a maximal one (a chain including subsets of any
size from 1 to n) the region of co.b.f.s turns out to be the union of a collection of
simplices, each of them associated with a maximal chain C:
[
CO = Cl(bA1 , · · · , bAn ).
C={A1 ⊂···⊂An }
9.2 The consonant subspace 285

The number of such maximal simplices in CO is equal to the number of maximal


chains in (2Θ , ⊆), i.e.,
n  
Y k
= n!,
1
k=1
since given a size-k set we can build a new set containing it by just choosing
one of the remaining elements. Since the length of a maximal chain is |Θ| = n
and the vectors {bA , A ⊂ Θ} are affinely independent, the dimension of the
vector spaces generated by these convex components is the same and equal to
dim Cl(bA1 , ..., bAn ) = n − 1.
Each categorical belief function bA obviously belongs to several distinct com-
ponents. In particular, if |A| = k the total number of maximal chains containing A
is (n − k)!k! – indeed, in the power set of A the number of maximal chains is k!,
while to form a chain from A to Θ we just need to add an element of Ac = Θ \ A
(whose size is n − k) at each step. The integer (n − k)!k! is then also the number of
maximal simplices of CO containing bA .
In particular, each vertex bx of the probability simplex P (for which |{x}| = k = 1)
belongs to a sheaf of (n − 1)! convex components of the consonant subspace. An
obvious remark is that CO is connected: each maximal convex component is ob-
viously connected, and each pair of such components has at least bΘ as common
intersection.

9.2.2 Ternary case

Let us consider, as a more significant illustrative example, the case of a frame of


size 3: Θ = {x, y, z}.
Belief functions b ∈ B3 can be written as 6-dimensional vectors:
b = [b(x), b(y), b(z), b({x, y}), b({x, z}), b({y, z})]0 .
All the maximal chains of 2Θ are listed below:
{x} ⊂ {x, z} ⊂ Θ, {y} ⊂ {x, y} ⊂ Θ, {z} ⊂ {y, z} ⊂ Θ
{x} ⊂ {x, y} ⊂ Θ, {y} ⊂ {y, z} ⊂ Θ, {z} ⊂ {x, z} ⊂ Θ.
Each element of Θ is then associated with two chains, and the total number of
maximal convex components, whose dimension is |Θ| − 1 = 2, is 3! = 6:
Cl(bx , b{x,z} , bΘ ), Cl(by , b{x,y} , bΘ ), Cl(bz , b{y,z} , bΘ ),
Cl(bx , b{x,y} , bΘ ), Cl(by , b{y,z} , bΘ ), Cl(bz , b{x,z} , bΘ ).
Each 2-dimensional maximal simplex (for instance Cl(bx , b{x,z} , bΘ )) has an inter-
section of dimension |Θ| − 2 = 1 (in the example, Cl(b{x,z} , bΘ )) with a single
other maximal component (Cl(bz , b{x,z} , bΘ )), associated with a different element
of Θ.
In conclusion the geometry of the ternary frame can be represented as in Fig-
ure 9.2, where the belief space B3 = Cl(bx , by , bz , b{x,y} , b{x,z} , b{y,z} , bΘ ) is 6-
dimensional, its probabilistic face is a 2-dimensional simplex P3 = Cl(bx , by , bz ),
286 9 The geometry of possibility

and the consonant subspace CO3 , also part of the boundary of B3 , is given by the
union of the maximal simplices listed above.

b {x,z} bz

b{y,z}
P3
bx

by

b{x,y}

CO3

Fig. 9.2. The simplicial complex CO3 of all the consonant belief functions for a ternary
frame Θ3 . The complex is composed by n! = 3! = 6 maximal simplicial components of
dimension n − 1 = 2, each vertex of P3 being shared by (n − 1)! = 2! = 2 of them. The
region is connected, and is part of the boundary ∂B3 of the belief space B3 .

9.2.3 Consonant subspace as simplicial complex

These properties of CO can be summarized by means of another concept of convex


geometry, which generalizes that of simplex [441].
Definition 78. A simplicial complex is a collection Σ of simplices of arbitrary di-
mensions possessing the following properties:
1. if a simplex belongs to Σ, then all its faces of any dimension belong to Σ;
2. the intersection of two simplices in the complex is a face of both.
Let us consider, for instance, the case of two triangles (2-dimensional simplices)
in R2 . Roughly speaking, condition 2. asks for the intersection of the two triangles
not to contain points of their interiors (Figure 9.3, left). The intersection cannot just
be any subset of their borders either (middle), but has to be a face (right, in this
case a single vertex). Note that if two simplices intersect in a face τ , they obviously
intersect in every face of τ .
9.3 Properties of the consonant subspace 287

Fig. 9.3. Intersection of simplices in a complex. Only the right-hand pair of triangles meets
condition 2. of the definition of simplicial complex (Definition 78).

Theorem 34. The consonant subspace CO is a simplicial complex included in the


belief space B.
Proof. Property 1. of Definition 78 is trivially satisfied. As a matter of fact, if a
simplex Cl(bA1 , ..., bAn ) corresponds to a chain A1 ⊆ ... ⊆ An in the poset (2Θ , ⊆
), each face of this simplex is in correspondence with a subchain in 2Θ , and therefore
to a simplex of consonant belief functions. As for property 2., let us consider the
intersection of two arbitrary simplices in the complex:

Cl(bA1 , ..., bAn1 ) ∩ Cl(bB1 , ..., bBn2 )

associated with chains A = {A1 , ..., An1 } and B = {B1 , ..., Bn2 }, respectively.
As the vectors {bA , ∅ ( A ( Θ} are linearly independent in RN −2 , no linear
combination of the vectors bBi ’s can yield an element of span(bA1 , ..., bAn1 ), unless
some of those vectors coincide. The desired intersection is therefore:

Cl(bCi1 , ..., bCik ) (9.1)

where
{Cij , j = 1, ..., k} = C = A ∩ B,
with k < n1 , n2 . But then C is a subchain of both A and B, so that (9.1) is a face of
both Cl(bA1 , ..., bAn1 ) and Cl(bB1 , ..., bBn2 ). 
As Figure 9.2 shows, the probability simplex P and the maximal simplices of
CO have the same dimension, and are both part of the boundary ∂B of the belief
space.

9.3 Properties of the consonant subspace


More can be said about the geometry of the consonant subspace, and in particular
on the features of its consituent maximal simplices.
288 9 The geometry of possibility

9.3.1 Congruence of the convex components of CO


Indeed, as the binary case study suggests, all maximal simplices of the conso-
nant complex are congruent, i.e., they can be mapped onto each other by means
of rigid transformations. In the binary case, for instance, the two components
COx = Cl(bΘ , bx ) and COy = Cl(bΘ , by ) are segments of the same (Euclidean)
length, namely:
kCOx k = kbx − bΘ k = kbx k = k[1, 0]0 k = 1 = kby − bΘ k = kCOy k
(see Figure 9.1 again).
We can get an intuition on how to prove that this is true in the general case by
studying the more significant ternary case. From Section 9.2.2:
Cl(bx , b{x,y} , bΘ )

Cl(bx , bΘ ) kbx − bΘ k = kbx k = k[1 0 0 1 1 0]0 k = 3
0
Cl(b{x,y} , bΘ ) ↔ kb{x,y} − bΘ k = kb{x,y} k = k[0 0 0 1 0 0]
√k=1
Cl(bx , b{x,y} ) 0
kbx − b{x,y} k = k[1 0 0 0 1 0] k = 2
Cl(bx , b{x,z} , bΘ )

Cl(bx , bΘ ) kbx − bΘ k = kbx k = k[1 0 0 1 1 0]0 k = 3
0
Cl(b{x,z} , bΘ ) ↔ kb{x,z} − bΘ k = kb{x,z} k = k[0 0 0 0 1 0]
√k = 1
Cl(bx , b{x,z} ) 0
kbx − b{x,z} k = k[1 0 0 1 0 0] k = 2
Cl(by , b{x,y} , bΘ )

Cl(by , bΘ ) kby − bΘ k = kby k = k[0 1 0 1 0 1]0 k = 3
0]0 k = 1
Cl(b{x,y} , bΘ ) ↔ kb{x,y} − bΘ k = kb{x,y} k = k[0 0 0 1 0 √
Cl(by , b{x,y} ) 0
kby − b{x,y} k = k[0 1 0 0 0 1] k = 2
Cl(bz , b{x,z} , bΘ )

Cl(bz , bΘ ) kbz − bΘ k = kbz k = k[0 0 1 0 1 1]0 k = 3
0]0 k = 1
Cl(b{x,z} , bΘ ) ↔ kb{x,z} − bΘ k = kb{x,z} k = k[0 0 0 0 1 √
Cl(bz , b{x,z} ) 0
kbz − b{x,z} k = k[0 0 1 0 0 1] k = 2
it is clear that the 1-dimensional faces of each pair of maximal simplices can be put
into a 1-1 correspondence, based on their having the same norm.
For instance, for the pair of triangles
Cl(bx , b{x,y} , bΘ ), Cl(bz , b{x,z} , bΘ ),
the desired correspondence is:
Cl(bx , bΘ ) ↔ Cl(bz , bΘ ), Cl(b{x,y} , bΘ ) ↔ Cl(b{x,z} , bΘ )
Cl(bx , b{x,y} ) ↔ Cl(bz , b{x,z} )
for such pairs of segments have the same norm.
This can be proven in the general case as well.
9.3 Properties of the consonant subspace 289

Theorem 35. All the maximal simplices of the consonant subspace are congruent.

Proof. To get a proof for the general case we need to find a 1-1 map between 1-
dimensional sides of two any maximal simplices. Let A = {A1 ⊂ · · · ⊂ Ai ⊂
· · · ⊂ An = Θ}, B = {B1 ⊂ · · · ⊂ Bi ⊂ · · · ⊂ Bn = Θ} be the associated
maximal chains.
The trick consists in associating pairs of events with the same cardinality:

Cl(bAi , bAj ) ↔ Cl(bBi , bBj ), |Ai | = |Bi | = i, |Aj | = |Bj | = j > i.

Indeed the categorical b.f. bAi is such that bAi (B) = 1 when B ⊇ Ai , bAi (B) = 0
otherwise. On the other side bAj (B) = 1 when B ⊇ Aj ⊃ Ai , bAj (B) = 0
otherwise, since Aj ⊃ Ai by hypothesis. Hence

|bAi − bAj (B)| = 1 ⇔ B ⊇ Ai , B 6⊃ Aj

so that
q q
kbAi − bAj k2 = |{B ⊆ Θ : B ⊇ Ai , B 6⊃ Aj }| = |Aj \ Ai |.

But this is true for each similar pair in any other maximal chain, so that

kbAi − bAj k2 = kbBi − bBj k2 ∀i, j ∈ [1, ..., n]

for each pair of maximal simplices of CO. In force of the generalization of a


well known Euclid’s theorem, this implies that the two simplices are congruent:
Cl(bA1 , · · · , bΘ ) ∼ Cl(bB1 , · · · , bΘ ).

It is easy to see that the components of CO are not congruent with P, even
though they have both dimension n − 1. In the binary case, for instance,

P = Cl(bx , by ), kPk = kby − bx k = 2

while kCOx k = kCOy k = 1.

9.3.2 Decomposition of maximal simplices into right triangles

An analysis of the norm of the difference of two categorical belief functions can
provide us with additional information about the nature and structure of the maximal
simplices of the consonant subspace.
We know from [258] that in RN −2 each triangle

Cl(bΘ , bB , bA )

\
with Θ ) B ) A is a right triangle with as right angle bΘ bB bA .
Indeed we can prove here a much more general result.

Theorem 36. If Ai ) Aj ) Ak then bA\b b = π/2.


i Aj Ak
290 9 The geometry of possibility

Proof. As Ai ) Aj ) Ak we can write:



1 B ⊇ Aj , B 6⊃ Ai
bAj − bAi (B) =
0 otherwise;

1 B ⊇ Ak , B 6⊃ Ai
bAk − bAi (B) =
0 otherwise;

1 B ⊇ Ak , B 6⊃ Aj
bAk − bAj (B) =
0 otherwise.
This implies:
bAi − bAj (B) = 1 ⇒ B 6⊃ Ai , B ⊇ Aj ⇒ bAj − bAk (B) = 0
and vice-versa, so that the inner product hbAi − bAj , bAj − bAk i = 0 is nil, and
therefore bA\b b is π/2.
i Aj Ak

All triangles Cl(bAi , bAj , bAk ) in CO such that Ai ) Aj ) Ak are right tri-
angles. But as each maximal simplicial component COC of the consonant complex
has vertices associated with the elements A1 ( · · · ( An of a maximal chain, any
three of them will also form a chain. Hence all 2-dimensional faces of any maximal
component of CO are right triangles. All its 3-dimensional faces (tetrahedrons) have
as faces right triangles (Figure 9.4), and so on.

bAk

bA bAj
l

bAi

Fig. 9.4. All the tetrahedrons Cl(bAi , bAj , bAk , bAl ) formed by vertices of a maximal sim-
plex of the consonant subspace, Ai ( Aj ( Ak ( Al , have all right triangles as faces.

9.4 Consistent belief functions


Although consonant belief functions are characterised by evidence pointing all in the
same direction, as we will see in the following they are not the most general class
of belief functions associated with collections of consistent pieces of evidence. We
introduce this new class of b.f.s starting from an analogy of the notion of consistency
in classical logic.
9.4 Consistent belief functions 291

9.4.1 Consistent knowledge bases in classical logic

In classical logic, a set Φ of formulas or “knowledge base” is said consistent if and


only if there does exist another formula φ such that the knowledge base implies
both such formula and its negation: Φ ` φ, Φ ` ¬φ. In other words, it is impos-
sible to derive incompatible conclusions from the set of propositions that form a
consistent knowledge base. The application of inference rules to inconsistent col-
lections of formulas may lead to incompatible conclusions, depending on the subset
of assumptions one starts their reasoning from [1012].
A variety of approaches have been proposed in the context of classical logics
to address the issue of inconsistent knowledge bases, such as fragmenting the latter
into maximally consistent subsets, limiting the power of the formalism, or adopting
non-classical semantics [1049, 67]. Even when a knowledge base is formally incon-
sistent, however, it may still contain potentially useful information. Paris [1012], for
instance, tackles the problem by not assuming each proposition in the knowledge
base as a fact, but by attributing to it a certain degree of belief in a probabilistic
logic approach. This leads to something similar to a belief function.

9.4.2 Belief functions as uncertain knowledge bases

This parallelism with classical logic reminds us of the fact that belief functions are
also collections of disparate pieces of evidence, incorporated in time as they become
available. As a result, each belief function is likely to contain self-contradictory
information, which is in turn associated with a degree of “internal” conflict. As we
have seen in Chapter ??, conflict and combinability play a central role in the theory
of evidence [1470, 1216, 1063], and have been recently subject to novel analyses
[871, 637, 884].
In propositional logic, propositions or formulas are either true or false, i.e., their
truth value is either 0 or 1 [922]. Formally, an interpretation or model of a formula
φ is a valuation function mapping φ to the truth value “true” (1). Each formula can
therefore be associated with the set of interpretations or models under which its truth
value is 1. If we define a frame of discernment collecting all the possible interpreta-
tions, each formula φ is associated with the subset A(φ) of this frame which collects
all its interpretations.
A straightforward extension of classical logic consists on assigning a probability
value to such sets of interpretations, i.e, to each formula. If, however, the available
evidence allows us to define a belief function on the frame of possible interpreta-
tions, each formula A(φ) ⊆ Θ is then naturally assigned a degree of belief b(A(φ))
between 0 and 1 [1104, 572], measuring the total amount of evidence supporting the
proposition “φ is true”.
A belief function can therefore be seen in this context as the generalization of a
knowledge base [1104, 572], i.e., a set of propositions together with their non-zero
belief values: b = {A ⊆ Θ : b(A) 6= 0}.
292 9 The geometry of possibility

9.4.3 Consistency in belief logic

To determine what consistency amounts to in such a framework, however, we need


to precise the notion of “proposition implied by a belief function”.
One way to define this is to decide that b ` B ⊆ Θ if B is implied by all the
propositions supported by b:

b ` B ⇔ A ⊆ B ∀A : b(A) 6= 0. (9.2)

An alternative definition requires the proposition B itself to receive non-zero sup-


port by the belief function b:

b ` B ⇔ b(B) 6= 0. (9.3)

Whatever the way we choose to define implication, we can define the class of con-
sistent belief functions as the set of BFs which cannot imply contradictory proposi-
tions.
Definition 79. A belief function b is consistent if there exists no proposition A such
that both A and its negation Ac are implied by b.
When adopting the implication relation (9.2), it is trivial to verify that:
 
\
A ⊆ B ∀A : b(A) 6= 0 ⇔  A ⊆ B.
b(A)6=0

Furthermore, as each proposition with non-zero belief value must, by definition,


contain a focal element C s.t. mb (C) 6= 0, the intersection of all non-zero belief
propositions reduces to that of all focal elements of b, i.e., the core of b CHECK
DEF OF CORE IS CONSISTENT:
\ \ \
A= A= C = Cb .
b(A)6=0 ∃C⊆A:mb (C)6=0 mb (C)6=0

No matter our definition of implication, the class of consistent belief functions cor-
responds to the set of b.f.s whose core is not empty.
Definition 80. A belief function is said to be consistent if its core is non-empty:
. \
Cb = A 6= ∅.
A:mb (A)6=0

Indeed we can prove that, under either definition (9.2) or definition (9.3) of the
implication b ` B, Definitions 79 and 80 are equivalent.
Theorem 37. A belief function b : 2Θ → [0, 1] has non-empty core if and only
if there do not exist two complementary propositions A, Ac ⊆ Θ which are both
implied by b in the sense (9.2).
9.5 The geometry of consistent belief functions 293

Proof. We have seen above that a proposition A is implied (9.2) by b iff Cb ⊆ A.


Accordingly, in order for both A and Ac to be implied by b we would need Cb = ∅.

Theorem 38. A belief function b : 2Θ → [0, 1] has non-empty core if and only if
there do not exist two complementary propositions A, Ac ⊆ Θ which both enjoy
non-zero support from b, b(A) 6= 0, b(Ac ) 6= 0 (i.e., they are implied by b in the
sense (9.3)).
@A, Ac : b(A) 6= 0, b(Ac ) 6= 0.

Proof. By Definition 79, in order for a subset (or proposition, in a propositional


logic interpretation) A ⊆ Θ to have non-zero belief value it has to contain the core
of b: A ⊇ Cb . In order to have both b(A) 6= 0, b(Ac ) 6= 0 we need them to both
contain the core, but in that case

A ∩ Ac ⊇ Cb 6= ∅

which is absurd as A ∩ Ac = ∅. CHECK THIS PROOF

Clearly, if we define the amount of internal conflict of a belief function as


. X
c(b) = mb (A)mb (B), (9.4)
A,B⊆Θ:A∩B=∅

it follows that: CHECK STANDARD NOTATION FOR c(b)


Theorem 39. A belief function b : 2Θ → [0, 1] is consistent if and only if its internal
conflict is zero, c(b) = 0.
When measuring the amount of conflict between two belief function to be combined,
it is standard practice (although there are exceptions [871]) to adopt as indicator the
quantity mb (∅) which measure the total mass of non-intersecting focal elements in
Dempster’s combination. The quantity (9.4) provides a more conservative assess-
ment of the amount of conflict present in the evidence generating the belief function
at hand, as it is easy to see that:
X
c(b) = mb (A)mb (B)
A,B⊆Θ:A∩B=∅ X
= m(∅) + mb (A)mb (B) ≥ m(∅).
∅(A,B⊆Θ:A∩B=∅

9.5 The geometry of consistent belief functions


To provide a complete overview of the geometry of possibility measures it is then
necessary to study the geometry of consistent belief functions. Here we show that,
just like consonant b.f.s, they live in a simplicial complex whose faces possess cer-
tain geometrical features. We conclude by proposing a natural decomposition of any
294 9 The geometry of possibility

arbitrary belief function into |Θ| “consistent components”, each of them living in a
maximal simplex of the consistent complex. Such components can thus be seen as
natural consistent transformations with different cores of an arbitrary belief func-
tion. The topic of how to approximate a generic b.f. by means of a consistent one
will be discussed in greater detail Chapter 13.

9.5.1 Example: the binary frame


In our running example of a frame of discernment of cardinality 2, consistent belief
functions may obviously have as collection of focal elements either Eb = {{x}, Θ}
or Eb = {{y}, Θ}. Thus, by Theorem 11, all consistent b.f.s on Θ2 = {x, y} live in
the union of two convex components:
CS 2 = CS x ∪ CS y = Cl(bΘ , bx ) ∪ Cl(bΘ , by ).

Fig. 9.5. In the binary case consistent belief functions are constrained to belong to the union
of the two segments CS x = Cl(bΘ , bx ) and CS y = Cl(bΘ , by ) (cfr. the behaviour of conso-
nant b.f.s in Figure 9.1).

9.5.2 The region of consistent belief functions


In the general case, consistent belief functions are characterized as we know by the
fact that their focal elements have non-empty intersection. All possible lists of focal
elements associated with consistent BFs then correspond to all possible collections
of intersecting events:
m
\
{A1 , ..., Am ⊆ Θ : Ai 6= ∅}.
i=1
9.5 The geometry of consistent belief functions 295

Just as in the consonant case, Theorem 11 implies that all the b.f.s whose fo-
cal elements belong to such a collection, no matter the actual values of their ba-
sic plausibility assignment, form a simplex Cl(bA1 , ..., bAm ). Such a collection is
“maximal” when it is not possible to add another focal element Am+1 such that
∩m+1
i=1 Ai 6= ∅.
It is easy to see that collections of events with non-empty intersection are maximal
iff they have the form {A ⊆ Θ : A 3 x}, where x ∈ Θ is a singleton. Consequently,
the region of consistent belief functions is the union of the following collection of
maximal simplices: [
CS = Cl(bA , A 3 x). (9.5)
x∈Θ
.
There are obviously n = |Θ| such maximal simplices in CS. Each of them has
c
|{A : A 3 x}| = |{A ⊆ Θ : A = {x} ∪ B, B ⊂ {x}c }| = 2|{x} | = 2n−1
B
vertices, so that their dimension as simplices in the belief space is 2n−1 − 1 = dim
2
n
(as the dimension of the whole belief space is dim B = 2 − 2).
Clearly CS is connected as as each maximal simplex is by definition connected,
and bΘ belongs to all maximal simplices.

9.5.3 Consistent complex

Just as in the consonant case, the region (9.5) of consistent belief functions is an
instance of simplicial complex (see Definition 78) [441].
Theorem 40. CS is a simplicial complex in the belief space B.
As with consonant belief functions, more can be said about the geometry of the
maximal faces of CS. In Θ = {x, y, z}, for instance, the consistent complex CS is
composed by three maximal simplices of dimension |{A 3 x}|−1 = 3 (cfr. Section
9.3.2):
Cl(bA : A 3 x) = Cl(bx , b{x,y} , b{x,z} , bΘ ),
Cl(bA : A 3 y) = Cl(by , b{x,y} , b{y,z} , bΘ ), (9.6)
Cl(bA : A 3 z) = Cl(bz , b{x,z} , b{y,z} , bΘ ).
Once again the vertices of each pair of such maximal simplices can be put into a 1-1
correspondence. Consider for instance the pair CS x = Cl(bx , b{x,y} , b{x,z} , bΘ ),
CS z = Cl(bz , b{x,z} , b{y,z} , bΘ ). The desired mapping is:

x ↔ z, {x, z} ↔ {x, z}, {x, y} ↔ {y, z}, Θ ↔ Θ,

for corresponding segments in the two simplices have the same length. For exam-
ple, (remembering that bA (B) = 1 B ⊇ A, bA (B) = 0 otherwise), Cl(bx , bΘ ) is
congruent with Cl(bz , bΘ ) as:

kbx −bΘ k = kbx k = k[1 0 0 1 1 0]0 k = 3 = k[0 0 1 0 1 1]0 k = kbz −bΘ k = kbz k.
296 9 The geometry of possibility

In the same way, Cl(b{x,z} , b{x,y} ) is congruent with Cl(b{x,z} , b{y,z} ) as



kb{x,z} −b{x,y} k = k[0 0 0−1 1 0]0 k = 2 = k[0 0 0 0 1−1]0 k = kb{x,z} −b{y,z} k.

This is true in the general case.

Theorem 41. All maximal simplices of the consistent complex are congruent.

9.6 Natural consistent components


We wish to conclude this Chapter devoted to possibility theory by showing that
each belief function b can be decomposed into n = |Θ| consistent components with
distincy cores, which can be interpreted as natural projections of b on to the maximal
components of the consistent simplicial complex. Interestingly, this decomposition
turns out to be closely related to the pignistic transformation [1276]:
X mb (A)
BetP [b](x) = .
|A|
A⊇{x}

Indeed, we can write:


P mb (A)
X X X mb (A) X A3x |A| bA
b= mb (A)bA = bA = BetP [b](x)
|A| BetP [b](x)
A⊆Θ
X x∈Θ A3x x∈Θ
x
= BetP [b](x)b .
x∈Θ
(9.7)
The n = |Θ| consistent belief functions

. 1 X mb (A)
bx = bA , x ∈ Θ
BetP [b](x) |A|
A3x

can be considered as “consistent component” of b onto the maximal components


CS x , x ∈ Θ of the consistent complex.
As a result, each arbitrary belief function b lives in the n − 1 dimensional sim-
.
plex P b = Cl(bx , x ∈ Θ) with as vertices these n consistent components (see
Figure 9.6). Strikingly, its convex coordinates in this simplex P b coincide with the
coordinates of the pignistic probability in the probability simplex P, namely:
X X
BetP [b] = BetP [b](x)bx ↔ b= BetP [b](x)bx .
x∈Θ x∈Θ

Obviously if b ∈ P then bx = bx ∀x ∈ Θ.
Equation (9.7) draws a bridge between the notions of belief, probability, and
possibility, by associating each belief function with its “natural” probabilistic (the
pignistic function) and possibilistic (the quantities bx ) components.
9.7 Open questions 297

bx
P

CSx bz
x
b
BetP[b]

b
P
b
y
by b
CSy
z
b


CSz

Fig. 9.6. Pictorial representation of the role of the pignistic values BetP [b](x) for a belief
function and the related pignistic function. Both b and BetP [b] live in a simplex (respectively
P = Cl(bx , x ∈ Θ) and P b = Cl(bx , x ∈ Θ)) on which they possess the same convex
coordinates {BetP [b](x)}. The vertices bx , x ∈ Θ of the simplex P b can be interpreted as
consistent components of the belief function b on the simplicial complex of consistent belief
functions CS.

It is natural to wonder whether the consistent components bx of b can indeed


be interpreted as actual consistent approximations of b, i.e., whether they minimise
some sort of distance between b and the consistent complex.
It is immediate to see that

BetP [b] = arg min d(p, b)


p∈P

where d is any function of the convex coordinates of b in P b (as they coincide with
the pignistic values for both b and BetP [b]).
The consistent approximation will be analysed in more detail in Chapter 12.

9.7 Open questions


298 9 The geometry of possibility

Appendix: proofs
Proof of Theorem 40

Property 1. of Definition 78 is trivially satisfied. As a matter of fact, if a simplex


Cl(bA1 , ..., bAn ) corresponds to focal elements with non-empty intersection, clearly
points of any face of this simplex (obtained by selecting a subset of vertices) will
be b.f.s with non-empty core, and will therefore correspond to consistent b.f.s. As
for property 2., consider the intersection of two maximal simplices of CS associated
with two distinct cores C1 , C2 ⊆ Θ:

Cl(bA : A ⊇ C1 ) ∩ Cl(bA : A ⊇ C2 ).

Now, each convex closure of points b1 , ..., bm in a Cartesian space is trivially in-
cluded in the affine space they generate:
( )
. X
Cl(b1 , ..., bm ) ( a(b1 , ..., bm ) = b : b = α1 b1 + · · · + αm bm , αi = 1
i

(we just need to relax the positivity constraint on the coefficients αi ). But the cate-
gorical belief functions {bA : ∅ ( A ( Θ} are linearly independent (as it is straight-
forward to check), so that a(bA , A ∈ L1 ) ∩ a(bA , A ∈ L2 ) 6= ∅ (where L1 , L2 are
lists of subsets of Θ) if and only if L1 ∩ L2 6= ∅. Here L1 = {A ⊆ Θ : A ⊇ C1 },
L2 = {A ⊆ Θ : A ⊇ C2 }, so that the condition is

{A ⊆ Θ : A ⊇ C1 } ∩ {A ⊆ Θ : A ⊇ C2 } = {A ⊆ Θ : A ⊇ C1 ∪ C2 } =
6 ∅.

As C1 ∪ C2 ⊇ C1 , C2 we have that Cl(bA , A ⊇ C1 ∪ C2 ) is a face of both simplices.

Proof of Theorem 41

We need to find a 1-1 map between the vertices of any two maximal simplices
Cl(bA , A 3 x), Cl(bA : A ⊇ y) of CS such that corresponding sides are congruent.
We need to rewrite the related collections of events as:
{A ⊆ Θ : A 3 x} = {A ⊆ Θ : A = B ∪ {x}, B ⊆ {x}c },
(9.8)
{A ⊆ Θ : A 3 y} = {A ⊆ Θ : A = B ∪ {y}, B ⊆ {y}c }.

But, in turn, {B ⊆ {x}c } = {B ⊆ {x, y}c } ∪ {B 6⊃ x, B 3 y} and {B ⊆ {y}c } =


{B ⊆ {x, y}c } ∪ {B 6⊃ y, B 3 x}. Therefore:

{B ⊆ {x}c } = {B ⊆ {x, y}c } ∪ {B = C ∪ {y}, C ⊆ {x, y}c }


{B ⊆ {y}c } = {B ⊆ {x, y}c } ∪ {B = C ∪ {x}, C ⊆ {x, y}c }.

Let us then define the following map between events of the two collections (9.8):

{A ⊆ Θ, A 3 x} → {A ⊆ Θ, A 3 y}
(9.9)
A = B ∪ {x} 7→ A0 = B 0 ∪ {y}
9.7 Open questions 299

where
B 7→ B 0 = B B ⊆ {x, y}c ;

0 (9.10)
B = C ∪ {y} 7→ B = C ∪ {x} B 6⊆ {x, y}c .
We can prove that (9.9) preserves the length of the segments in the corresponding
maximal simplices Cl(bA , A 3 x), Cl(bA , A 3 y). We first need to find an explicit
expression for kbA − bA0 k, A, A0 ⊆ Θ. Again, each categorical BF bA is such that:

bA (B) = 1 B ⊇ A, bA (B) = 0 otherwise.

If A0 ⊇ A then bA0 (B) = 1 if B ⊇ A0 ⊇ A, bA0 (B) = 0 otherwise. Hence

bA − bA0 (B) 6= 0 ⇔ bA − bA0 (B) = 1 ⇔ B ⊇ A, B 6⊇ A0

and p p
kbA − bA0 k = |{B ⊇ A, B 6⊇ A0 }| = |A0 \ A|.
For each pair of vertices A1 = B1 ∪ {x}, A2 = B2 ∪ {x} in the first component
we can distinguish four cases:
1. B1 ⊆ {x, y}c , B2 ⊆ {x, y}c in which case B10 = B1 , B20 = B2 and

|A02 \ A01 | = |(B20 ∪ {y}) \ (B10 ∪ {y})| = |B20 \ B10 | = |B2 \ B1 |


= |B20 ∪ {x} \ B10 ∪ {x}| = |A2 \ A1 |,

so that kbA02 − bA01 k = kbA2 − bA1 k;


2. B1 ⊆ {x, y}c but B2 6⊆ {x, y}c , B2 = C2 ∪ {y}, in which case B10 = B1 ,
B20 = C2 ∪ {x} which implies:

A02 \ A01 = B20 \ B10 = (C2 ∪ {x}) \ B1 = (C2 \ B1 ) ∪ {x}


A2 \ A1 = B2 \ B1 = (C2 ∪ {y}) \ B1 = (C2 \ B1 ) ∪ {y}.

But then |A02 \ A01 | = |A2 \ A1 | so that again kbA02 − bA01 k = kbA2 − bA1 k;
3. B1 6⊆ {x, y}c , B1 = C1 ∪ {y} but B2 ⊆ {x, y}c , which by symmetry of the \
operator yields again kbA02 − bA01 k = kbA2 − bA1 k as in point 2);
4. B1 6⊆ {x, y}c , B1 = C1 ∪ {y}, B2 6⊆ {x, y}c , B2 = C2 ∪ {y} in which case
B10 = C1 ∪ {x}, B20 = C2 ∪ {x}, so that:

B20 \ B10 = (C2 ∪ {x}) \ (C1 ∪ {x}) = C2 \ C1


= (C2 ∪ {y}) \ (C1 ∪ {y}) = B2 \ B1 .

In all cases kbA02 − bA01 k = kbA2 − bA1 k, for pairs of segments Cl(A1 , A2 ),
Cl(A01 , A02 ) in the two maximal components associated through the mapping (9.10)
introduced above. For the usual generalization of a well known Euclid’s theorem
this implies that the two simplices are congruent.
Part III

Geometric interplays of uncertainty measures


As we have seen in Chapter 4,the relation between belief and probability in the
theory of evidence has been and continues to be an important subject of study. The
reason is that a probability transform mapping belief functions to probability mea-
sures can be of use in various contexts: to mitigate the inherent exponential complex-
ity of belief calculus (Section 4.4), to make decisions via the obtained probability
distributions in an utility theory framework (Section 4.5), to obtain pointwise esti-
mates of quantities of interest from belief functions (e.g. the pose of an articulated
object in computer vision: see [?], Chapter 8, or [280]).
Since both belief and probability measures can be assimilated to points of a
Cartesian space (Chapter 6), the problem can be posed in a geometric setting too.
In Part III we will apply the geometric approach introduced in Part II to the study
of the problem of transforming an uncertainty measure (e.g. a belief function) into
a diffent type of measure (e.g., a probability or a possibility).
The affine family of probability
transforms
10
Without loss of generality, we can define a probability transform as a mapping PT :
B → P, from the belief space to the probability simplex, b ∈ B 7→ PT [b] ∈ P such
that an appropriate distance function or similarity measure d from b is minimized
[311]:
PT [b] = arg min d(b, p) (10.1)
p∈P

(compare our review of dissimilarity measures in Section 4.8.2).


Note that such definition requires the probability which results from the transform
to be compatible with the upper and lower bounds the original b.f. b enforces on the
singletons only, and not on all the focal sets as in Equation (3.10). This is a mini-
mal, sensible constraint which does not require probability transforms to adhere to
the upper-lower probability semantics of belief functions (cfr. Chapter 3). As a mat-
ter of fact, important such transforms are not compatible with such semantics, as we
will see here.
In particular, approximation and conditioning approaches explicitly based on tradi-
tional Lp norms are proposed in Chapters 12, 13 and 15.
Here, however, we wish to pursue a wider understanding of the geometry of a num-
ber of probability transforms, and their classification into families composed by
transforms which exhibit common properties.
Indeed the study of the link between belief functions and probabilities has been
posed in a geometric setup by few authors in the past.
In particular, Ha and Haddawy [567] have proposed an ‘affine operator’ which can
be considered a generalization of both belief functions and interval probabilities,

305
306 10 The affine family of probability transforms

and can be used as a tool for constructing convex sets of probability distributions.
Uncertainty is modeled as sets of probabilities represented as ‘affine trees’, while ac-
tions (modifications of the uncertain state) are defined as tree manipulators. A small
number of properties of the affine operator are also presented. In a later work [566]
they presented the interval generalization of the probability cross-product operator,
called convex-closure (cc) operator. They analyzed the properties of the cc-operator
relative to manipulations of sets of probabilities, and presented interval versions of
Bayesian propagation algorithms based on it. Probability intervals were represented
in a computationally efficient fashion, by means of a data structure called pcc-tree,
in which branches are annotated with intervals, and nodes with convex sets of prob-
abilities.
The topic of this Chapter is somehow related to Ha’s cc operator, as we deal
here with probability transforms which commute (at least under certain conditions)
with affine combination. We call such group of transforms the ‘affine family’, of
which Smets’ pignistic transform (Section 4.4.2) is the foremost representative. We
introduce two new probability transformations of belief functions, both of them de-
rived from purely geometric considerations, that can be grouped with the pignistic
function in the affine family.

Chapter outline

As usual, we first look for insight by considering the simplest case of a binary frame
(Section 10.1). Each belief function b is associated there with three different geo-
metric entities, namely: the simplex of consistent probabilities P[b] = {p ∈ P :
p(A) ≥ b(A) ∀A ⊂ Θ} (see Chapter 4, Section 3.1.4, and [169]); the line (b, plb )
joining b with the related plausibility function plb ; and the orthogonal complement
P ⊥ of the probabilistic subspace P. These in turn determine three different proba-
bilities associated with b, i.e. the barycenter of P[b] or pignistic function BetP [b],
the intersection probability p[b], and the orthogonal projection π[b] of b onto P. In
the binary case all these Bayesian belief functions coincide.
In Section 10.2 we prove that, even though the (‘dual’) line (b, plb ) is always
orthogonal to P, it does not intersect in general the Bayesian simplex. However, it
does intersect the region of Bayesian normalized sum functions (compare Chapter
6, Section 6.3.3), i.e., the generalizations of belief functions obtained by relaxing
the positivity constraint for masses. This intersection yields a Bayesian n.s.f. ς[b].
We later see, in Section 10.3, that ς[b] is in turn associated with a proper Bayesian
belief function p[b], which we call intersection probability. We provide two differ-
ent interpretations of the way this probability distributes the masses of the focal ele-
ments of b to the elements of Θ, both functions of the difference between plausibility
and belief of singletons, and compare the combinatorial and geometric behavior of
p[b] with those of pignistic function and relative plausibility of singletons.
Section 10.4 concerns the study of the orthogonal projection of b onto the prob-
ability simplex P, i.e., the transform (10.1) associated with the classical d = L2
distance. We show that π[b] always exists and is indeed a probability function. After
deriving the condition under which a belief function b is orthogonal to P we give
10.1 Affine transforms in the binary case 307

two equivalent expressions of the orthogonal projection. We see that π[b] can be
reduced to another probability signalling the distance of b from orthogonality, and
that this ‘orthogonality flag’ can be in turn interpreted as the result of a mass redis-
tribution process analogous to that associated with the pignistic transform.
We prove that, just as BetP [b] does, π[b] commutes with the affine combination
operator, and can therefore be expressed as a convex combination of basis pignistic
functions, making orthogonal projection and pignistic function fellow members of
a common ‘affine family’ of probability transformations.
For sake of completeness, the case of unnormalized belief functions (see Section
10.5) is also discussed. We argue that, while the intersection probability p[b] is not
defined for a generic u.b.f. b, the orthogonal projection π[b] does exist and retains
its properties.
Finally, in Section 10.6 more general conditions under which the three affine
transformations coincide are analysed.

10.1 Affine transforms in the binary case

In Chapter 8, Section 8.3.3 (Figure 10.1) we extensively illustrated the geometry of


belief functions living on a binary frame Θ2 = {x, y}. There, both belief B and
plausibility PL spaces are simplices with vertices {bΘ = [0, 0]0 , bx = [1, 0]0 , by =
[0, 1]0 } and {plΘ = [1, 1]0 , plx = bx , ply = by }, respectively.
Let us first recall the definition (8.1) of basic plausibility assignment (b.pl.a., the
Moebius transform of the plausibility function, see Chapter 8, Section 8.1), and its
expression (8.3) in terms of the basic probability assignment:
X
µb (A) = (−1)|A|+1 mb (B),
B⊇A

whenever A is non-empty. We can then compute the b.pl.a. of an arbitrary belief


function b on Θ2 as:
X
µb (x) = (−1)2 mb (B) = mb (x) + mb (Θ) = plb (x),
B3x
X
µb (y) = (−1)2 mb (B) = mb (y) + mb (Θ) = plb (y).
B3y

Thus, the point of R2 which represents its (dual) plausibility function is simply (see
Figure 10.1 again):
plb = plb (x)bx + plb (y)by .
As it was first noticed in Chapter 8, belief and plausibility spaces lie in symmetric
locations with respect to the Bayesian simplex P = Cl(bx , by ). Furthermore, each
pair of measures (b, plb ) determines a line orthogonal to P, where b and plb lie on
symmetric positions on the two sides of P itself.
308 10 The affine family of probability transforms

P'
plΘ=[1,1]'
by=[0,1]'=pl y

PL

P
B
pl b
1−m b(x)
P[b]
~
pl b
p[b]=π[b]=BetP[b]
~
b
m b(y) b
bx =[1,0]'=plx
bΘ=[0,0]' m b(x) 1−m b(y)

Fig. 10.1. In a binary frame Θ2 = {x, y} a belief function b and the corresponding plau-
sibility function plb are always located in symmetric positions with respect to the segment
˜ and belief
P of all the probabilities defined on Θ2 . The associated relative plausibility pl b
b̃ of singletons are just the intersections of the probability simplex P with the line passing
through plb and bΘ = [0, 0]0 and that joining b and bΘ , respectively. Pignistic function, or-
thogonal projection and intersection probability all coincide with the center of the segment
of probabilities P[b] which dominate b (in red).

Clearly, in the binary case the set P[b] = {p ∈ P : p(A) ≥ b(A) ∀A ⊆ Θ} of


all the probabilities dominating b is a segment whose center of mass P[b] is known
[169, 437, 284] to be Smets’ pignistic function [1232, 1276]:
X X mb (A)  mb (Θ)   mb (Θ) 
BetP [b] = bx = bx mb (x) + + by mb (y) + ,
|A| 2 2
x∈Θ A⊃x
(10.2)
and coincides with both the orthogonal projection π[b] of b onto P, and the intersec-
tion p[b] of the line a(b, plb ) with the Bayesian simplex P:

p[b] = π[b] = BetP [b] = P[b].

Inherently epistemic notions such as ‘consistency’ and ‘linearity’ (one of the ratio-
nality principles behind the pignistic transform [1223]) seem to be related to geo-
metric properties such as orthogonality. It is natural to wonder whether this is true
in general, or is just an artifact of the binary frame.
10.2 Geometry of the dual line 309

Incidentally, both the relative plausibility (4.16) and the relative belief (4.18) of
singletons (see Chapter 4, Section 4.4.2 or the original paper [1358]), do not follow
the same pattern. We will consider their behavior separately in Chapter 11.

10.2 Geometry of the dual line

In the binary case the plane R2 in which both B and PL are embedded is the space
of all the normalized sum functions (n.s.f.s) on Θ2 (compare Section 6.3.3). The
region P 0 of all the Bayesian n.s.f.s one can define on Θ = {x, y} is the line:
n o
P 0 = ς ∈ R2 : mς (x) + mς (y) = 1 = a(P),

i.e., the affine space a(P) = a(bx , x ∈ Θ) generated by P 1 .

10.2.1 Orthogonality of the dual line

We first note that P 0 can be written as the translated version of a vector space as
follows:
a(P) = bx + span(by − bx , ∀y ∈ Θ, y 6= x),
where span(by − bx ) denotes the vector space generated by the n − 1 difference
vectors by − bx (n = |Θ|), and bx is the categorical belief function focussed on x
(Section 6.2). Since a categorial b.f. bB focussed on a specific subset B is such that:

1A⊇B
bB (A) = (10.3)
0 otherwise,

these vectors show a rather peculiar symmetry, namely:



1 A ⊃ {y}, A 6⊃ {x}
by − bx (A) = 0 A ⊃ {x}, {y} or A 6⊃ {x}, {y} (10.4)
−1 A 6⊃ {y}, A ⊃ {x}.

The latter can be exploited to prove the following Lemma.


Lemma 9. [by − bx ](Ac ) = −[by − bx ](A) ∀A ⊆ Θ.
1
Here a(v1 , .., vk ) denotes the affine subspace of a Cartesian space Rm generated by a
collection of points v1 , ..., vk ∈ Rm , i.e. the set
.
n X o
a(v1 , .., vk ) = v ∈ Rm : v = α1 v1 + · · · + αk vk , αi = 1 .
i
310 10 The affine family of probability transforms

Proof. By Equation (10.3) [by − bx ](A) = 1 implies

A ⊃ {y}, A 6⊃ {x} ⇒ Ac ⊃ {x}, Ac 6⊃ {y} ⇒ [by − bx ](Ac ) = −1

and vice-versa. On the other hand, [by − bx ](A) = 0 implies A ⊃ {y}, A ⊃ {x}
or A 6⊃ {y}, A 6⊃ {x}. In the first case Ac 6⊃ {x}, {y}, in the second one Ac ⊃
{x}, {y}. Regardless, [by − bx ](Ac ) = 0.

Lemma 9 allows us to prove that, just as in the binary case (see chapter Appendix):
Theorem 42. The line connecting plb and b is orthogonal to the affine space gen-
erated by the probabilistic simplex. Namely:

a(b − plb )⊥a(P).

10.2.2 Intersection with the region of Bayesian normalized sum functions

One might be tempted to conclude that, since a(b, plb ) and P are always orthogonal,
their intersection is the orthogonal projection of b onto P as in the binary case.
Unfortunately, this is not the case for in general they do not intersect each other.
As a matter of fact b and plb belong to a N − 2 = (2n − 2)-dimensional Euclidean
space (recall that we neglect the trivially constant components associated with the
empty set ∅ and the entire frame Θ, Chapter 6), while the simplex P generates a
vector space whose dimension is only n − 1. If n = 2, n − 1 = 1 and 2n − 2 = 2
so that a(P) divides the plane into two half-planes with b on one side and plb on the
other side (see Figure 10.1 again).
Formally, for a point on the line a(b, plb ) to be a probability measure we need
to find a value of α such that b + α(plb − b) ∈ P. Its components are obviously
b(A) + α[plb (A) − b(A)] for any subset A ⊂ Θ, A 6= Θ, ∅. In particular, when
A = {x} is a singleton:

b(x) + α[plb (x) − b(x)] = b(x) + α[1 − b(xc ) − b(x)]. (10.5)

In order for this point to belong to P, it needs to meet the normalization constraint
for singletons, namely:
X X
b(x) + α [1 − b(xc ) − b(x)] = 1.
x∈Θ x∈Θ

The latter yields a single candidate value β[b] for the line coordinate of the desired
intersection, more precisely:
P
1 − x∈Θ b(x) .
α= P c ) − b(x)]
= β[b]. (10.6)
x∈Θ [1 − b(x

Using the terminology of Section 6.3.3, the candidate projection


.
ς[b] = b + β[b](plb − b) = a(b, plb ) ∩ P 0 (10.7)
10.3 The intersection probability 311

(where P 0 denotes once again the set of all Bayesian normalized sum functions in
RN −2 ) is a Bayesian n.s.f., but is not guaranteed to be a Bayesian
Pbelief function.
For normalized sum functions, the normalization condition x∈Θ mς (x) = 1
implies |A|>1 mς (A) = 0, so that P 0 can be written as:
P

 
 X X X 
P0 = ς= mς (A)bA ∈ RN −2 : mς (A) = 1, mς (A) = 0 .
 
A⊂Θ |A|=1 |A|>1
(10.8)

Theorem 43. The coordinates of ς[b] in the reference frame of the categorical
Bayesian belief functions {bx , x ∈ Θ} can be expressed in terms of the basic prob-
ability assignment mb of b as follows:
X
mς[b] (x) = mb (x) + β[b] mb (A), (10.9)
A)x

where:
P
mb (B)
P
1− mb (x)
β[b] = P x∈Θ
 = P |B|>1 . (10.10)
x∈Θ plb (x) − mb (x) |B|>1 mb (B)|B|

Equation (10.9) ensures that mς[b] (x) is positive for each x ∈ Θ. A more
symmetrical-looking version of (10.9) can be obtained after realizing that
P
|B|=1 mb (B)
P = 1,
|B|=1 mb (B)|B|

so that we can write:


P P
|B|=1 mb (B) |B|>1 mb (B)
mς[b] (x) = b(x) P + [plb − b](x) P . (10.11)
|B|=1 mB (B)|B| |B|>1 mb (B)|B|

It is easy to prove that the line a(b, plb ) intersects the actual probability simplex
only for 2-additive belief functions (check the chapter’s Appendix as usual).
Theorem 44. The Bayesian normalised sum function ς[b] is a probability measure,
ς[b] ∈ P, if and only if b is 2-additive, i.e., mb (A) = 0 |A| > 2. In the latter case
plb is the reflection of b through P.
For 2-additive belief functions ς[b] is nothing but the mean probability function
b+plb
2 . In the general case, however, the reflection of b through P not only does not
coincide with plb , but it is not even a plausibility function [271].
312 10 The affine family of probability transforms

plb

ς[b]
p[b]
a(b,plb)
π[b]

P
b

a(P )
P'

Fig. 10.2. The geometry of the line a(b, plb ) and the relative locations of p[b], ς[b] and π[b]
for a frame of discernment of arbitrary size. Each belief function b and the related plausibil-
ity function plb lie on opposite sides of the hyperplane P 0 of all Bayesian normalised sum
functions, which divides the space RN −2 of all n.s.f.s into two halves. The line a(b, plb )
connecting them always intersects P 0 , but not necessarily a(P) (vertical line). This intersec-
tion ς[b] is naturally associated with a probability p[b] (in general distinct from the orthogonal
projection π[b] of b onto P), having the same components in the base {bx , x ∈ Θ} of a(P).
P is a simplex (a segment in the figure) in a(P): π[b] and p[b] are both “true” probabilities.

10.3 The intersection probability


Summarizing, although the dual line a(b, plb ) is always orthogonal to P, it does not
intersect the probabilistic subspace in general, while it does intersect the region of
Bayesian normalized sum P functions in a point which we denoted by ς[b] (10.7).
But of course, since x mς[b] (x) = 1, ς[b] is naturally associated with a
Bayesian belief function, assigning an equal amount of mass to each singleton and
0 to each A : |A| > 1.
Namely, we can define the probability measure
. X
p[b] = mς[b] (x)bx , (10.12)
x∈Θ
10.3 The intersection probability 313

where mς[b] (x) is given by Equation (10.9). Trivially p[b] is a probability measure,
since by definition mp[b] (A) = 0 for |A| > 1, mp[b] (x) = mς[b] (x) ≥ 0 ∀x ∈ Θ,
and by construction:
X X
mp[b] (x) = mς[b] (x) = 1
x∈Θ x∈Θ

We call p[b] the intersection probability associated with b.


The relative geometry of ς[b] and p[b] with respect to the regions of Bayesian belief
and normalised sum functions, respectively, is outlined in Figure 10.2.

10.3.1 Interpretations of the intersection probability

Non-Bayesianity flag A first interpretation of this probability transform follows


from noticing that:
P
1 − x∈Θ mb (x) 1 − kb̃
β[b] = P P = ,
x∈Θ plb (x) − x∈Θ mb (x) ˜ − kb̃
kpl b

where X X X
kb̃ = mb (x) kpl
˜ =
b
plb (x) = mb (A)|A|
x∈Θ x∈Θ A⊂Θ

are the total mass (belief) of singletons and the total plausibility of singletons, re-
spectively (equivalently, the normalization factors for relative belief b̃ and relative
˜ , respectively). Consequently the intersection probability p[b] can be
plausibility pl b
rewritten as:
plb (x) − mb (x)
p[b](x) = mb (x) + (1 − kb̃ ) . (10.13)
˜ − kb̃
kpl b

When b is Bayesian, plb (x) − mb (x) = 0 ∀x ∈ Θ. If b is not Bayesian, there exists


at least a singleton x such that plb (x) − mb (x) > 0.
The Bayesian belief function
P
. A)x mb (A) plb (x) − mb (x)
R[b](x) = P =P 
|A|>1 mb (A)|A| y∈Θ plb (y) − mb (y)

thus measures the relative contribution of each singleton x to the non-Bayesianity of


b. Equation (10.13) shows indeed that the non-Bayesian mass 1 − kb̃ of the original
belief function b is re-assigned by p[b] to each singleton according to its relative
contribution R[b](x) to the non-Bayesianity of b.

Intersection probability and epistemic transforms Clearly, from Equation (10.13)


the flag probability R[b] also relates the intersection probability p[b] to other two
classical Bayesian approximations, the relative plausibility pl˜ and belief b̃ of sin-
b
gletons (cfr. Chapter 4, Section 4.4.2). We just need to rewrite (10.13) as:

p[b] = kb̃ b̃ + (1 − kb̃ )R[b]. (10.14)


314 10 The affine family of probability transforms
P
Since kb̃ = x∈Θ mb (x) ≤ 1, Equation (10.14) implies that the intersection prob-
ability p[b] belongs to the segment linking the flag probability R[b] to the relative
belief of singletons b̃. Its convex coordinate on this segment is the total mass of
singletons kb̃ .
˜ can also be written in terms of b̃ and R[b]
The relative plausibility function pl b
as, by definition (4.16):
plb (x) − mb (x) plb (x) mb (x)
R[b](x) = = −
˜ − kb̃
kpl ˜ − kb̃
kpl ˜ − kb̃
kpl
b b b
kpl˜ k
˜
= plb (x) b
− b̃(x) b̃
,
˜ − kb̃
kpl ˜ − kb̃
kpl
b b

˜ (x) = plb (x)/k ˜ and b̃(x) = mb (x)/k . Therefore:


since pl b plb b̃
! !
˜ = k b̃ k b̃
pl b b̃ + 1 − R[b]. (10.15)
kpl˜
b
kpl˜
b

In conclusion, both the relative plausibility of singletons pl˜ and the intersec-
b
tion probability p[b] belong to the segment Cl(R[b], b̃) joining relative belief b̃ and
probability flag R[b] (see Figure 10.3). The convex coordinate of pl ˜ in Cl(R[b], b̃)
b
(Equation (10.15)) measures the ratio between total mass and plausibility of single-
tons, while that of b̃ measures
P the total mass of singletons kb̃ .
However, since kpl ˜
b
= A⊂Θ mb (A)|A| ≥ 1, we have that kb̃ /kpl ˜ ≤ kb̃ : hence
b
˜ (Figure 10.3 again).
p[b] is closer to R[b] than the relative plausibility function pl b

~ ~
plb b

p[b]
R[b]
P

Fig. 10.3. Location in the probability simplex P of intersection probability p[b] and relative
˜ with respect to the non-Bayesianity flag R[b]. They both lie on
plausibility of singletons pl b
˜ is closer to b̃ than p[b].
the segment joining R[b] and the relative belief of singletons b̃, but pl b

Obviously when kb̃ = 0 (the relative belief of singletons b̃ does not exists, for b
assigns no mass to singletons) the remaining probability approximations coincide:
˜ = R[b] by Equation (10.13).
p[b] = pl b

Meaning of the ratio β[b] and pignistic function To shed more light on p[b] and
get an alternative interpretation of the intersection probability it is useful to compare
p[b] as expressed in Equation (10.13) with the pignistic function:
10.3 The intersection probability 315

. X mb (A) X mb (A)
BetP [b](x) = = mb (x) + .
|A| |A|
A⊃x A⊃x,A6=x

We can notice that in BetP [b] the mass of each event A, |A| > 1 is considered
separately, and its mass mb (A)
P is equally shared among the elements of A. In p[b],
instead, it is the total mass |A|>1 mb (A) = 1 − kb̃ of non-singletons which is
considered, and this total mass is distributed proportionally to their non-Bayesian
contribution to each element of Θ.
How should β[b] be interpreted then? If we write p[b](x) as

p[b](x) = mb (x) + β[b](plb (x) − mb (x)) (10.16)

we can observe that a fraction measured by β[b] of its non-Bayesian contribution


plb (x) − mb (x) is uniformly assigned to each singleton. This leads to another par-
allelism between p[b] and BetP [b]. It suffices to note that, if |A| > 1,
P
|B|>1 mb (B) 1
β[bA ] = P =
|B|>1 m b (B)|B| |A|

so that both p[b](x) and BetP [b](x) assume the form


X
mb (x) + mb (A)βA ,
A⊃x,A6=x

where βA = const = β[b] for p[b], while βA = β[bA ] in case of the pignistic
function.
Under what conditions intersection probability and pignistic function coincide?
A sufficient condition can be easily given for a special class of belief functions.
Theorem 45. Intersection probability and pignistic function coincide for a given
belief function b whenever the focal elements of b have size 1 or k only.

Proof. The desired equality p[b] = BetP [b] is equivalent to:


X X mb (A)
mb (x) + mb (A)β[b] = mb (x) +
|A|
A)x A)x

which in turn reduces to:


X X mb (A)
mb (A)β[b] = .
|A|
A)x A)x

If ∃ k : mb (A) = 0 for |A| =


6 k, |A| > 1 then β[b] = 1/k and the equality is met.

In particular this is true when b is 2-additive.


316 10 The affine family of probability transforms

10.3.2 Example of intersection probability

Let us briefly discuss these two interpretations of p[b] in a simple example. Consider
a ternary frame Θ = {x, y, z}, and a belief function b with b.p.a.:

mb (x) = 0.1, mb (y) = 0, mb (z) = 0.2,


(10.17)
mb ({x, y}) = 0.3, mb ({x, z}) = 0.1, mb ({y, z}) = 0, mb (Θ) = 0.3.

The related basic plausibility assignment is, according to Equation (8.1):


X
µb (x) = (−1)|x|+1 mb (B)
B⊇{x}
= mb (x) + mb ({x, y}) + mb ({x, z}) + mb (Θ) = 0.8,

µb (y) = 0.6, µb (z) = 0.6, µb ({x, y}) = −0.6,

µb ({x, z}) = −0.4, µb ({y, z}) = −0.3, µb (Θ) = +0.3.

Θ Θ

x z x z
y y

x z
y

Fig. 10.4. Sign of non-zero masses assigned to events by the functions discussed in the exam-
ple. Top left: b.p.a. of the belief function (10.17), with 5 focal elements. Right: the associated
b.pl.a. assigns positive masses to all events of size 1 and 3, negative ones to all events of size
2. This is the case for the mass assignment associated with ς (10.9) too. Bottom: the intersec-
tion probability p[b] (10.12) retains, among the latter, only the masses assigned to singletons.

Figure 3-top depicts the subsets of Θ with non-zero b.p.a. (left) and b.pl.a. (right)
induced by the belief function (10.17): dashed lines indicate a negative mass. The
total mass (10.17) accords to singletons is kb̃ = 0.1 + 0 + 0.2 = 0.3 - therefore, the
line coordinate β[b] of the intersection ς[b] of the line a(b, plb ) with P 0 is equal to:
1 − kb̃ 0.7
β[b] = = .
mb ({x, y})|{x, y}| + mb ({x, z})|{x, z}| + mb (Θ)|Θ| 1.7
10.3 The intersection probability 317

By Equation (10.9) the mass assignment of ς[b] is:


0.7
mς[b] (x) = mb (x) + β[b](µb (x) − mb (x)) = 0.1 + 0.7 · = 0.388,
1.7
0.7 0.7
mς[b] (y) = 0 + 0.6 · = 0.247, mς[b] (z) = 0.2 + 0.4 · = 0.365,
1.7 1.7
0.7
mς[b] ({x, y}) = 0.3 − 0.9 · = −0.071,
1.7
mς[b] ({x, z}) = 0.1 − 0.5 · 0.7/1.7 = −0.106,
0.7 0.7
mς[b] ({y, z}) = 0 − 0.3 · = −0.123, mς[b] (Θ) = 0.3 + 0 · = 0.3.
1.7 1.7
We can verify that all singleton masses are indeed non negative and sum to one,
while the masses of the non-singletons events sum up to zero:
−0.071 − 0.106 − 0.123 + 0.3 = 0,
confirming that ς[b] is a Bayesian normalized sum function. Its mass assignment has
signs which are still described by Figure 3-top-right although, as mς[b] is a weighted
average of mb and µb , its mass values are closer to zero.
In order to compare it with the intersection probability we need to recall Equa-
tion (10.13): the non-Bayesian contributions of x, y, z are respectively
plb (x) − mb (x) = mb (Θ) + mb ({x, y}) + mb ({x, z}) = 0.7,
plb (y) − mb (y) = mb ({x, y}) + mb (Θ) = 0.6,
plb (z) − mb (z) = mb ({x, z}) + mb (Θ) = 0.4,
so that the non-Bayesian flag is R(x) = 0.7/1.7, R(y) = 0.6/1.7, R(z) = 0.4/1.7.
For each singleton s the intersection probability value results from adding to the
original b.p.a. mb (s) a share of the mass of the non-singletons events 1 − kb̃ = 0.7
proportional to the value of R(s) (see Figure 3-bottom):
p[b](x) = mb (x) + (1 − kb̃ )R(x) = 0.1 + 0.7 ∗ 0.7/1.7 = 0.388,
p[b](y) = mb (y) + (1 − kb̃ )R(y) = 0 + 0.7 ∗ 0.6/1.7 = 0.247,
p[b](z) = mb (z) + (1 − kb̃ )R(z) = 0.2 + 0.7 ∗ 0.4/1.7 = 0.365.
We can see that p[b] coincides with the restriction of ς[b] to singletons.
Equivalently, β[b] measures the share of plb (x) − mb (x) assigned to each singleton:
p[b](x) = mb (x) + β[b](plb (x) − mb (x)) = 0.1 + 0.7/1.7 ∗ 0.7,
p[b](y) = mb (y) + β[b](plb (y) − mb (y)) = 0 + 0.7/1.7 ∗ 0.6,
p[b](z) = mb (z) + β[b](plb (z) − mb (z)) = 0.2 + 0.7/1.7 ∗ 0.4.

10.3.3 Intersection probability and affine combination


We have seen that p[b] and BetP [b] are closely related probability transforms, linked
by the role of the quantity β[b]. It is natural to wonder whether p[b] exhibits a similar
behavior with respect to the convex closure operator (cfr. Chapter 4, Equation 4.15).
Indeed, although the situation is a bit more complex in this second case, p[b] turns
also out to be related to Cl(.) in a rather elegant way.
Let us introduce the notation β[bi ] = Ni /Di .
318 10 The affine family of probability transforms

Theorem 46. Given two arbitrary belief functions b1 , b2 defined on the same frame
of discernment, the intersection probability of their affine combination α1 b1 + α2 b2
is, for any α1 ∈ [0, 1], α2 = 1 − α1 :
 
p[α1 b1 + α2 b2 ] = α
\ 1 D1 α1 p[b1 ] + α2 T [b1 , b2 ] + α2 D2 α1 T [b1 , b2 ]) + α2 p[b2 ] ,
\
(10.18)
αi Di
where α [ D
i i = α1 D1 +α2 D2 , T [b ,
1 2b ] is the probability with values:
.
T [b1 , b2 ](x) = D̂1 p[b2 , b1 ] + D̂2 p[b1 , b2 ], (10.19)
. Di
with D̂i = D1 +D2 and:
. 
p[b2 , b1 ](x) = mb2 (x) + β[b1 ] plb2 (x) − mb2 (x),
. (10.20)
p[b1 , b2 ](x) = mb1 (x) + β[b2 ] plb1 (x) − mb1 (x) .
Geometrically, p[α1 b1 + α2 b2 ] can be constructed as in Figure 4 as a point of the
simplex Cl(T [b1 , b2 ], p[b1 ], p[b2 ]). The point α1 T [b1 , b2 ] + α2 p[b2 ] is the intersec-
tion of the segment Cl(T, p[b2 ]) with the line l2 passing through α1 p[b1 ] + α2 p[b2 ]
and parallel to Cl(T, p[b1 ]). Dually, α2 T [b1 , b2 ] + α1 p[b1 ] is the intersection of the
segment Cl(T, p[b1 ]) with the line l1 passing through α1 p[b1 ] + α2 p[b2 ] and parallel
to Cl(T, p[b2 ]). p[α1 b1 + α2 b2 ] is finally the point of the segment
Cl(α1 T + α2 p[b2 ], α2 T + α1 p[b1 ])
with convex coordinate α
\1 D1 (or equivalently α2 D2 ).
\

Location of T [b1 , b2 ] in the binary case As an example, let us consider the loca-
tion of T [b1 , b2 ] in the binary belief space B2 (Figure 10.6), where
mbi (Θ)
β[b1 ] = β[b2 ] = = 1/2
2mbi (Θ)
∀b1 , b2 ∈ B2 and p[b] always commutes with the convex closure operator.
Accordingly,
mb1 (Θ) h mb2 (Θ) i mb2 (Θ)
T [b1 , b2 ](x) = mb2 (x) + + ·
mb1 (Θ) + mb2 (Θ) 2 mb1 (Θ) + mb2 (Θ)
h
m (Θ)
i mb1 (Θ) mb2 (Θ)
· mb1 (x) + b12 = p[b2 ] + p[b1 ].
mb1 (Θ) + mb2 (Θ) mb1 (Θ) + mb2 (Θ)
Looking at Figure 10.6, simple trigonometric considerations show that the segment
Cl(p[bi ], T [b1 , b2 ]) has length √m2itan
(Θ)
φ
, where φ is the angle between the segments
Cl(bi , T ) and Cl(p[bi ], T ).
T [b1 , b2 ] is then the unique point of P such that the angles b1\ T p[b1 ] and b2\
T p[b2 ]
coincide, i.e., T is the intersection of P with the line passing through bi and the
reflection of bj through P.
As this reflection (in B2 ) is nothing but plbj :
T [b1 , b2 ] = Cl(b1 , plb2 ) ∩ P = Cl(b2 , plb1 ) ∩ P.
10.3 The intersection probability 319

Τ[b1,b2]
α2
α1T+α 2 p[b2]

^
α1D1 α1
p[b2]
α1
p[α1b1+α2b2]
^
α2D2 α1

α2T+α 1 p[b1]
α1 p[b1] + α2 p[b2]
α2
α2 l1
p[b1] l2

Fig. 10.5. Behavior of the intersection probability p[b] under affine combination. α2 T +
α1 p[b1 ] and α1 T + α2 p[b2 ] lie on inverted locations on the segments joining T [b1 , b2 ] and
p[b1 ], p[b2 ] respectively: αi p[bi ] + αj T is the intersection of the line Cl(T, p[bi ]) with the
parallel to Cl(T, p[bj ]) passing through α1 p[b1 ] + α2 p[b2 ]. The quantity p[α1 b1 + α2 b2 ] is
finally the point of the segment joining them with convex coordinate α[ i Di .

by=[0,1]'
1

0.9
plb
2

0.8
p[b2 ]
0.7

0.6
b2
0.5
m2(Θ) Τ[b1 ,b2 ]
0.4 φ
0.3 pl b1

0.2 p[b1 ]
0.1
b1
0 bx =[1,0]'
0 0.2 0.4 0.6 0.8 1
m1 (Θ)

Fig. 10.6. Location of the probability function T [b1 , b2 ] in the binary belief space B2 .

10.3.4 Intersection probability and convex closure


Although the intersection probability does not commute with affine combination
(Theorem 46), p[b] can still be assimilated to orthogonal projection and pignistic
320 10 The affine family of probability transforms

function. Theorem 47 states the conditions under which p[b] and convex closure
(Cl) commute.

Theorem 47. Intersection probability and convex closure commute iff

T [b1 , b2 ] = D̂1 p[b2 ] + D̂2 p[b1 ]

or, equivalently, either β[b1 ] = β[b2 ] or R[b1 ] = R[b2 ].

Geometrically, only when the two lines l1 , l2 in Figure 10.5 are parallel to a(p[b1 ], p[b2 ])
(i.e. T [b1 , b2 ] ∈ Cl(p[b1 ], p[b2 ]), compare above) the desired quantity p[α1 b1 +
α2 b2 ] belongs to Cl(p[b1 ], p[b2 ]) (i.e., it is also a convex combination of p[b1 ] and
p[b2 ]).
Theorem 47 reflects the two complementary interpretations of p[b] we gave in
terms of β[b] and R[b] (Equations (10.13) and (10.16)):

p[b] = mb (x) + (1 − kb̃ )R[b](x), p[b] = mb (x) + β[b](plb (x) − mb (x)).

If β[b1 ] = β[b2 ] both belief functions assign to each singleton the same share of
their non-Bayesian contribution. If R[b1 ] = R[b2 ] the non-Bayesian mass 1 − kb̃ is
distributed in the same way to the elements of Θ.
A sufficient condition for the commutativity of p[.] and Cl(.) can be obtained
via the following decomposition of β[b]:
P Pn P
|B|>1 mb (B) k=2 |B|=k mb (B) σ2 + · · · + σn
β[b] = P = Pn P =
|B|>1 m b (B)|B| k=2 k · |B|=k m b (B) 2σ 2 + · · · + nσn
(10.21)
. P
where σk = |B|=k mb (B).

Theorem 48. If the ratio between the total mass of focal elements of different car-
dinality is the same for all the belief functions involved, namely:

σ1l σ2l
= ∀l, m ≥ 2 s.t. σ1m , σ2m 6= 0 (10.22)
σ1m σ2m

then intersection probability (considered as an operator mapping belief functions to


probabilities) and convex combination commute.

10.4 Orthogonal projection


Although the intersection of the line a(b, plb ) with the region P 0 of Bayesian n.s.f.s
is not always in P, an orthogonal projection π[b] of b onto a(P) is obviously guaran-
teed to exist as a(P) is nothing but a linear subspace in the space of normalized sum
functions (such as b). An explicit calculation of π[b], however, requires a description
of the orthogonal complement of a(P) in RN −2 .
10.4 Orthogonal projection 321

10.4.1 Orthogonality condition


P
We seek a necessary and sufficient condition for an arbitrary vector v = A⊂Θ vA xA
of RN 5 (where {xA }A∈2Θ is the usual orthonormal reference frame there) to be or-
thogonal to the probabilistic subspace a(P).
The scalar product between v and the generators by − bx of a(P ) is:
X  X
hv, by − bx i = vA xA , by − bx = vA [by − bx ](A)
A⊂Θ A⊂Θ

which, recalling Equation (10.4), becomes:


X X
hv, by − bx i = vA − vA .
A⊃y,A6⊃x A⊃x,A6⊃y

The orthogonal complement a(P)⊥ of a(P) can then be expressed as


 
 X X 
v(P)⊥ = v : vA = vA ∀y 6= x . (10.23)
 
A⊃y,A6⊃x A⊃x,A6⊃y

Lemma 10. A belief function b belongs to the orthogonal complement (10.23) of


a(P) iff:
X X
mb (B)21−|B| = mb (B)21−|B| ∀y 6= x. (10.24)
B⊃y,B6⊃x B⊃x,B6⊃y

By Lemma 10 we can prove that:

Theorem 49. The orthogonal projection π[b] of b onto a(P) can be expressed in
terms of the basic probability assignment mb of b in two equivalent forms:

1 − |A|21−|A|
X X  
1−|A|
π[b](x) = mb (A)2 + mb (A) (10.25)
n
A⊃x A⊂Θ

1 + |Ac |21−|A| 1 − |A|21−|A|


X   X  
π[b](x) = mb (A) + mb (A) .
n n
A⊃x A6⊃x
(10.26)

Equation (10.26) shows that π[b] is indeed a probability, since both 1+|Ac |21−|A| ≥
0 and 1 − |A|21−|A| ≥ 0 ∀|A| = 1, ..., n. This is not at all trivial, as π[b] is the
projection of b onto the affine space a(P), and could have in principle assigned
negative masses to one or more singletons.
This make the orthogonal projection a valid probability transform.
5
The proof is valid for A = Θ, ∅ too, see Section 10.5.
322 10 The affine family of probability transforms

10.4.2 Orthogonality flag

Theorem 49 does not provide any clear intuition about the meaning of π[b] in terms
of degrees of belief. In fact, if we process Equation (10.26) we can reduce it to a
new Bayesian b.f. strictly related to the pignistic function.
Theorem 50. The orthogonal projection of b onto P can be decomposed as:

π[b] = P(1 − kO [b]) + kO [b]O[b],

where P is the uniform probability and:


P 1−|A|
P mb (A)
Ō[b](x) A⊃x mb (A)2 A⊃x 2|A|
O[b](x) = =P 1−|A|
= mb (A)|A|
(10.27)
kO [b] A⊂Θ mb (A)|A|2
P
|A|
A⊂Θ 2

is a Bayesian belief function.


As 0 ≤ |A|21−|A| ≤ 1 for all A ⊂ Θ, kO [b] assumes values in the interval [0, 1].
By Theorem 50 then the orthogonal projection is always located on the line segment
Cl(P, O[b]) joining the uniform, non-informative probability on Θ to the Bayesian
belief function O[b].
The interpretation of O[b] becomes clear when noticing that condition (10.24)
(under which a b.f. b is orthogonal to a(P)) can be rewritten as:
X X
mb (B)21−|B| = mb (B)21−|B|
B⊃y B⊃x

which is in turn equivalent to O[b](x) = const = P for all singletons x ∈ Θ. There-


fore π[b] = P if and only if b⊥a(P), and O − P measures the non-orthogonality of
b with respect to P.
The Bayesian b.f. O[b] deserves then the name of orthogonality flag.

10.4.3 Two mass redistribution processes

A compelling link can be drawn between orthogonal projection and pignistic func-
tion via the orthogonality flag O[b].
Let us introduce the following two belief functions associated with b:

. 1 X mb (A) . 1 X mb (A)
b|| = bA , b2|| = bA ,
k|| |A|
A⊂Θ
k2||
A⊂Θ
2|A|

where k|| and k2|| are the normalization factors needed to make them admissible.

Theorem 51. O[b] is the relative plausibility of singletons of b2|| ; BetP [b] is the
relative plausibility of singletons of b|| .
10.4 Orthogonal projection 323

z m'(x) = m'(y) = m'(z) = 1/3 m(A)


x
y

m'( ) = m'(x) = m'(y) = m'(z) =


z = m'({x,y}) = m'({x,z}) =
x
y m'({y,z}) = m'(A) = 1/8 m(A)

Fig. 10.7. Redistribution processes associated with pignistic transformation and orthogonal
projection. In the pignistic transformation (top) the mass of each focal element A is dis-
tributed among its elements. In the orthogonal projection (bottom) instead (through the or-
thogonality flag), the mass of each f.e. A is divided among all its subsets B ⊂ A. In both
cases, the related relative plausibility of singletons yields a Bayesian belief function.

The two functions b|| and b2|| represent two different processes acting on b (see Fig-
ure 10.7). The first one equally redistributes the mass of each focal element among
its singletons (yielding directly the Bayesian belief function BetP [b]). The second
one equally redistributes the b.p.a. of each focal element A to its subsets B ⊂ A
(∅, A included). In this second case we get an unnormalized [1238] b.f. bU :
X mb (B)
mbU (A) = ,
B⊃A
2|B|

whose relative belief of singletons b˜U is the orthogonality flag O[b].

Example Let us consider again as an example the belief function b on the ternary
frame Θ = {x, y, z} considered in Section 10.3.2:

mb (x) = 0.1, mb (y) = 0, mb (z) = 0.2,


mb ({x, y}) = 0.3, mb ({x, z}) = 0.1, mb ({y, z}) = 0, mb (Θ) = 0.3.

To compute the orthogonality flag O[b] we need to apply the redistribution process
of Figure 10.7-bottom to each focal element of b. In this case their masses are di-
vided among their subsets as follows:
324 10 The affine family of probability transforms

m(x) = 0.1 7→ m0 (x) = m0 (∅) = 0.1/2 = 0.05

m(z) = 0.2 7→ m0 (z) = m0 (∅) = 0.2/2 = 0.1

m({x, y}) = 0.3 7→ m0 ({x, y}) = m0 (x) = m0 (y) = m0 (∅) = 0.3/4 = 0.075

m({x, z}) = 0.1 7→ m0 ({x, z}) = m0 (x) = m0 (z) = m0 (∅) = 0.1/4 = 0.025

m(Θ) = 0.3 7→ m0 (Θ) = m0 ({x, y}) = m0 ({x, z}) = m0 ({y, z}) =


= m0 (x) = m0 (y) = m0 (z) = m0 (∅) = 0.3/8 = 0.0375.

By summing up all the contributions related to singletons we get:

mbU (x) = 0.05 + 0.075 + 0.025 + 0.0375 = 0.1875,


mbU (y) = 0.075 + 0.0375 = 0.1125,
mbU (z) = 0.1 + 0.025 + 0.0375 = 0.1625,

whose sum is the normalization factor kO [b] = mbU (x) + mbU (y) + mbU (z) =
0.4625. After normalisation we get O[b] = [0.405, 0.243, 0.351]0 . The orthogonal
projection π[b] is finally the convex combination of O[b] and P = [1/3, 1/3, 1/3]0
with coefficient kO [b]:

π[b] = P(1 − kO [b]) + kO [b]O[b] = [1/3, 1/3, 1/3]0 (1 − 0.4625)+


+0.4625 [0.405, 0.243, 0.351]0 = [0.366, 0.291, 0.342]0 .

10.4.4 Orthogonal projection and affine combination

As strong additional evidence of their close relationship, orthogonal projection and


pignistic function both commute with affine combination.

Theorem 52. Orthogonal projection and affine combination commute. Namely, if


α1 + α2 = 1 then:

π[α1 b1 + α2 b2 ] = α1 π[b1 ] + α2 π[b2 ].

This property can be used to find an alternative expression of the orthogonal pro-
jection as a convex combination of the pignistic functions associated with all the
categorical belief functions.
Lemma 11. The orthogonal projection of a categorical belief function bA is:

π[bA ] = (1 − |A|21−|A| )P + |A|21−|A| P A ,


1
P
where P A = |A| x∈A bx is the center of mass of all the probability measures with
support in A.
10.4 Orthogonal projection 325

Proof. By Equation (10.27) kO [bA ] = |A|21−|A| , so that


 1−|A|  1
2 x∈A x∈A
Ō[bA ](x) = ⇒ O[bA ](x) = |A|
0 x 6∈ A 0 x∈6 A
1
P
i.e. O[bA ] = |A| x∈A bx = P A .

Theorem 53. The orthogonal projection can be expressed as a convex combination


of all non-informative probabilities with support on a single event A as
.
 X  X
π[b] = P 1 − αA + αA P A , αA = mb (A)|A|21−|A| . (10.28)
A6=Θ A6=Θ

10.4.5 Orthogonal projection and pignistic function


As P A = BetP [bA ] we can appreciate that:
X
BetP [b] = mb (A)BetP [bA ],
A⊂Θ  
X X
π[b] = αA BetP [bA ] + 1 − αA BetP [bΘ ], αA = mb (A)kO [bA ]
A6=Θ A6=Θ
(10.29)
both orthogonal projection and pignistic function are convex combinations of the
collection of categorical pignistic functions. However, as kO [bA ] = |A|21−|A| < 1
for |A| > 2, the orthogonal projection turns out to be closer (when compared to the
pignistic probability) to the vertices associated with events of lower cardinality (see
Figure 10.8).
Let us consider as an example the usual ternary frame Θ3 = {x, y, x}, and a
belief function defined there with focal elements:
mb (x) = 1/3, mb ({x, z}) = 1/3, mb (Θ3 ) = 1/3. (10.30)
According to Equation (10.28) we have:
π[b] = 1/3P {x} + 1/3P {x,z} + (1 − 1/3 − 1/3)P
1 1 bx + bz 1 bx + by + bz 11 1 5
= bx + + = bx + by + bz ,
3 3 2 3 3 18 9 18
and π[b] is the barycenter of the simplex Cl(P {x} , P {x,z} , P) (see Figure 10.9).
On the other hand:
BetP [b](x) = mb1(x) + mb (x,z)
2 + mb (Θ3 )
3 = 11
18 ,
BetP [b](y) = 19 ,
BetP [b](z) = 16 + 19 = 18
5
,
i.e., pignistic function and orthogonal projection coincide: BetP [b] = π[b].
Indeed this is true for each belief function b ∈ B3 defined on a ternary frame,
P by Equation
since P (10.29) when |Θ| = 3 αA = mb (A) for |A| ≤ 2, and 1 −
A Aα = 1 − A6=Θ mb (A) = mb (Θ).
326 10 The affine family of probability transforms

|A|>2

P[bA] = Cl(bx ,x A)∋

π[b]
_ BetP[b]
PA

|A|<3
_
P[bΘ ] = P PΘ

Fig. 10.8. Orthogonal projection π[b] and pignistic function BetP [b] both lie in the simplex
whose vertices are the categorical pignistic functions, i.e., the uniform probabilities with sup-
port on a single event A. However, as the convex coordinates of π[b] are weighted by a factor
kO [bA ] = |A|21−|A| , the orthogonal projection is relatively closer to vertices related to lower
size events.

_
by = P{y}

_ _
P{x,y} P{y,z}
_

BetP[b] = π[b]
_ _
bx = P{x} _ bz = P{z}
P{x,z}

Fig. 10.9. Orthogonal projection and pignistic function for the belief function (10.30) on the
ternary frame Θ3 = {x, y, z}.

10.5 The case of unnormalized belief functions


The above results have been obtained for ‘classical’ belief functions, where the mass
assigned to the empty set is 0: b(∅) = mb (∅) = 0. However, as discussed in Chap-
10.5 The case of unnormalized belief functions 327

ter 4, there are situations (‘open world’ scenarios) in which it makes sense to work
with unnormalized belief functions (u.b.f.) [1238], namely belief functions admit-
ting non-zero support mb (∅) 6= 0 for the empty set [1224]. The mass mb (∅) of
the empty set is an indicator of the amount of internal conflict carried by a belief
function b, but can also be interpreted as the chance that the existing frame of dis-
cernment does not exhaust all the possible outcomes of the problem.
Unnormalized b.f.s are naturally associated with vectors with N = 2|Θ| coordi-
nates. A coordinate frame of basis u.b.f.s can be defined as follows:
{bA ∈ RN , ∅ ⊆ A ⊆ Θ},
.
this time including a vector b∅ = [1 0 · · · 0]0 . Note also that in this case bΘ =
[0 · · · 0 1]0 is not the null vector.
It is natural to wonder whether the above definitions and properties of p[b] and
π[b] hold their validity. Let us consider again the binary case. We now have to use
four coordinates, associated with all the subsets of Θ: ∅, {x}, {y}, and Θ itself.
Remember that for unnormalised belief functions:
X
b(A) = mb (B) A 6= ∅,
∅(B⊆A

i.e., the contribution of the empty set is not considered when computing the belief
value of an event A 6= ∅ 6 . The four-dimensional vectors corresponding to basis
belief and plausibility functions, respectively, are therefore:
b∅ = [1, 0, 0, 0]0 , pl∅ = [0, 0, 0, 0]0 ,
bx = [0, 1, 0, 1]0 , plx = [0, 1, 0, 1]0 = bx ,
by = [0, 0, 1, 1]0 , ply = [0, 0, 1, 1]0 = by ,
bΘ = [0, 0, 0, 1]0 , plΘ = [0, 1, 1, 1]0 .
A striking difference with the ‘classical’ case is that b(Θ) = 1 − mb (∅) = plb (Θ)
which implies that both belief and plausibility spaces are not in general subsets of
the section {v ∈ RN : vΘ = 1} of RN . In other words, u.b.f.s and u.pl.f.s are not
normalized sum functions (n.s.f.s, Section 6.3.3).
As a consequence, the line a(b, plb ) is not guaranteed to intersect the affine space
P 0 of the Bayesian n.s.f.s.
Consider for instance the line connecting b∅ and pl∅ in the binary case:
α b∅ + (1 − α) pl∅ = α [1, 0, 0, 0]0 , α ∈ R.

As P 0 = [a, b, (1 − b), −a]0 , a, b ∈ R there clearly is no value α ∈ R such that




α · [1, 0, 0, 0]0 ∈ P 0 .
Simple calculations show that in fact a(b, plb ) ∩ P 0 6= ∅ iff b(∅) = 0 (i.e. b is
‘classical’) or (trivially) b ∈ P. This is true in the general case.
6
In the unnormalized case the notation b is usually reserved for implicability functions,
while belief functions are denoted by Bel [1223]. In this book however, as the notation
Bel would be impractical when used for vectors, we denote both belief measures and their
vectors by b.
328 10 The affine family of probability transforms

Proposition 40. The intersection probability is well defined for classical belief
functions only.
It is interesting to note that, however, the orthogonality results of Section 10.2.1 are
still valid since Lemma 9 does not involve the empty set, while the proof of Theorem
42 is valid for the components A = ∅, Θ too (as by − bx (A) = 0 for A = ∅, Θ).
Therefore:
Proposition 41. The dual line a(b, plb ) is orthogonal to P for each unnormalised
belief function b, although ς[b] = a(b, plb ) ∩ P 0 exists if and only if b is a classical
belief function.
Analogously, the orthogonality condition (10.24) is not affected by the mass of the
empty set. The orthogonal projection π[b] of a u.b.f. b is then well defined (check
Theorem 49’s proof), and it is still given by Equations (10.25),(10.26), with the
caveat that the summations on the right hand side include ∅ as well:

1 − |A|21−|A|
X X  
π[b](x) = mb (A)21−|A| + mb (A)
n
A⊃x ∅⊆A⊂Θ
c 1−|A|
1 − |A|21−|A|
   
X 1 + |A |2 X
π[b](x) = mb (A) + mb (A) .
n n
A⊃x ∅⊆A6⊃x

10.6 Comparisons within the affine family


In virtue of its neat relation with affine combination, the intersection probability can
be considered a member of a family of Bayesian transforms which also includes pig-
nistic function and orthogonal projection: we call it the ‘affine’ family of probability
transforms.
To complete our analysis we seek to identify conditions under which intersection
probability, orthogonal projections and pignistic function coincide. As a matter of
fact, preliminary sufficient conditions have been already devised [267].
Proposition 42. Intersection probability and orthogonal projection coincide if b is
2-additive, i.e. mb (A) = 0 for all A : |A| > 2.
A similar sufficient condition for the pair p[b], BetP [b] can be found by resorting to
the following decomposition of β[b]:
P Pn P
|B|>1 mb (B) k=2 |B|=k mb (B) σ2 + · · · + σn
β[b] = P = Pn P = ,
|B|>1 mb (B)|B| k=2 (k |B|=k mb (B)) 2 σ2 + · · · + n σn
(10.31)
. P
where as usual σ k = |B|=k mb (B).
Proposition 43. Intersection probability and pignistic function coincide if ∃ k ∈
[2, ..., n] such that σ i = 0 ∀ i 6= k, i.e. the focal elements of b have size 1 or k only.
10.6 Comparisons within the affine family 329

This is the case for binary frames, in which all belief functions meet the conditions
of both Proposition 42 and Proposition 43. As a result, p[b] = BetP [b] = π[b] for
all the b.f.s defined on Θ = {x, y} (see Figure 8.1 again).
More stringent conditions can however be formulated in terms of equal distribu-
tion of masses among focal elements.
Theorem 54. If a belief function b is such that its mass is equally distributed among
focal elements of the same size, namely ∀k = 2, ..., n:

mb (A) = const ∀A : |A| = k, (10.32)

then its pignistic and intersection probabilities coincide: BetP [b] = p[b].
Condition (10.32) is sufficient to guarantee the equality of intersection probabil-
ity and orthogonal projection too.
Theorem 55. If a belief function b meets condition (10.32) (i.e., its mass is equally
distributed among focal elements of the same size) then the related orthogonal pro-
jection and intersection probability coincide.
In the special case of a ternary frame π[b] = BetP [b] [267], so that checking
whether p[b] = BetP [b] is equivalent to check the analogous condition for the
pignistic function. One can prove that [267]:

Proposition 44. For belief functions b defined on a ternary frame, the Lp distance
kp[b] − BetP [b]kp between intersection probability and pignistic function in the
probability simplex has three maxima, corresponding to the three b.f.s with basic
probability assignment:
√ √ 0
mb1 = [0, 0, 0, 3 − 6, √ 0, 0, √6 − 2]0 ,
mb2 = [0, 0, 0, 0, 3 − 6, √0, √6 − 2] ,
mb3 = [0, 0, 0, 0, 0, 3 − 6, 6 − 2]0

regardless the norm p = 1, 2, ∞ chosen.

Proposition 44 opens the way to a more complete quantitative analysis of the differ-
ences between the intersection probability and the other Bayesian transforms of the
same family.

Chapter appendix: proofs

Proof of Theorem 42

Having denoted as usual by xA the A-th axis of the orthonormal reference frame
{xA : ∅ ( A ( Θ} in RN −2 (see Chapter 6, Section 6.1), we can write the
difference b − plb as:
330 10 The affine family of probability transforms
X
plb − b = [plb (A) − b(A)]xA ,
∅(A(Θ

where:
[plb − b](Ac ) = plb (Ac ) − b(Ac ) = 1 − b(A) − b(Ac )
(10.33)
= 1 − b(Ac ) − b(A) = plb (A) − b(A) = [plb − b](A).

The scalar product h·, ·i between the vector plb − b and any arbitrary basis vector
by − bx of a(P) is therefore:
X
hplb − b, by − bx i = [plb − b](A) · [by − bx ](A),
∅(A(Θ

which, by Equation (10.33), becomes:


X n o
[plb − b](A) [by − bx ](A) + [by − bx ](Ac ) .
|A|≤b|Θ|/2c,A6=∅

By Lemma 9 all the addenda in the above expression are nil.

Proof of Theorem 43
P
The numerator of Equation (10.6) is trivially |B|>1 m(B). On the other hand:
X X X
1 − b(xc ) − b(x) = mb (B) − mb (B) − mb (x) = mb (B),
B⊂Θ B⊂xc B⊃x,B6=x

so that the denominator of β[b] becomes:


X X
[plb (y) − b(y)] = (1 − b(y c ) − b(y))
y∈Θ y∈Θ
X X X
= mb (B) = mb (B)|B|,
y∈Θ B⊃y,B6=y |B|>1

yielding (10.10). Equation (10.9) comes directly from (10.5) when we recall that
b(x) = mb (x), ς(x) = mς (x) ∀x ∈ Θ.

Proof of Theorem 44

By definition (10.7) ς[b] reads in terms of the reference frame {bA , A ⊂ Θ} as:
X X X 
mb (A)bA + β[b] µb (A)bA − mb (A)bA =
A⊂Θ X  A⊂Θ A⊂Θ

= bA mb (A) + β[b](µb (A) − mb (A))
A⊂Θ
10.6 Comparisons within the affine family 331

since µb (.) is the Moebius inverse of plb (.). For ς[b] to be a Bayesian belief function,
accordingly, all the components related to non-singleton subsets need to be zero,
mb (A) + β[b](µb (A) − mb (A)) = 0 ∀A : |A| > 1.
This condition reduces to (after recalling expression (10.10) of β[b]):
X X
µb (A) mb (B) + mb (A) mb (B)(|B| − 1) = 0 ∀A : |A| > 1. (10.34)
|B|>1 |B|>1

But as we can write


X X X
mb (B)(|B| − 1) = mb (B) + mb (B)(|B| − 2)
|B|>1 |B|>1 |B|>2

expression (10.34) reads as:


X X
[µb (A) + mb (A)] mb (B) + mb (A) mb (B)(|B| − 2) = 0
|B|>1 |B|>2

or, equivalently:
[mb (A) + µb (A)]M1 [b] + mb (A)M2 [b] = 0 ∀A : |A| > 1, (10.35)
. X . X
after defining: M1 [b] = mb (B), M2 [b] = mb (B)(|B| − 2). Clearly:
|B|>1 |B|>2

M1 [b] = 0 ⇔ mb (B) = 0 ∀B : |B| > 1 ⇔ b ∈ P


M2 [b] = 0 ⇔ mb (B) = 0 ∀B : |B| > 2,
as all the terms inside the summations are non-negative by definition of basic prob-
ability assignment.
We can distinguish three cases: (1) M1 = 0 = M2 (b ∈ P); (2) M1 6= 0 but
M2 = 0; and finally (3) M1 6= 0 6= M2 . If (1) holds then b is a probability (trivially).
If (3) holds Equation (10.35) implies mb (A) = µb (A) = 0, |A| > 1 i.e. b ∈ P,
which is a contradiction.
The only non-trivial case is then (2) M2 = 0. There condition (10.35) becomes:
M1 [b] [mb (A) + µb (A)] = 0, ∀A : |A| > 1.
For all |A| > 2 we have that mb (A) = µb (A) = 0 (since M2 = 0) and the constraint
is met. If |A| = 2, instead:
X
µb (A) = (−1)|A|+1 mb (B) = (−1)2+1 mb (A) = −mb (A)
B⊃A

(since mb (B) = 0 ∀B ⊃ A, |B| > 2) so that µb (A)+mb (A) = 0 and the constraint
is again met.
Finally, as the coordinate β[b] of ς[b] on the line a(b, plb ) can be rewritten as a
function of M1 [b] and M2 [b] as follows:
M1 [b]
β[b] = , (10.36)
M2 [b] + 2M1 [b]
b+plb
if M2 = 0 then β[b] = 1/2 and ς[b] = 2 .
332 10 The affine family of probability transforms

Proof of Theorem 46

By definition the quantity p[α1 b1 + α2 b2 ](x) can be written as:


X
p[α1 b1 + α2 b2 ](x) = mα1 b1 +α2 b2 (x) + β[α1 b1 + α2 b2 ] mα1 b1 +α2 b2 (A),
A)x
(10.37)
where β[α1 b1 + α2 b2 ] =
P P P
|A|>1 mα1 b1 +α2 b2 (A) α1 |A|>1 mb1 (A) + α2 |A|>1 mb2 (A)
P = P P
|A|>1 mα1 b1 +α2 b2 (A)|A| α1 |A|>1 mb1 (A)|A| + α2 |A|>1 mb2 (A)|A|
α1 N1 + α2 N2 α1 D1 β[b1 ] + α2 D2 β[b2 ] \
= = = α1 D1 β[b1 ] + α
\ 2 D2 β[b2 ]
α1 D1 + α2 D2 α1 D1 + α2 D2
once we introduce the notation β[bi ] = Ni /Di .
Replacing this decomposition for β[α1 b1 + α2 b2 ] into Equation (10.37) yields:
 X
α1 mb1 (x) + α2 mb2 (x) + (α\1 D1 β[b1 ] + α2 D2 β[b2 ]) α1
\ mb1 (A) + α2 ·
A)x
X  1 n
· mb2 (A) = (α1 D1 + α2 D2 )(α1 mb1 (x) + α2 mb2 (x))+
α1 D1 + α2 D2
A)x  X o
X
+(α1 D1 β[b1 ] + α2 D2 β[b2 ]) α1 mb1 (A) + α2 mb2 (A) =
A)x
α1 D1 h A)xX
= α1 mb1 (x) + α2 mb2 (x) + β[b1 ] α1 mb1 (A)+
α1 D1 + α2 D2
A)x
X i α2 D2 h
+α2 mb2 (A) + α1 mb1 (x) + α2 mb2 (x) + β[b2 ]·
α1 D1 + α2 D2
A)x
 X X i α1 D1
· α1 mb1 (A) + α2 mb2 (A) = ·
α1 D1 + α2 D2
h  A)x A)x
  i
X X
· α1 mb1 (x) + β[b1 ] mb1 (A) + α2 mb2 (x) + β[b1 ] mb2 (A) +
A)x A)x
α2 D2 h  X 
+ α1 mb1 (x) + β[b2 ] mb1 (A) +
α1 D1 + α2 D2
A)x
 X i α12 D1
+α2 mb2 (x) + β[b2 ] mb2 (A) = p[b1 ]+
α1 D1 + α2 D2
A)x
α22 D2 α1 α2 h i
+ p[b2 ] + D1 p[b2 , b1 ](x) + D2 p[b1 , b2 ](x) .
α1 D1 + α2 D2 α1 D1 + α2 D2
(10.38)
after recalling Equation (10.20).
We can further notice that the function
.
F (x) = D1 p[b2 , b1 ](x) + D2 p[b1 , b2 ](x)
P
is such that x∈Θ F (x) =
10.6 Comparisons within the affine family 333
X 
= D1 mb2 (x) + N1 (plb2 − mb2 (x)) + D2 mb1 (x) + N2 (plb1 − mb1 (x))
x∈Θ
= D1 (1 − N2 ) + N1 D2 + D2 (1 − N1 ) + N2 D1 = D1 + D2

(making use of Equation (10.10)). Thus, T [b1 , b2 ](x) = F (x)/(D1 + D2 ) is a


probability (as T [b1 , b2 ](x) is always non negative), expressed by Equation (10.19).
By Equation (10.38) the quantity p[α1 b1 + α2 b2 ](x) can be expressed as:
1  2
α D1 p[b1 ](x) + α22 D2 p[b2 ](x) + α1 α2 (D1 + D2 )T [b1 , b2 ](x) ,

α1 D1 + α2 D2 1
i.e., Equation (10.18).

Proof of Theorem 47

By Equation (10.18) we have that p[α1 b1 + α2 b2 ] − α1 p[b1 ] − α2 p[b2 ] =

= α1 D1 α1 p[b1 ] + α1 D1 α2 T + α2 D2 α1 T + α2 D2 α2 p[b2 ] − α1 p[b1 ] − α2 p[b2 ]


\ \ \ \
= α1 p[b1 ](α1 D1 − 1) + α1 D1 α2 T + α2 D2 α1 T + α2 p[b2 ](α2 D2 − 1)
\ \ \ \
= −α1 p[b1 ]α2 D2 + α1 D1 α2 T + α2 D2 α1 T − α2 p[b2 ]α1 D1
\ \ \ \
= α1 D1 (α2 T − α2 p[b2 ]) + α2 D2 (α1 T − α1 p[b1 ])
\ \
α1 α2 −
= α D +α D [D1 (T p[b2 ]) + D2 (T − p[b1 ])].
1 1 2 2

This is nil iff

T [b1 , b2 ](D1 + D2 ) = p[b1 ]D2 + p[b2 ]D1 ≡ T [b1 , b2 ] = D̂1 p[b2 ] + D̂2 p[b1 ],

as α1 Dα11+α
α2
2 D2
is always non-zero in non-trivial cases. This is equivalent to (after
replacing the expressions for p[b] (10.16) and T [b1 , b2 ] (10.19))

D1 (plb2 − mb2 (x))(β[b2 ] − β[b1 ]) + D2 (plb1 − mb1 (x))(β[b1 ] − β[b2 ]) = 0,

in turn equivalent to:


h i
(β[b2 ] − β[b1 ]) D1 (plb2 (x) − mb2 (x)) − D2 (plb1 (x) − mb1 (x)) = 0.

Obviously this is true iff β[b1 ] = β[b2 ] or the second factor is zero, i.e.

plb2 (x) − mb2 (x) plb (x) − mb1 (x)


D1 D2 − D1 D2 1 =
D2 D1
= D1 D2 (R[b2 ](x) − R[b1 ](x)) = 0

for all x ∈ Θ, i.e. R[b1 ] = R[b2 ].


334 10 The affine family of probability transforms

Proof of Theorem 48

By Equation (10.21) the equality β[b1 ] = β[b2 ] is equivalent to:

(2σ22 + ... + nσ2n )(σ12 + ... + σ1n ) = (2σ12 + ... + nσ1n )(σ22 + ... + σ2n ).

Let us assume that there exists a cardinality k such that σ1k 6= 0 6= σ2k . We can then
divide the two sides by σ1k and σ2k , obtaining
 σ2 σ n  σ12 σn 
2 2k + .. + k + ... + n 2k k
+ .. + 1 + ... + 1k =
σ σ2 σ1 σ1
 2σ 2 σ n  2
σ σ2n 
= 2 1k + .. + k + ... + n k11 2
+ .. + 1 + ... + .
σ1 σ1 σ2k σ2k

Therefore, if σ1j /σ1k = σ2j /σ2k ∀j 6= k the condition β[b1 ] = β[b2 ] is met. But this
is equivalent to (10.22).

Proof of Lemma 10

When the vector v in (10.23) is a belief function (vA = b(A)) we have that:
X X X X
b(A) = mb (B) = mb (B)2n−1−|B∪{y}| ,
A⊃y,A6⊃x A⊃y,A6⊃x B⊂A B⊂{x}c

since 2n−1−|B∪{y}| is the number of subsets A of {x}c containing both B and y.


The orthogonality condition then becomes:
X X
mb (B)2n−1−|B∪{y}| = mb (B)2n−1−|B∪{x}| ∀y 6= x.
B⊂{x}c B⊂{y}c

Now, sets B ⊂ {x, y}c appear in both summations, with the same coefficient (since
|B ∪ {x}| = |B ∪ {y}| = |B| + 1).
After simplifying the common factor 2n−2 we get (10.24).

Proof of Theorem 49

Finding the orthogonal projection π[b] of b onto a(P) is equivalent to imposing the
condition hπ[b] − b, by − bx i = 0 ∀y 6= x. Replacing the masses of π − b

π(x) − mb (x), x ∈ Θ
−mb (A), |A| > 1

into Equation (10.24) yields, after extracting the singletons x from the summation,
the following system of equations:
10.6 Comparisons within the affine family 335
X
mb (A)21−|A| + mb (y)+


 π(y) = π(x) +



 A⊃y,A6⊃x,|A|>1
X
mb (A)21−|A| ∀y 6= x

−mb (x) − (10.39)

 A⊃x,A6⊃y,|A|>1
 X
π(y) = 1.




y∈Θ

By replacing the first n − 1 equations of (10.39) into the normalization constraint


we get:
h X
mb (A)21−|A| +
P
π(x) + y6=x π(x) + mb (y) − mb (x) +
X i ⊃x,|A|>1
A⊃y,A6

− mb (A)21−|A| = 1,
A⊃x,A6⊃y,|A|>1

which is equivalent to:


X X X
nπ(x) = 1 + (n − 1)mb (x) − mb (y) + mb (A)21−|A|
y6=x y6=x A⊃x,A6⊃y,|A|>1
X X
1−|A|
− mb (A)2 .
y6=x A⊃y,A6⊃x,|A|>1

As for the last two addenda, we can first note that:


X X X
mb (A)21−|A| = mb (A)21−|A| |A|,
y6=x A⊃y,A6⊃x,|A|>1 A6⊃x,|A|>1

as all the events A not containing x do contain some y 6= x, and they are counted |A|
times (i.e. once for each element they contain). As for the last addendum, instead:
X X X
mb (A)21−|A| = mb (A)21−|A| (n − |A|)
y6=x A)x,A6⊃y A⊃x,1<|A|<n
X
= mb (A)21−|A| (n − |A|)
A)x

for n − |A| = 0 when A = Θ. Hence, π(x) is equal to


1h X X
n mb (x) + 1 − mb (y) + n mb (A)21−|A|
n
y∈Θ A)x i
X X
1−|A|
− mb (A)2 |A| − mb (A)21−|A| |A| .
A)x A6⊃x,|A|>1

We then just need to note that:


X X
− mb (y) = − mb (A)|A|21−|A| ,
y∈Θ |A|=1
336 10 The affine family of probability transforms

so that the orthogonal projection can be finally expressed as:


1h X X i
π(x) = n mb (x) + n mb (A)21−|A| + 1 − mb (A)|A|21−|A|
n
A)x A⊂Θ
X X  1 − |A|21−|A| 
1−|A|
= mb (x) + mb (A)2 + mb (A) ,
n
A)x A⊂Θ

namely Equation (10.25). Since


1 |A| 1−|A| 1 + 21−|A| (n − |A|) 1 + 21−|A| |Ac |
21−|A| + − 2 = =
n n n n
the second form (10.26) follows.

Proof of Theorem 50

By Equation (10.26) we can write


1 X X 
π[b](x) = Ō[b](x) + mb (A) − mb (A)|A|21−|A|
n
A⊂Θ A⊂Θ
1 
= Ō[b](x) + 1 − kO [b] .
n
But since
X XX X
Ō[b](x) = mb (A)21−|A| = mb (A)|A|21−|A| = kO [b], (10.40)
x∈Θ x∈Θ A⊃x A⊂Θ

i.e. kO [b] is the normalization factor for Ō[b], the function (10.27) is a Bayesian
belief function, and we can write, as desired, (since P(x) = 1/n):
π[b] = (1 − kO [b])P + kO [b]O[b].

Proof of Theorem 51

By the definition of plausibility function and Equation (10.40) it follows that:


X 1 X mb (A) Ō[b]
plb2|| (x) = mb2|| (A) = |A|
= ,
k2|| 2 2k2||
A⊃x A⊃x
X 1 X X mb (A) kO [b]
plb2|| (x) = |A|
= .
k2|| 2 2k2||
x∈Θ x∈Θ A⊃x

˜ (x) = Ō[b]/kO [b] = O[b]. Similarly:


Hence pl b ||
2

X 1 X mb (A) 1
plb|| (x) = mb|| (A) = = BetP [b](x)
k|| |A| k||
A⊃x A⊃x

and since
P ˜ (x) = BetP [b](x).
BetP [b](x) = 1, pl
x b||
10.6 Comparisons within the affine family 337

Proof of Theorem 52

By Theorem 50 π[b] = (1 − kO [b])P + Ō[b] where


X
kO [b] = mb (A)|A|21−|A|
A⊂Θ

mb (A)21−|A| . Hence:
P
and Ō[b](x) = A⊃x
X
α1 mb1 (A) + α2 mb2 (A) |A|21−|A|

kO [α1 b1 + α2 b2 ] =
A⊂Θ
=α1 kO [b1 ] + α2 kO [b2 ],
X
α1 mb1 (A) + α2 mb2 (A) 21−|A|

Ō[α1 b1 + α2 b2 ](x) =
A⊃x
= α1 Ō[b1 ] + α2 Ō[b2 ],

which in turn implies (since α1 + α2 = 1):

π[α1 b1 + α2 b2 ] = (1 − α1 kO [b1 ] − α2 kO [b2 ])P


 + α1 Ō[b1 ] + α2 Ō[b2 ] 
= α1 (1 − kO [b1 ])P + Ō[b1 ] + α2 (1 − kO [b2 ])P + Ō[b2 ]
= α1 π[b1 ] + α2 π[b2 ].

Proof of Theorem 53

By Theorem 52
hX i X
π[b] = π mb (A)bA = mb (A)π[bA ]
A⊂Θ A⊂Θ

which, by Lemma 11, becomes:


X
mb (A) (1 − |A|21−|A| )P + |A|21−|A| P A
 
π[b] =
A⊂Θ
 
X X
= 1− mb (A)|A|21−|A| P + mb (A)|A|21−|A| P A
X A⊂Θ  A⊂Θ
X 
1−|A|
= mb (A)|A|2 PA + 1 − mb (A)|A|21−|A| P
A6=Θ A⊂Θ
+mb (Θ)|Θ|21−|Θ| P,

i.e., Equation (10.28).

Proof of Theorem 54

If b meets (10.32), then the intersection probability values are, ∀x ∈ Θ:


n n−1

X X
p[b](x) = mb (x) + β[b] mb (A) = mb (x) + β[b] σ k k−1
n
 =
A){x} k=2 k
338 10 The affine family of probability transforms
n−1 n
 
(as there are k−1 events of size k containing x, and k events of size k in total)
n
X k 1 σ 2 + ... + σ n
= mb (x) + β[b] σk = mb (x) + (2σ 2 + ... + nσ n )
n n 2σ 2 + ... + nσ n
k=2
1
= mb (x) + (σ 2 + ... + σ n ),
n
after recalling the decomposition (10.31) of β[b].
On the other hand, under the hypothesis, the pignistic function reads as:
n n n−1

X X mb (A) X σk k−1
BetP [b](x) = mb (x) + = mb (x) + n

k k k
k=2 A){x},|A|=k k=2
n k n
X σk
X σ k
= mb (x) + = mb (x) + ,
k n n
k=2 k=2
(10.41)
and the two functions coincide.

Proof of Theorem 55
The orthogonal projection of a belief function b on the probability simplex P has
the following expression [267] (Equation 10.26):
1 + |Ac |21−|A| 1 − |A|21−|A|
X   X  
π[b](x) = mb (A) + mb (A) .
n n
A⊇{x} A6⊃{x}

Under condition (10.32) it becomes


n 
1 + (n − k)21−k
X  X
π[b](x) = mb (x) + mb (A)
n
k=2 A⊇{x},|A|=k
n  (10.42)
1 − (n − k)21−k
X  X
+ mb (A)
n
k=2 A6⊃{x},|A|=k

k
P
where again A⊇{x},|A|=k mb (A) = σ k/n, while
n−1

X (n − 1)! k!(n − k)! n−k
mb (A) = σ k nk  = σ k = σk .
k
k!(n − k − 1)! n! n
A6⊃{x},|A|=k

Replacing those expressions in Equation (10.42) yields


n  n 
1 + (n − k)21−k k X 1 − (n − k)21−k
 
X n−k
mb (x) + σk + σk =
n n n n
k=2 k=2
n   n
k k kn−k 1X k
X
= mb (x) + σ 2 +σ = m b (x) + σ
n n2 n
k=2 k=2

i.e., the value (10.41) of the intersection probability under the same assumptions.
The epistemic family of probability
transforms
11
We have seen in Chapter 4 that a decision-based approach to probability transfor-
mation is the foundation of Smets’ ‘Transferable Belief Model’ [1231, 1276]. In the
TBM, Smets abandons all notions of multivalued mapping to define belief directly
in terms of basis belief assignments (‘credal’ level), while decisions are made via
the pignistic probability (4.15)
X mb (A)
BetP [b](x) = ,
|A|
A⊇{x}

generated by the associated pignistic transform: BetP : B → P, b 7→ BetP [b].


Initially justified by the Principle of Insufficient Reason, the pignistic probability is
the result of a redistribution process in which the mass of each focal element A is re-
assigned to all its elements x ∈ A on an equal basis. Geometrically, it is the center
of mass of the polytope (3.10) of consistent probabilities [168]. Other proposals,
based on redistribution processes similar to that of the pignistic transform, have
been recently brought forward by Dezert et al. [391], Burger [139], Sudano [1314]
and others.
We have studied in Chapter 10 that the pignistic transform is a member of what
we called the ‘affine’ group of probability transforms, which also includes inter-
section probability and orthogonal projection onto the probability simplex. Such
probability transforms are characterized by their commutativity with affine combi-
nation in our geometric framework, a property called ‘linearity’ in the TBM.
Here we focus on probability transforms which commute with Dempster’s rule of

339
340 11 The epistemic family of probability transforms

combination, and are therefore considered by some scholars as more consistent with
the original Dempster-Shafer framework. We show that the relative plausibility and
relative belief transforms belong to this group, which we call the ‘epistemic family.
of probability transforms.

Chapter content

As we have seen in Chapter 3, belief functions have different, rather conflicting


interpretations. In Section 11.1 we therefore discuss the semantics of relative belief
and plausibility in both the probability-bound and Shafer’s acceptions of the theory.
Within the probability-bound interpretation (Section 11.1.1), as neither transforms
are consistent with the original belief functions, as they cannot be associated with a
valid redistribution of the mass of the focal elements to the singletons. In Shafer’s
formulation of the theory of evidence as an evidence combination process, instead,
the arguments proposed for the plausibility transform can be extended to the case of
the relative belief transform (Section 11.1.2).
Indeed, we argue here that relative plausibility and belief transforms are closely
related probability transformations (Section 11.2). Not only they are characterized
by the fact that the relative belief of singletons can be seen as the relative plausi-
bility of singletons of the associated plausibility function (Section 11.2.2), but both
transforms meet a number of dual properties with respect to Dempster’s rule of
combination (Section 11.2.3). In particular, while pl ˜ commutes with Dempster’s
b
sum of belief functions, b̃ commutes with orthogonal sums of plausibility functions
(compare Chapter 8). Similarly, while pl ˜ perfectly represents a belief function b
b
when combined with any probability distribution (4.17), b̃ perfectly represents the
associated plausibility function plb when combined with a probability through the
natural extension of Dempster’s sum (Section 11.2.4).
The resulting duality is summarised in the following table:

b ↔ plb
˜
pl ↔ b̃
b
˜ ⊕ p ∀p
b ⊕ p = pl ↔ plb ⊕ p = b̃ ⊕ p ∀p
b
˜ [b1 ⊕ b2 ] = pl
pl ˜ [b1 ] ⊕ pl
˜ [b2 ] ↔ b̃[plb ⊕ plb ] = b̃[plb ] ⊕ b̃[plb ].
b b b 1 2 1 2

The symmetry/duality between (relative) plausibility and belief is broken, how-


ever, as the existence of r.b.s. is subject to a strong condition:
X
mb (x) 6= 0, (11.1)
x∈Θ

stressing the issue of its applicability (Section 11.4). Even though this situation is
‘singular’ (in the sense that it excludes most belief and probability measures, Section
11.4.1), in practice the situation in which the mass of all singletons is nil is not so
uncommon. However, in Section 11.4.2 we point out that relative belief is only a
member of a class of relative mass transformations, which can be interpreted as
11.1 Rationale of epistemic transforms 341

low-cost proxies for both plausibility and pignistic transforms (11.4.3). We discuss
their applicability as approximate transformations in two significant scenarios.
The second part of the Chapter is devoted to the study of the geometry of epis-
temic transforms, in both the space of all pseudo belief functions (Section 11.5), in
which the belief space is embedded, and the probability simplex (Section 11.6).
Indeed, the geometry of relative belief and plausibility can be reduced to that of
two specific pseudo belief functions called ‘plausibility of singletons’ (11.14) and
‘belief of singletons’ (11.16), which are introduced in Sections (11.5.1) and (11.5.2)
respectively. Their geometry can be described in terms of three planes (11.5.3) and
angles (11.5.4) in the belief space. Such angles are, in turn, related to a probability
distribution which measures the relative uncertainty on the probabilities of single-
tons determined by b, and can be considered as the third P member of the epistemic
family of transformations. As b̃ does not exist when x mb (x) = 0, this singular
case needs to be discussed separately (Section 11.5.5).
Several examples illustrate the relation between the geometry of the involved func-
tions and their properties in terms of degrees of belief.
As probability transforms map belief functions onto probability distributions, it
makes sense to study their behavior in the simplex of all probabilities as well. We
will get some insight on this in Section 11.6, at least in the case study of a frame of
size 3.
Finally, as a step towards a complete understanding of the probability transfor-
mation problem, we discuss (Section 11.7) what we learned about the relationship
between the affine and epistemic families of probability transformations. Inspired
by the binary case study, we provide sufficient conditions under which all trans-
forms coincide, in terms of equal distribution of masses and equal contribution to
the plausibility of the singletons.

11.1 Rationale of epistemic transforms

As we well know by now, the original semantics of belief functions derive from
Dempster’s analysis of the effect of multi-valued mappings Γ : Ω → 2Θ , x ∈ Ω 7→
Γ (x) ⊆ Θ on evidence available in the form of a probability distribution on the
‘top’ domain Ω on the ‘bottom’ decision set Θ (Section ??). As such, belief values
are probabilities of events implying other events.
In some of his papers [346], however, Dempster himself claimed that the mass
mb (A) associated with a non-singleton event A ⊆ Θ could be understood as a ‘float-
ing probability mass’ which could not be attached to any particular singleton event
x ∈ A because of the lack of precision of the (multi-valued) operator that quantify
our knowledge via the mass function. This has originated a popular but controver-
sial interpretation of belief functions as coherent sets of probabilities determined by
sets of lower and upper bounds on their probability values (Section 3.1.4).
As Shafer admits in [?], there is a sense in which a single belief function can indeed
be interpreted as a consistent system of probability bounds. However, the issue with
342 11 The epistemic family of probability transforms

the probability-bound interpretation of belief functions becomes evident when con-


sidering two or more belief functions addressing the same question but representing
conflicting items of evidence, i.e., when Dempster’s rule is applied to aggregate ev-
idence. In [1149, 1153], Shafer disavowed any probability-bound interpretation, a
position later seconded by Dempster [347].
We will come back to this point in Section 11.1.2, in which we will link the
relative belief transform to Cobb and Shenoy’s arguments [197] in favor of the
plausibility transform as a link between Shafer’s theory of evidence (endowed with
Dempster’s rule) and Bayesian reasoning. To corroborate this argument, in Section
11.1.1 we show that both plausibility and relative belief transforms (unlike Smets’
pignistic transform) are not consistent with a probability-bound interpretation of
belief functions1 .

11.1.1 Semantics within the probability bound interpretation

In their static, probability-bound interpretation, belief functions b : 2Θ → [0, 1] de-


termine each a convex set P[b] of ‘consistent’ probability distributions (3.10). These
are the result of a redistribution process, in which the mass of each focal element
is shared between its elements in an arbitrary proportion [246]. One such probabil-
ity is the pignistic one (4.15). The pignistic transform was originally based on the
Principle of Insufficient Reason (PIR) proposed by Bernoulli, Laplace, and Keynes
[9], which states that ‘if there is no known reason for predicating of our subject
one rather than another of several alternatives, then relatively to such knowledge the
assertions of each of these alternatives have an equal probability’. A direct conse-
quence of the PIR2 in the probability-bound interpretation of b.f.s is that, when con-
sidering how to redistribute the mass of an event A, it is wise to assume equiproba-
bility amongst its singletons - but this yields exactly the pignistic probability.
It is easy to prove that relative belief and plausibility of singletons are not the
result of such a redistribution process, and therefore are not consistent with the
original belief function in the sense defined above.
Indeed, the relative plausibility of singletons (4.16) is the result of a process in
which:
– for each singleton x ∈ Θ a mass reassignment strategy (there could be more than
one) is selected in which the mass of all the events containing it is reassigned to
x, yielding {plb (x), x ∈ Θ};
– however, as different reassignment strategies are supposed to hold for different
singletons (many of which belong to the same higher-size focal elements), this
scenario is not compatible with the existence of a single coherent redistribution
1
Even in this interpretation, however, a rationale for such transformations can be given via
a utility theoretical argument as in the case of the pignistic probability. We will discuss
this in Chapter ??.
2
Later on, however, Smets [1223] advocated that the PIR could not justify by itself the
uniqueness of the pignistic transform, and proposed a justification based on a number of
axioms.
11.1 Rationale of epistemic transforms 343

of mass from focal elements to singletons, as the basic probabiloity of the same
higher cardinality event is assigned to different singletons;
– the obtained plausibility values plb (x) are nevertheless normalized to yield a for-
mally admissible probability distribution.
Similarly, for the relative belief of singletons (4.18):
– for each singleton x ∈ Θ a mass reassignment strategy is selected in which only
the mass of {x} itself is re-assigned to x, yielding {b(x) = mb (x), x ∈ Θ};
– once again this scenario does not correspond to a single valid redistribution pro-
cess, as the mass of all higher-size focal elements is not assigned to any single-
tons;
– the obtained values b(x) are nevertheless normalized to produce a valid probabil-
ity.
The fact that both such probability transforms come from jointly assuming a
number of incompatible redistribution processes is reflected by the fact that the re-
sulting probability distributions are not guaranteed to belong to the set of probabili-
ties (3.10) consistent with b.

Theorem 56. The relative belief of singletons of a belief function b is not always
consistent with b.

Theorem 57. The relative plausibility of singletons of a belief function b is not al-
ways consistent with b.

As an example, consider a belief function on Θ = {x1 , x2 , ..., xn } with two


focal elements:

mb (x1 ) = 0.01, mb ({x2 , ..., xn }) = 0.99. (11.2)

This can be interpreted as the following real-world situation. A number of people


x2 , ..., xn have no money of their own but they are all candidates to inherit the
wealth of a very rich relative. Person x1 is not, but has some little money of their
own. Note that it is not correct to interpret x2 , ..., xn as assured, joint owners of a
certain wealth (say, shares of the same company), as (11.2) is indeed consistent (in
the probability-bound interpretation) with a distribution which assigns probability
0.99 to a single person of the group x2 , ..., xn .
The relative belief of singletons associated with (11.2) is:

b̃(x1 ) = 1, b̃(xi ) = 0 ∀i = 2, ..., n. (11.3)

Clearly this is not a good representative of the set of probabilities consistent with the
above belief function, as it does not contemplate at all the chance the heirs x2 , ..., xn
have to gain a remarkable amount of money.
Indeed, according to Theorem 56, (11.3) is not at all consistent with (11.2).
344 11 The epistemic family of probability transforms

11.1.2 Semantics within Shafer’s interpretation

Shafer has strongly argued against a probability-bound interpretation of belief func-


tions. When these are not taken in isolation but as pieces of evidence to combine,
such an interpretation forces us to consider only groups of belief functions whose
degrees of belief, when interpreted as probability bounds, can be satisfied simulta-
neously (in other words, when their sets of consistent probabilities have non-empty
intersection). In Shafer’s (and Shenoy’s) view, though, when belief functions are
combined via Dempster’s rule this is irrelevant, even though consistent probabili-
ties that simultaneously bound all the belief functions being combined as well as
the resulting b.f. do exist when no renormalization is required in their Dempster’s
combination. Consequently, citing Shafer, authors who support a probability-bound
interpretation of belief functions are uncomfortable with renormalization [1536].
In this context, Cobb and Shenoy [197] have argued in favor of the plausibility
transform as a link between Shafer’s theory of evidence (endowed with Dempster’s
rule) and Bayesian reasoning. Besides some general arguments supporting proba-
bility transformations of belief functions in general, their points more specifically
about the plausibility transform can be summarized as follows:
– a probability transformation consistent with Dempster’s rule can improve our un-
derstanding of the theory of evidence by providing probabilistic semantics for
belief functions, i.e., ‘meanings’ of basic probability assignments in the context
of betting for hypotheses in the frame Θ;
– in opposition to some literature on belief functions suggesting that the theory of
evidence is more expressive than probability theory since the probability model
obtained by using the pignistic transformation leads to non-intuitive results [118],
they show that by using the plausibility transformation method the original belief
function model and the corresponding probability model yield the same qualita-
tive results;
– a probability transformation consistent with Dempster’s rule allows to build prob-
abilistic models by converting/transforming belief function models obtained by
using the belief function semantics of distinct evidence [1182].
Mathematically, they proved [198] that the plausibility transform commutes with
Dempster’s rule, and meets a number of additional properties which they claim ‘al-
low an integration of Bayesian and D-S reasoning that takes advantage of the ef-
ficiency in computation and decision-making provided by Bayesian calculus while
retaining the flexibility in modeling evidence that underlies D-S reasoning’.
In this Chapter we prove that a similar set of (dual) properties hold for the rela-
tive belief transform, associating relative belief and relative plausibility transforms
in a family of probability transformations strongly related to Shafer’s interpretation
of the theory of evidence via Dempster’s rule.
11.2 Dual properties of epistemic transforms 345

11.2 Dual properties of epistemic transforms


Relative belief and plausibility of singletons are, as we show here, linked by a form
of duality, as b̃ can be interpreted as the relative plausibility of singletons of the
plausibility function plb associated with b. Furthermore, b̃ and pl ˜ share a close
b
relationship with Dempster’s evidence combination rule ⊕, as they meet a set of
dual properties with respect to ⊕. This suggests a classification of all the probability
transformations of belief functions in terms of the operator they relate to.

11.2.1 Relative plausibility, Dempster’s rule, and pseudo belief functions


˜ (4.16) com-
Cobb and Shenoy [198] proved that the relative plausibility function pl b
mutes with Dempster’s rule, and meets a number of additional properties3 .

Proposition 45. The following statements hold:


˜ = pl
1. If b = b1 ⊕ · · · ⊕ bm then pl ˜ ⊕ · · · ⊕ pl
˜ : Dempster’s sum and relative
b b1 bm
plausibility commute.
2. If mb is idempotent with respect to Dempster’s rule, i.e. mb ⊕ mb = mb , then
˜ is idempotent with respect to Bayes’ rule.
pl b
3. Let us define the limit of a belief function b as
. .
b∞ = lim bn = lim b ⊕ · · · ⊕ b (n times); (11.4)
n→∞ n→∞

if ∃x ∈ Θ such that plb (x) > plb (y) ∀y 6= x, y ∈ Θ, then pl ˜ ∞ (x) = 1,


b
˜
plb∞ (y) = 0 ∀y 6= x.
4. If ∃A ⊆ Θ (|A| = k) s.t. plb (x) = plb (y) ∀x, y ∈ A, plb (x) > plb (z) ∀x ∈
˜ ∞ (x) = pl
A, z ∈ Ac , then pl ˜ ∞ (y) = 1/k ∀x, y ∈ A, pl
˜ ∞ (z) = 0 ∀z ∈ Ac .
b b b

On his side, Voorbraak had shown that [1358]:


˜ is a perfect representa-
Proposition 46. The relative plausibility of singletons pl b
tive of b in the probability space when combined through Dempster’s rule:
˜ ⊕ p,
b ⊕ p = pl ∀p ∈ P.
b

The relative belief of singletons meets similar dual properties. Their study, however,
requires to extend the analysis to normalised sum functions (also called ‘pseudo
belief functions’, cfr. Section 6.3.3).
3
The original statements from [196] have been reformulated according to the notation used
in this Book.
346 11 The epistemic family of probability transforms

11.2.2 A (broken) symmetry

A direct consequence of the duality between belief and plausibility measures is the
existence of a striking symmetry between (relative) plausibility and belief transform.
A formal proof of this symmetry is based on the following interesting property of
the basic plausibility assignment µb (8.1) [260].
P
Lemma 12. A⊇{x} µb (A) = mb (x).

Theorem 58. Given a pair of belief/plausibility functions b, plb : 2Θ → [0, 1], the
relative belief transform of the belief function b coincides with the plausibility trans-
form of the associated plausibility function plb (interpreted as a pseudo belief func-
tion):
˜ b ].
b̃[b] = pl[pl
The symmetry between relative plausibility and relative belief of singletons is bro-
ken by the fact that the latter is not defined for belief functions with no singleton
focal sets. Since b̃ is itself an instance of relative plausibility (of a plausibility func-
˜ always exists, this fact seems to contradict Theorem 58.
tion plb ), and pl b
This seeming paradox can be explained by the combinatorial nature of belief,
plausibility, and commonality functions. As we provedP in Chapter 8 [260], while
belief measures are sum functions of the form b(A) = B⊂A m(B) whose Moe-
bius transform m is both normalized and non-negative, plausibility measures are
sum functions whose Moebius transform µ is not necessarily non-negative (com-
monality functions are not even normalized).
As a consequence, the quantity
X X X X
plplb (x) = µb (A) = µb (A)|A|
x x A⊇{x} A⊇Θ

˜
can be equal to zero, in which case pl plb = b̃ does not exist.

11.2.3 Dual properties of the relative belief operator

The duality between b̃ and pl ˜ (albeit to some extent imperfect) extends to the pair
b
of transformations’ behavior with respect to Dempster’s rule of combination (2.6).
We have seen in Chapter 7, Section 7.1, that the orthogonal sum can be natu-
rally extended to a pair ς1 , ς2 of pseudo belief functions (p.b.f.s) [265], by simply
applying (2.6) to their Moebius inverses mς1 , mς2 .
Proposition 47. Dempster’s rule defined as in Equation (2.6) when applied to a
pair of pseudo belief functions ς1 , ς2 yields again a pseudo belief function.
We still denote the orthogonal sum of two p.b.f.s ς1 , ς2 by ς1 ⊕ ς2 .
As plausibility functions are pseudo b.f.s, Dempster’s rule can then be formally
applied to them too. It is convenient to introduce a dual form of the relative be-
lief operator, mapping a plausibility function to the corresponding relative belief of
singletons: b̃ : PL → P, plb 7→ b̃[plb ], where
11.2 Dual properties of epistemic transforms 347

. mb (x)
b̃[plb ](x) = P ∀x ∈ Θ (11.5)
y∈Θ mb (y)
P
is defined as usual for b.f.s b such that y mb (y) 6= 0.
Indeed, as b and plb are in 1-1 correspondence, we can indifferently define an op-
erator mapping a belief function b to its relative belief b̃, or mapping the unique
plausibility function plb associated with b to b̃.
The following commutativity theorem follows, as the dual of point 1) in Propo-
sition 45.
Theorem 59. The relative belief operator commutes with respect to Dempster’s
combination of plausibility functions:

b̃[pl1 ⊕ pl2 ] = b̃[pl1 ] ⊕ b̃[pl2 ].

Theorem 59 implies that


b̃[(plb )n ] = (b̃[plb ])n . (11.6)
As an immediate consequence, an idempotence property which is the dual of point
2) of Proposition 45 holds for the relative belief of singletons.
Corollary 11. If plb is idempotent with respect to Dempster’s rule, i.e. plb ⊕ plb =
plb , then b̃[plb ] is itself idempotent: b̃[plb ] ⊕ b̃[plb ] = b̃[plb ].
Proof. By Theorem 59 b̃[plb ] ⊕ b̃[plb ] = b̃[plb ⊕ plb ], and if plb ⊕ plb = plb the thesis
immediately follows. 
The dual results of the remaining two statements of Proposition 45 can be proven
in a similar fashion.
Theorem 60. If ∃x ∈ Θ such that b(x) > b(y) ∀y 6= x, y ∈ Θ, then

b̃[plb∞ ](x) = 1, b̃[plb∞ ](y) = 0 ∀y 6= x.

An similar proof can be provided for the following generalization of Th 60.


Corollary 12. If ∃A ⊆ Θ (|A| = k) s.t. b(x) = b(y) ∀x, y ∈ A, b(x) > b(z)
∀x ∈ A, z ∈ Ac , then

b̃[plb∞ ](x) = b̃[plb∞ ](y) = 1/k ∀x, y ∈ A, b̃[plb∞ ](z) = 0 ∀z ∈ Ac .

A numerical example It is crucial to point out that commutativity (Theorem 59)


and idempotence (Corollary 11) hold for combinations of plausibility functions, and
not of belief functions.
Let us consider as an example the belief function b on the frame of size four
Θ = {x, y, z, w} determined by the following basic probability assignment:

mb ({x, y}) = 0.4, mb ({y, z}) = 0.4, mb (w) = 0.2. (11.7)

Its b.pl.a. is, according to (8.1), given by


348 11 The epistemic family of probability transforms

µb (x) = 0.4, µb (y) = 0.8, µb (z) = 0.4,


(11.8)
µb (w) = 0.2, µb ({x, y}) = −0.4, µb ({y, z}) = −0.4.

To check the validity of Theorems 59 and 60 let us analyse the two series of proba-
bility measures (b̃[plb ])n and b̃[(plb )n ].
By applying Dempster’s rule to the b.pl.a. (11.8) (plb2 = plb ⊕ plb ) we get a new
b.pl.a. µ2b with values µ2b (x) = 4/7, µ2b (y) = 8/7, µ2b (z) = 4/7, µ2b (w) = −1/7,
µ2b ({x, y}) = −4/7, µ2b ({y, z}) = −4/7 (see Figure 11.1). To compute the corre-

{y,z} {y} {z} {y} {y,z}


{x,y} {x} {y} {x,y} {y}
{w} {w}
{z} {z} {z}
{y} {y} {y} {y}
{x} {x} {x}

{x} {y} {z} {w} {x,y}{y,z}

Fig. 11.1. Intersection of focal elements in Dempster’s combination of the b.pl.a. (11.8) with
itself. Non-zero mass events for each addendum µ1 = µ2 = µb correspond to rows/columns
of the table, each entry of the table hosting the related intersection.

sponding relative belief b̃[plb2 ] we first need to get the plausibility values

plb2 ({x, y, z}) = µ2b (x) + µ2b (y) + µ2b (z) + µ2b ({x, y}) + µ2b ({y, z}) = 8/7,
plb2 ({x, y, w}) = 1, plb2 ({x, z, w}) = 1, plb2 ({y, z, w}) = 1
.
which imply (as, by definition, plb (A) = 1 − b(Ac )): b2 (w) = −1/7, b2 (z) =
b2 (y) = b2 (x) = 0. Therefore: b̃[plb2 ] = [0, 0, 0, 1]0 (representing probability distri-
butions as vectors of the form [p(x), p(y), p(z), p(w)]0 ).
Theorem 59 is confirmed as, by (11.7) (being {w} the only singleton with non-
zero mass), b̃ = [0, 0, 0, 1]0 so that b̃ ⊕ b̃ = [0, 0, 0, 1]0 and b̃[.] commutes with plb ⊕.
By combining plb2 with plb one more time we get the b.pl.a.

µ3b (x) = 16/31, µ3b (y) = 32/31, µ3b (z) = 16/31, µ3b (w) = −1/31,
µ3b ({x, y}) = −16/31, µ3b ({y, z}) = −16/31

which corresponds to plb3 ({x, y, z}) = 32/31, plb3 ({x, y, w}) = 1, plb3 ({x, z, w}) =
1, plb3 ({y, z, w}) = 1. Therefore: b3 (w) = −1/31, b3 (z) = b3 (y) = b3 (x) = 0,
and b̃[plb3 ] = [0, 0, 0, 1]0 which again is equal to b̃ ⊕ b̃ ⊕ b̃ as Theorem 59 guarantees.
The series of basic plausibility assignments (µb )n clearly converges to:

µnb (x) → 1/2+ , µ3b (y) → 1+ , µ3b (z) → 1/2+ , µ3b (w) → 0− ,
µb ({x, y}) → −1/2− ,
3
µb ({y, z}) → −1/2− ,
3
11.2 Dual properties of epistemic transforms 349

associated with the following plausibility values: limn→∞ plbn ({x, y, z}) = 1+ ,
plbn ({x, y, w}) = plbn ({x, z, w}) = plbn ({y, z, w}) = 1 ∀n ≥ 1. These correspond
to the following values of belief of singletons: limn→∞ bn (w) = 0− , bn (z) =
bn (y) = bn (x) = 0 ∀n ≥ 1, so that:
n
limn→∞ b̃[plb∞ ](w) = limn→∞ bbn (w)
(w) = 1,
limn→∞ b̃[plb∞ ](x) = limn→∞ b̃[plb∞ ](y) = limn→∞ b̃[plb∞ ](z)
= limn→∞ bn 0(w) = limn→∞ 0 = 0,

in perfect agreement with Theorem 60.

11.2.4 Representation theorem for relative beliefs

A dual of the representation theorem (Proposition 46) for the relative belief trans-
form can also be proven, once we recall the following result on Dempster’s sum of
affine combinations [265] (cfr. Chapter 7, Theorem (15)).
Proposition 48. The orthogonal sum b ⊕ i αi bi , i αi = 1 of a b.f. b with any4
P P
affine combination of belief functions is itself an affine combination of the partial
sums b ⊕ bi X X
b⊕ αi bi = γi (b ⊕ bi ), (11.9)
i i

where γi = Pαi k(b,bi ) and k(b, bi ) is the normalization factor of the partial
j αj k(b,bj )
Dempster’s sum b ⊕ bi .
Again, the duality between b̃ and pl˜ suggests that the relative belief of single-
b
tons represent the associated plausibility function plb , rather than the corresponding
belief function b: b̃ ⊕ p 6= b ⊕ p.
Theorem 61. The relative belief of singletons b̃ perfectly represents the correspond-
ing plausibility function plb when combined with any probability through (extended)
Dempster’s rule:
b̃ ⊕ p = plb ⊕ p
for all Bayesian belief functions p ∈ P.
˜
Theorem 61 can be obtained from Proposition 46 by replacing b with plb and pl b
with b̃ in virtue of their duality.

Example: continued Once again, the representation Theorem 61 is about combina-


tions of plausibility functions (as pseudo b.f.s), not combinations of belief functions.
Going back to the previous example, the combination b ⊕ b of b with itself has basic
probability assignment:
4
In fact the collection {bi } is required to include at least a belief function which is com-
binable with b, [265].
350 11 The epistemic family of probability transforms

mb ({x, y}) · mb ({x, y}) 0.16


mb⊕b ({x, y}) = = = 0.235,
k(b, b) 0.68
mb ({y, z}) · mb ({y, z}) 0.16
mb⊕b ({y, z}) = = = 0.235,
k(b, b) 0.68
mb (w) · mb (w) 0.04
mb⊕b (w) = = = 0.058,
k(b, b) 0.68
mb ({x, y}) · mb ({y, z}) + mb ({y, z}) · mb ({x, y})
mb⊕b (y) = = 0.47,
k(b, b)
which obviously yields:
 0
0.47 0.058
b ⊕ b = 0,
] , 0, 6= b̃ ⊕ b̃ = [0, 0, 0, 1]0 .
0.528 0.528

The main reason for that is that the plausibility function of a sum of two belief
functions is not the sum of the associated plausibilities:

[plb1 ⊕ plb2 ] 6= plb1 ⊕b2 .

11.2.5 Two families of Bayesian approximations

The following table summarizes the duality results we just presented:

b ↔ plb
˜
plb ↔ b̃
˜ ⊕ p ∀p
b ⊕ p = pl ↔ plb ⊕ p = b̃ ⊕ p ∀p
b
˜ [b1 ⊕ b2 ] = pl
pl ˜ [b1 ] ⊕ pl
˜ [b2 ] ↔ b̃[plb1 ⊕ plb2 ] = b̃[plb1 ] ⊕ b̃[plb2 ]
b b b
˜ ⊕ pl[b]
b ⊕ b = b ` pl[b] ˜ = pl[b] ˜ ↔ plb ⊕ plb = plb ` b̃[plb ] ⊕ b̃[plb ] = b̃[plb ].

Note that, just as Voorbraak’s and Cobb’s results are not valid for all pseudo belief
functions but only for proper b.f.s., the above dual results do not hold for all pseudo
belief functions either, but only for those p.b.f.s which are plausibility functions.
These results bring about a classification of all probability transformations in
two families related to Dempster’s sum and affine combination, respectively.
The notion that there exist two distinct families of probability transformations, each
determined by the operator they commute with, was already implicitly present in
the literature. Smets’ linearity axiom [1276], which lays at the foundation of the
pignistic transform, obviously corresponds (even though expressed in a somewhat
different language) to the commutativity with affine combination of belief func-
tions. To address the criticism such axiom was subject to, Smets introduced later a
formal justification based on an expected utility argument in the presence of con-
ditional evidence [1223]. On the other hand, Cobb and Shenoy argued in favour of
the commutativity with respect of Dempster’s rule, on the basis that the Dempster-
Shafer theory of evidence is a coherent framework of which Dempster’s rule is an
integral part, and that a Dempster-compatible transformation can provide a useful
probabilistic semantic for belief functions.
11.4 Generalizations of the relative belief operator 351

Incidentally, there seems to be a flaw in Smets’ argument that the pignistic trans-
form is uniquely determined as the probability transformation which commutes with
affine combination: in [267] and Chapter 10 we indeed proved that the orthogonal
transform (Section 10.4) also enjoys the same property.
Analogously, we showed here that the plausibility transform is not unique as a prob-
ability transformation which commutes with ⊕ (even though, in this latter case, the
transformation is applied to different objects).

11.3 Plausibility transform and convex closure


We add a further element to this ongoing debate by proving that the plausibility
transform, although it does not obviously commute with affine combination, does
commute with the convex closure of belief functions in the belief space B:
n X o
Cl(b1 , ..., bk ) = b ∈ B : b = α1 b1 + · · · + αk bk , αi = 1, αi ≥ 0 ∀i .
i

Let us first study its behavior with respect to affine combination.


Lemma 13. For all α ∈ R we have that
˜
pl[αb ˜ ˜
1 + (1 − α)b2 ] = β1 pl[b1 ] + β2 pl[b2 ],

where
αkpl1 αkpl2
β1 = , β2 = .
αkpl1 + (1 − α)kpl2 αkpl1 + (1 − α)kpl2
It follows that:
Theorem 62. The relative plausibility operator commutes with convex closure in
˜
the belief space: pl[Cl(b ˜ ˜
1 , ..., bk )] = Cl(pl[b1 ], ..., pl[bk ]).

The behavior of the plausibility transform, in this respect, is similar to that of Demp-
ster’s rule (Theorem 6, [265]), supporting the argument that the plausibility trans-
form is indeed naturally associated with the D-S framework.

11.4 Generalizations of the relative belief operator

A serious issue with the relative belief of singletons is its applicability.


In opposition to relative plausibility, b̃ does not exist for a large class of belief func-
tions (those which assign no mass to singletons). Even though this singular case in-
volves only a small fraction of all belief measures (Section 11.4.1), this issue arises
in many practical cases, for instance when we use fuzzy membership functions to
model the evidence.
352 11 The epistemic family of probability transforms

11.4.1 Zero mass to singletons as a singular case

Let us first consider the set of belief functions for which a relative belief of single-
tons does not exist. In the binary case Θ = {x, y}, the existence constraint (11.1)
implies that the only belief function which does not admit relative belief of single-
tons is the vacuous one bΘ : mbΘ (Θ) P = 1. Indeed, for the vacuous belief function
there, mbΘ (x) = mbΘ (y) = 0 so that x mbΘ (x) = 0 and b̃Θ does not exist. Sym-
metrically, the pseudo b.f. ς = plbΘ (for which plbΘ (x) = plbΘ (y) = 1) is such that
˜
plplbΘ = bΘ , so that pl plbΘ does not exist either.
Figure 11.2-left illustrates the geometry of the relative belief operator in the binary
case - the dual singular points bΘ , ς = plbΘ are highlighted.

b y =[0,1]'=pl b y
plb =[1,1]' by
Θ

P
_ _
P{x,y} P{y,z}

_
~ P
b
b

bx _ bz
bΘ=[0,0]' b x =[1,0]'=pl bx P{x,z}

Fig.
h 11.2. Left: The location
i0 of the relative belief of singletons b̃ =
mb (x) mb (y)
,
mb (x)+mb (y) mb (x)+mb (y)
associated with an arbitrary belief function b on {x, y}
is shown. The singular points bΘ = [0, 0]0 and plbΘ = [1, 1]0 are marked by small circles.
Right: The images
P under pignistic function and relative plausibility of the subset of belief
functions {b : x mb (x) = 0} span only a proper subset of the probability simplex. This
region is shown here in the ternary case Θ = {x, y, z} (the triangle delimited by dashed
lines).

The analysis of the binary case shows that the set of belief functions for which
b̃ does not exist is a lower-dimensional subset of the belief space B. To support
this point, we determine here the region spanned by the most common probability
transformations: the plausibility and the pignistic transforms.
Theorem 62 proves that the plausibility transform commutes with convex closure.
As (by Proposition 48, [267]) the pignistic transform (4.15) commutes with affine
combination, we have that BetP also commutes with Cl:

BetP [Cl(b1 , ..., bk )] = Cl(BetP [bi ], i = 1, ..., k).


11.4 Generalizations of the relative belief operator 353

To determine the image under both probability transforms of any convex set Cl(b1 , ..., bk )
of belief functions it is then sufficient to compute the images of its vertices.
.
The space of all belief functions B = {b : 2Θ → [0, 1]}, in particular, is the
convex closure of all the categorical b.f.s bA : B = Cl(bA , A ⊆ Θ) [244] (cfr. Theo-
rem 11). The image of a categorical b.f. bA (a vertex of B) under either plausibility
or pignistic transform is:
P  1
m (B)
|A| x ∈ A =. X mb (B)
˜ (x) = P B⊇{x} bA
pl = PA = A
bA
B⊇{x} m bA
(B)|B| 0 else |B|
B⊇{x}

= BetP [bA ](x). Hence:


˜
BetP [B] = Cl(BetP [bA ], A ⊆ Θ) = Cl(P A , A ⊆ Θ) = P = pl[B].

Pignistic and relative plausibility transform span the whole probability simplex P.
Consider, however, the set of (singular) b.f.s which assign zero
P mass to single-
tons. They live in Cl(bA , |A| > 1), as they have the form b = |A|>1 mb (A)bA ,
P
with mb (A) ≥ 0, |A|>1 mb (A) = 1.
The region of P spanned by their probability transforms is therefore:
˜
pl[Cl(b ˜
A , |A| > 1)] = Cl(plbA , |A| > 1) = Cl(P A , |A| > 1)

= Cl(BetP [bA ], |A| > 1) = BetP [Cl(bA , |A| > 1)].

If (11.1) is not met, both probability transforms span only a limited region of the
probability simplex. In the case of a ternary frame this yields the triangle:

Cl(P {x,y} , P {x,z} , P {y,z} , P Θ ) = Cl(P {x,y} , P {x,z} , P {y,z} )

delimited by dashed lines in Figure 11.2-right.

11.4.2 The family of relative mass probability transformations

One may argue that although the ‘singular’ case concerns only a small fraction of
all belief and probability measures, in many practical application there is a bias to-
wards some particular models which are the most exposed to the problem.
For example, uncertainty is often represented using a fuzzy membership function
[725]. If the membership function has only a finite number of values, then it is
equivalent to a belief function whose focal sets are linearly ordered under set inclu-
sion A1 ⊆ · · · ⊆ An = Θ, |Ai | = i, or ‘consonant’ belief function (see Chapter 2,
[1149, 411]). In consonant b.f.s at most one focal element A1 is a singleton, hence
most information is stored in the non-singleton focal elements.
This train of thoughts leads to the realization that the relative belief transform is
merely one representative of an entire family of probability transformations. Indeed,
it can be thought of as the probability transformation which, given a b.f. b:
354 11 The epistemic family of probability transforms

1. retains the focal elements of size 1 only, yielding an unnormalized belief func-
tion;
2. computes (indifferently) the latter’s relative plausibility/pignistic transforma-
tion:
P P mb (A)
A⊇x,|A|=1 mb (A) mb (x) A⊇x,|A|=1 |A|
b̃(x) = P P = =P P mb (A)
.
y A⊇x,|A|=1 mb (A) kmb
y A⊇x,|A|=1 |A|

A family of natural generalizations of the relative belief transform is thus obtained


by, given an arbitrary belief function b:
1. retaining the focal elements of size s only;
2. computing either the resulting relative plausibility ...
3. ... or the associated pignistic transformation.
Now, both option 2. and option 3. yield the same probability distribution. Indeed,
the application of the relative plausibility transform yields:
X X X
mb (A) mb (A) mb (A)
A⊇{x}:|A|=s A⊇{x}:|A|=s A⊇{x}:|A|=s
p(x) = X X = X = X ,
mb (A) mb (A)|A| s mb (A)
y∈Θ A⊇{y}:|A|=s A⊆Θ:|A|=s A⊆Θ:|A|=s

while applying the pignistic transform produces:


X mb (A) X
s mb (A)
|A|
A⊇{x}:|A|=s A⊇{x}:|A|=s
p(x) = X = , (11.10)
mb (A)
X X
s mb (A)
X
|A| y∈Θ A⊇{y}:|A|=s
y∈Θ A⊇{y}:|A|=s

i.e., the very same result. The following natural extension of the relative belief op-
erator is then well defined.
Definition 81. Given any belief function b : 2Θ → [0, 1] with basic probability
assignment mb , we call relative mass transformation of level s the transform M̃s [b]
which maps b to the probability distribution (11.10).
We denote by m̃s the output of the relative mass transform of level s.

11.4.3 Approximating pignistic probability and relative plausibility

Classical transformations as convex combinations of relative mass transforma-


tions It is easy too see that both relative plausibility of singletons and pignis-
tic probability are convex combinations of all the (n) relative mass probabilities
{m̃s , s = 1, ..., n}.
11.4 Generalizations of the relative belief operator 355
P
Namely, let us we denote by kb,s = A⊆Θ:|A|=s mb (A) the total mass of focal
P
elements of size s, and by plb (x; k) = A⊇{x}:|A|=s mb (A) the contribution to the
plausibility of x of the same size-s focal elements. Immediately:
X X X X
plb (y) = mb (A) = mb (A)|A|
y y A⊇{y} A⊆Θ
n
X  X  Xn
= r mb (A) = rkb,r .
r=1 A⊆Θ,|A|=r r=1

This yields the following convex decomposition of the relative plausibility of sin-
gletons into relative mass probabilities m̃s :
P
˜ (x) = Pplb (x) = P s plb (x; s)
X plb (x; s) X plb (x; s) skb,s
pl b = P = P
plb (y) r rkb,r r rkb,r skb,s r rkb,r
Xy s s
= αs m̃s (x),
s
(11.11)
plb (x;s)
as m̃s (x) = skb,s . The coefficients

skb,s X
αs = P ∝ skb,s = plb (y; s)
r rkb,r y

of the convex combination measure for each level s the total plausibility contribution
of the focal elements of size s.
In the case of the pignistic probability we get:
X mb (A) X 1 X
BetP [b](x) = = mb (A)
|A| s
s
A⊇{x} A⊇{x},|A|=s
X1 (11.12)
X plb (x; s) X
= plb (x; s) = kb,s = kb,s m̃s (x),
s
s s
skb,s s

with coefficients βs = kb,s measuring for each level s the mass contribution of the
focal elements of size s.

Relative mass transforms as low-cost proxies: approximation criteria Accord-


ingly, the relative mass probabilities can be seen as basic components of both the
pignistic and the plausibility transform, associated with the evidence carried by fo-
cal elements of a specific size.
As such transforms can be computed just by considering size-s focal elements, they
can also be thought of as low-cost proxies for both relative plausibility and pignistic
probability, since only the ns size-s focal elements (instead of the initial 2n ) have


to be stored, while all the others can be dropped without further processing.
We can think of two natural criteria for such an approximation of pl, ˜ BetP via
the relative mass transforms.
356 11 The epistemic family of probability transforms

– (C1) we retain the component s whose coefficient αs /βs is the largest in the con-
vex decomposition (11.11)/(11.12);
– (C2) we retain the component associated with the minimal size focal elements.
Clearly,
P the second criterion delivers the classical relative belief transform whenever
x m b (x) 6= 0. When the mass of singletons is nil, instead, (C2) amounts to a
natural extension of the relative belief operator:
P
ext . A⊇{x}:|A|=min mb (A)
b̃ (x) = P . (11.13)
|A|min A⊆Θ:|A|=min mb (A)

The two approximation criteria favour different aspects of the original belief func-
tion. (C1) focuses on the strength of the evidence carried by focal elements of equal
size. Note that the optimal C1 approximations of plausibility or pignistic transform
are in principle distinct:
˜ = arg max skb,s ,
ŝ[pl] ŝ[BetP ] = arg max kb,s .
s s

The optimal approximation for the pignistic probability will not necessarily be the
best approximation of the relative plausibility of singletons as well.
(C2) favors instead the precision of the pieces of evidence involved. Let us compare
these two approaches in two simple scenarios.

Two opposite scenarios While C1 appears to be a sensible, rational principle (the


selected proxy must be the greatest contributor to the actual classical probability
transformation), C2 seems harder to justify. Why should one retain only the smallest
focal elements, regardless their mass? The attractive feature of the relative belief of
singletons, among all possible C2 approximations, is its simplicity: the original mass
is directly re-distributed onto the singletons. What about the ‘extended’ operator
(11.13)?

Scenario 1 Suppose we wish to approximate the plausibility/pignistic transform of


a b.f. b : 2Θ → [0, 1], with b.p.a. mb (A) = mb (B) = , |A| = |B| = 2, and
mb (Θ) = 1 − 2  mb (A) (Figure 11.3-left).
Its relative plausibility of singletons is given by:
˜ (x) ∝ mb (A) + mb (Θ), pl
pl ˜ (y) ∝ mb (A) + mb (B) + mb (Θ),
b b
˜ (z) ∝ mb (B) + mb (Θ), pl
pl ˜ (w) ∝ mb (Θ) ∀w 6= x, y, z.
b b

Its pignistic probability reads as:


mb (A) mb (Θ)
BetP (x) = 2 + n , BetP (y) = mb (A)+m
2
b (B)
+ mbn(Θ) ,
mb (B) mb (Θ)
BetP (z) = 2 + n , BetP (w) = mbn(Θ) ∀w 6= x, y, z.

Both transformations have a profile similar to that of Figure 11.3-right (where we


assumed mb (A) > mb (B)).
11.4 Generalizations of the relative belief operator 357

Fig. 11.3. Left: the original belief function in Scenario 1. Right: corresponding profile of both
relative plausibility of singletons and pignistic probability.

Now, according to criterion (C1), the best approximation (among all relative
mass transforms) of both pl ˜ and BetP [b] is given by selecting the focal element
b
of size n, i.e., Θ, as the greatest contributor to both the convex sums (11.11) and
(11.12). However, it is easy to see that this yields as an approximation the uniform
probability p(w) = 1/n, which is the least informative probability distribution.
In particular, the fact that the available evidence supports to a limited extent the
singletons x, y and z is completed discarded, and no decision is possible.
If, on the other hand, we operate according to criterion (C2), we end up selecting
the size-2 focal elements A and B. The resulting approximation is:

m̃2 (x) ∝ mb (A), m̃2 (y) ∝ mb (A) + mb (B), m̃2 (z) ∝ mb (B),

m̃2 (w) = 0 ∀w 6= x, y, z. This mass assignment has the same profile as that of
˜ or BetP [b] (Figure 11.3-right): any decision made according to the latter will
pl b
correspond to that made on the basis of pl ˜ or BetP [b].
b
In a decision-making sense, therefore, m̃2 = b̃ext is the most correct approximation
of both plausibility and pignistic transforms. We end up making the same decisions,
at a much lower (in general) computation cost.

Scenario 2 Consider now a second scenario, involving a belief function with only
two focal elements A and B, with |A| > |B| and mb (A)  mb (B) (Figure 11.4-
left). Both relative plausibility and pignistic probability have the following values:
˜ (w) = BetP (w) ∝ mb (A) w ∈ A, pl
pl ˜ (w) = BetP (w) ∝ mb (B) w ∈ B,
b b

and correspond to the profile of Figure 11.4-right.


In this second case, (C1) and (C2) generate the uniform probability with support
in A (as mb (A)  mb (B)) and the uniform probability with support in B (as
|B| < |A|), respectively. Therefore, it is (C1) that yields the best approximation of
both plausibility and pignistic transforms in a decision-making perspective.

A critical look In this discussion, the second scenario corresponds to a situation


in which evidence is highly conflicting. In such a case we are given two opposite
358 11 The epistemic family of probability transforms

Fig. 11.4. Left: the b.f. of the second scenario. Right: corresponding profile of both relative
plausibility of singletons and pignistic probability.

decision alternatives, and it is quite difficult to say which one makes more sense.
Should we privilege precision or evidence support?
Some insight on this issue comes from recalling that higher-size focal elements are
expression of ‘epistemic’ uncertainty (in Smets’ terminology), as they come from
missing data/lack of information on the problem at hand. Besides, by their own
nature they allow for a lower resolution for decision making purposes (in the second
scenario above, if we trust (C1) we are left uncertain on whether to pick one of |A|
outcomes, while if adopt (C2) the uncertainty is restricted to |B| outcomes).
In conclusion, it is not irrational, in case of conflicting evidence, to judge larger
size focal elements ‘less reliable’ (as carriers of greater ignorance) than more fo-
cused focal elements. It follows a preference for approximation criterion (C2),
which ultimately supports the case for the relative belief operator and its natural
extension (11.13).

11.5 Geometry in the space of pseudo belief functions


After studying the dual properties of the pair of epistemic transforms, and proposing
a generalisation of the relative belief operator in singular cases, it is time to complete
our understanding of the geometry of probability transformations by considering the
geometric behaviour of epistemic mappings.
In Section 10.1 we had a quick glance at their geometry in the binary case. Using
the terminology we acquired in the last two Chapters, we learned that transforma-
tions of the affine family coincide on a binary frame. On the other hand, we saw that
the members of the epistemic family, the relative belief and the relative plausibility
of singletons, do not follow the same pattern.
To understand the geometry of transformations of the epistemic family, we need to
introduce a pair of pseudo belief functions related to them.
11.5 Geometry in the space of pseudo belief functions 359

11.5.1 Plausibility of singletons and relative plausibility

Let us call plausibility of singletons the pseudo belief function plb : 2Θ → [0, 1]
with Moebius inverse mplb : 2Θ → R given by:
X
mplb (x) = plb (x) ∀x ∈ Θ, mplb (Θ) = 1 − plb (x) = 1 − kplb ,
x
mplb (A) = 0 ∀A ⊆ Θ : |A| =
6 1, n.

Indeed mplb meets the normalization constraint


X X  X 
mplb (A) = plb (x) + 1 − plb (x) = 1.
A⊆Θ x∈Θ x∈Θ

Then, as 1 − kplb ≤ 0, plb is a pseudo belief function (Section 6.1). Note that plb is
instead not a plausibility function.
In the belief space plb is represented by the vector
X X
plb = plb (x) bx + (1 − kplb ) bΘ = plb (x) bx , (11.14)
x∈Θ x∈Θ

as bΘ = 0 is the origin of the reference frame in RN −2 .


Theorem 63. 5 pl fb is the intersection of the line joining vacuous belief function bΘ
and plausibility of singletons plb with the probability simplex.

Proof. By Equations (4.16) (11.14) we have that:


X
pl
fb = fb (x)bx = pl /kpl .
pl b b
x∈Θ

Since bΘ = 0 is the origin of the reference frame, pl fb lies on the segment


Cl(plb , bΘ ). This in turn implies plb = Cl(plb , bΘ ) ∩ P.
f

The geometry of plfb depends on that of pl through Theorem 63. In the binary case
b
plb = plb , and we go back to the situation of Figure 10.1.

11.5.2 Belief of singletons and relative belief

By definition of intersection probability p[b] (10.12), given in Section 11.2.5, it fol-


lows that: X X
p[b] = mb (x)bx + β[b] (plb (x) − mb (x))bx
x∈Θ x∈Θ
X X (11.15)
= (1 − β[b]) mb (x)bx + β[b] plb (x)bx .
x∈Θ x∈Θ

5
This result, at least in the binary case, appeared in [308] too.
360 11 The epistemic family of probability transforms

Analogously to what done for the plausibility of singletons, we can define the belief
function (belief of singletons) b : 2Θ → [0, 1] with basic probability assignment:
mb (x) = mb (x), mb (Θ) = 1 − kmb , mb (A) = 0 ∀A ⊆ Θ : |A| = 6 1, n,
P
where the scalar quantity kmb = x∈Θ mb (x) measures the total mass of single-
tons. The belief of singletons assigns to Θ all the mass b gives to non-singletons. In
the belief space b is represented by the vector:
X X
b= mb (x)bx + (1 − kmb )bΘ = mb (x)bx (11.16)
x∈Θ x∈Θ

(as again bΘ = 0). Equation (11.15) can then be written as:


p[b] = (1 − β[b]) b + β[b] plb . (11.17)
Namely, the intersection probability is the convex combination of belief and plausi-
bility of singletons with coefficient β[b].
In the binary case b = b and plb = plb , so that the plausibility of singletons is a
plausibility function (Figure 10.1).

11.5.3 A three plane geometry

The geometry of relative plausibility and belief of singletons can therefore be re-
duced to that of plb , b.
As we know, a belief function b and the corresponding plausibility function plb
have the same coordinates with respect to the vertices bA , plA of the belief and the
plausibility space, respectively:
X X
b= mb (A)bA ↔ plb = mb (A)plA .
∅6=A⊆Θ ∅6=A⊆Θ

Just as the latter form a pair of ‘dual’ vectors in the respective spaces, plausibility plb
and belief b of singletons have duals (that we can denote by pl b and bb) characterised
b
by having the same coordinates in the plausibility space: b ↔ bb, plb ↔ pl b .
b
They can be written as:
X
bb = mb (x)plx + (1 − kmb )plΘ = b + (1 − kmb )plΘ
x∈Θ
X (11.18)
pl
b =
b plb (x)plx + (1 − kplb )plΘ = plb + (1 − kplb )plΘ
x∈Θ

(as plx = bx for all x ∈ Θ), where, again, plΘ = 1.


We can prove that (see Chapter appendix):
Theorem 64. The line passing through the duals (11.18) of plausibility of single-
tons (11.14) and belief of singletons (11.16) crosses p[b] too, and
b − bb) + bb = p[b] = β[b] (pl − b) + b.
β[b] (pl (11.19)
b b
11.5 Geometry in the space of pseudo belief functions 361

If kmb 6= 0 the geometry of relative plausibility and belief of singletons can there-
fore be described in terms of the three planes

a(plb , p[b], pl
b ),
b a(bΘ , pl
fb , plΘ ), a(bΘ , eb, plΘ )

(see Figure 11.5), where eb = b/kmb is the relative belief of singletons. Namely:

^ −
pl b pl b
~
pl b

φ1
φ2 p[b]

φ3
bΘ plΘ


b b^
~
b
P

Fig. 11.5. Planes and angles describing the geometry of relative plausibility and belief of
singletons, in terms of plausibility of singletons plb and belief of singletons b. Geometrically
two lines or three points are sufficient to uniquely determine a plane passing through them.
The two lines a(b, plb ) and a(b b, pl
b ) uniquely determine a plane a(b, p[b], b
b b). Two other
planes are uniquely determined by the origins of belief bΘ and plausibility plΘ spaces to-
gether with either the relative plausibility of singletons ple or the relative belief of singletons
b
b: a(bΘ , plb , plΘ ) (top of the diagram) and a(bΘ , b, plΘ ) (bottom), respectively. The angles
e e e
φ1 [b], φ2 [b], φ3 [b] are all independent, as the value of each of them reflects a different prop-
erty of the original belief function b. The original belief b and plausibility plb functions do
not appear here for sake of simplicity. They play a role only through the related plausibility
of singletons (11.14) and belief of singletons (11.16).

1. p[b] is the intersection of a(b, plb ) and a(bb, pl


b ), and has the same affine coordi-
b
nate on the two lines (Section 11.5.1). Those two lines then span a plane which
we can denote by:
a(b, p[b], bb) = a(plb , p[b], pl
b ).
b

2. Furthermore, by definition,
362 11 The epistemic family of probability transforms

fb − bΘ = (pl − bΘ )/kpl
pl (11.20)
b b

while (11.18) implies pl b − (1 − kpl )plΘ ]/kpl so that


fb = pl /kpl = [pl
b b b b b

fb − plΘ = (pl
pl b − plΘ )/kpl . (11.21)
b b

By comparing (11.20) and (11.21) we realize that pl fb has the same affine co-
ordinate on the two lines a(bΘ , plb ) and a(plΘ , plb ), which intersect exactly in
b
pl
fb . The functions bΘ , plΘ , pl
fb , pl and pl
b
b therefore determine another plane
b
which we can denote by:
a(bΘ , pl
fb , plΘ ).

3. Analogously, by definition, eb − bΘ = (b − bΘ )/kmb while (11.18) yields eb −


plΘ = (bb − plΘ )/kmb . The relative belief of singletons then has the same affine
coordinate on the two lines a(bΘ , b) and a(plΘ , bb). The latter intersect exactly
in eb. The quantities bΘ , plΘ , eb, b and bb thus determine a single plane denoted by:

a(bΘ , eb, plΘ ).

11.5.4 A geometry of three angles

In the binary case, b = b = pl b = [mb (x), mb (y)]0 , plb = pl = bb = [1 −


b b
0
mb (y), 1 − mb (x)] and all these quantities are coplanar. This suggests a description
of the geometry of plfb , eb in terms of the three angles:

fb\
φ1 [b] = pl
\b
p[b] plb , φ2 [b] = b p[b] pl e\
b , φ3 [b] = b bΘ plb (11.22)
f

(cfr. Figure 11.5 again). Such angles are all independent, and each of them has
a distinct interpretation in terms of degrees of belief as different values of theirs
reflect different properties of the belief function b and the associated probability
transformations.

Orthogonality condition for φ1 [b] We know that the dual line a(b, plb ) is always
orthogonal to P (Section 10.2). The line a(b, plb ), though, is not in general orthog-
onal to the probabilistic subspace.
Formally, the simplex P = Cl(bx , x ∈ Θ) determines an affine (or vector) space
a(P) = a(bx , x ∈ Θ). A set of generators for a(P) is formed by the n − 1 vectors:
by − bx , ∀y ∈ Θ, y 6= x, after picking an arbitrary element x ∈ Θ as reference. The
non-orthogonality of a(b, plb ) and a(P) can therefore be expressed by saying that
for at least one of such basis vectors the scalar product h·i with the difference vector
plb − b (which generates the line a(b, plb ) ) is non-zero:

∃y 6= x ∈ Θ s.t. hplb − b, by − bx i =
6 0. (11.23)

Recall that φ1 [b] as defined in (11.22) is the angle between a(b, plb ) and the specific
line a(eb, pl
fb ) laying on the probabilistic subspace.
11.5 Geometry in the space of pseudo belief functions 363

The condition under which orthogonality holds has a significant interpretation


in terms of the uncertainty expressed the belief function b on the probability value
of each singleton.
Theorem 65. The line a(b, plb ) is orthogonal to the vector space generated by P
(and therefore φ1 [b] = π/2) if and only if:
X
mb (A) = plb (x) − mb (x) = const ∀ x ∈ Θ.
A)x

Relative uncertainty of singletons If b is Bayesian, plb (x) − mb (x) = 0 ∀x ∈ Θ.


If b is not Bayesian, there exists at least a singleton x such that plb (x) − mb (x) > 0.
In this case we can define the probability function
X plb (x) − mb (x) plb − b
R[b] = bx = . (11.24)
kplb − kmb kplb − kmb
x∈Θ

The value R[b](x) indicates how much the uncertainty plb (x)−mb (x) on the proba-
bility value on x ‘weighs’ on the total uncertainty on the probabilities of singletons.
It is the natural to call it relative uncertainty on the probabilities of singletons. When
b is Bayesian, R[b] does not exist.
Corollary 13. The line a(b, plb ) is orthogonal to P iff the relative uncertainty on
the probabilities of singletons is the uniform probability: R[b](x) = 1/|Θ| for all
x ∈ Θ.
If this holds the evidence carried by b yields the same uncertainty on the probability
value of all singletons. By definition of p[b] (10.12) wehave that:

p[b](x) = mb (x) + β[b](plb (x) − mb (x))


1−kmb
= mb (x) + P (plb (y)−m b (y))
(plb (x) − mb (x))
y∈Θ
1−kmb
= mb (x) + (1 − kmb )R[b](x) = mb (x) + n ,

namely the intersection probability re-assigns the mass originally given by b to non-
singletons to each singleton on an equal basis.

Dependence of φ2 on the relative uncertainty The value of φ2 [b] also depends on


the relative uncertainty on the probabilities of singletons.
Theorem 66. Denote by 1 = plΘ the vector [1, .., 1]0 . Then

h1, R[b]i
cos(π − φ2 [b]) = 1 − , (11.25)
kR[b]]k2

where again h1, R[b]i denotes the usual scalar product between the unit vector 1 =
[1, .., 1]0 and the vector R[b] ∈ RN −2 .
364 11 The epistemic family of probability transforms

We can observe that:


1. φ2 [b] = π (cos = 1) iff h1, R[b]i = 0.
But this never happens, as h1, pi = 2n−1 − 1 ∀p ∈ P (see proof of Theorem 66).
2. φ2 [b] = 0 (cos = −1) iff kR[b]k2 = h1, R[b]i/2.
The last situation also never materializes for belief functions defined on non-trivial
frames of discernment.
b ) never coincide ∀b ∈ B
Theorem 67. φ2 [b] 6= 0 and the lines a(b, plb ), a(bb, pl b
when |Θ| > 2; instead φ2 [b] = 0 ∀b ∈ B when |Θ| ≤ 2.

Example Let us see that by comparing the situations of the 2-element and 3-element
frames. If Θ = {x, y} we have that plb (x) − mb (x) = mb (Θ) = plb (y) − mb (y),
and the relative uncertainty function is:
1 1
R[b] = bx + by = P ∀b
2 2
(where P denotes the uniform probability on Θ, Figure 4) and R[b] = 21 1 = 12 plΘ .
In the binary case the angle φ2 [b] is zero for all belief functions. As we learned
b = b = pl
b , plb = pl = bb and the geometry of the epistemic family is planar.
b b
On the other side, if Θ = {x, y, z} not even the vacuous belief function bΘ
meets condition 2. In that case R[bΘ ] = P = 13 bx + 13 by + 13 bz and R is still the
uniform probability. But hR[bΘ ], 1i = 3, while
Dh 1 1 1 2 2 2 i0 h 1 1 1 2 2 2 i0 E 15
hR[bΘ ], R[bΘ ]i = hP, Pi = , = .
333333 333333 9

Unifying condition for the epistemic family The angle φ3 [b] is related to the con-
dition under which relative plausibility of singletons and relative belief of singletons
coincide. As a matter of fact, the angle is nil iff eb = pl
fb , which is equivalent to:

mb (x)/kmb = plb (x)/kplb ∀x ∈ Θ.

Again, this necessary and sufficient condition for φ3 [b] = 0 can expressed in terms
of the relative uncertainty on the probabilities of singletons, as

R[b](x) = (plb (x) −mb (x))/(kplb − kmb )


1 kplb (11.26)
= kplb −kmb kmb mb (x) − mb (x) = mb (x)/kmb ∀x ∈ Θ,

i.e., R[b] = eb, with R[b] ‘squashing’ pl fb onto eb from the outside. In this case the
quantities plb , plb , plb , p[b], b, b, b all lie in the same plane.
b f b e

11.5.5 Singular case

We need to pay some attention to the singular case (from a geometric


P point of view)
in which the relative belief of singletons does not exist: kmb = x mb (x) = 0.
11.6 Geometry in the probability simplex 365

Fig. 11.6. The angle φ2 [b] is nil for all belief functions in the size-two frame Θ = {x, y}, as
R[b] = [1/2, 1/2]0 is parallel to plΘ = 1 for all b.

As a matter of fact the belief of singletons b still exists, even in this case, and by
Equation (11.16) b = bΘ , while bb = plΘ by duality. Recall the description in terms
of planes we gave in Section 11.5.3. In this case the first two planes a(b, p[b], bb) =
a(a(bb, pl
b ), a(b, pl )) = a(a(bΘ , pl
b b
b ), a(plΘ , pl )) = a(bΘ , pl
b b
fb , plΘ ) coincide,
while the third one a(bΘ , b, plΘ ) simply does not exist. The geometry of the epis-
e
temic family reduces to a planar one (see Figure 11.7), which depends only on the
angle φ2 [b]. It is remarkable that, in this case:
1 − kmb   1
p[b](x) = mb (x) + plb (x) − mb (x) = plb (x) = pl
fb (x).
kplb − kmb kplb
Theorem 68. If a belief function b does not admit relative belief of singletons (as
b assigns zero mass to all singletons) then its relative plausibility of singletons and
intersection probability coincide.
Also, in this case the relative uncertainty on the probabilities of singletons coincides
with the relative plausibility of singletons too: R[b] = pl
fb = p[b] (see (11.24)).

11.6 Geometry in the probability simplex


The geometry of relative belief and plausibility of singletons in the space of all
(pseudo) belief functions is a function of three angles and planes. It is also interest-
366 11 The epistemic family of probability transforms

^ −
pl b pl b

~
φ2 p[b]= pl b =R[b]


b = bΘ P plΘ = b^

Fig. 11.7. Geometry of relative plausibility of singletons P


and relative uncertainty on the
probabilities of singletons in the singular case when kmb = x m(x) = 0.

ing, however, to see how they behave as probability distributions in the probability
simplex.
We can observe for instance that, as

R[b](kplb − kmb ) = plb − b = plfb · kpl − eb · km


b b

= plb · kplb − eb · kmb + kplb · eb − kplb · eb


f
fb − eb) + eb(kpl − km ),
= kplb (pl b b

R[b] lies on the line joining eb and pl


fb :

kplb fb − eb).
R[b] = eb + (pl (11.27)
kplb − kmb
Let us study the situation in a simple example.

11.6.1 Geometry in the 3-element frame

Consider a belief function b1 with basic belief assignment

mb1 (x) = 0.5, mb1 (y) = 0.1, mb1 ({x, y}) = 0.3, mb1 ({y, z}) = 0.1

on Θ = {x, y, z}. The probability intervals of the singletons have widths:

plb1 (x) − mb1 (x) = mb1 ({x, y}) = 0.3,


plb1 (y) − mb1 (y) = mb1 ({x, y}) + mb1 ({y, z}) = 0.4,
plb1 (z) − mb1 (z) = mb1 ({y, z}) = 0.1.
11.6 Geometry in the probability simplex 367

Their relative uncertainty is therefore R[b1 ](x) = 3/8, R[b1 ](y) = 1/2, R[b1 ](z) =
1/8. R[b1 ] is plotted as a point of the probability simplex P = Cl(bx , by , bz ) in
Figure 11.8. Its distance from the uniform probability P = [1/3, 1/3, 1/3]0 in P is:
hX i1/2
kP − R[b1 ]k = (1/3 − R[b1 ](x))2
h x1 3 2  1 1 2  1 1 2 i1/2
= − + − + − = 0.073.
3 8 3 2 3 8
The related intersection probability (as kmb1 = 0.6, kplb1 = 0.8 + 0.5 + 0.1 = 1.4,
β[b1 ] = (1 − 0.6)/(1.4 − 0.6) = 1/2)

p[b1 ](x) = 0.5 + 12 0.3 = 0.65, p[b1 ](y) = 0.1 + 12 0.4 = 0.3,
p[b1 ](z) = 0 + 12 0.1 = 0.05,

is plotted as a square (second from the left) on the dotted triangle of Figure 11.8.
A larger uncertainty on the probability of singletons is associated with b2

mb2 (x) = 0.5, mb2 (y) = 0.1, mb2 (z) = 0, mb2 ({x, y}) = 0.4,

in which all the higher-size mass is assigned to a single focal element {x, y}. In that
case plb2 (x) − mb2 (x) = 0.4, plb2 (y) − mb2 (y) = 0.4, plb2 (z) − mb2 (z) = 0 so
that the relative uncertainty on the probabilities of singletons is R[b2 ](x) = 1/2,
R[b2 ](y) = 1/2, R[b2 ](z) = 0 with a Euclidean distance from P equal to d2 =
[(1/6)2 + (1/6)2 + (1/3)2 ]1/2 = 0.408.
The corresponding intersection probability (as β[b2 ] = (1 − 0.6)/0.8 is still 1/2) is
the first square from the left on the above dotted triangle:
1 1
p[b2 ](x) = 0.5 + 0.4 = 0.7, p[b2 ](y) = 0.1 + 0.4 = 0.3, p[b2 ](z) = 0.
2 2
If we spread the mass of non-singletons on to two focal elements to get a third
belief function b3 :

mb3 (x) = 0.5, mb3 (y) = 0.1, mb3 ({x, y}) = 0.2, mb3 ({y, z}) = 0.2

we get the following uncertainty intervals:

plb3 (x) − mb3 (x) = 0.2, plb3 (y) − mb3 (y) = 0.4,
plb3 (z) − mb3 (z) = 0.2,

which correspond to R[b3 ](x) = 1/4, R[b3 ](y) = 1/2, R[b3 ](z) = 1/4 and a dis-
tance from P of 0.2041. The intersection probability assumes the values p[b3 ](x) =
0.5 + 12 0.2 = 0.6, p[b3 ](y) = 0.1 + 12 0.4 = 0.3, p[b3 ](z) = 0 + 12 0.2 = 0.1.
Assigning a certain mass to the singletons determines a set of belief functions
compatible with such a probability assignment. In our example b1 , b2 and b3 all
belong to the following such set:
368 11 The epistemic family of probability transforms

Fig. 11.8. Locations of the members of the epistemic family in the probability simplex
P = Cl(bx , by , bz ) for a 3-element frame Θ = {x, y, z}. The relative uncertainty on the
probability of singletons R[b], the relative plausibility of singletons plfb and the intersection
probability p[b] for the family of belief functions on the 3-element frame defined by the mass
assignment (11.28) lie on the dashed, solid and dotted triangles respectively. The locations
of R[b1 ], R[b2 ], R[b3 ] for the three belief functions b1 , b2 and b3 discussed in the example
are shown. The relative plausibility of singletons and the intersection probability for the same
b.f.s appear on the corresponding triangles in the same order. The relative belief of singletons
b lies on the bottom-left square for all the belief functions of the considered family (11.28).
e

 X 
b : mb (x) = 0.5, mb (y) = 0.1, mb (z) = 0, mb (A) = 0.4 . (11.28)
|A|>1

The corresponding relative uncertainty on the probability of singletons is con-


strained to live in the simplex delimited by the dashed lines in Figure 11.8. Of the
three belief functions we considered, b2 corresponds to the maximal imbalance be-
tween the masses of size-2 focal elements, as it assigns the whole mass to {x, y}.
As a result, R[b2 ] has maximal distance from the uniform probability P. The belief
function b3 spreads instead the mass equally between {x, y} and {y, z}. As a result,
R[b3 ] has minimal distance from P.
Similarly, the intersection probability (11.15) is constrained to live in the simplex
delimited by the dotted lines.
All those belief functions have by definition (4.18) the same relative belief eb. The
lines determined by R[b] and p[b] for each admissible belief function b in the set
(11.28) intersect as a matter of fact in
11.6 Geometry in the probability simplex 369

eb(x) = 5/6, eb(y) = 1/6, b(z) = 0


(bottom left square). This is due to the fact that:
Xh i X
p[b] = mb (x) + (1 − kmb )R[b](x) bx = mb (x)bx + (1 − kmb )R[b]
x x
= kmb eb + (1 − kmb )R[b],

so that eb is collinear with R[b], p[b].


Finally, the associated relative plausibilities of singletons also live in a simplex
(solid lines in Figure 11.8). The probabilities pl b1
e , and pl
e , pl
b2
e are identified as
b3
squares located in the same order as above. According to (11.27), pl e , eb, and R[b]
b
are also collinear for all belief functions b.

11.6.2 Singular case in the 3-element frame

Let us pay attention to the singular case. For each belief function b such that
mb (x) = mb (y) = mb (z) = 0 the plausibilities of the singletons of a size-3 frame
are:
plb (x) = mb ({x, y}) + mb ({x, z}) + mb (Θ) = 1 − mb ({y, z}),
plb (y) = mb ({x, y}) + mb ({y, z}) + mb (Θ) = 1 − mb ({x, z}),
plb (z) = mb ({x, z}) + mb ({y, z}) + mb (Θ) = 1 − mb ({x, y}).
Furthermore, by hypothesis plb (w) − mb (w) = plb (w) for all w ∈ Θ, so that:
X
(plb (w) − mb (w)) = plb (x) + plb (y) + plb (z) =
w
= 2(mb ({x, y}) + mb ({x, z}) + mb ({y, z})) + 3mb (Θ) = 2 + mb (Θ)
and we get:
P
1 − w mb (w) 1 1
β[b] = P =P = .
w (plb (w) − mb (w)) pl
w b (w) 2 + m b (Θ)

Therefore:
plb (x) − mb (x) 1 − mb ({y, z})
R[b](x) = P = ,
w (pl b (w) − m b (w)) 2 + mb (Θ)
1 − mb ({x, z}) 1 − mb ({x, y})
R[b](y) = , R[b](z) = ;
2 + mb (Θ) 2 + mb (Θ)
1 − mb ({y, z})
p[b](x) = mb (x) + β[b](plb (x) − mb (x)) = β[b]plb (x) = ,
2 + mb (Θ)
1 − mb ({x, z}) 1 − mb ({x, y})
p[b](y) = , p[b](z) = ;
2 + mb (Θ) 2 + mb (Θ)
fb (x) = Pplb (x) = 1 − mb ({y, z}) ,
pl
w plb (w) 2 + mb (Θ)
1 − m b ({x, z}) f 1 − mb ({x, y})
pl
fb (y) = , plb (z) =
2 + mb (Θ) 2 + mb (Θ)
370 11 The epistemic family of probability transforms

Fig. 11.9. Simplices spanned by R[b] = p[b] = pl e and BetP [b] = π[b] in the probability
b
simplex for the cardinality 3 frame in the singular case mb (x) = mb (y) = mb (z) = 0, for
different values of mb (Θ). The triangle spanned by R[b] = p[b] = pl e (solid lines) coincides
b
with that spanned by BetP [b] = π[b] for all b such that mb (Θ) = 0. For mb (Θ) = 1/2,
R[b] = p[b] = pl e spans the triangle Cl(p01 , p02 , p03 ) (dotted lines) while BetP [b] = π[b]
b
spans the triangle Cl(p001 , p002 , p003 ) (dashed lines). For mb (Θ) = 1 both groups of transforma-
tions reduce to a single point P.

and R[b] = plfb = p[b] as stated by Theorem 68.


While in the non-singular case all those quantities live in different simplices that
‘converge’ to eb (Figure 11.8), when eb does not exist all such simplices coincide.
For a given value of mb (Θ) this is the triangle with vertices
h i0 h i0
1 1 mb (Θ) 1 mb (Θ) 1
, ,
2+mb (Θ) 2+mb (Θ) 2+mb (Θ) , , ,
2+mb (Θ) 2+mb (Θ) 2+mb (Θ) ,
h i0 (11.29)
mb (Θ) 1 1
2+mb (Θ) , 2+mb (Θ) , 2+mb (Θ) .

As a reference, for mb (Θ) = 0 the latter is the triangle delimited by the points
p1 , p2 , p3 in Figure 11.9 (solid line). For mb (Θ) = 1 we get a single point: P (the
central black square in the Figure). For mb (Θ) = 1/2, instead, (11.29) yields

Cl(p01 , p02 , p03 ) = Cl([2/5, 2/5, 1/5]0 , [2/5, 1/5, 2/5]0 , [1/5, 2/5, 2/5]0 )

(the dotted triangle in Figure 11.9). For comparison let us compute the values of
Smets’ pignistic probability (which in the 3-element case coincide with the orthog-
onal projection [267], see Section 10.4.5). We get:
11.7 Equality conditions for both families of approximations 371
mb ({x,y})+mb ({x,z})
BetP [b](x) = 2 + mb3(Θ) ,
mb ({x,y})+mb ({y,z})
BetP [b](y) = 2 + mb3(Θ) ,
mb ({x,z})+mb ({y,z})
BetP [b](z) = 2 + mb3(Θ) .

Thus, the simplices spanned by the pignistic function for the same sample values of
mb (Θ) are (Figure 11.9 again): mb (Θ) = 1 → P; mb (Θ) = 0 → Cl(p1 , p2 , p3 );
mb (Θ) = 1/2 → Cl(p001 , p002 , p003 ) where

p001 = [5/12, 5/12, 1/6]0 , p002 = [5/12, 1/6, 5/12]0 , p003 = [1/6, 5/12, 5/12]0

(the vertices of the dashed triangle in the figure). The behavior of the two families
of probability transformations is rather similar, at least in the singular case. In both
cases approximations are allowed to span only a proper subset of the probability
simplex P, stressing the pathological situation of the singular case itself.

11.7 Equality conditions for both families of approximations


The rich tapestry of results of Sections 11.5 and 11.6 completes our knowledge of
the geometry of the relation between belief functions and their probability transfor-
mations which started with the affine family in Chapter 10.
The epistemic family is formed by transformations which depend on the balance
between the total plausibility kplb of the elements of the frame, and the total mass
kmb assigned to them. This measure of the relative uncertainty on the probabilities
of singletons is symbolized by the probability distribution R[b].
The examples of Section 11.6 shed some light on the relative behavior of all
probability transformations, at least in the probability simplex. It is now time to
understand under which conditions the probabilities generated by transformations of
different families reduce to the same probability distribution. Theorem 68 is a first
step in this direction: when b does not admit relative belief, its relative plausibility
pl
e and intersection probability p[b] coincide. Once again we gain insight from the
b
binary case.

11.7.1 Equal plausibility distribution in the affine family

Let us first focus on functions of the affine family. In particular, let us consider the
orthogonal projection (10.26) of b onto P [267]

1 + |Ac |21−|A| 1 − |A|21−|A|


X   X  
π[b](x) = mb (A) + mb (A) ,
n n
A⊇{x} A6⊃{x}

and the pignistic transformation (4.15). We can prove that:


Lemma 14. The difference π[b](x) − BetP [b](x) between the probability values of
orthogonal projection and pignistic function is
372 11 The epistemic family of probability transforms

1 − |A|21−|A| 1 − |A|21−|A|
X   X  
mb (A) − mb (A) . (11.30)
n |A|
A⊆Θ A⊇{x}

An immediate consequence of Lemma 14 is that:


Theorem 69. Orthogonal projection and pignistic function coincide iff
X |Ac | X
mb (A)(1 − |A|21−|A| ) = mb (A)(1 − |A|21−|A| ) ∀x ∈ Θ.
|A|
A⊇{x} A6⊃{x}
(11.31)
Theorem 69 gives an exhaustive but rather arid description of the relation between
π[b] and BetP [b]. More significant sufficient conditions can be given in terms of
belief values. Let us denote by
X
plb (x; k) = mb (A)
A⊃{x},|A|=k

the support focal elements of size k provide to each singleton x.


Corollary 14. Each of the following is a sufficient condition for the equality of the
pignistic and orthogonal transformations of a belief function b (BetP [b] = π[b]):
1. mb (A) = 0 for all A ⊆ Θ such that |A| = 6 1, 2, n;
2. the mass of b is equally distributed among all the focal elements A ⊆ Θ of the
same size |A| = k, for all sizes k = 3, ..., n − 1:
P
|B|=k mb (B)
mb (A) = , ∀A : |A| = k, ∀k = 3, .., n − 1;
n
 
k

3. for all singletons x ∈ Θ, and for all k = 3, .., n − 1

plb (x; k) = const = plb (·; k). (11.32)

If mass is equally distributed among higher-size events the orthogonal projection is


the pignistic function (Condition 2). The probability closest to b (in the Euclidean
sense) is also the barycenter of the simplex P[b] of consistent probabilities.
This is also the case when events of the same size contribute with the same amount
to the plausibility of each singleton (Condition 3).
It is easy to see that Condition 1 implies (is stronger than) Condition 2 which in turn
implies Condition 3. All of them are met by belief functions on size-2 frames. In
particular, Corollary 14 implies that
Corollary 15. BetP [b] = π[b] for |Θ| ≤ 3.
11.7 Equality conditions for both families of approximations 373

11.7.2 Equal plausibility distribution as a general condition

As a matter of fact, condition (Equation (11.32)) on the equal distribution of plau-


sibility provides an equality condition for probability transformations of b of both
families.
Consider again the binary case of Figure 10.1. We can appreciate that belief
functions such that mb (x) = mb (y) lay on the bisector of the first quadrant, which
is orthogonal to P. Their relative plausibility is then equal to their orthogonal pro-
jection π[b]. Theorem 65 can indeed be interpreted in terms of equal distribution of
plausibility among singletons. If Equation (11.32) is met for all k =P 2, ..., n−1 (this
is trivially true for k = n) then the uncertainty plb (x) − mb (x) = A){x} mb (A)
on the probability value of each singleton x ∈ Θ becomes:

X n
X X n−1
X
mb (A) = mb (A) = mb (Θ) + plb (·; k), (11.33)
A){x} k=2 |A|=k,A⊃{x} k=2

which is constant for all singletons x ∈ Θ.


The following is then a consequence of Theorem 65 and Equation (11.33).
Corollary 16. If plb (x; k) = const for all x ∈ Θ and for all k = 2, ..., n − 1 then
the line a(b, plb ) is orthogonal to P, and the relative uncertainty on the probabilities
of the singletons is the uniform probability R[b] = P.
The quantity plb (x; k) seems then to be connected to geometric orthogonality in
the belief space. We say that a belief function b ∈ B is orthogonal to P when the
−→
vector b 0 joining the origin 0 of RN −2 with b is orthogonal to it. We showed in
Chapter 10 that this is the case if and only if (10.24):
X X
mb (A)21−|A| = mb (A)21−|A|
A⊃{y},A6⊃{x} A⊃{x},A6⊃{y}

for each pair of distinct singletons x, y ∈ Θ, x 6= y. For instance, the uniform


Bayesian belief function P is orthogonal to P.
Again a sufficient condition for (10.24) can be given in terms of equal distribution
of plausibility. Confirming the intuition given by the binary case, in this case all
probability transformations of b converge to the same probability.
Theorem 70. If plb (x; k) = const = plb (·; k) for all k = 1, ..., n − 1 then b is
orthogonal to P, and
fb = R[b] = π[b] = BetP [b] = P.
pl (11.34)

We can summarise our findings by stating that, if focal elements of the same
size equally contribute to the plausibility of each singleton (plb (x; k) = const) the
following consequences on the relation between all probability transformations and
their geometry hold, as a function of the range of values of |A| = k for which the
hypothesis is true:
374 11 The epistemic family of probability transforms

∀k = 3, ..., n : BetP [b] = π[b];

∀k = 2, ..., n : a(b, plb )⊥P;

∀k = 1, ..., n : b⊥P, pl
fb = eb = R[b] = P = BetP [b] = p[b] = π[b].
Less binding conditions may be harder to formulate - we plan on studying them in
the near future.

Chapter appendix: proofs


Proof of Theorem 56
We just need a simple counterexample. Consider a belief function b : 2Θ → [0, 1] on
. P
Θ = {x1 , x2 , ..., xn }, kmb = x∈Θ mb (x) the total mass it assigns to singletons,
with b.p.a. mb (xi ) = kmb /n for all i, mb ({x1 , x2 }) = 1 − kmb . Then
kmb n − 2
b({x1 , x2 }) = 2 · + 1 − kmb = 1 − kmb ,
n n
1 2
b̃(x1 ) = b̃(x2 ) = ⇒ b̃({x1 , x2 }) = .
n n
For b̃ to be consistent with b (Equation (3.10)) it is necessary that b̃({x1 , x2 }) ≥
b({x1 , x2 }), namely:
2 n−2
≥ 1 − kmb ≡ kmb ≥ 1,
n n
which in turn reduces to kmb = 1. If kmb < 1 (b is not a probability) its relative
belief of singletons is not consistent.

Proof of Theorem 57
Let us pick for simplicity a frame of discernment with just three elements: Θ =
{x1 , x2 , x3 }, and the following b.p.a.:
k
mb ({xi }c ) = ∀i = 1, 2, 3, mb ({x1 , x2 }c ) = mb ({x3 }) = 1 − k.
3
In this case, the plausibility of {x1 , x2 } is obviously: plb ({x1 , x2 }) = 1 − (1 −
k) = k, while the plausibilities Pof the singletons are: plb (x1 ) = plb (x2 ) = 2/3k,
plb (x3 ) = 1 − 1/3k. Therefore x∈Θ plb (x) = 1 + k and the relative plausibility
values are:
˜ (x1 ) = pl
pl ˜ (x2 ) = 2/3k , pl ˜ (x3 ) = 1 − 1/3k .
b b b
1+k 1+k
˜ to be consistent with b we would need that:
For plb

˜ ({x1 , x2 }) = pl
pl ˜ (x2 ) = 4 k 1 ≤ plb ({x1 , x2 }) = k,
˜ (x1 ) + pl
b b b
3 1+k
˜ 6∈ P[b].
which happens if and only if k ≥ 1/3. Therefore, for k < 1/3 pl b
11.7 Equality conditions for both families of approximations 375

Proof of Theorem 58

Each pseudo belief function admits a (pseudo) plausibility


P function, as in the case
of standard b.f.s, which can be computed as plς (A) = B∩A6=∅ mς (B).
For the class of pseudo belief functions ς which correspond to the plausibility of
some belief function
P b (ς = plb for some b ∈ B), their pseudo plausibility function
is plplb (A) = B∩A6=∅ µb (B), as µb (8.1) is the Moebius inverse of plb .
When applied P to the elements x ∈ Θ of the common frame of b, plb this yields
plplb (x) = B3x µb (B) = mb (x) by Lemma 12, which implies

˜ b ](x) = P plplb (x)


pl[pl =P
mb (x)
= b̃[b].
pl
y∈Θ plb (y) y∈Θ mb (y)

Proof of Theorem 59

The basic plausibility assignment of pl1 ⊕ pl2 is, according to (2.6),


1 X
µpl1 ⊕pl2 (A) = µ1 (X)µ2 (Y ).
k(pl1 , pl2 )
X∩Y =A

Therefore, according to Lemma 12, the corresponding relative belief of singletons


b̃[pl1 ⊕ pl2 ](x) (11.5) is proportional to:
X
mpl1 ⊕pl2 (x) = µpl1 ⊕pl2 (A)
A⊇{x}
X X X
µ1 (X)µ2 (Y ) µ1 (X)µ2 (Y )
A⊇{x} X∩Y =A X∩Y ⊇{x}
= = ,
k(pl1 , pl2 ) k(pl1 , pl2 )
(11.35)
where mpl1 ⊕pl2 (x) denotes the b.p.a. of the (pseudo) belief function which corre-
P function pl1 ⊕ pl2 .
sponds to the plausibility
On the other hand, as X⊇{x} µb (X) = mb (x):
X X
b̃[pl1 ](x) ∝ m1 (x) = µ1 (X), b̃[pl2 ](x) ∝ m2 (x) = µ2 (X).
X⊇{x} X⊇{x}

Their Dempster’s combination is therefore:


  
X X X
(b̃[pl1 ]⊕b̃[pl2 ])(x) ∝  µ1 (X)  µ2 (Y ) = µ1 (X)µ2 (Y ),
X⊇{x} Y ⊇{x} X∩Y ⊇{x}

and by normalizing we get (11.35).


376 11 The epistemic family of probability transforms

Proof of Theorem 60

By taking the limit on both sides of Equation (11.6) we get:

b̃[plb∞ ] = (b̃[plb ])∞ . (11.36)

Let us consider the quantity (b̃[plb ])∞ = limn→∞ (b̃[plb ])n on the right hand side.
Since (b̃[plb ])n (x) = K(b(x))n (where K is a constant independent from x), and x
is the unique most believed state, it follows that:

(b̃[plb ])∞ (x) = 1, (b̃[plb ])∞ (y) = 0 ∀y 6= x. (11.37)

Hence, by (11.36), b̃[plb∞ ](x) = 1 and b̃[plb∞ ](y) = 0 for all y 6= x.

Proof of Theorem 61

By virtue of Equation (8.10) we can express a plausibility function as an affine


combination of all the categorical b.f.s bA . We can then apply the commutativity
property (11.9), obtaining
X
plb ⊕ p = ν(A)p ⊕ bA (11.38)
A⊆Θ

where P
µb (A)k(p, bA ) p(x)bx
ν(A) = P p ⊕ bA = x∈A ,
B⊆Θ b µ (B)k(p, bB ) k(p, bA )
P
with k(p, bA ) = x∈A p(x).
By replacing these expressions into (11.38) we get: plb ⊕ p =
X X  X  X  X
µb (A) p(x)bx p(x) µb (A) bx p(x)mb (x)bx
A⊆Θ x∈A x∈Θ A⊇{x}
= X X  = X  X  = x∈Θ
X ,
µb (B) p(y) p(y) µb (B) p(y)mb (y)
B⊆Θ y∈B y∈Θ B⊇{y} y∈Θ

once again by Lemma 12. But this is exactly b̃ ⊕ p, as a direct application of Demp-
ster’s rule (2.6) shows.

Proof of Lemma 13

We first need to analyze the behavior of the plausibility transform with respect to
affine combination of belief functions. By definition, the plausibility values of the
affine combination αb1 + (1 − α)b2 are pl[αb1 + (1 − α)b2 ](x) =
X X
= mαb1 +(1−α)b2 (A) = [αm1 (A) + (1 − α)m2 (A)]
A⊇{x} A⊇{x}
X X
=α m1 (A) + (1 − α) m2 (A) = αpl1 (x) + (1 − α)pl2 (x).
A⊇{x} A⊇{x}
11.7 Equality conditions for both families of approximations 377
P
Hence, after denoting by kpli = y∈Θ pli (y) the total plausibility of the single-
tons with respect to bi , the values of the relative plausibility of singletons can be
˜
computed as: pl[αb 1 + (1 − α)b2 ](x) =

αpl1 (x) + (1 − α)pl2 (x) αpl1 (x) + (1 − α)pl2 (x)


= P =
y∈Θ [αpl 1 (y) + (1 − α)pl 2 (y)] αkpl1 + (1 − α)kpl2
αpl1 (x) (1 − α)pl2 (x)
= +
αkpl1 + (1 − α)kpl2 αkpl1 + (1 − α)kpl2
αkpl1 ˜ (1 − α)kpl2 ˜ (x)
= pl1 (x) + pl
αkpl1 + (1 − α)kpl2 αkpl1 + (1 − α)kpl2 2
˜ (x) + β2 pl
= β1 pl ˜ (x).
1 2

Proof of Theorem 62
The proof follows the structure of that of Theorem 3 and Corollary 3 in [265], on
the commutativity of Dempster’s rule and convex closure.
Formally, we need to prove that:
P P ˜
1. whenever b = k αk bk , αk ≥ 0, k αk = 1, we have that pl[b] =
P ˜
k βk pl[bk ] for some convex coefficients βk ;
2. whenever p ∈ Cl(pl[b ˜ k ], k) (i.e., p = P βk pl[b
˜ k ] with βk ≥ 0, P βk = 1),
k P k
there exists a set of convex coefficients αk ≥ 0, k αk = 1 such that p =
˜ P αk bk ].
pl[ k
Now, condition 1. follows directly P
from Lemma 13. Condition 2., instead, amounts
to proving that there exist αk ≥ 0, k αk = 1 such that:
αk kplk
βk = P ∀k, (11.39)
j αj kplj

which is equivalent to
βk X βk
αk = · αj kplj ∝ ∀k
kplk j kplk
P βk
as j αj kplj does not depend on k. If we pick αk = kplk the system (11.39) is
met: by further normalization we obtain as desired.

Proof of Theorem 64
By Equation (11.18) bb − b = (1 − kmb )plΘ and pl
b − pl = (1 − kpl )plΘ . Hence:
b b b

b − bb) + bb =
β[b](pl b

 
= β[b] plb + (1 − kplb )plΘ − b − (1 − kmb )plΘ + b + (1 − kmb )plΘ
 
= β[b] plb − b + (kmb − kplb )plΘ + b + (1 − kmb )plΘ
 
= b + β[b](plb − b) + plΘ β[b](kmb − kplb ) + 1 − kmb .
378 11 The epistemic family of probability transforms

But by definition of β[b] (10.10):


1 − kmb
β[b](kmb − kplb ) + 1 − kmb = (kmb − kplb ) + 1 − kmb = 0,
kplb − kmb
and (11.19) is met.

Proof of Theorem 65

By definition of bA (bA (C) = 1 if C ⊇ A, 0 otherwise) we have that:


X X X
hbA , bB i = bA (C)bB (C) = 1·1= 1 = kbA∪B k2 .
C⊆Θ C⊇A,B C⊇A∪B

The scalar products of interest can then be written as:


DX E
hplb − b, by − bx i = (plb (z) − mb (z)) bz , by − bx
X z∈Θ
= (plb (z) − mb (z)) [hbz , by i − hbz , bx i]
z∈Θ h i
X
= (plb (z) − mb (z)) kbz∪y k2 − kbz∪x k2 ∀y 6= x.
z∈Θ

We can distinguish three cases:


– if z 6= x, y then |z ∪ x| = |z ∪ y| = 2 and the difference kbz∪x k2 − kbz∪y k2 goes
to zero;
– if z = x then kbz∪x k2 −kbz∪y k2 = kbx k2 −kbx∪y k2 = (2n−2 −1)−(2n−1 −1) =
−2n−2 where n = |Θ|;
– if instead z = y then kbz∪x k2 − kbz∪y k2 = kbx∪y k2 − kby k2 = 2n−2 .
Hence for all y 6= x:

hplb − b, by − bx i = 2n−2 (plb (y) − mb (y)) − 2n−2 (plb (x) − mb (x)),


P
and as A)x mb (A) = plb (x) − mb (x) the thesis follows.

Proof of Corollary 13

As a matter of fact:
X X X
mb (A) = (plb (x) − mb (x)) = kplb − kmb ,
x∈Θ A⊃x,A6=x x∈Θ

so that the condition of Theorem 65 can be written as:


X kplb − kmb
plb (x) − mb (x) = mb (A) = ∀x.
n
A⊃x,A6=x

1
P
Replacing this in (11.24) yields R[b] = x∈Θ n bx .
11.7 Equality conditions for both families of approximations 379

Proof of Theorem 66

By Equation (11.19) p[b] = b + β[b](plb − b).


After recalling that β[b] = (1 − kmb )(kplb − kmb ) we can write:
h i
plb − p[b] = plb − b + β[b](plb − b) = (1 − β[b])(plb − b)
kplb − 1 (11.40)
= (pl − b) = (kplb − 1) R[b]
kplb − kmb b

b = pl + (1 − kpl )plΘ by Equation


by definition (11.24) of R[b]. Moreover, as pl b b b
(11.18), we get:

b − p[b] = (pl
pl h b − plb ) + (plb − p[b]) i
b
b
= plb + (1 − kplb )plΘ − plb + (kplb − 1)R[b] (11.41)
= (1 − kplb )plΘ + (kplb − 1)R[b] = (kplb − 1)(R[b] − plΘ ).

Combining (11.41) and (11.40) then yields:


D E
hpl
b − p[b], pl − p[b]i = (kpl − 1)(R[b] − plΘ ), (kpl − 1)R[b]
b b b b

= (kplb − 1)2 hR[b]


 − plΘ , R[b]i 
= (kplb − 1)2 hR[b], R[b]i − hplΘ , R[b]i
 
= (kplb − 1)2 hR[b], R[b]i − h1, R[b]i .

But now
b − p[b], pl − p[b]i
hpl b b
cos(π − φ2 ) =
kpl − p[b]kkpl − p[b]k
b
b b
where
h i1/2
kpl
b − p[b]k = hpl
b
b − p[b], pl
b
b − p[b]i
b
h i1/2
= (kplb − 1) hR[b] − plΘ , R[b] − plΘ i
h i1/2
= (kplb − 1) hR[b], R[b]i + hplΘ , plΘ i − 2hR[b], plΘ i

and kplb − p[b]k = (kplb − 1)kR[b]k by Equation (11.40). Hence

(kR[b]k2 − h1, R[b]i)


cos(π − φ2 [b]) = p . (11.42)
kR[b]]k kR[b]k2 + h1, 1i − 2hR[b], 1i

We can further simplify this expression by noticing that for all probabilities p ∈
c
P we have h1, pi = 2|{x} | − 1 = 2n−1 − 1 while h1, 1i = 2n − 2, so that
h1, 1i − 2hp, 1i = 0 and being R[b] a probability we get (11.25).
380 11 The epistemic family of probability transforms

Proof of Theorem 67
We make use of (11.42). As φ2 [b] = 0 iff cos(π −φ2 [b]) = −1 the desired condition
is:
(kR[b]k2 − h1, R[b]i)
−1 = p
kR[b]]k kR[b]k2 + h1, 1i − 2hR[b], 1i
i.e., after elevating to the square both numerator and denominator:
kR[b]k2 (kR[b]k2 +h1, 1i−2hR[b], 1i) = kR[b]k4 +h1, R[b]i2 −2h1, R[b]ikR[b]k2 .
After erasing the common terms we get that φ2 [b] is nil if and only if:
h1, R[b]i2 = kR[b]k2 h1, 1i. (11.43)
Condition (11.43) has the form:
hA, Bi2 = kAk2 kBk2 cos2 (AB)
d = kAk2 kBk2

i.e., cos2 (AB)


d = 1, with A = plΘ , B = R[b]. This yields cos(R[b]pl \Θ ) = 1 or
\
cos(R[b]plΘ ) = −1, i.e., φ2 [b] = 0 if and only if R[b] is (anti-)parallel to plΘ = 1.
But this means R[b] = α plΘ for some scalar value α ∈ R, namely:
X
R[b] = −α (−1)|A| bA
A⊆Θ

(since plΘ = − A⊆Θ (−1)|A| bA by Equation (8.11)). But R[b] is a probability


P
(i.e., a linear combination of categorical probabilities bx only) and since the vectors
{bA , A ( Θ} which represent all categorical belief functions are linearly indepen-
dent the two conditions are never jointly met, unless |Θ| = 2.

Proof of Lemma 14
Using the form (10.26) of the orthogonal projection we get π[b] − BetP [b](x) =
1 + |Ac |21−|A| 1 − |A|21−|A|
   
X 1 X
= mb (A) − + mb (A)
n |A| n
A⊇{x} A6⊃{x}

but
1 + |Ac |21−|A| 1 |A| + |A|(n − |A|)21−|A| − n
− = =
n |A| n|A|
(|A| − n)(1 − |A|21−|A| )
 
1 1 1−|A|
= = − 1 − |A|2
n|A| n |A|
so that π[b](x) − BetP [b](x) =
1 − |A|21−|A| 1 − |A|21−|A|
    
X n X
= mb (A) 1− + mb (A)
n |A| n
A⊇{x} A6⊃{x}
(11.44)
or equivalently, Equation (11.30).
11.7 Equality conditions for both families of approximations 381

Proof of Theorem 69

By Equation (11.30) the condition π[b](x) − BetP [b](x) = 0 for all x ∈ Θ reads
as:
1 − |A|21−|A| 1 − |A|21−|A|
X   X  
mb (A) = mb (A) ∀x ∈ Θ
n |A|
A⊆Θ A⊇{x}

i.e.,

1 − |A|21−|A|
   
X X 1 1
mb (A) mb (A)(1 − |A|21−|A| )
= −
n |A| n
A6⊃{x}
 1−|A|
 A⊇{x}  
X 1 − |A|2 X
1−|A| n − |A|
mb (A) = mb (A)(1 − |A|2 )
n |A|n
A6⊃{x} A⊇{x}

for all singletons x ∈ Θ, i.e., (11.31).

Proof of Corollary 14

Let us consider all claims. Equation (11.31) can be expanded as follows:


n−1
X n − k X n−1
X  X
1 − k21−k mb (A) = 1 − k21−k mb (A)
k
k=3 A⊃{x},|A|=k k=3 A6⊃{x},|A|=k
n−1
X  n − k X X 
1−k
≡ 1 − k2 mb (A) − mb (A) = 0
k
k=3 A⊃{x},|A|=k A6⊃{x},|A|=k
n−1
1 − k21−k 
X  X X 
≡ n −k mb (A) = 0
k
k=3 A⊃{x},|A|=k |A|=k

after noticing that 1 − k · 21−k = 0 for k = 1, 2 and the coefficient of mb (Θ) in


Equation (11.31) is zero, since |Θc | = |∅| = 0.
The condition of Theorem 69 can then be rewritten as:
n−1
X  1 − k21−k  X n−1
X X
n mb (A) = (1 − k21−k ) mb (A)
k
k=3 A⊃{x},|A|=k k=3 |A|=k
(11.45)
for all x ∈ Θ. Condition 1 follows immediately from (11.45).
As for Condition 2 the equation becomes:
X
mb (A)
n−1
Xn n−1
n − 1
  |A|=k X X
(1 − k21−k ) = (1 − k21−k ) mb (A)
k k − 1 
n

k=3 k=3 |A|=k
k
382 11 The epistemic family of probability transforms

nn − 1 n
which is verified since = .
k k−1 k
Finally, let us consider Condition 3. Under (11.32) the system of equations
(11.45) reduces to a single equation:
n−1 n−1
X n X X
(1 − k21−k ) plb (.; k) = (1 − k21−k ) mb (A).
k
k=3 k=3 |A|=k

The latter is verified if |A|=k mb (A) = nk plb (.; k) ∀k = 3, ..., n − 1, which is


P
P
in turn equivalent to n plb (.; k) = k |A|=k mb (A) ∀k = 3, ..., n − 1. Under the
hypothesis of the Theorem we get that:
X X X X
n plb (.; k) = k mb (A) = mb (A)|A| = mb (A).
|A|=k |A|=k x∈Θ A⊃{x},|A|=k

Proof of Theorem 70

Condition (10.24) is equivalent to:


X X
mb (A)21−|A| = mb (A)21−|A|
A⊇{y} A⊇{x}
n−1 n−1
X 1 X X 1 X
≡ mb (A) = mb (A)
2k 2k
k=1 |A|=k,A⊃{y} k=1 |A|=k,A⊃{x}
n−1 n−1
X 1 X 1
≡ k
plb (y; k) = plb (x; k)
2 2k
k=1 k=1

for all y 6= x. If plb (x; k) = plb (y; k) ∀y 6= x the equality is met.


To prove (11.34) let us rewrite the values of the pignistic function BetP [b](x)
in terms of plb (x; k) as:
X mb (A) Xn n
X mb (A) X plb (x; k)
BetP [b](x) = = =
|A| k k
A⊇{x} k=1 A⊃{x},|A|=k k=1

which is constant under the hypothesis, yielding BetP [b] = P. Also, as


X n
X X n
X
plb (x) = mb (A) = mb (A) = plb (x; k)
A⊇{x} k=1 A⊃{x},|A|=k k=1

we get Pn
fb (x) = plb (x) = P k=1 pl (x; k)
pl Pn b
kplb x∈Θ k=1 plb (x; k)

which is equal to 1/n if plb (x; k) = plb (·; k) ∀k, x.


Finally, under the same condition:
11.7 Equality conditions for both families of approximations 383
n
 X 1
p[b](x) = mb (x) + β[b] plb (x) − mb (x) = plb (·; 1) + β[b] plb (·; k) = ;
n
k=2
eb(x) = mb (x) = X plb (x; 1)
=
plb (·; 1) 1
= .
km b plb (y; 1) nplb (·; 1) n
y∈Θ
12
Consonant approximation

As we have learned in the last two Chapters, probability transforms are a very well
studied topic in belief calculus, as they useful as a means to reduce the compu-
tational complexity of the framework (Section 4.4), they allow us to reduce deci-
sion making with belief functions to the classical utility theory approach (Section
4.5.1), and are theoretically interesting when understanding the relationship between
Bayesian reasoning and belief theory [197].
Less extensively studied is the problem of mapping a belief function toSpossibil-
ity measures, namely functions P os : 2Θ → [0, 1] on Θ such that P os( i Ai ) =
supi P os(Ai ) for any family {Ai |Ai ∈ 2Θ , i ∈ I} where I is an arbitrary set index.
Their dual ‘necessity’ measures are defined as N ec(A) = 1 − P os(Ac ), and (as we
learned in Chapter 9) have as counterparts in the theory of evidence belief functions
whose focal elements are nested [1149] (‘consonant’ b.f.s).
Approximating a belief function by a necessity measure is then equivalent to map-
ping it to a consonant b.f. [?, 682, 680, 61]. As possibilities are completely deter-
mined by their values on the singletons P os(x), x ∈ Θ, they are less computation-
ally expensive than belief functions (indeed, their complexity is linear in the size of
the frame of discernment, just like standard probabilities’), making the approxima-
tion process interesting for many applications.
Furthermore, just as in the case of Bayesian belief functions, the study of possibility
transforms can shed light on the relation between belief and possibility theory.

385
386 12 Consonant approximation

The consonant approximation problem

Dubois and Prade [?], in particular, have extensively worked on consonant approx-
imations of belief functions [682, 680], suggesting the notion of ‘outer consonant
approximation’.
Several partial orderings between belief functions have been introduced [1481,
410], in connection with the so-called ‘least commitment principle’. The latter plays
a similar role in the ToE as the principle of maximum entropy does in Bayesian
theory. It postulates that, given a set of basic probability assignments compatible
with a set of constraints, the most appropriate is the least informative (according to
one of those orderings).
In particular, belief functions admit (among others) the following order relation:
b ≤ b0 ≡ ∀A ⊆ Θ, b(A) ≤ b0 (A), called ‘weak inclusion’. It is then possible
to define the outer consonant approximations [?] of a belief function b as those
co.b.f.s co such that co(A) ≤ b(A) ∀A ⊆ Θ. Dubois and Prade’s work has been
later extended by Baroni [61] to capacities. In [257], the author of this Book has
provided a comprehensive description of the geometry of the set of outer consonant
approximations. For each possible maximal chain A1 ⊂ · · · ⊂ An = Θ of focal
elements, i.e., a collection of nested subsets of all possible cardinalities from 1 to
|Θ|, a maximal outer consonant approximation with mass assignment:

m0 (Ai ) = b(Ai ) − b(Ai−1 ) (12.1)

can be singled out. The latter mirrors the behavior of the vertices of the credal set of
probabilities dominating a belief function or a 2-alternating capacity [169, 944].
Another interesting approximation has been studied in the context of Smets’
Transferable Belief Model [1276], where the pignistic transform assumes a central
role for decision making. One can then define an ‘isopignistic’ approximation as the
unique consonant belief function whose pignistic probability is identical to that of
the original belief function [409, 415]. Subsequent work has been conducted along
these lines: for instance, the expression of the isopignistic consonant b.f. associated
with a unimodal probability density has been derived in [1222], while in [38], con-
sonant belief functions are constructed from sample data using confidence sets of
pignistic probabilities.

The geometric approach to approximation

In more recent times the opportunity of seeking probability or consonant approx-


imations / transformations of belief functions by minimizing appropriate distance
functions has been explored [267, 273, 245]. As to what distances are the most
appropriate, as we have seen in Section 4.8.2, Jousselme et al [685] have recently
conducted a nice survey of the distance or similarity measures so far introduced
between belief functions, and proposed a number of generalizations of known mea-
sures. Other similarity measures between belief functions have been proposed by
Shi et al [1191], Jiang et al [669], and others [713, 399]. Many of these measures
12 Consonant approximation 387

could be in principle employed to define conditional belief functions, or to approxi-


mate belief functions by necessity or probability measures.
As we have proved in Chapter 9 [257], consonant belief functions (or their vec-
tor counterparts in the belief space) live in a structured collection of simplices or
‘simplicial complex’. Each maximal simplex of the consonant complex is associated
with a maximal chain of nested focal elements: C = {A1 ⊂ A2 ⊂ · · · ⊂ An = Θ}.
Computing the consonant belief function(s) at minimal distance from a given b.f.
involves therefore: 1) computing first a partial solution for each possible maximal
chain; 2) selecting a global approximation among all the partial ones.
Note, however, that geometric approximations can be sought in different Carte-
sian spaces. A belief function can be represented either by the vector of its belief
values, or the vector of its mass values. We call the set of vectors of the first kind
belief space B [244, 292] (Chapter 6), and the collection of vectors of the second
kind mass space M [256]. In both cases the region of consonant belief functions
is a simplicial complex. In the mass space representation, however, we further need
to consider the fact that, because of normalization, only N − 2 mass values (where
N = 2|Θ| ) are sufficient to determine a belief function. Approximations can then be
computed in vectors spaces or dimension N − 1 or N − 2, leading to different but
related results.

Chapter’s content

The theme of this Chapter is to conduct an exhaustive, analytical study of all the
consonant approximations of belief functions. In the first part we will understand the
geometry of classical outer consonant approximations, while in the second part we
will characterise geometric consonant approximations which minimise appropriate
distances from the original belief function.
We will focus in particular on approximations induced by minimizing L1 , L2
or L∞ distances, in both the belief and the mass space, and in both representations
of the latter. Even though we believe the resulting consonant approximations are
likely to be potentially useful in practical applications, our purpose at this stage
is not to empirically compare them with existing approaches such as isopignistic
function and outer approximations, but to initiate a theoretical study of the nature
of consonant approximations induced by geometric distance minimization, starting
with Lp norms as a stepping stone of a more extensive line of research. Our purpose
is to point out their semantics in terms of degrees of belief, their mutual relationships
and to analytically compare them with existing approximations. What emerges is a
picture in which belief-, mass-, and pignistic-based approximations form distinct
families of approximations with different semantics.
In some cases, improper partial solutions (in the sense that they potentially in-
clude negative mass assignments) can be generated by the Lp minimization process.
The set of approximations, in other words, may fall partly outside the simplex of
proper consonant belief functions for a given desired chain of focal elements. This
situation is not entirely new, as outer approximations themselves include infinitely
many improper solutions. Nevertheless, only the subset of acceptable solutions is
388 12 Consonant approximation

retained. In the case of the present work, the set of all (admissible and not) solutions
is typically much simpler to describe geometrically, in terms of simplices or poly-
topes. Computing the set of proper approximations in all cases requires significant
further effort, which for reasons of clarity and length we reserve for the near future.
Additionally, in this Chapter only ‘normalized’ belief functions (i.e., b.f.s whose
mass of the empty set is nil) are considered. Unnormalized b.f.s, however, play an
important role in the TBM [1238] as the mass of the empty set is an indicator of
conflicting evidence. The analysis of the unnormalized case is also left to future
work for lack of sufficient space here.

Summary of main results

We will show that outer consonant approximations form a convex subset of the con-
sonant complex, for every choice of the desired maximal chain C = A1 ⊂ · · · ⊂ An
of focal elements Ai . In particular the set of outer consonant approximations with
chain C, OC [b], is a polytope whose vertices are indexed by all the functions reassin-
ing the mass of each focal element to elements of the chain containing it (‘assign-
ment functions’). In particular, the maximal outer approximation is the vertex of this
polytope associated with the permutation of singletons which produces the desired
maximal chain C.
Two sets of results are reported instead for geometric approximations.
As it turns out, partial approximations in the mass space M amount to redis-
tributing in various ways the mass of focal elements outside the desired maximal
chain to elements of the chain itself (compare [256]). In the (N − 1)-dimensional
representation, the L1 (partial) consonant approximations are such that their mass
values are greater than those of the original belief function on the desired maximal
chain. They form a simplex which is entirely admissible, and whose vertices are
obtained by re-assigning all the mass originally outside the desired maximal chain
C to a single focal element of the chain itself. The barycenter of this simplex is the
L2 partial approximation, which redistributes the mass outside the chain to all the
elements of C on an equal basis. The simplex of L1 , M approximations, in addition,
exhibits interesting relations with outer consonant approximations.
When the partial L∞ approximation is unique, it coincides with the L2 approxima-
tion and the barycenter of the set of L1 approximations, and it is obviously admis-
sible. When it is not unique, it is a simplex whose vertices assign to each element
of the chain (but one) the maximal mass outside the chain: this set is in general not
entirely admissible.
The L1 and L2 partial approximations calculated when adopting a (N − 2) section
of M coincide. For each possible neglected component Ā ∈ C they describe all the
vertices of the simplex of L1 , M partial approximations in the (N − 1)-dimensional
representation. In each such section, the L∞ partial approximations form instead a
(partly admissible) region whose size is determined by the largest mass outside the
desired maximal chain.
Finally, the global approximations in the L1 , L2 , L∞ cases span the simplicial com-
12 Consonant approximation 389

ponents of CO whose chains minimize the sum of mass, sum of square masses, and
maximal mass outside the desired maximal chain, respectively.
In the belief space B, all Lp approximations amount to picking different repre-
sentatives from the n lists of belief values:
n o
Li = b(A), A ⊇ Ai , A 6⊃ Ai+1 ∀i = 1, ..., n.

Belief functions are defined on a partially ordered set, the power set {A ⊆ Θ}, of
which a maximal chain is a maximal totally ordered subset. Therefore, given two
elements of the chain Ai ⊂ Ai+1 , there are a number of ‘intermediate’ focal ele-
ments A which contain the latter but not the former. This list is uniquely determined
by the desired chain.
Indeed, all partial Lp approximations in the belief space have mass m0 (Ai ) =
f (Li )−f (Li−1 ), where f is a simple function of the belief values in the list, such as
max, average, or median. Classical maximal outer and ‘contour-based’ approxima-
tions can also be expressed in the same way. As they would all reduce to the maximal
outer approximation (12.1) if the power set were totally ordered, all these consonant
approximations can be considered as generalization of the the latter. Sufficient con-
ditions on their admissibility can be given in terms of the (partial) plausibility values
of the singletons.
As for global approximations, in the L∞ case they fall on the component(s) associ-
ated with the maximal plausibility singleton(s). In the other two cases they are, for
now, of more difficult interpretation.
Table outline
Chapter (12.1) illustrates the behavior of the different geometric consonant ap-
proximations explored here, in terms of multiplicity/admissibility/global solutions.
First (Section 12.1.1) we study the geometry of the polytope of outer consonant
approximations and its vertices.
We then provide the necessary background on the geometric representation of
belief and mass and the geometric approach to the approximation problem (12.2).
After going through the case study of the binary frame (Section 12.3), we ap-
proach the problem in the mass space (Section 12.4). We: analytically compute the
approximations induced by L1 , L2 and L∞ (12.4.1) norms; discuss their interpre-
tation in terms of mass re-assignment and the relationship between the results in
the mass space versus those on its sections (12.4.2); analyze the computability and
admissibility of global approximations (12.4.3); study the relation of the obtained
approximations with classical outer consonant approximations (12.4.4); and finally,
illustrate the results in the significant ternary case (12.4.5).
In the last part of the Chapter we analyse the Lp approximation problem in
the belief space (Section 12.5). Again, we compute the approximations induced by
L1 (12.5.1), L2 (12.5.2) and L∞ (12.5.3) norms, respectively; we propose a com-
prehensive view of all approximations in the belief space via lists of belief values
determined by the desired maximal chain (Section 12.5.4), and draw some compara-
tive conclusions on the behavior of geometric approximations in the belief and mass
space (12.5.6).
To improve readability, as usual all proofs are collected in an Appendix to be
found at the end of the Chapter.
390 12 Consonant approximation

multiplicity admissibility
global solution(s)
of partial sol. of partial sol.
X
L1 , M simplex entirely arg min mb (B)
C
B6∈C
point, X
L2 , M yes arg min (mb (B))2
bary of L1 , M C
B6∈C
X
arg min mb (B)
point / yes / C
L∞ , M B6∈C
simplex partial / arg min max mb (B)
C B6∈C
point,
L1 , M \ Ā yes as in L1 , M
vertex of L1 , M
point,
L2 , M \ Ā yes as in L2 , M
as in L1 , M \ Ā
not
L∞ , M \ Ā polytope arg min max mb (B)
entirely C B6∈C
depends
L1 , B polytope not easy to interpret
on plb (xi )
depends
L2 , B point not known
on plb (xi )
depends
L∞ , B polytope arg maxC pl(A1 )
on plb (xi )

Table 12.1. Properties of the geometric consonant approximations studied in the second part
of this Chapter, in terms of multiplicity and admissibility of partial solutions, and the related
global solutions.

12.1 Geometry of outer consonant approximations in the


consonant simplex
We first seek a geometric interpretation of the most common approach to the con-
sonant approximation problem: outer consonant approximation.

12.1.1 Outer consonant approximations

With the purpose of finding outer approximations which are maximal with respect to
the weak inclusion relation (4.19) Dubois and Prade have introduced two different
families of approximations.
A first group of is obtained by considering all possible permutations ρ of the
elements {x1 , ..., xn } of the frame of discernment Θ: {xρ(1) , ..., xρ(n) }.
The following family of nested sets can be then built:
n o
S1ρ = {xρ(1) }, S2ρ = {xρ(1) , xρ(2) }, ..., Snρ = {xρ(1) , ..., xρ(n) } ,

so that a new belief function coρ can be defined with b.p.a.:


12.1 Geometry of outer consonant approximations in the consonant simplex 391
X
mcoρ (Sjρ ) = mb (Ei ). (12.2)
i:min{l:Ei ⊆Slρ }=j

Analogously, we can consider all the permutations ρ of the focal elements {E1 , ..., Ek }
of b, {Eρ(1) , ..., Eρ(k) }, and introduce the following family of sets:
n o
S1ρ = Eρ(1) , S2ρ = Eρ(1) ∪ Eρ(2) , ..., Skρ = Eρ(1) ∪ · · · ∪ Eρ(k) .

A new belief function cρ can then be defined with b.p.a.:


X
mcρ (Sjρ ) = mb (Ei ). (12.3)
i:min{l:Ei ⊆Slρ }=j

In general, approximations of the second family (12.3) are generated by the first
family (12.2) too [?, 61].

12.1.2 Geometry in the binary case

In the binary belief space B2 the set O[b] of all the outer consonant approximations
of b is depicted in Figure 12.1-left (dashed lines). It is the intersection of the region
of the points b0 such that ∀A ⊆ Θ b0 (A) ≤ b(A), and the complex CO = COx ∪COy
of consonant b.f.s (cfr. Chapter 9, Figure 9.1). Among them, the co.b.f.s generated

Fig. 12.1. Geometry of outer consonant approximations of a belief function b ∈ B2 .

by the 6 = 3! possible permutations of the three focal elements {x}, {y}, {x, y} of
392 12 Consonant approximation

b (12.3) correspond to the points cρ1 , ..., cρ6 in Figure 12.1, namely the orthogonal
projections of b onto COx , COy , respectively, plus the vacuous belief function bΘ =
0.
Let us denote by OC [b] the intersection of the set O[b] of all outer consonant ap-
proximations with the component COC of the consonant complex, with C a maximal
chain of 2Θ . We can notice that, for each maximal chain C:
1. OC [b] is convex (in this case C = {x, Θ} or {y, Θ});
2. OC [b] is in fact a polytope, i.e. the convex closure of a number of vertices: in
particular a segment in the binary case (Ox,Θ [b] or Oy,Θ [b]);
3. the maximal (with respect to weak inclusion (4.19)) outer approximation of b is
one of the vertices of this polytope OC [b] (coρ , Equation (12.2)), that associated
with the permutation ρ of singletons which generates the chain.
In the binary case there are just two such permutations, ρ1 = {x, y} and ρ2 =
{y, x}, which generate the chains {x, Θ} and {y, Θ}, respectively.
We will prove that all these properties hold in the general case as well.

12.1.3 Convexity

Theorem 71. Let b be a belief function on Θ. For each maximal chain C of 2Θ ,


the set of outer consonant approximations OC [b] of b which belong to the simplicial
component COC of the consonant space CO is convex.
Proof. Consider two belief functions b1 , b2 weakly included in b. Then:

α1 b1 (A) + α2 b2 (A) ≤ α1 b(A) + α2 b(A) = (α1 + α2 )b(A) = b(A)

whenever α1 + α2 = 1, αi ≥ 0. If α1 , α2 are not guaranteed to be non-negative


the sum α1 b1 (A) + α2 b2 (A) can be greater than b(A) (see Figure 12.2). Now, this
holds in particular if the two b.f.s are consonant: Their convex combination, though,
is obviously not guaranteed to be consonant.
If they both belong to the same maximal simplex of the consonant complex, how-
ever, their convex combination still lives in the simplex and α1 b1 + α2 b2 is both
consonant and weakly included in b. 

12.1.4 Weak inclusion and mass re-assignment

A more cogent statement on the shape of O[b] can be proven by means of the follow-
ing result on the basic probability assignment of consonant belief functions weakly
included in b.
Lemma 15. Consider a belief function b with basic probability assignment mb . A
consonant belief function co is weakly included in b, for all A ⊆ Θ co(A) ≤ b(A),
B
if and only if there is a choice of coefficients {αA , B ⊆ Θ, A ⊇ B} satisfying:
X
B B
∀B ⊆ Θ, ∀A ⊇ B, 0 ≤ αA ≤1 ∀B ⊆ Θ, αA =1 (12.4)
A⊇B
12.1 Geometry of outer consonant approximations in the consonant simplex 393

α1 < 0
b(A)

b2(A)
α1 , α 2 > 0

b1(A) b(A)
Fig. 12.2. The convex combination of two belief functions weakly included in b is still weakly
included in b: this does not hold for affine combinations (dashed line).

such that co has basic probability assignment


X
B
mco (A) = αA mb (B). (12.5)
B⊆A

Lemma 15 states that the b.p.a. of any outer consonant approximation of b is ob-
tained by re-assigning the mass of each f.e. A of b to some B ⊇ A.
We will extensively use this result in what follows.

12.1.5 The polytopes OC [b] of outer approximations

Let us call C = {B1 , ..., Bn } (|Bi | = i) the chain of focal elements of a consonant
belief function weakly included in b.
It is natural to conjecture that, for each maximal simplex COC of CO associated
with a maximal chain C, OC [b] is the convex closure of the co.b.f.s oB [b] with b.p.a.:
X
moB [b] (Bi ) = mb (A). (12.6)
A⊆Θ:B(A)=Bi

Each of these vertex co.b.f. is associated with an ‘assignment function’:

B : 2Θ → C
(12.7)
A 7→ B(A) ⊇ A

which maps each subset A to one of the focal elements of the chain C = {B1 ⊂
... ⊂ Bn } which contains it.
394 12 Consonant approximation

Theorem 72. For each simplicial component COC of the consonant space associ-
ated with any maximal chain of focal elements C = {B1 , ..., Bn } the set of outer
consonant approximation of an arbitrary belief function b is the convex closure

OC [b] = Cl(oB [b], ∀B)

of the co.b.f.s (12.6), indexed by all admissible assignment functions (12.7).


In other words, OC [b] is a polytope, the convex closure of a number of belief function
whose number is equal to the number of assignment functions (12.7). Each B is
characterized by assigning each event A to an element Bi ⊇ A of the chain C.
As we will see in the ternary example of Section 12.1.8 the points (12.6) are
not guaranteed to be proper vertices of the polytope OC [b], as some of them can be
obtained by convex combination of others.

12.1.6 Maximal outer approximations

We can prove instead that the outer approximation (12.2) obtained by permuting
the singletons of Θ as in Section 12.1.1 is an actual vertex of OC [b]. More pre-
cisely, all possible permutations of the elements of Θ generate exactly n! differ-
ent outer approximations of b, each of which lies on a single simplicial compo-
nent of the consonant complex. Each such permutation ρ generates a maximal chain
Cρ = {S1ρ , ..., Snρ } of focal elements so that the corresponding belief function will
lie on COCρ .

Theorem 73. The outer consonant approximation coρ (12.2) generated by a per-
mutation ρ of the singleton elements of Θ is a vertex of OCρ [b].

Furthermore, we prove that:

Corollary 17. The maximal outer consonant approximation with maximal chain C
of a belief function b is the vertex (12.2) of OCρ [b] associated with the permutation
ρ of the singletons which generates C = Cρ .

By definition (12.2) coρ assigns the mass mb (A) of each focal element A to the
smallest element of the chain containing A. By Lemma 15 each outer consonant
approximation of b with chain C, co ∈ OCρ [b], is the result of re-distributing the
mass of each focal element A to all its supersets in the chain {Bi ⊇ A, Bi ∈ C}.
But then each such co is weakly included in coρ for its b.p.a. can be obtained by
re-distributing the mass of the minimal superset Bj , where j = min{i : Bi ⊆ A},
to all supersets of A. Hence, coρ is the maximal outer approximation with chain Cρ .

12.1.7 Maximal outer approximations as lower chain measures

A different perspective on maximal outer consonant approximations is provided by


the notion of chain measure [133].
12.1 Geometry of outer consonant approximations in the consonant simplex 395

Let us S be a family of subsets of a non-empty set Θ containing ∅ and Θ itself.


The ‘inner extension’ of a monotone set function µ : S → [0, 1] (s.t. A ⊆ B implies
µ(A) ≤ µ(B)) is:
µ∗ (A) = sup µ(B) (12.8)
B∈S,B⊂A

(dually for the outer extension).


Definition 82. A monotone set function β : S → [0, 1] is called a lower chain
measure, if there exists a chain with respect to set inclusion C ⊂ S, which includes
∅ and Θ, such that:
β = (β|C)∗ |S,
i.e., β is the inner extension of its restriction to the elements of the chain.
We can prove that for a lower chain measure β on S:

β(∩A∈A ) = inf β(A)


A∈A

for all finite set systems A such that ∩A∈A A ∈ S. If this property holds for arbitrary
A and S is closed under arbitrary intersection, then β is called a necessity measure.
Any necessity measure is a lower chain measure, but the converse does not hold.
However, the class of necessity measures coincides with the class of lower chain
measures if Θ is finite.
As consonant belief functions are necessity measures on finite domains, they are
trivially also lower chain measures and vice-versa.
Now, let b be a belief function and C a maximal chain in 2Θ . Then we can build
a chain measure (consonant b.f.) associated with b as:

bC (A) = max b(B). (12.9)


B∈C,B⊆A

We can prove the following.


Theorem 74. The chain measure (12.9) associated with the maximal chain C coin-
cides with the vertex coρ (12.2) of the polytope of outer consonant approximations
OCρ [b] of b associated with the permutation ρ of the elements of Θ which generates
C = Cρ .
The chain measure associated with a belief function b and a maximal chain C is the
maximal outer consonant approximation of b.

12.1.8 Example: outer approximations on the ternary frame

Let us consider as an example a belief function b on a ternary frame Θ = {x, y, z}


and study the polytope of outer consonant approximations with focal elements:
n o
C = {x}, {x, y}, {x, y, z} .
396 12 Consonant approximation

According to Theorem 72 the polytope is the convex closure of consonant approx-


imations associated with all the assignment functions B : 2Θ → C. There are
23−k
Q3
k=1 k = 14 · 22 · 31 = 12 such functions.
We list them here as vectors of the form:
h i0
B = B({x}), B({y}), B({z}), B({x, y}), B({x, z}), B({y, z}), B({x, y, z}) ,

namely:  0
B1 = {x}, {x, y}, Θ, {x, y}, Θ, Θ, Θ ;
 0
B2 = {x}, {x, y}, Θ, Θ, Θ, Θ, Θ ;
 0
B3 = {x}, Θ, Θ, {x, y}, Θ, Θ, Θ ;
 0
B4 = {x}, Θ, Θ, Θ, Θ, Θ, Θ ;
 0
B5 = {x, y}, {x, y}, Θ, {x, y}, Θ, Θ, Θ ;
 0
B6 = {x, y}, {x, y}, Θ, Θ, Θ, Θ, Θ ;
 0
B7 = {x, y}, Θ, Θ, {x, y}, Θ, Θ, Θ ;
 0
B8 = {x, y}, Θ, Θ, Θ, Θ, Θ, Θ ;
 0
B9 = Θ, {x, y}, Θ, {x, y}, Θ, Θ, Θ ;
 0
B10 = Θ, {x, y}, Θ, Θ, Θ, Θ, Θ ;
 0
B11 = Θ, Θ, Θ, {x, y}, Θ, Θ, Θ ;
 0
B12 = Θ, Θ, Θ, Θ, Θ, Θ, Θ .
They correspond to the following co.b.f.s with b.p.a. [m({x}), m({x, y}), m(Θ)]0 :

oB1 = [mb (x), mb (y) + mb (x, y), 1 − b(x, y) ]0 ;


B2
o = [mb (x), mb (y), 1 − mb (x) − mb (y) ]0 ;
B3
o = [mb (x), mb (x, y), 1 − mb (x) − mb (x, y) ]0 ;
B4
o = [mb (x), 0, 1 − mb (x) ]0 ;
B5
o = [0, b(x, y), 1 − b(x, y) ]0 ;
B6
o = [0, mb (x) + mb (y), 1 − mb (x) − mb (y) ]0 ;
(12.10)
B7
o = [0, mb (x) + mb (x, y), 1 − mb (x) − mb (x, y) ]0 ;
B8
o = [0, mb (x), 1 − mb (x) ]0 ;
B9
o = [0, mb (y) + mb (x, y), 1 − mb (y) − mb (x, y) ]0 ;
o B10
= [0, mb (y), 1 − mb (y) ]0 ;
o B11
= [0, mb (x, y), 1 − mb (x, y) ]0 ;
o B12
= [0, 0, 1 ]0 .

Figure 12.3-left shows the resulting polytope OC [b] for a belief function

mb (x) = 0.3, mb (y) = 0.5, mb ({x, y}) = 0.1, mb (Θ) = 0.1, (12.11)

in the component COC = Cl(bx , b{x,y} , bΘ ) of the consonant complex (black tri-
angle in the figure). The polytope OC [b] is plotted in red, together with all the 12
points (12.10) (red squares). Many of them lie on a side of the polytope. However,
the point obtained by permutation of singletons (12.2) is an actual vertex (red star):
it is the first item oB1 of the list (12.10).
12.2 Geometric consonant approximation 397

Fig. 12.3. Not all the points (12.6) associated with assignment functions are actual ver-
tices of OC [b]. Here the polytope OC [b] of outer consonant approximations with C =
{{x}, {x, y}, Θ} for the belief function (12.11) on Θ = {x, y, z}, is plotted in red, together
with all the 12 points (12.10) (red squares). Many of them lie on a side of the polytope. How-
ever, the point obtained by permutation of singletons (12.2) is an actual vertex (red star). The
minimal and maximal outer approximations with respect to weak inclusion are oB12 and oB1 ,
respectively.

It is interesting to point out that the points (12.10) are ordered with respect to
weak inclusion (we just need to apply its definition, or the re-distribution property of
Lemma 15). The result is summarized in the graph of Figure 12.4. We can appreciate
that the vertex oB1 generated by singleton permutation is indeed the maximal outer
approximation of b, as stated by Corollary 17.

12.2 Geometric consonant approximation


While in the first part of the Chapter we studied the geometry of the most popular
approach to consonant approximation (outer approximation), in the second part we
purse a different approach based on minimizing appropriate distances between the
given belief function and the consonant complex (see Chapter 9).
We first briefly outline an alternative geometric representations of belief func-
tions, as mass (rather than belief) vectors.

12.2.1 Principles of geometric approximation


398 12 Consonant approximation

Fig. 12.4. Partial order of the points (12.10) with respect to weak inclusion. For sake of
simplicity we denote by Bi the co.b.f. oBi associated with the assignment function Bi . An
arrow from Bi to Bj stands for oBj ≤ oBi .

Mass space representation Just as belief functions can be represented as vectors


of a sufficiently large Cartesian space (see Chapter 6), they can be associated with
the related vector of mass values:
X
mb = mb (A)mA ∈ RN −1 , (12.12)
∅(A⊆Θ

(compare Equation (6.8) and Theorem 10), where mA is the vector of mass values
of the categorical belief function bA : mA (A) = 1, mA (B) = 0 ∀B 6= A. Note that
in RN −1 mΘ = [0, ..., 0, 1]0 and cannot be neglected.
Since the mass of any focal element Ā is uniquely determined by all the other
masses in virtue of the normalization constraint, we can also choose to represent
b.p.a.s as vectors of RN −2 of the form:
X
mb = mb (A)mA , (12.13)
∅(A⊂Θ,A6=Ā

in which one component Ā is neglected. This leads to two possible approaches to


consonant approximations in the mass space. We will consider both in the following.
Whatever the chosen representation, it is not difficult to prove that the collection
M of vectors of the Cartesian space which represent valid basic probability assign-
ments is also a simplex, which we can call the mass space. Depend on whether we
choose the first or the second, lower-dimensional, representation, M is either the
convex closure
M = Cl(mA , ∅ ( A ⊂ Θ) ⊂ RN −1
or
M = Cl(mA , ∅ ( A ⊂ Θ, A 6= Ā) ⊂ RN −2 .
12.2 Geometric consonant approximation 399

Binary example In the case of a binary frame Θ = {x, y}, since mb (x) ≥ 0,
mb (y) ≥ 0, and mb (x) + mb (y) ≤ 1 we can easily infer that the set B2 = M2 of
all the possible basic probability assignments on Θ2 can be depicted as the triangle
in the Cartesian plane of Figure 12.5, whose vertices are the vectors

bΘ = mΘ = [0, 0]0 , bx = mx = [1, 0]0 , by = my = [0, 1]0 ,

which correspond respectively to the vacuous belief function bΘ , the Bayesian b.f.
bx with mbx (x) = 1, and the Bayesian b.f. by with mby (y) = 1.

Fig. 12.5. The mass space M2 for a binary frame is a triangle in R2 whose vertices are
the mass vectors associated with the categorical belief functions focused on {x}, {y} and Θ:
mx , my , mΘ . The belief space B2 coincides with M2 when Θ = {x, y}. Consonant b.f.s
live in the union of the segments CO{x,Θ} = Cl(mx , mΘ ) and CO{y,Θ} = Cl(my , mΘ ).
The unique L1 = L2 consonant approximation (circle) and the set of L∞ consonant approx-
imations (dashed segment) on CO{x,Θ} are shown.

The region P2 of all Bayesian belief functions on Θ2 is the diagonal line segment
Cl(mx , my ) = Cl(bx , by ). On Θ2 = {x, y} consonant belief functions can have
as chain of focal elements either {{x}, Θ2 } or {{y}, Θ2 }. Therefore, they live in
the union of two segments (see Figure 12.5):

CO2 = CO{x,Θ} ∪ CO{y,Θ} = Cl(mx , mΘ ) ∪ Cl(my , mΘ ).

Approximation in the consonant complex We have seen in Chapter 9 [257] that


the region COB of consonant belief functions in the belief space is a simplicial
complex, the union of a collection of (maximal) simplices, each of them associated
with a maximal chain C = {A1 ⊂ · · · ⊂ An }, |Ai | = i of subsets of the frame Θ:
400 12 Consonant approximation
[
COB = Cl(bA1 , · · · , bAn ).
C={A1 ⊂···⊂An }

Analogously, the region COM of consonant belief functions in the mass space M
is the simplicial complex:
[
COM = Cl(mA1 , · · · , mAn ).
C={A1 ⊂···⊂An }

Given a belief function b, we call consonant approximation of b induced by a


distance function d in M (B) the b.f.(s) coM/B,d [mb /b] which minimize(s) the dis-
tance d(mb , COM ) (d(b, COB )) between b and the consonant simplicial complex
in M (B):
.
coM,d [mb ] = arg min d(mb , mco ) coB,d [b] = arg min d(b, co),
mco ∈CO M co∈CO B
(12.14)
where mb and b are the vectors of mass and belief values associated with b, respec-
tively (see Figure 12.6).

Fig. 12.6. In order to minimize the distance of a mass vector from a consonant simplicial
complex, we need to find all the partial solutions (12.17) on all the maximal simplices which
form the complex (empty circles), and compare these partial solutions to select a global one
(black circle).

Choice of norm Consonant belief functions are the counterparts of necessity mea-
sures in the theory of evidence, so that their plausibility functions are possibility
measures, which in turn are inherently related to L∞ as P os(A) = maxx∈A P os(x)
(cfr. Section 9.1). It makes therefore sense to conjecture that a consonant transfor-
mation obtained by picking as distance function d in (12.14) one of the classical Lp
norms would be meaningful.
For vectors mb , mb0 ∈ M representing the basic probability assignments of
two belief functions b, b0 , they read as:
12.2 Geometric consonant approximation 401
. X
kmb − mb0 kL1 = |mb (B) − mb0 (B)|;
∅(B⊆Θ
s
. X
kmb − mb0 kL2 = (mb (B) − mb0 (B))2 ; (12.15)
∅(B⊆Θ
.
kmb − mb0 kL∞ = max |mb (B) − mb0 (B)|,
∅(B⊆Θ

while the same norms in the belief space read as:


. X
kb − b0 kL1 = |b(B) − b0 (B)|;
∅(B⊆Θ
s
0 . X
kb − b kL2 = (b(B) − b0 (B))2 ; (12.16)
∅(B⊆Θ
.
kb − b0 kL∞ = max |b(B) − b0 (B)|.
∅(B⊆Θ

Lp norms have been recently successfully employed in the probability transforma-


tion problem [267] and for conditioning [265, 256]. Recall that L2 distance minimi-
sation induces the orthogonal projection of b onto P (Chapter 10).
Clearly, however, a number of other norms can be used to define consonant (or
Bayesian) approximations. For instance, generalizations to belief functions of the
classical Kullback-Leibler divergence of two probability distributions P, Q:
Z ∞  
p(x)
DKL (P |Q) = p(x) log dx
−∞ q(x)
or other measures based on information theory such as fidelity and entropy-based
norms [686] can be studied. Many other similarity measures have indeed been pro-
posed [1191, 669, 713, 399]. The application of similarity measures more specific
to belief functions or inspired by classical probability to the approximation problem
is left as future work.
Distance of a point from a simplicial complex As the consonant complex CO is a
collection of simplices which generate distinct linear spaces (in both the belief and
the mass space), solving the consonant approximation problem involves finding first
a number of partial solutions:
.
coCB,Lp [b] = arg min kb − cokLp coCM,Lp [mb ] = arg min km − mco kLp
co∈CO C
B mco ∈CO C
M
(12.17)
(see Figure 12.6), one for each maximal chain C of subsets of Θ. Then, the distance
of b from all such partial solutions has to be assessed in order to select a global
optimal approximation.
In the following we start as usual from the simple but interesting binary case
(Figure 12.5). Some of its features are retained in the general case, others are not.
Note also that, in the binary case, consonant and consistent [273] (Chapter 13) ap-
proximations coincide, and there is no difference between belief and mass space
[256] representation.
402 12 Consonant approximation

12.3 Consonant approximation in the binary belief space


12.3.1 Bayesian Lp approximations

Let us first compute the probabilistic and consonant approximations of a b.f. b ∈ B2 ,


using the classical norms (12.16). In the Bayesian case:

.
h mb (Θ) mb (Θ) i0
pL2 [b] = arg min kb − pkL2 = mb (x) + , mb (y) +
p∈P 2 2
is the orthogonal projection π[b] of b onto P [267], and coincides with the pignistic
function BetP [b] [1231, 1276, 199] only in the binary case (Section 10.4).
The L∞ norm yields the same Bayesian approximation:
.
n o
pL∞ [b] = arg min kb − pkL∞ = arg min max |b(x) − p(x)|, |b(y) − p(y)|
p∈P n p∈P o
= arg min max |mb (x) − p(x)|, |mb (y) − p(y)|
h p∈P i0
= mb (x) + mb2(Θ) , mb (y) + mb2(Θ) = pL2 [b] = π[b],

while the optimization problem

arg minp∈P kb − pkL1 = arg min(|b(x) − p(x)| + |b(y) − p(y)|)


p∈P
= arg min(|mb (x) − p(x)| + |mb (y) − p(y)|)
p∈P

has as solution the entire set of probabilities ‘consistent’ with b [801, 437]:
n o
p ∈ P : p(A) ≥ b(A) ∀A ⊆ Θ = P[b]. (12.18)

12.3.2 Consonant Lp approximations

As illustrated in Figure 12.6, in the consonant case we need to find a partial ap-
proximation on each component of the consonant complex, to later select a global
approximation among the resulting partial solutions. We get for L2 :
( 0
. m (x), 0 mb (x) ≤ mb (y)
coL2 [b] = arg min kb − cokL2 =  b 0 (12.19)
co∈CO 0, mb (y) mb (x) ≥ mb (y),

while:

kb−cokL1 = |mb (x)−mco (x)|+|mb (y)−mco (y)| = |mb (x)−mco (x)|+mb (y)

for co ∈ COx . This is minimal for mco (x) = mb (x) (mco (y) = 0 by definition).
Analogously for the component COy

arg min kb − cokL1 = [0, mb (y)]0 ,


co∈CO y
12.3 Consonant approximation in the binary belief space 403

so that coL1 [b] = coL2 [b] is again given by Equation (12.19).


The L∞ case is more intriguing. For co ∈ COx the L∞ distance between b and
co is given by:
n o
kb − cokL∞ = max |mb (x) − mco (x)|, |mb (y) − mco (y)|
n o
= max |mb (x) − mco (x)|, mb (y) .

Its minimum arg minco∈COx kb − cokL∞ corresponds to all the consonant belief
functions such that |mb (x) − mco (x)| ≤ mb (y), i.e.:
n o
co ∈ COx : max{0, mb (x) − mb (y)} ≤ co(x) ≤ mb (x) + mb (y) .

An analogous result holds for the COy component. We can thus write arg minco∈CO2 kb−
.
cokL∞ = CO[b] =
n o


 co ∈ CO x : m b (x) − m b (y) ≤ mco (x) ≤ m b (x) + mb (y) , mb (x) ≥ mb (y)

 n o
 co ∈ COy : mb (y) − mb (x) ≤ mco (y) ≤ mb (y) + mb (x)

mb (y) ≥ mb (x)

since when mb (x) ≥ mb (y)

max{0, mb (x) − mb (y)} = mb (x) − mb (y),

while when mb (y) ≥ mb (x):

max{0, mb (y) − mb (x)} = mb (y) − mb (x).

It suffices to compare the expressions we obtained for the consonant approximations


coL1 [b], coL2 [b], coL∞ [b] of b to note that coL1 [b] = coL2 [b] is the center of mass of
the above set CO[b] (see Figure 12.5 again): coL1 [b] = coL2 [b] = CO[b].
If we summarize the results we obtained so far:
coL∞ [b] = CO[b], pL1 [b] = P [b];
coL2 [b] = CO[b], pL2 [b] = P [b];
coL1 [b] = CO[b], pL∞ [b] = P [b],

we can recognize the dual role of the norms L∞ and L1 in the two problems (at least
in the binary case). It is natural to call the set CO[b] the collection of consonant
belief functions compatible with b.

12.3.3 Compatible consonant belief functions

We can try and give a characterization of compatible co.b.f.s in terms of an order


relation similar to weak inclusion for P [b] (12.18). Looking at Figure 12.7 we can
note that, in the binary case, CO[b] is the set of co.b.f.s co for which:
404 12 Consonant approximation

Fig. 12.7. Characterization of compatible consonant belief functions in terms of the ref-
erence frame (X, Y ) formed by the probability line and the line P ⊥ orthogonal to P in
P = [1/2, 1/2]0 .

X(co) ≥ X(b), Y (co) ≥ Y (b)

in the reference frame (X, Y ) with origin O = P = [1/2, 1/2]0 . The coordinates of
a belief function b in this reference frame can be computed through simple trigono-
metric arguments, and are given by
mb (Θ) mb (x)−mb (y)
X(b) = √
2
, Y (b) = √
2
.

More interestingly, X(b) is the L2 distance of b from the Bayesian region, while
Y (b) is the distance between b and the orthogonal complement of P in P:

X(b) = kb − PkL2 Y (b) = kb − P ⊥ kL2 .

Furthermore, P ⊥ (or better its segment Cl(bΘ , P) joining bΘ and P) is the set
of belief functions in which the mass is equally distributed among events of the
same size ({x} and {y} in the binary case). This link between orthogonality and
equidistribution is true in the general case too (recall Theorem 70 [283]).
In conclusion, at least in the binary case the consonant belief functions compat-
ible with b ∈ B2 are those which are simultaneously less Bayesian and less equally
distributed than b. The question of the existence of a set of compatible consonant be-
lief functions in the general case is something which we plan to explore in upcoming
work.
12.4 Consonant approximation in the mass space 405

12.4 Consonant approximation in the mass space


Let us now compute the analytical form of all Lp consonant approximations in
the mass space, in both its RN −1 and RN −2 forms (see Section 12.2.1, Equations
(12.12) and (12.13)). We start by analyzing the difference vector mb −mco between
the original mass vector and its approximation.
In the complete, (N − 1)-dimensional version M of the mass space (see Equa-
tion (12.12)), the mass vector associated with an arbitrary consonant
P belief function
co with maximal chain of focal elements C reads as mco = A∈C mco (A)mA . The
sought difference vector is therefore:
X  X
mb − mco = mb (A) − mco (A) mA + mb (B)mB . (12.20)
A∈C B6∈C

When picking a (N − 2)-dimensional section of the mass space (see Equation


(12.13)), instead, we need to distinguish whether the missing focal element Ā is
an element of the desired maximal chain C or not. In the former case, the mass vec-
tor associated
P with the same, arbitrary consonant b.f. co with maximal chain C is
mco = A∈C,A6=Ā mco (A)mA , and the difference vector:
X X
mb − mco = (mb (A) − mco (A))mA + mb (B)mB . (12.21)
A∈C,A6=Ā B6∈C

If, instead, thePmissing component is not an element of C, the arbitrary consonant


b.f. is mco = A∈C mco (A)mA while the difference vector becomes:
X  X
mb − mco = mb (A) − mco (A) mA + mb (B)mB . (12.22)
A∈C B6∈C,B6=Ā

One can observe that, since (12.22) coincides with (12.20) (factoring out the miss-
ing component Ā) minimizing the Lp norm of the difference vector in a (N − 2)-
dimensional section of the mass space which leaves out a focal element outside
the desired maximal chain yields the same results as in the complete mass space1 .
Therefore, in what follows we only consider consonant approximations in (N − 2)-
dimensional sections obtained by excluding a component associated with an element
Ā ∈ C of the desired maximal chain.
In the following we denote by COCM\Ā,Lp [mb ] (uppercase) the set of partial
Lp approximations of b with maximal chain C in the section of the mass space
which excludes Ā ∈ C. We drop the superscript C for global solutions, drop \Ā
for solutions in the complete mass space, and use coCM\Ā,Lp [mb ] (lowercase) for
pointwise solutions and the barycenters of sets of solutions.
1
The absence of the missing component mb (Ā) in (12.22) implies in fact a small difference
when it comes to the L∞ approximation: Theorem 77 and Equation (12.26), concerning
the vertices of the polytope of L∞ approximations, remain valid as long as we replace
maxB6∈C mb (B) with maxB6∈C,B6=Ā mb (B).
406 12 Consonant approximation

12.4.1 Results of Lp consonant approximation in the mass space

L1 approximation Minimising the L1 norm of the difference vectors (12.20),


(12.21) yields the following result.

Theorem 75. Given a belief function b : 2Θ → [0, 1] with basic probability assign-
ment mb , the partial L1 consonant approximations of b with maximal chain of focal
elements C in the complete mass space M is the set of co.b.f.s co with chain C such
that mco (A) ≥ mb (A) ∀A ∈ C. They form a simplex:

COCM,L1 [mb ] = Cl mĀ



L1 [mb ], Ā ∈ C , (12.23)

whose vertices have b.p.a.:


 X
 mb (A) + mb (B) A = Ā,
mĀL1 [mb ](A) = B6∈C (12.24)
mb (A) A ∈ C, A 6= Ā,

and whose barycenter has mass assignment:


1 X
coCM,L1 [mb ](A) = mb (A) + mb (B) ∀A ∈ C. (12.25)
n
B6∈C

The set of global L1 approximations of b is the union of the simplices (12.23)


associated with the maximal chain(s) which maximize(s) their total original mass:
[
COM,L1 [mb ] = COCM,L1 [mb ].
P
C∈arg maxC A∈C mb (A)

The partial L1 consonant approximation coCM\Ā,L1 [mb ] of b in the section of the


mass space M with missing component Ā ∈ C is unique and has b.p.a. (12.24).
The globalPsuch approximation(s) are also associated with the maximal chains
arg maxC A∈C mb (A).

L2 approximation In order to find the L2 consonant approximation(s) in M, in-


stead, it is convenient to recall that the minimal L2 distance between a point and a
vector space is attained by the point of the vector space V such that the difference
vector is orthogonal to all the generators gi of V :

arg min kp − qkL2 = q̂ ∈ V : hp − q̂, gi i = 0 ∀i


q∈V

whenever p ∈ Rm , V = span({gi , i}). Instead of minimizing the L2 norm of the


difference vector kmb − mco kL2 we can thus impose a condition of orthogonality
between the difference vector itself mb − mco and each component COCM of the
consonant complex in the mass space.
12.4 Consonant approximation in the mass space 407

Theorem 76. Given a belief function b : 2Θ → [0, 1] with basic probability assign-
ment mb , the partial L2 consonant approximation of b with maximal chain of focal
elements C in the complete mass space M has mass assignment (12.25):

coCM,L2 [mb ] = coCM,L1 [mb ].

The set of all global L2 approximations is:


[
COM,L2 [mb ] = coCM,L2 [mb ],
P 2
C∈arg minC B6∈C (mb (B))

i.e., the union of the partial solutions associated with maximal chains of focal ele-
ments which minimise the sum of square masses outside the chain.
The partial L2 consonant approximation of b in the section of the mass space
with missing component Ā ∈ C is unique, and coincides with the L1 partial conso-
nant approximation in the same section (12.24):

coCM\Ā,L2 [mb ] = coCM\Ā,L1 [mb ].

The L2 global approximations in the section form the Punion of the related partial
approximations associated with the chains: arg minC B6∈C (mb (B))2 .
Note that global solutions in the L1 and L2 cases fall in general onto different sim-
plicial components of CO.

L∞ approximation
Theorem 77. Given a belief function b : 2Θ → [0, 1] with basic probability as-
signment mb , the partial L∞ consonant approximations of b with maximal chain of
focal elements C in the complete mass space M form a simplex:

COCM,L∞ [mb ] = Cl mĀ



L∞ [mb ], Ā ∈ C

whose vertices have b.p.a.: mĀ


L∞ [mb ](A) =

 mb (A) + max
 mb (B) A ∈ C, A 6= Ā,
B6∈C
= m (Ā) + max m (B) +
X 
 b b mb (B) − n max mb (B) A = Ā,
B6∈C B6∈C

B6∈C
(12.26)
whenever the belief function to approximate is such that:
1 X
max mb (B) ≥ mb (B). (12.27)
B6∈C n
B6∈C

When the opposite is true, the sought partial L∞ consonant approximation re-
duces to a single consonant belief function, the barycenter of the above simplex,
located on the partial L2 approximation (and barycenter of the L1 partial approxi-
mations) (12.25).
408 12 Consonant approximation

When (12.27) holds, the global L∞ consonant approximations are associated


with the maximal chain(s) of focal elements:

arg min max mb (B); (12.28)


C B6∈C

otherwise they correspond to the maximal chains:


X
arg min mb (B).
C
B6∈C

The partial L∞ consonant approximations of b in the section of the mass space


M with missing component Ā ∈ C form a set COCM\Ā,L∞ [mb ] whose elements
have b.p.a. mco such that:

mb (A) − max mb (B) ≤ mco (A) ≤ mb (A) + max mb (B) ∀A ∈ C, A 6= Ā.


B6∈C B6∈C
(12.29)
Its barycenter reassigns all the mass originally outside the desired maximal chain
C to Ā, leaving the masses of the other elements of the chain untouched (12.24):

coCM\Ā,L∞ [mb ] = coCM\Ā,L2 [mb ] = mĀ


L1 [mb ].

The related global approximations of b are associated with the optimal chain(s)
(12.28).

12.4.2 Semantics of partial consonant approximations in M

N − 1 representation Summarizing, the partial Lp approximations of an arbitrary


mass function mb in the complete mass space M are:

COCM,L1 [mb ] = n Cl mĀ



L1 [mb ], Ā ∈ C o
= co ∈ COCM : mco (A) ≥ mb (A) ∀A ∈ C ;
1 X
coCM,L2 [mb ] = coCM,L1 [mb ] : mco (A) = mb (A) + mb (B), (12.30)
n
B6∈C
Cl(mL
 ∞
, Ā ∈ C) if (12.27) holds
COCM,L∞ [mb ] = C

coM,L2 [mb ] else.

We can observe that, for each desired maximal chain of focal elements C:
1. the L1 partial approximations of b are those consonant b.f.s whose mass assign-
ment dominates that of b over all the elements of the chain;
2. this set is a fully admissible simplex, whose vertices are obtained by re-
assigning all the mass outside the desired chain to a single focal element of
the chain itself (see (12.24));
3. its barycenter coincides with the L2 partial approximation with the same chain,
which redistributes the original mass of focal elements outside the chain to all
the elements of the chain on an equal basis (12.25);
12.4 Consonant approximation in the mass space 409

4. when the partial L∞ approximation is unique, it coincides with the L2 approx-


imation and the barycenter of the L1 approximations;
5. when it is not unique, it is a simplex whose vertices assign to each element of
the chain (but one) the maximal mass outside the chain, and whose barycenter
is again the L2 approximation.
Note that the simplex of L∞ partial solutions (point 5.) may fall outside the sim-
plex of consonant belief functions with the same chain - therefore, some of those
approximations will not be admissible.

N − 2 representation When adopting a (N − 2)-dimensional section M \ Ā of


the mass space, instead, the partial Lp approximations are:

coCM\Ā,L1 [mb ] = coCM\Ā,L2 [mb ]



 mco (A) = mb (A), AX ∈ C, A 6= Ā
= coCM\Ā,L∞ [mb ] : mco (Ā) = mb (Ā) + mb (B);

n B6 ∈ C
COCM\Ā,L∞ [mb ] = co ∈ COCM : mco (A) − mb (A) ≤ max mb (B)

o B6∈C
∀A ∈ C, A 6= Ā .
(12.31)
Therefore, for each desired maximal chain C:
– the L1 and L2 partial approximations are uniquely determined, and coincide with
the barycenter of the set of L∞ partial approximations;
– their interpretation is straightforward: all the mass outside the chain is re-assigned
to a single focal element of the chain Ā ∈ C;
– the set of L∞ (partial) approximations falls entirely inside the simplex of ad-
missible consonant b.f.s only if each focal element in the desired chain has mass
greater then all focal elements outside the chain:

min mb (A) ≥ max mb (B);


A∈C B6∈C

– the latter forms a generalized rectangle in the mass space M, whose size is deter-
mined by the largest mass outside the desired maximal chain.

Comparison and general overview As a general trait, approximations in the mass


space amount to some redistribution of the original mass to focal elements of the
desired maximal chain. The relationships between the different Lp consonant ap-
proximations in the full mass space and those in any arbitrary (N − 2)-dimensional
section of M are summarized in the diagram of Figure 12.8: while being both ac-
ceptable geometric representations of mass vectors, the two approaches generate
different but related results.
By Equation (12.31), the L1 /L2 approximations in all the (N − 2)-dimensional
sections of the mass space (and the barycenters of the related sets of L∞ approx-
imations) track all the vertices of the L1 simplex in M. As it is quite arbitrary to
410 12 Consonant approximation

Fig. 12.8. Graphical representation of the relationships between the different (partial) Lp
consonant approximations with desired maximal chain C, in the related simplex COCM of
the consonant complex CO. Approximations in the full mass space M and approximations
computed in a (N − 2)-dimensional section with missing component Ā ∈ C are compared.
In the latter case, the special case Ā = Θ is highlighted.

select the component Ā to neglect, the latter simplex (and its barycenter) seem to
play a privileged role.
L∞ approximations in any such sections are not entirely admissible, and do not
show a particular relation with the simplex of L1 , M solutions. It remains to be de-
termined the relation between the L∞ partial solutions in the full mass space and
those computed in its sections M \ Ā.

Theorem 78. Given a belief function b : 2Θ → [0, 1] with b.p.a. mb and a maximal
chain of focal elements C, the partial L∞ consonant approximations of b in the
complete mass space COCM,L∞ [mb ] are not necessarily partial L∞ approximations
COCM\Ā,L∞ [mb ] in the section excluding Ā. However, for all Ā ∈ C the two sets of
approximations share the vertex (12.26).

Notice that in the ternary case (n = 3) condition (12.73)


1 X
max mb (B) > mb (B)
B6∈C n−2
B6∈C

(see the proof of Theorem 78 in the Appendix) becomes:


12.4 Consonant approximation in the mass space 411
X
max mb (B) > mb (B),
B6∈C
B6∈C

which is impossible. Therefore: if |Θ| = 3 the set of L∞ partial consonant approxi-


mations of a b.f. b in the full mass space (the blue triangle in Figure 12.8) is a subset
of the set of its L∞ partial consonant approximations in the section of A which
neglects the component Ā, for any choice of Ā ∈ C (see Section 12.4.5).

12.4.3 Computability and admissibility of global solutions

As far as global solutions are concerned, we can observe the following facts:
– in the L1 case, in both the (N − 1)- and (N − 2)-dimensional representations,
the optimal chain(s) are:
X X
arg min mb (B) = arg max mb (A);
C C
B6∈C A∈C

– in the L2 case, again in both representations, these are:


X
arg min (mb (B))2 ;
C
B6∈C

– in the L∞ case, the optimal chain(s) are:

arg min max mb (B),


C B6∈C

unless the approximation is unique in the full mass space, in which case the opti-
mal chains behave as in the L1 case.

Admissibility of partial and global solutions Concerning their admissibility, we


know that all L1 /L2 partial solutions are always admissible, in both representations
of mass vectors. As for the L∞ case, in the full mass space not even global solutions
are guaranteed to have all admissible vertices (Equation (12.26)). Indeed:
X
∆= mb (B) − n · max mb (B) ≤ 0
B6∈C
B6∈C

as condition (12.27) holds, therefore mĀ


L∞ [mb ](Ā) can be negative.
In M \ Ā, by (12.29) the set of L∞ approximations is entirely admissible iff:

min mb (A) ≥ max mb (B). (12.32)


A∈C,A6=Ā B6∈C

A counterexample shows that minimizing maxB6∈C mb (B) (i.e., considering global


L∞ solutions) does not necessarily imply (12.32): think of a belief function b with
no focal elements of cardinality 1, but several focal elements of cardinality 2.
The computation of the admissible part of this set of solutions is not trivial, and
is left to future work.
412 12 Consonant approximation

Computational complexity of global solutions In terms of computability, finding


the global L1 /L2 approximations involves therefore finding the maximal mass/square
mass chain(s). This is expensive as we have to examine all n! of them. The most fa-
vorable case (in terms of complexity) is the L∞ one, as all the chains which do not
contain the maximal mass element(s) are optimal. Looking for the maximal mass fo-
cal elements requires a single pass of the list of f.e.s, with complexity O(2n ) rather
than O(n!). On the other hand, in this case the global consonant approximations
are spread over a potentially large number of simplicial components of CO, and are
therefore less informative.

12.4.4 Relation with other approximations

This behavior compares unfavorably with that of two other natural consonant ap-
proximations.
Definition 83. Given a belief function b : 2Θ → [0, 1], its isopignistic conso-
nant approximation [415] is defined as the unique consonant b.f. coiso [b] such that
BetP [coiso [b]] = BetP [b]. Its contour function is:
X n o
plcoiso [b] (x) = min BetP [b](x), BetP [b](x0 ) . (12.33)
x0 ∈Θ

It is well known that, given the contour function plb of a consistent belief function
b : 2Θ → [0, 1] (such that maxx plb (x) = 1) we can obtain the unique consonant
b.f. which has plb as contour function via the following formulae:

plb (xi ) − plb (xi+1 ) i = 1, ..., n − 1,
mco (Ai ) = (12.34)
plb (xn ) i = n,

where x1 , ..., xn are the singletons of Θ sorted by plausibility value, and Ai =


{x1 , ..., xi } for all i. Such a unique transformation is not in general feasible for
arbitrary belief functions.
The isopignistic transform builds a contour function (possibility distribution)
from the pignistic values of the singletons, in the following way. Given the list of
singletons x1 , ..., xn ordered by pignistic value, (12.33) reads as:
i−1
X 
plcoiso [b] (xi ) = 1 − BetP [b](xj ) − BetP [b](xi )
j=1
n
X
= BetP [b](xj ) + (i − 1)BetP [b](xi ).
j=i

By applying (12.34) we obtain the following mass values:



mcoiso [b] (Ai ) = i · BetP [b](xi ) − BetP [b](xi+1 ) , i = 1, ..., n. (12.35)
12.4 Consonant approximation in the mass space 413

Definition 84. Given a belief function b : 2Θ → [0, 1], its contour-based consonant
approximation with maximal chain of focal elements C = {A1 ⊂ · · · ⊂ An } has
mass assignment:

 1 − plb (x2 ) i = 1,
mcocon [b] (Ai ) = plb (xi ) − plb (xi+1 ) i = 2, ..., n − 1, (12.36)
plb (xn ) i = n,

.
where xi = Ai \ Ai−1 for all i = 1, ..., n.
Such approximation uses the (unnormalized) contour function of an arbitrary b.f. b
as if it was a possibility distribution, by replacing the plausibility of the maximal
element with 1, and applying the mapping (12.34).
In order to guarantee their admissibility, both the isopignistic and the contour-
based approximations require sorting (respectively) the pignistic and the plausibil-
ity values of the singletons (an operation whose complexity is O(n log n)). On top
of that, though, one must add the complexity of actually computing the value of
BetP [b](x) (plb (x)) from a mass vector, which requires n scans (one for each sin-
gleton x) with an overall complexity of n · 2n .
An interesting relationship between outer consonant and L1 consonant approxi-
mation in the mass space M can also be pointed out.
Theorem 79. Given a belief function b : 2Θ → [0, 1], the set of partial L1 conso-
nant approximations COCM,L1 [mb ] with maximal chain of focal elements C in the
complete mass space and the set OC C [b] of its partial outer consonant approxima-
tions with the same chain have non-empty intersection. This intersection contains
at least the convex closure of the candidate vertices of OC C [b] whose assignment
functions are such that B(Ai ) = Ai for all i = 1, ..., n.
Proof. Clearly if B(Ai ) = Ai for all i = 1, ..., n, then the mass mb (Ai ) is re-
assigned to Ai itself for each element Ai of the chain. Hence mco (Ai ) ≥ mb (Ai ),
and the co.b.f. belongs to COCM,L1 [mb ] (see Equation (12.30)). 
In particular, both coCmax [b] (12.2) and

 mco (A) = mb (A), X A ∈ C, A 6= Θ,
coCM\Θ,L1 /2 [mb ] : mco (Θ) = mb (Θ) + mb (B) A = Θ, (12.37)

B6∈C

belong to both (partial) outer and L1 , M consonant approximations. The quantity


(12.37) is generated by the trivial assignment function assigning all the mass mb (B),
B ⊆ Θ, B 6∈ C to An = Θ: B(B) = Θ for all B 6∈ C.
A negative result can, on the other hand, be proven for L∞ approximations.
Theorem 80. Given a belief function b : 2Θ → [0, 1], the set of its (partial) outer
consonant approximations OC C [b] with maximal chain C and the set of its partial
L∞ approximations (in the complete mass space) with the same chain may have
empty intersection.
In particular, coCmax [b] is not necessarily an L∞ , M approximation of b.
414 12 Consonant approximation

12.4.5 Ternary example

To conclude the analysis of L∞ consonant approximations in the mass space, it


can be useful to compare the different results in the toy case of a ternary frame,
Θ = {x, y, z}.
Let the desired consonant approximation have maximal chain C = {{x} ⊂
{x, y} ⊂ Θ}. Figure 12.9 illustrates the different partial Lp consonant approxima-
tions in M in the simplex of consonant belief functions with chain C, for a belief
function b with masses:

mb (x) = 0.2, mb (y) = 0.3, mb (x, z) = 0.5. (12.38)

Notice that only the Lp approximations in the section with Ā = Θ are shown for
sake of simplicity. The example confirms the general picture of their relationships
given in Figure 12.8.
According to the formulae at page 8 of [250] (see also Section 12.1.8), the set
of outer consonant approximations of (12.38) with chain {{x}, {x, y}, Θ} is the
convex closure of the points:

mB1 ,B2 = [mb (x), mb (y), 1 − mb (x) − mb (y)]0 ,


mB3 ,B4 = [mb (x), 0, 1 − mb (x)]0 ,
mB5 ,B6 = [0, mb (x) + mb (y), 1 − mb (x) − mb (y)]0 ,
(12.39)
mB7 ,B8 = [0, mb (x), 1 − mb (x)]0 ,
mB9 ,B10 = [0, mb (y), 1 − mb (y)]0 ,
mB11 ,B12 = [0, 0, 1]0 .

These points are plotted in Figure 12.9 as empty squares. We can observe that, as
proven by Theorem 79, both coCmax [b] (12.2) and coCM\Θ,L1 /2 [mb ] (12.37) belong
to the intersection of (partial) outer and L1 , M consonant approximations.
The example also suggests that (partial) outer consonant approximations are
included in L∞ consonant approximations, calculated by neglecting the component
Ā = Θ. However, this is not so as attested by the binary case Θ = {x, y}, for
which the L∞ , M \ Θ solutions satisfy, for the maximal chain C = {{x} ⊂ Θ}:
mb (x) − mb (y) ≤ mco (x) ≤ mb (x) + mb (y), while the outer approximations are
such that 0 ≤ mco (x) ≤ mb (x).
As for isopignistic and contour-based approximations, they coincide in this case
with the vectors:
miso = [0.15, 0.1, 0.75]0 ,
mcon = [1 − plb (y), plb (y) − plb (z), plb (z)]0 = [0.7, −0.2, 0.5]0 .

The pignistic values of the elements in this example are BetP [b](x) = 0.45,
BetP [b](y) = 0.3, BetP [b](z) = 0.25 so that the chain associated with the
isopignistic approximation is indeed {{x}, {x, y}, Θ}. Notice though that ‘pseudo’
isopignistic approximations can be computed for all chains via Equation (12.35),
none of which will be admissible. The contour-based approximation is not admissi-
ble in this case, as singletons have a different plausibility ordering.
12.5 Consonant approximation in the belief space 415

While no relationship whatsoever seems to link isopignistic and Lp consonant


approximations (as expected), the former appears to be an outer approximation as
well. As for the contour-based approximation, it coincides in this example with a
vertex of the set of L∞ , M approximations. However, this is not generally true: just
compare Equations (12.26) and (12.36).

Fig. 12.9. The simplex COC in the mass space of consonant belief functions with maximal
chain C = {{x} ⊂ {x, y} ⊂ Θ} defined on Θ = {x, y, z}, and the Lp partial consonant
approximations in M of the belief function with basic probabilities (12.38). The L2 , M
approximation is plotted as a red square, as the barycenter of both the sets of L1 , M (blue
triangle) and L∞ , M (green triangle) approximations. The maximal outer approximation
is denoted by a yellow square, the contour-based approximation is a vertex of the triangle
L∞ , M. The various Lp approximations are also depicted for the section M \ Θ of the mass
space: the unique L1 /L2 approximation is a vertex of L1 , M, while the polytope of L∞
approximations in the section is depicted in light green. The related set OC C [b] of partial
outer consonant approximations (12.39) is also shown for comparison (light yellow), while
the isopignistic function is represented by a star.

12.5 Consonant approximation in the belief space


We have seen that consonant approximations in the mass space have quite natural
semantics in terms of mass redistributions. As we see in this Section, (partial) Lp
approximations in the belief space are instead closely associated with lists of belief
416 12 Consonant approximation

values determined by the desired maximal chain, and through the latter to other
natural approximations.
We first need to make explicit the analytical form of the difference vector b − co
between the original b.f. b and the desired approximation co.

Lemma 16. Given a belief function b : 2Θ → [0, 1] and an arbitrary consonant


b.f. co defined on the same frame with maximal chain of focal elements C = {A1 ⊂
· · · ⊂ An }, the difference between the corresponding vectors in the belief space is:

X n−1
X X  i
X 
b − co = b(A)xA + xA γ(Ai ) + b(A) − mb (Aj ) ,
A6⊃A1 i=1 A⊇Ai ,A6⊃Ai+1 j=1
(12.40)
where X
γ(A) = (mb (B) − mco (B))
B⊆A,B∈C

and {xA , ∅ =
6 A ( Θ} is the usual orthonormal reference frame in the belief space
B (Section 6.1).

12.5.1 L1 approximation

A compact expression for the set of partial L1 consonant approximations in B can be


found in terms of the innermost values of a list of belief values very much related to
the maximal outer consonant approximation (12.2), as we will see in Section 12.5.4.
Theorem 81. Given a belief function b : 2Θ → [0, 1], and a maximal chain of focal
elements C = {A1 ⊂ · · · ⊂ An } in Θ, the partial L1 consonant approximations
COCB,L1 [b] in the belief space with maximal chain C have mass vectors forming the
following convex closure:
 0 
Cl b1 , b2 −b1 , · · · , bi −bi−1 , · · · , 1−bn−1 , bi ∈ γint1
 i i

, γint2 ∀i = 1, ..., n−1 ,
(12.41)
i i
where γint1 , γint2 are the innermost (median) elements of the list of belief values:
n o
Li = b(A), A ⊇ Ai , A 6⊃ Ai+1 . (12.42)

n−1 n−1
In particular, bn−1 = γint1 = γint2 = b(An−1 ).
Note that, even though the approximation is computed in B, we present the result
in terms of mass assignments as they are simpler and easier to interpret. The same
holds for the other Lp approximations in B.
Due to the nature of partially ordered set of 2Θ , the innermost values of the above
lists (12.42) cannot be analytically identified in full generality (even though they can
be easily computed numerically). Nevertheless, the partial L1 approximations in B
can be analytically derived in some cases. By (12.41), the barycenter of the set of
partial L1 consonant approximations in B has mass vector:
12.5 Consonant approximation in the belief space 417
1
 1 2 2 1 1
0
γint1 + γint2 γint1 + γint2 γint1 + γint2
mcoCB,L [b] = , − , · · · , 1 − b(A n−1 ) .
1 2 2 2
(12.43)
The global L1 approximation(s) can be easily derived from the expression of the
norm of the difference vector (see proof of Theorem 81, Equation (12.76)).
Theorem 82. Given a belief function b : 2Θ → [0, 1], its global L1 consonant
approximations COB,L1 [b] in B live in the collection of partial such approximations
associated with maximal chain(s) which maximize the cumulative lower halves of
the lists of belief values Li (12.42):
X X
arg max b(A). (12.44)
C
i b(A)∈Li ,b(A)≤γint1

12.5.2 (Partial) L2 approximation

To find the partial consonant approximation(s) at minimal L2 distance from b in B


we need to impose the orthogonality of the difference vector b − co with respect to
any given simplicial component COCB of the complex COB :

hb − co, bAj − bΘ i = hb − co, bAj i = 0 ∀Aj ∈ C, j = 1, ..., n − 1 (12.45)

as bΘ = 0 is the origin of the Cartesian space in B, and bAj −bΘ for j = 1, ..., n−1
are the generators of the component COCB .
Using once again expression (12.40), the orthogonality conditions (12.45) trans-
late into the following linear system of equations:
X X 
mb (A)hbA , bAj i + mb (A) − mco (A) hbA , bAj i = 0 (12.46)
A6∈C A∈C,A6=Θ

for all j = 1, ..., n − 1. This is a linear system in n − 1 unknowns mco (Ai ), i =


1, ..., n − 1 and n − 1 equations. The resulting L2 partial approximation of b is also
a function of the list of belief values (12.42).
Theorem 83. Given a belief function b : 2Θ → [0, 1], its partial2 L2 consonant
approximation coCB,L2 [b] in B with maximal chain C = {A1 ⊂ · · · ⊂ An } is unique,
and has basic probability assignment:

mcoCB,L [b] (Ai ) = ave(Li ) − ave(Li−1 ) ∀i = 1, ..., n, (12.47)


2

.
where L0 = {0}, and ave(Li ) is the average of the list of belief values Li (12.42):
1 X
ave(Li ) = |Aci+1 |
b(A). (12.48)
2 A⊇Ai ,A6⊃Ai+1

2
The computation of the global L2 approximation(s) is rather involved. We plan to solve
this issue in the near future.
418 12 Consonant approximation

12.5.3 L∞ approximation

Partial approximations The behavior of partial L∞ approximations in B shows


similarities with that of their L1 counterparts, as they also form a convex set of
solutions for each desired maximal chain.
Theorem 84. Given a belief function b : 2Θ → [0, 1], its partial L∞ consonant
approximations in the belief space COCB,L∞ [b] with maximal chain of focal elements
C = {A1 ⊂ · · · ⊂ An } have mass vectors which live in the following convex closure
of 2n−1 vertices:

0
Cl b1 , b2 − b1 , · · · , bi − bi−1 , · · · , 1 − bn−1 ∀i = 1, ..., n − 1


b(Ai ) + b({xi+1 }c ) b(Ai ) + b({xi+1 }c )


 
bi ∈ − b(Ac1 ) + , b(Ac1 ) + .
2 2
(12.49)
The barycenter coCB,L∞ [b] of this set has mass assignment:

b(A1 ) + b({x2 }c )

i = 1,


2


mcoCB,L [b] (Ai ) = b(Ai ) − b(Ai−1 ) plb ({xi }) − plb ({xi+1 })
∞ + i = 2, ..., n − 1,
2 2



1 − b(An−1 ) i = n.

(12.50)
Note that, since b(Ac1 ) = 1 − plb (A1 ) = 1 − plb (x1 ), the size of the polytope
(12.49) of partial L1 approximations of b is a function of the plausibility of the
innermost desired focal element only. As expected, it reduces to zero only when the
b is a consistent belief function (see Section 9.4) and A1 = {x1 } has plausibility 1.
A straightforward interpretation of the barycenter of the partial L∞ approxi-
mations in B in terms of degrees of belief is possible when we notice that, for all
i = 1, ..., n:
mcoCmax [b] (Ai ) + mcocon [b] (Ai )
mco (Ai ) =
2
(recall Equations (12.2) and (12.36)), i.e., (12.50) is the average of the maximal
outer consonant approximation and what we called ‘contour-based’ consonant ap-
proximation (Definition 84).

Global approximations To compute the global L∞ approximation of the original


belief function b in B, we need to locate as usual the partial solution whose L∞
distance from b is the smallest.
Given the expression (12.74) of the L∞ norm of the difference vector (see the
proof of Theorem 84), such partial distance is (for each maximal chain C = {A1 ⊂
· · · ⊂ An = Θ}) equal to b(Ac1 ). Therefore the global L∞ consonant approxima-
tions of b in the belief space are associated with the chains of focal elements:

arg min b(Ac1 ) = arg min(1 − plb (A1 )) = arg max plb (A1 ).
C C C
12.5 Consonant approximation in the belief space 419

Theorem 85. Given a belief function b : 2Θ → [0, 1], the set of global L∞ conso-
nant approximations of b in the belief space is the collection of partial approxima-
tions associated with maximal chains whose smallest focal element is the maximal
plausibility singleton:
[
COB,L∞ [b] = COCB,L∞ [b].
C:A1 =arg maxx plb (x)

12.5.4 Approximations in B as generalized maximal outer approximations


As it appears from Theorems 81, 83 and 84, a comprehensive view of our results on
Lp consonant approximation in the belief space can be given in terms of the lists of
belief values (12.42):
n o
Li = b(A), A ⊇ Ai , A 6⊃ Ai+1 ∀i = 1, ..., n,

n included, as Ln = {b(Θ)} = {1}.


Indeed, the basic probability assignments of all the partial approximations in the
belief space are differences of simple functions of belief values taken from these
lists, which are uniquely determined by the desired chain for focal elements A1 ⊂
· · · ⊂ An . Namely:
mcoCmax [b] (Ai ) = min(Li ) − min(Li−1 );
mcoCcon [b] (Ai ) = max(Li ) − max(Li−1 );
int1 (Li ) + int2 (Li ) int1 (Li−1 ) + int2 (Li−1 )
mcoCB,L [b] (Ai ) = − ; (12.51)
1 2 2
mcoCB,L [b] (Ai ) = ave(Li ) − ave(Li−1 );
2
max(Li ) + min(Li ) max(Li−1 ) + min(Li−1 )
mcoCB,L [b] (Ai ) = − ,
∞ 2 2
where the expression for coCB,L∞ [b] comes directly from (12.50).
As for each vertex of the L1 polytope, either one of the innermost elements of the
i-th list int1 (Li ), int2 (Li ) is picked from the list Li , for each component of the
mass vector. This yields:
mco (Ai ) = int1 (Li )/int2 (Li ) − int1 (Li−1 )/int2 (Li−1 ),
(where / denotes the alternative choice). For each vertex of the L∞ polytope, either
max(Li ) or min(Li ) is selected, yielding:
mco (Ai ) = max(Li )/ min(Li ) − max(Li−1 )/ min(Li−1 ).
The different approximations in B (12.51) correspond therefore to different
choices of a representative for the list Li . The maximal outer approximation coCmax [b]
is obtained by picking as representative min(Li ), coCcon [b] amounts to picking
max(Li ), the barycenter of the L1 approximations to choosing the average inner-
most (median) value, the barycenter of the L∞ approximations to the average out-
ermost value, L2 to picking the overall average value of the list. Each vertex of the
L1 solutions amounts to selecting, for each component, either one of the innermost
values; each vertex of the L∞ polytope, either one of the outermost values.
420 12 Consonant approximation

Interpretation of the list Li Belief functions are defined on a partially ordered


set, the power set 2Θ = {A ⊆ Θ}, of which a maximal chain is a maximal totally
ordered subset. Therefore, given two elements of the chain Ai ⊂ Ai+1 , there are a
number of ‘intermediate’ focal elements A which contain the latter but not the for-
mer. If 2Θ were to be a totally ordered set, the list Li would contain a single element
b(Ai ) and all the Lp approximations (12.51) would reduce to the maximal outer con-
sonant approximation coCmax [b], with b.p.a. mcoCmax [b] (Ai ) = b(Ai ) − b(Ai−1 ). The
diversity of Lp approximations in B is therefore a consequence of belief functions
being defined on partially ordered sets: together with the contour-based approxima-
tion (12.36), they can all be seen as different generalizations of the maximal outer
consonant approximation.
c
Relations between approximations in B The list Li is composed by 2|Ai+1 | =
2n−(i+1) elements, for all i = 1, ..., n. Obviously |L0 | = 1 by definition.
We can therefore infer the following relationships between the various Lp ap-
proximations in B:
– the barycenter of the set of L∞ approximations is the average of the maximal
outer and the contour based approximations, always;
– ave(Li ) = max(Li )+min(L
2
i)
= int1 (Li )+int
2
2 (Li )
whenever |Li | ≤ 2. i.e. for
i ≥ n − 2 or i = 0; therefore the last two components of the L1 barycenter, L2 ,
and L∞ barycenter approximations coincide, namely:
mcoCB,L [b] (Ai ) = mcoCB,L [b] (Ai ) = mcoCB,L [b] (Ai )
1 2 ∞

for i = n − 1, n;
– in particular, coCB,L1 [b] = coCB,L2 [b] = coCB,L∞ [b] whenever |Θ| = n ≤ 3;
– all the point-wise approximations in (12.51) coincide on the last component:
mcoCmax [b] (An ) = mcoCcon [b] (An ) = mcoCB,L [b] (An )
1
= mcoCB,L [b] (An ) = mcoCB,L [b] (An ) = 1 − b(An−1 ).
2 ∞

Admissibility As it is clear from the table of Equation (12.51), all the Lp approx-
imations in the belief space are differences of vectors of all positive values; in-
deed, differences of shifted version
i h of the same positive vector.iAs such vectors
h 0 0
int1 (Li )+int2 (Li ) max(Li )+min(Li )
2 ,i = 1, ..., n , 2 ,i = 1, ..., n , [ave(Li ), i =
0
1, ..., n] are not guaranteed to be monotonically increasing for any arbitrary maxi-
mal chain C, none of the related partial approximations are guaranteed to be entirely
admissible. However, sufficient conditions under which they are admissible can be
worked out by studying the structure of the list of belief values (12.42).
Let us first consider comax and cocon . As min(Li−1 ) = b(Ai−1 ) ≤ b(Ai ) =
min(Li ), the maximal partial outer approximation is admissible for all maximal
chains C. As for the contour-based approximation, max(Li ) = b(Ai + Aci+1 ) =
b(xci+1 ) = 1 − plb (xi+1 ) while max(Li−1 ) = 1 − plb (xi ), so that max(Li ) −
max(Li−1 ) = plb (xi ) − plb (xi+1 ) which is guaranteed non-negative if the chain C
is generated by singletons sorted by their plausibility values. Thus, as:
12.5 Consonant approximation in the belief space 421

max(Li ) − max(Li−1 ) min(Li ) − min(Li−1 )


mcoCB,L [b] (Ai ) = + ,
∞ 2 2
the barycenter of the set of L∞ , B approximations is also admissible on the same
chain(s).
A similar but more sophisticated sufficient condition holds in the L1 , L2 cases.
Theorem 86. If a maximal chain C is generated by singletons sorted by their values
. X
plAi+2 (xi ) = mb (B)
B⊆Ai+2 ,B⊃xi

(where plAi+2 (xi ) measures the plausibility of xi given Ai+2 ), then both the partial
L2 consonant approximation and the barycenter of the L1 consonant approxima-
tions in the belief space with maximal chain C are admissible.

12.5.5 Graphical comparison in a ternary example


As we did in the mass space case, it can be helpful to visualize the outcomes of
Lp consonant approximation in the belief space when Θ = {x, y, z}, and com-
pare them with approximations in the mass space on the same example of Section
12.4.5 (Figure 12.10). To obtain a homogeneous comparison, we plot both sets of

Fig. 12.10. Comparison between Lp partial consonant approximations in the mass M and
belief B spaces for the belief function with basic probabilities (12.38) on Θ = {x, y, z}. The
L2 , B approximation is plotted as a red square, as the barycenter of both the sets of L1 , B
(blue segment) and L∞ , B (green quadrangle) approximations. Contour-based and maximal
outer approximations are in this example the extreme of the segment L1 , B (blue squares).
The polytope of partial outer consonant approximations (yellow), the isopignistic approxima-
tion (star) and the various Lp partial approximations in M (in gray levels) are also drawn.

approximations in the belief and in the mass space as vectors of mass values. When
422 12 Consonant approximation

Θ = {x, y, z} and A1 = {x}, A2 = {x, y}, A3 = {x, y, z} the relevant lists of


belief values are L1 = {b(x), b(x, z)} and L2 = {b(x, y)}, so that:

min(L1 ) = int1 (L1 ) = b(x),


max(L1 ) = int2 (L1 ) = b(x, z),
ave(L1 ) = b(x)+b(x,z)
2 ;
min(L2 ) = int1 (L2 ) = max(L2 ) = int2 (L2 ) = ave(L2 ) = b(x, y).

Therefore, the set of L1 partial consonant approximations is, by Equation (12.41), a


segment Cl(m1L1 , m2L1 ), with vertices:
 0
m1L1 = b(x), b(x, y) − b(x), 1 − b(x, y) ,
 0 (12.52)
m2L1 = b(x, z), b(x, y) − b(x, z), 1 − b(x, y)

(see Figure 12.10). Note that this set is not entirely admissible, not even in this
ternary example.
The partial L2 approximation in B is, by (12.51), unique, with mass vector:
 0
b(x) + b(x, z) b(x) + b(x, z)
mcoB,L2 [b] = mcoB,L∞ [b] = , b(x, y)− , 1−b(x, y) ,
2 2
(12.53)
and coincides with the barycenter of the set of partial L∞ approximations (note that
this is not so in the general case).
As for the full set of partial L∞ approximations, this has vertices (12.49):
h i0
b(x)+b(x,z) b(x)+b(x,z)
2 − b(y, z), b(x, y) − 2 , 1 − b(x, y) + b(y, z) ;
h i0
b(x)+b(x,z)
2 − b(y, z), b(x, y) − b(x)+b(x,z)
2 + 2b(y, z), 1 − b(x, y) − b(y, z) ;
h i0
b(x)+b(x,z) b(x)+b(x,z)
2 + b(y, z), b(x, y) − 2 − 2b(y, z), 1 − b(x, y) + b(y, z) ;
h i0
b(x)+b(x,z) b(x)+b(x,z)
2 + b(y, z), b(x, y) − 2 , 1 − b(x, y) − b(y, z) ,

which, as expected, are not all admissible (see Figure 12.10 again).
The example hints at the possibility that the contour-based approximation and/or
the L2 , L∞ barycenter approximations in the belief space be related to the set of L1
approximations in the full mass space: this deserves further analysis. On the other
hand, we know that the maximal partial outer approximation (12.2) is not in general
a vertex of the polygon of L1 partial approximations in B, unlike what the ternary
example (for which int1 (L1 ) = b(x)) suggests.

12.5.6 Some conclusions

Belief versus mass space approximations By comparing the results of Section


12.4 and Section 12.5 we can draw a number of conclusions:
12.5 Consonant approximation in the belief space 423

– Lp consonant approximation in the mass space is basically associated with differ-


ent but related mass redistribution processes: the mass outside the desired chain
of focal elements is re-assigned in some way to the elements of the chain;
– their relationships with classical outer approximations (on one hand) and approx-
imations based on the pignistic transform (on the other) are rather weak;
– the various Lp approximations in M are characterized by natural geometric rela-
tions;
– consonant approximation in the belief space is inherently linked to the lists of
belief values of focal elements ‘intermediate’ between each pair of elements of
the desired chain;
– the classical outer consonant approximations and contour-based approximations
are also approximations of the same type - indeed, they can all be seen as different
generalizations of the maximal outer approximation, induced by the nature of
partially ordered set of the power set;
– in the mass space, some partial approximations are always entirely admissible
and should be preferred (this is the case for the L1 and L2 approximations in M),
some others are not;
– as for the belief case, even though all partial Lp approximations are differences
between shifted versions of the same positive vector, admissibility is not guaran-
teed for all maximal chains; however, they are admissible for chains generated by
singletons sorted by their plausibility (or partial plausibility) values.

Approximation in the mass space and general imaging in belief revision As


it is the case for geometric conditioning in the mass space [256], results of con-
sonant approximations in the mass space can be interpreted as a generalization of
Lewis’ imaging approach to belief revision, originally formulated in the context of
probabilities [841]. The idea behind imaging is that, upon observing that some state
x ∈ Θ is impossible, you transfer the probability initially assigned to x completely
towards the remaining state you deem the most similar to x [1032].
Peter Gärdenfors [502] extended Lewis’ idea by allowing a fraction P λi of the prob-
ability of such state x to be re-distributed to all remaining states xi ( i λi = 1).
In the case of partial consonant approximation of belief functions, the mass
m(B) of each focal element not in the desired maximal chain C should be re-
assigned to the ‘closest’ focal element A ∈ C in the chain. If no information on
the similarity between focal elements is available or make sense in a particular con-
text, ignorance translates into allowing all possible set of weights λ(A), A ∈ C for
Gärdenfors’ (generalized) belief revision by imaging. This yields the set of partial
L1 consonant approximations in M. If such ignorance is expressed by assigning
instead equal weight λ(A) to each A ∈ C, the resulting partial consonant approxi-
mation is the unique partial L2 approximation, the barycenter of the polytope of L1
partial approximations.

Three families of consonant approximations As we pointed out above, Lp conso-


nant approximation in the belief space amounts instead to generalizing in different
424 12 Consonant approximation

but related ways the classical approach incarnated by the maximal outer approxima-
tion (12.2). The latter, together with the contour-based approximation (12.36) form
therefore a different, coherent family of consonant approximations.
As for the isopignistic approximation, it seems to be completely unrelated to ap-
proximations in both the mass and the belief space, as it naturally fits in the context
of the Transferable Belief Model and the use of the pignistic function.
It will be interesting, in this respect, to study the property of geometric con-
sonant approximations with respect to other major probability transforms, such as
orthogonal projection, intersection probability, and relative plausibility and belief of
singletons (since they seem to be related the plausibilities of the singletons).
Isopignistic, mass-space and belief-space consonant approximations form three dis-
tinct families of approximations, with fundamentally different rationales: which ap-
proach to use will therefore vary according to the chosen framework, and the prob-
lem at hand.

Appendix
Proof of Lemma 15

(1) Sufficiency. If Equation (12.5) holds for all focal elements A ⊆ Θ then:
X X X X X
B B
co(A) = mco (X) = αX mb (B) = mb (B) αX
X⊆A X⊆A B⊆X B⊆A B⊆X⊆A
X X
B B
where, by Condition (12.4): αX ≤ αX = 1. Therefore:
B⊆X⊆A X⊇B
X
co(A) ≤ mb (B) = b(A),
B⊆A

i.e., co is weakly included in b.


(2) Necessity. Let us denote by C = {B1 , ..., Bn }, n = |Θ| the chain of focal
elements of co, and consider first the subsets A ⊆ Θ such that A 6⊃ B1 (A 6∈ C).
In this case co(A) = 0 ≤ mb (A) whatever the mass assignment of b: we then just
need to focus on the elements A = Bi ∈ C of the chain.
We need to prove that for all Bi ∈ C:
X
B
mco (Bi ) = αB i
mb (B) ∀i = 1, · · · , n. (12.54)
B⊆Bi

. B
Let us introduce the notation αiB = αB i
for sake of simplicity. For each i we can
sum up the first i equations of system (12.54) and obtain the equivalent system of
equations: X
co(Bi ) = βiB mb (B) ∀i = 1, · · · , n (12.55)
B⊆Bi
12.5 Consonant approximation in the belief space 425
Pi
as co(Bi ) = j=1 mco (Bj ) for co is consonant. For all B ⊆ Θ the coefficients
. Pi
βiB = j=1 αjB need to satisfy:

0 ≤ βiBmin ≤ · · · ≤ βnB = 1, (12.56)


where imin = min{j : Bj ⊇ B}.
We can prove by induction on i that if co is weakly included in b, i.e., co(Bi ) ≤
b(Bi ) for all i = 1, ..., n, then there exists a solution {βiB , B ⊆ Bi , i = 1, ..., n} to
system (12.55) which meets the constraint (12.56).
Let us look for solutions of the form:
 X 
X


 co(B i ) − β i m b (X) 


 X⊆B

B B i−1
βi , B ⊆ Bi−1 ; βi = X , B ⊆ Bi , B 6⊂ Bi−1


 mb (X) 


 
X⊆Bi ,X6⊂Bi−1
(12.57)
in which the coefficients (variables) βiB associated with subsets of the previous f.e.
Bi−1 are left unconstrained, while all the coefficients associated with subsets that
are in Bi but not in Bi−1 are set to a common value, which depends on the free
variables βiX , X ⊆ Bi−1 .
Step i = 1. We get:
co(B1 )
β1B1 = ,
mb (B1 )
which is such that 0 ≤ β1B1 ≤ 1 as co(B1 ) ≤ mb (B1 ), and trivially satisfies the
first equation of system (12.55): co(B1 ) = β1B1 mb (B1 ).
Step i. Suppose there exists a solution (12.57) for {B ⊆ Bj , j = 1, ..., i − 1}.
We first have to show that all solutions of the form (12.57) for i solve the i-th
equation of system (12.55).
When we replace (12.57) into the i-th equation of (12.55) we get (as the vari-
ables βiB in (12.57) do not depend on B for all B ⊆ Bi , B 6⊂ Bi−1 ):
 X 
co(Bi ) − βiX mb (X)
X  X⊆Bi−1
 X
co(Bi ) = βiB mb (B) +  mb (B),
 
X 
B⊆Bi−1
 mb (X) 
B ⊆ Bi ,
X⊆Bi ,X6⊂Bi−1
B 6⊂ Bi−1

i.e., co(Bi ) = co(Bi ) and the equation is met.


We also need to show, though, that there exist solutions of the above form
(12.57) that meet the ordering constraint (12.56), i.e.,
0 ≤ βiB ≤ 1, B
B ⊆ Bi , B 6⊂ Bi−1 ; βi−1 ≤ βiB ≤ 1, B ⊆ Bi−1 . (12.58)
The constraints (12.58) generate constraints on the free variables in (12.57), i.e.,
{βiB , B ⊆ Bi−1 }. Given the shape of (12.57) those conditions (in the same order as
in (12.58)) assume the form:
426 12 Consonant approximation
 X

 βiB mb (B) ≤ co(Bi )

 B⊆B
 Xi−1 B

 X
βi mb (B) ≥ co(Bi ) − mb (B) (12.59)

 B⊆B i−1 B⊆B i ,B6 ⊂ B i−1
B
βiB ≥ βi−1 ∀B ⊆ Bi−1




 B
βi ≤ 1 ∀B ⊆ Bi−1 .

Let us call 1.,2.,3.,4. the above constraints on the free variables {βiB , B ⊆ Bi−1 }.
– 1. and 2. are trivially compatible;
– 1. is compatible with 3. as replacing βiB = βi−1
B
into 1. yields (due to the i − 1-th
equation of the system):
X X
βiB mb (B) = B
βi−1 mb (B) = co(Bi−1 ) ≤ co(Bi );
B⊆Bi−1 B⊆Bi−1

– 4. is compatible with 2. as replacing βiB = 1 into 2. yields:


X X X
βiB mb (B) = mb (B) ≥ co(Bi ) − mb (B)
B⊆Bi−1 B⊆Bi−1 B⊆Bi ,B6⊂Bi−1

which is equivalent to:


X
mb (B) = b(Bi ) ≤ co(Bi ),
B⊆Bi

which is in turn true by hypothesis;


– 4. and 1. are clearly compatible, as we just need to choose βiB small enough;
– 2. and 3. are compatible, as we just need to choose βiB large enough.
In conclusion, all the constraints in Equation (12.59) are mutually compatible.
Hence there exists an admissible solution to the i-th equation of system (12.55),
which proves the induction step.

Proof of Theorem 72

We need to prove that:


1. each co.b.f. co ∈ COC such that co(A) ≤ b(A) for all A ⊆ Θ can be written as
a convex combination of the points (12.6):
X X
co = αB oB [b], αB = 1, αB ≥ 0 ∀B;
B B

2. vice-versa, each convex combination of the oB [b] satisfies αB oB [b](A) ≤


P
B
b(A) for all A ⊆ Θ.
12.5 Consonant approximation in the belief space 427

Let us consider 2. first. By definition of belief function


X
oB [b](A) = moB [b] (B)
B⊆A,B∈C
P
where moB [b] (B) = X⊆B:B(X)=B mb (X). Therefore:
X X X
oB [b](A) = mb (X) = mb (X),
B⊆A,B∈C X⊆B:B(X)=B X⊆Bi :B(X)=Bj ,j≤i
(12.60)
where Bi is the largest element of the chainP C included in A. Since Bi ⊆ A the
quantity (12.60) is obviously no greater than B⊆A mb (B) = b(A). Hence:
X X X
αB oB [b](A) ≤ αB b(A) = b(A) αB = b(A) ∀A ⊆ Θ.
B B B

Let us now prove point 1. According to Lemma 15, if ∀A ⊆ Θ co(A) ≤ b(A)


then the mass mco (Bi ) of each event Bi of the chain is:
X
A
mco (Bi ) = mb (A)αB i
. (12.61)
A⊆Bi

We then need to write (12.61) as a convex combination of the moB [b] (Bi ), i.e.:
X X X X X
αB oB [b](Bi ) = αB mb (X) = mb (X) αB .
B B X⊆Bi :B(X)=Bi X⊆Bi B(X)=Bi

In other words we need to show that the system of equations


 X
A
αB i
= αB ∀i = 1, ..., n; ∀A ⊆ Bi (12.62)
B(A)=Bi
P
has at least one solution {αB } such that B αB = 1 and ∀B αB ≥ 0. The normal-
ization constraint is in fact trivially satisfied as from (12.62) it follows that
X X X X
A
αB i
= 1 = αB = αB .
Bi ⊇A Bi ⊇A B(A)=Bi B

Using the normalization constraint the system of equations (12.62) reduces to:
 X
A
αB i
= αB ∀i = 1, ..., n − 1; ∀A ⊆ Bi . (12.63)
B(A)=Bi

We can show that each equation in the reduced system (12.63) involves at least one
variable αB which is not present in any other equation.
Formally, the set of assignment functions which meet the constraint of equation
A, Bi but not all others is not empty:
428 12 Consonant approximation
 ^ ^ 
B : (B(A) = Bi ) (B(A) 6= Bj ) (B(A0 ) 6= Bj ) 6= ∅.
∀j=1,...,n−1 ∀A0 6=A
j6=i ∀j=1,...,n−1
(12.64)
But the assignment functions B such that B(A) = Bi and ∀A0 6= A B(A0 ) = Θ all
meet condition (12.64). Indeed they obviously meet B(A) 6= Bj for all j 6= i while
clearly for all A0 ⊆ Θ B(A0 ) = Θ 6= Bj , as j < n so that Bj 6= Θ.
A non-negative solution of (12.63) (and hence of (12.62)) can be obtained by
setting for each equation one of the variables αB to be equal to the left hand side
A
αB i
, and all the others to zero.

Proof of Theorem 73

The proof is divided in two parts.


1. We first need to find an assignment B : 2Θ → Cρ which generates coρ .
Each singleton xi is mapped by ρ to the position j: i = ρ(j). Then, given any event
A = {xi1 , ..., xim } its elements are mapped to the new positions xji1 , ..., xjim ,
where i1 = ρ(ji1 ), ..., im = ρ(jim ). But then the map
.
Bρ (A) = Bρ ({xi1 , ..., xim }) = Sjρ = {xρ(1) , ..., xρ(j) }

where
.
j = max{ji1 , ..., jim }
maps each event A to the smallest Siρ in the chain which contains A: j = min{i :
A ⊆ Siρ }. Therefore it generates a co.b.f. with b.p.a. (12.2), i.e. coρ .
2. In order for coρ to be an actual vertex, we need to ensure that it cannot be
written as a convex combination of the other (pseudo) vertices oB [b]:
X X
coρ = αB oB [b], αB = 1, ∀B 6= Bρ αB ≥ 0.
B6=Bρ B6=Bρ
P
As moB (Bi ) = A:B(A)=Bi mb (A) the above condition reads as:
  
X X X
mb (A)  αB  = mb (A) ∀Bi ∈ C.

A⊆Bi B:B(A)=Bi A⊆Bi :Bρ (A)=Bi

Remembering that Bρ (A) = Bi iff A ⊆ Bi , 6⊂ Bi−1 we get:


  
X X X
mb (A)  αB  = mb (A) ∀Bi ∈ C.

A⊆Bi B:B(A)=Bi A⊆Bi ,6⊂Bi−1

P 
For i = 1 the condition is mb (B1 ) B:B(B1 )=B1 B = mb (B1 ), namely:
α
12.5 Consonant approximation in the belief space 429
X X
αB = 1, αB = 0.
B:B(B1 )=B1 B:B(B1 )6=B1

Replacing the above equalities into the second constraint i = 2 yields:


   
 X   X 
mb (B2 \ B1 ) 
 αB 
 + mb (B2 ) 
 αB 
 = 0,
B:B(B1 )=B1 B:B(B1 )=B1
B(B2 \B1 )6=B2 B(B2 )6=B2

which implies αB = 0 for all the assignment functions B such that B(B2 \ B1 ) 6=
B2 or B(B2 ) 6= B2 . The only non-zero coefficients can then be the αB such that
B(B1 ) = B1 , B(B2 \ B1 ) = B2 , B(B2 ) = B2 .
By induction we get that αB = 0 for all B 6= Bρ .

Proof of Theorem 74

Let us denote as usual by {B1 , ..., Bn } the elements of the maximal chain C. By
definition the masses coρ assigns to the elements of the chain are:
X
mcoρ (Bi ) = mb (B),
B⊆Bi ,B6⊂Bi−1

so that the belief value of coρ on an arbitrary event A ⊆ Θ can be written as:
X X X
coρ (A) = mcoρ (Bi ) = mb (B)
Bi ⊆A,Bi ∈C Bi ⊆A B⊆Bi ,B6⊂Bi−1
X
= mb (B) = b(BiA ),
B⊆BiA

where BiA is the largest element of the chain included in A. But then, as the ele-
ments B1 ⊂ · · · ⊂ Bn of the chain are nested and any belief function b is monotone:

coρ (A) = b(BiA ) = max b(Bi ),


Bi ∈C,Bi ⊆A

i.e., coρ is indeed (12.9).

Proof of Theorem 75

RN −1 representation The L1 norm of the difference vector (12.20) is:


X X
kmb − mco kL1 = |mb (A) − mco (A)| + mb (B)
A∈C
X X B6∈C
= |β(A)| + mb (B),
A∈C B6∈C
430 12 Consonant approximation
.
as a function of the variables {β(A) = mb (A) − mco (A), A ∈ C, A 6= Θ}. Since
X X  X X
β(A) = mb (A) − mco (A) = mb (A) − 1 = − mb (B),
A∈C A∈C A∈C B6∈C

X X
we have that: β(Θ) = − mb (B) − β(A). Therefore, the above norm
B6∈C A∈C,A6=Θ
reads as: kmb − mco kL1 =

X X X X
= − mb (B) − β(A) + |β(A)| + mb (B). (12.65)
B6∈C A∈C,A6=Θ A∈C,A6=Θ B6∈C

The norm (12.65) is a function of the form:


X X
|xi | + − xi − k , k ≥ 0, (12.66)

i i
P
which has an entire simplex of minima, namely: xi ≤ 0 ∀i, i xi ≥ −k. See Figure
12.11 for the case of two variables, x1 and x2 (i.e., the case of a maximal chain of
just three elements, |Θ| = n = 3). The minima of the L1 norm (12.65) are therefore

Fig. 12.11. The minima of a function of the form (12.66) with two variables x1 , x2 form the
triangle x1 ≤ 0, x2 ≤ 0, x1 + x2 ≥ −k.

the solutions to the following system of constraints:



 β(A)X≤ 0 X ∀A ∈ C, A 6= Θ,
β(A) ≥ − mb (B). (12.67)

A∈C,A6=Θ B6∈C

This reads, in terms of the mass assignment mco of the desired consonant approxi-
mation, as:
12.5 Consonant approximation in the belief space 431

 mco
X (A) ≥ mb (A) X ∀A ∈ C, A 6= Θ,

m b (A) − mco (A) ≥ − m b (B). (12.68)

A∈C,A6=Θ B6∈C

Note that the last constraint reduces to:


X X
mb (A) − 1 + mco (Θ) ≥ mb (A) − 1,
A∈C,A6=Θ A∈C

i.e., mco (Θ) ≥ mb (Θ). Therefore the partial L1 approximations in M are those
consonant b.f.s co s.t. mco (A) ≥ mb (A) ∀A ∈ C. The vertices of the set of par-
tial approximations (12.67) (see Figure 12.11) are given by the vectors of variables
{βĀ , Ā ∈ C} such that: βĀ (Ā) = mb (B), for βĀ (A) = 0 for A 6= Ā whenever
Ā 6= Θ, while βΘ = 0. Immediately, in terms of masses the vertices of the set of
partial L1 approximations have b.p.a. (12.24) and barycenter (12.25).
To find the global L1 consonant approximation(s) over the whole consonant
complex, we need to locate the component COCM at minimal L1 distance from mb .
C
P partial approximations (12.68) onto COM have L1 distance from mb equal
All the
to 2 B6∈C mb (B). Therefore, the minimal distance component(s) of the complex
are those whose maximal chains originally have maximal mass with respect to mb .

RN −2 representation Consider now the difference vector (12.21). Its L1 norm is:
X X
kmb − mco kL1 = |mb (A) − mco (A)| + mb (B),
A∈C,A6=Ā B6∈C

which is obviously minimized by mb (A) = mco (A) ∀A ∈ C, A 6= Ā, i.e., (12.24).


C
P(unique) partial approximation onto COM has L1 distance from mb given
Such
by B6∈C mb (B). Therefore, the minimal distance component(s) of the consonant
complex are once again those associated with the maximal chains:
X X
arg min mb (B) = arg max mb (A).
C C
B6∈C A∈C

Proof of Theorem 76

As the generators of COCM are the vectors in M: {mA − mΘ , A ∈ C, A 6= Θ} we


need to impose:
hmb − mco , mA − mΘ i = 0
for all A ∈ C, A 6= Θ.

RN −1 representation In the complete mass space the vector mA − mΘ is such


that: mA − mΘ (B) = 1 if B = A, -1 if B = Θ, 0 if B 6= A, Θ.
Hence, the orthogonality condition becomes β(A) − β(Θ) = 0 for all A ∈ C, A 6=
Θ, where again β(A) = mb (A) − mco (A). Since:
432 12 Consonant approximation
X X
β(Θ) = − mb (B) − β(A)
B6∈C A∈C,A6=Θ

(see the proof of Theorem 75), the orthogonality condition becomes:


X X
2β(A) + mb (B) + β(B) = 0
B6∈C B∈C,B6=A,Θ

for all A ∈ C, A 6= Θ. Its solution is clearly:


P
− B6∈C mb (B)
β(A) = ∀A ∈ C, A 6= Θ,
n
as by substitution:
2 X X n−2 X
− mb (B) + mb (B) − mb (B) = 0
n n
B6∈C B6∈C B6∈C

we obtain (12.25).
To find the global L2 approximation(s), we need to compute the L2 distance of
mb from the closest such partial solution. We have:
X
kmb − mco k2L2 = (mb (A) − mco (A))2
A⊆Θ P 2
X B6∈C mb (B) X
= + (mb (B))2
n
A∈C B6∈C
P 2
B6∈C mb (B)
X
= + (mb (B))2 ,
n
B6∈C

which is minimized by the component COCM that minimizes 2


P
B6∈C (mb (B)) .

RN −2 representation In the case of a section of the mass space with missing


component Ā ∈ C, as there mĀ = 0, the orthogonality condition reads as:

hmb − mco , mA i = β(A) = 0 ∀A ∈ C, A 6= Θ,

i.e., β(A) = 0 ∀A ∈ C, A 6= Ā and we get (12.24) once again.


The optimal distance is, in this case:
X X
kmb − mco k2L2 = (β(A))2 = (mb (B))2 + β(Ā)
A⊆Θ B6∈C
X  X 2
2
= (mb (B)) + mb (Ā) − mb (Ā) − mb (B)
B6∈C B6∈C
X X 2
2
= (mb (B)) + mb (B) ,
B6∈C B6∈C

2
P
which is once again minimized by the maximal chain(s) arg minC B6∈C (mb (B)) .
12.5 Consonant approximation in the belief space 433

Proof of Theorem 77

RN −1 representation In the complete mass space, the L∞ norm of the difference


vector is: n o
kmb − mco kL∞ = max max |β(A)|, max mb (B) .
A∈C B6∈C
X X
As β(Θ) = mb (B) − 1 − β(B), we have that
B∈C B∈C,B6=Θ

X X
|β(Θ)| = mb (B) + β(B)


B6∈C B∈C,B6=Θ

and the norm to minimize becomes: kmb − mco kL∞ =


( )
X X
= max max |β(A)|, mb (B) + β(B) , max mb (B) .

A∈C,A6=Θ B6∈C
B6∈C B∈C,B6=Θ
(12.69)
This is a function of the form
n o
max |x1 |, |x2 |, |x1 + x2 + k1 |, k2 , 0 ≤ k2 ≤ k1 ≤ 1. (12.70)

Such a function has two possible behaviors in terms of its minimal points in the
plane x1 , x2 .
Case 1. If k1 ≤ 3k2 its contour function has the form rendered in Figure 12.12-
left. The set of minimal points is given by xi ≥ −k2 , x1 + x2 ≤ k2 − k1 . In
the generalP case of an arbitrary number m − 1 of variables x1 , ..., xm−1 such that
xi ≥ −k2 , i xi ≤ k2 − k1 , the set of minimal points is a simplex with m vertices:
each vertex v i is such that

v i (j) = −k2 ∀j 6= i; v i (i) = −k1 + (m − 1)k2

(obviously v m = [−k2 , · · · , −k2 ]).


Concerning (12.69), in the first case (maxB6∈C mb (B) ≥ n1 B6∈C mb (B)) the set
P
of partial L∞ approximations is given by the following system of inequalities:

 β(A) ≥ − max

B6∈C
mb (B) A ∈ C, A 6= Θ,
X X
β(B) ≤ max mb (B) − mb (B).
B6∈C


B∈C,B6=Θ B6∈C

This determines a simplex of solutions Cl(mĀ


L∞ [mb ], Ā ∈ C) with vertices:

 βĀ (A) = − max

B6∈C
mb (B) A ∈ C, A 6= Ā,

mL∞ [mb ] : β (Ā) = −
X
mb (B) + (n − 1) max mb (B),
 Ā
 B6∈C
B6∈C
434 12 Consonant approximation

or, in terms of their b.p.a.s, (12.26). Its barycenter has mass assignment:
X X X
mĀ
L∞ [mb ](A) n · mb (A) + mb (B) mb (B)
Ā∈C B6∈C B6∈C
= = mb (A) + ,
n n n
for all A ∈ C, i.e., the L2 partial approximation (12.25). The corresponding minimal
L∞ norm of the difference vector is, according to (12.69), equal to maxB6∈C mb (B).

Fig. 12.12. Left: contour function (level sets) and minimal points (white triangle) of a func-
tion of the form (12.70), when k1 ≤ 3k2 . In the example k2 = 0.4 and k1 = 0.5. Right:
contour function and minimal point of a function of the form (12.70), when k1 > 3k2 . In this
example k2 = 0.1 and k1 = 0.5.

Case 2. In the second case k1 > 3k2 , i.e., for the norm (12.69),
1 X
max mb (B) < mb (B),
B6∈C n
B6∈C

the contour function of (12.70) is as in Figure 12.12-right. There is a single minimal


point, located in [−1/3k1 , −1/3k1 ].
For an arbitrary number m − 1 of variables the minimal point is:

[(−1/m)k1 , · · · , (−1/m)k1 ]0 ,

i.e., for system (12.69):


1 X
β(A) = − mb (B) ∀A ∈ C, A 6= Θ.
n
B6∈C

In terms of basic probability assignments, this yields (12.25) (the mass of Θ is ob-
tained by normalization). The corresponding minimal L∞ norm of the difference
vector is n1 B6∈C mb (B).
P
12.5 Consonant approximation in the belief space 435

RN −2 representation In the section of the mass space with missing component


Ā ∈ C, the L∞ norm of the difference vector (12.21) is: kmb − mco kL∞ =
n o
= max |mb (A) − mco (A)| = max max |β(A)|, max mb (B) ,
∅(A(Θ A∈C,A6=Ā B6∈C
(12.71)
which is minimized by:

|β(A)| ≤ max mb (B) ∀A ∈ C, A 6= Ā, (12.72)


B6∈C

i.e., in the mass coordinates mco , (12.29). According to (12.71) the corresponding
minimal L∞ norm is: maxB6∈C mb (B). Clearly, the vertices of the set (12.72) are
all the vectors of β variables such that β(A) = +/ − maxB6∈C mb (B) for all A ∈ C,
A 6= Ā. Its barycenter is given by β(A) = 0 for all A ∈ C, A 6= Ā, i.e., (12.24).

Proof of Theorem 78
C
By (12.26) the vertex mĀ
L∞ [mb ] of CO M,L∞ [mb ] meets the constraints (12.29) for
COM\Ā,L∞ [mb ]. As for the other vertices of COCM,L∞ [mb ] (12.26), let us check
C

the conditions on
. X
∆= mb (B) − n max mb (B)
B6∈C
B6∈C

under which mĀ


L∞ [mb ] meets (12.29). If ∆ is positive:

X 1
n max mb (B) < mb (B) ≡ max mb (B) < ,
B6∈C B6∈C n
B6∈C

which cannot happen by constraint (12.27). Therefore, ∆ is non-positive. In order


for the vertex not to belong to (12.29) we need mb (Ā) + maxB6∈C mb (B) + ∆ <
mb (Ā) − maxB6∈C mb (B), i.e.:
1 X
max mb (B) > mb (B), (12.73)
B6∈C n−2
B6∈C

which cannot be ruled out under condition (12.27).

Proof of Lemma 16

In the belief space the original belief function b and the desired consonant approxi-
mation co are written as:
X X  X 
b= b(A)xA , co = mco (B) xA .
∅(A(Θ A⊇A1 B⊆A,B∈C

Their difference vector is therefore: b − co =


436 12 Consonant approximation
X X X 
= b(A)xA + xA b(A) − mco (B)
A6⊃A1 A⊇A1  X B⊆A,B∈C 
X X X
= b(A)xA + vA mb (B) − mco (B)
A6⊃A1 A⊇A1  ∅(B⊆A B⊆A,B∈C 
X X X X
= b(A)xA + (mb (B) − mco (B)) +
xA mb (B)
A6⊃A1  B⊆A,B∈C
A⊇A1  B⊆A,B6∈C
X X X
= b(A)xA + xA γ(A) + mb (B)
A6⊃A1 A⊇A1 B⊆A,B6∈C
X X  i
X 
= b(A)xA + xA γ(A) + b(A) − mb (Aj ) ,
A6⊃A1 A⊇A1 j=1
(12.74)
after introducing the auxiliary variables
X
γ(A) = (mb (B) − mco (B)).
B⊆A,B∈C

All the terms in (12.74) associated with subsets A ⊇ Ai , A 6⊃ Ai+1 depend on the
same auxiliary variable γ(Ai ), while the difference in the component xΘ is trivially
1 − 1 = 0. Therefore, we obtain (12.40).

Proof of Theorem 80

To understand the relationship between the sets COCM,L∞ [mb ] and OC C [b], let us
rewrite the system of constraints for L∞ approximations in M under condition
(12.27) as:

m (A) − mb (A) ≤ max mb (B) A ∈ C, A 6= Θ,



 co
 B6∈C X 
X 
 mco (A) − mb (A) ≥ mb (B) − max mb (B) .
 B6∈C
A∈C,A6=Θ B6∈C
(12.75)
Indeed, when (12.27) does not hold, coCM,L∞ [mb ] = coCM,L2 [mb ] which is in gen-
eral outside OC C [b].
To be a pseudo vertex of the set of partial outer approximations, a co.b.f. co
must be the result of re-assigning the mass of each focal element to an element of
the chain which contains it. Imagine that all the focal elements not in the desired
chain C have the same mass: mb (B) = const for all B 6∈ C. Then, only up to
n − 1 of them can be reassigned to elements of the chain different from Θ. Indeed,
if you reassigned n outside focal elements to such elements of the chain, in absence
of mass redistribution internal to the chain, some A ∈ C would surely violate the
first constraint in (12.75), as it would receive mass from at least two outside f.e.s,
yielding:
mco (A) − mb (A) ≥ 2 max mb (B) > max mb (B).
B6∈C B6∈C
12.5 Consonant approximation in the belief space 437

Indeed, this is true even if mass redistribution does take place within the chain.
Suppose that some mass mb (A), A ∈ C is reassigned to some other A0 ∈ C. By
the first constraint in (12.75), this is allowed only if mb (A) ≤ maxB6∈C mb (B).
Therefore the mass of just one outside focal element can still be reassigned to A,
while now none can be reassigned to A0 . In both cases, since the number of elements
outside the chain m = 2n − 1 − n is greater than n (unless n ≤ 2) the second
equation of (12.75) implies:

(n − 1) max mb (B) ≥ (m − 1) max mb (B)


B6∈C B6∈C

which cannot hold under (12.27).

Proof of Theorem 81

After recalling the expression (12.40) of the difference vector b − co in the belief
space, the latter’s L1 norm reads as:
n−1
X X
i
X

X

kb − cokL1 = γ(Ai ) + b(A) −
mb (Aj ) + |b(A)|.
i=1 A⊇Ai ,A6⊃Ai+1 j=1 A6⊃A1
(12.76)
The norm (12.76) can be decomposed into a number of summations which depend
on a single auxiliary variable γ(Ai ). Such components are of the form |x + x1 | +
... + |x + xn |, with an even number of ‘nodes’ −xi .
Let us consider the simple function of Figure 12.13-left: it is easy to see that
similar functions are minimized by the interval of values comprised between their
two innermost nodes, i.e., in the case of norm (12.76):
i
X i
X
i i
mb (Aj ) − γint1 ≤ γ(Ai ) ≤ mb (Aj ) − γint2 ∀i = 1, ..., n − 1. (12.77)
j=1 j=1

This is equivalent to:


i
X
i i
γint1 ≤ mco (Aj ) ≤ γint2 ∀i = 1, ..., n − 2 (12.78)
j=1

n−1 n−1
while mco (An−1 ) = b(An−1 ), as by definition (12.42) γint1 = γint2 = b(An−1 ).
This is a set of constraints of the form l1 ≤ x ≤ u1 , l2 ≤ x + y ≤ u2 ,
l3 ≤ x + y + z ≤ u3 , also expressed as l1 ≤ x ≤ u1 , l2 − x ≤ y ≤ u2 − x,
l3 − (x + y) ≤ z ≤ u3 − (x + y). This is a polytope whose 2n−2 vertices are
obtained by assigning to x, x + y, x + y + z and so on either their lower or their
upper bound. For the specific set (12.78) this yields exactly (12.41).
438 12 Consonant approximation

Fig. 12.13. Left: minimising the L1 distance from the consonant subspace involves functions
such as the one depicted above, |x + 1| + |x + 3| + |x + 7| + |x + 8|, which is minimised
by 3 ≤ x ≤ 7. Right: minimising the L∞ distance from the consonant subspace involves
functions of the form max{|x + x1 |, ..., |x + xn |} (in bold).

Proof of Theorem 82
The minimal value of a function of the form |x + x1 | + ... + |x + xn | is:
X X
xi − xi .
i≥int2 i≤int1

In the case of the L1 norm (12.76), such minimal attained value is:
X X
b(A) − b(A),
A:A⊇Ai ,A6⊃Ai+1 ,b(A)≥γint2 A:A⊇Ai ,A6⊃Ai+1 ,b(A)≤γint1
Pi
since in the difference the addenda j=1 mb (Aj ) disappear.
Overall the minimal L1 norm is:
n−2
!
X X X X
b(A) − b(A) + b(A)
i=1 A:A⊇Ai ,A6⊃Ai+1 , A:A⊇Ai ,A6⊃Ai+1 , A6⊃A1
b(A)≥γint2 b(A)≤γint1
X n−2
X X
= b(A) − 2 b(A),
∅(A(Θ,A6=An−1 i=1 A:A⊇Ai ,A6⊃Ai+1 ,
b(A)≤γint1

which is minimised by the chains (12.44).

Proof of Theorem 83
By replacing the hypothesized solution (12.47) for the L2 approximation in B into
the system of constraints (12.46) we get, for all j = 1, ..., n − 1:
X

 mb (A)hbA , bAj i −ave(Ln−1 )hbAn−1 , bAn−1 i+

 A(Θ
n−2
X  
− ave(Li ) hbAi , bAj i − hbAi+1 , bAj i = 0,




i=1
12.5 Consonant approximation in the belief space 439

where hbAn−1 , bAn−1 i = 1 for all j, while (since hbA , bB i = |{C ( Θ : C ⊇


c
A, B}| = 2|(A∪B) | − 1):

hbAi , bAj i − hbAi+1 , bAj i = hbAj , bAj i − hbAj , bAj i = 0

whenever i < j, and


 
hbAi , bAj i − hbAi+1 , bAj i = |{A ⊇ Ai , Aj }| − 1 − |{A ⊇ Ai+1 , Aj }| − 1
c
= |{A ⊇ Ai }| − |{A ⊇ Ai+1 , }| = 2|Ai+1 |
c
whenever i ≥ j. The system of constraints becomes (as 2An = 2|∅| = 1):

X n−1
X c
mb (A)hbA , bAj i − ave(Li )2|Ai+1 | = 0 j = 1, ..., n − 1,

A(Θ i=j

which, given the expression (12.48) for ave(Li ), reads as:



X n−1
X X
mb (A)hbA , bAj i − b(A) = 0 j = 1, ..., n − 1.(12.79)

A(Θ i=j A⊇Ai ,A6⊃Ai+1

Let us study the second addenda of each equation above. We get:


n−1
X X X
b(A) = b(A),
i=j A⊇Ai ,A6⊃{xi+1 } Aj ⊆A(Θ

as any A ⊇ Aj , A 6= Θ is such that A ⊇ Ai and A 6⊃ Ai+1 for some Ai in


the desired maximal chain which contains Aj . Indeed, let us define xi+1 as the
lowest index element (according to the ordering associated with the desired focal
.
chain A1 ⊂ · · · ⊂ An , i.e., xj = Aj \ Aj−1 ) among those singletons in Ac . By
construction: A ⊇ Ai and A 6⊃ {xi+1 }.
Finally:
X X X X
b(A) = mb (C) = mb (C) {A : C ⊆ A ( Θ, A ⊇ Aj }
Aj ⊆A(Θ Aj ⊆A(Θ C⊆A C(Θ

where

{A : C ⊆ A ( Θ, A ⊇ Aj } = {A : A ⊇ (C ∪ Aj ), A 6= Θ}
c
= 2|(C∪Aj ) | − 1 = hbC , bAj i.

Therefore, summarizing:
n−1
X X X
b(A) = mb (C)hbC , bAj i.
i=j A⊇Ai ,A6⊃{xi+1 } C(Θ

By replacing the latter into (12.79) we obtain the trivial identity 0 = 0.


440 12 Consonant approximation

Proof of Theorem 84

Given the expression (12.40) for the difference vector of interest in the belief space,
we can compute the explicit form of its L∞ norm as: kb − cok∞ =

Xi 
X
= max max max γ(Ai ) + b(A) − mb (Aj ) , max
mb (B)
i A⊇Ai ,A6⊃Ai+1 A6⊃A1
j=1 B⊆A
 Xi 
mb (Aj ) , b(Ac1 ) ,

= max max max γ(Ai ) + b(A) −
i A⊇Ai ,A6⊃Ai+1

j=1
P (12.80)
c
as maxA6⊃A1 B⊆A mb (B) = b(A1 ). Now, (12.80) can be minimized separately

for each i = 1, ..., n − 1. Clearly, the minimum is attained when the variable ele-
ments in (12.80) are not greater than the constant element b(Ac1 ):
i
X
c

max γ(Ai ) + b(A) − m b (A j ≤ b(A1 ).
) (12.81)
A⊇Ai ,A6⊃Ai+1
j=1

The left hand side of (12.81) is a function of the form max |x + x1 |, ..., |x +
xn | (see Figure 12.13-right). Such functions are minimized by x = − xmin +x2
max

(see Figure 12.13-right again). In the case of (12.81), such minimum and maximum
offset values are, respectively,
i
X
i
γmin = b(Ai ) − mb (Aj ),
j=1
i
X i
X
i
γmax = b({xi+1 }c ) − mb (Aj ) = b(Ai + Aci+1 ) − mb (Aj ),
j=1 j=1

once defined {xi+1 } = Ai+1 \ Ai . As for each value of γ, |γ(Ai ) + γ| is dominated


i i
by either |γ(Ai ) + γmin | or |γ(Ai ) + γmax |, the norm of the difference vector is
minimized by the values of γ(Ai ) such that:
n o
i i
max |γ(Ai ) + γmin |, |γ(Ai ) + γmax | ≤ b(Ac1 ) ∀i = 1, ..., n − 1,

i.e.:
i i
γmin + γmax γ i + γmax
i
− − b(Ac1 ) ≤ γ(Ai ) ≤ − min + b(Ac1 ) ∀i = 1, ..., n − 1.
2 2
In terms of mass assignments, this is equivalent to:
i
b(Ai ) + b({xi+1 }c ) X b(Ai ) + b({xi+1 }c )
−b(Ac1 ) + ≤ mco (Ai ) ≤ b(Ac1 ) + .
2 j=1
2
(12.82)
12.5 Consonant approximation in the belief space 441

Once again this is a set of constraints of the form l1 ≤ x ≤ u1 , l2 ≤ x + y ≤ u2 ,


l3 ≤ x + y + z ≤ u3 , also expressed as l1 ≤ x ≤ u1 , l2 − x ≤ y ≤ u2 − x,
l3 − (x + y) ≤ z ≤ u3 − (x + y), which is a polytope with vertices obtained by
assigning to x, x + y, x + y + z etcetera either their lower or their upper bound. This
generates 2n−1 possible combinations, which for the specific set (12.82) yields (see
the proof of Theorem 81) Equation (12.49).
As for the barycenter of (12.49), we have that:
b(A1 )+b({x2 }c )
mco (A1 ) = 2
b(Ai )+b({xi+1 }c ) c
i} )
mco (Ai ) = 2 − b(Ai−1 )+b({x
2
= b(Ai )−b(Ai−1 )
+ plb ({xi })−plb ({xi+1 }) ,
P2n−1 h b(Ai )+b({xi+1 }2c ) b(Ai−1 )+b({xi }c ) i b(A1 )+b({x2 }c )
mco (An ) = 1 − i=2 2 − 2 − 2
= 1 − b(An−1 ).

Proof of Theorem 86

To study the admissibility of the L2 approximation and the barycenter of the L1


approximations we need to go deeper in our understanding of the structure of Li .
The latter can be expressed as

Li = {b(Ai + B), ∅ ⊆ B ⊆ Aci+1 },

whereas Li−1 contains 2 · |Li | elements and can then be written as the union of two
lists of |Li | elements:

Li−1 = L1i−1 ∪ L2i−1


= {b(Ai−1 + B), ∅ ⊆ B ⊆ Aci+1 } ∪ {b(Ai−1 + xi+1 + B), ∅ ⊆ B ⊆ Aci+1 }.

Now, b(Ai + B) ≥ b(Ai−1 + B) for all ∅ ⊆ B ⊆ Aci+1 , so that each element in Li


dominates the corresponding element of L1i−1 no matter the chain C. On the other
hand, b(Ai + B) ≥ b(Ai−1 + xi+1 + B) for all ∅ ⊆ B ⊆ Aci+1 iff

plAi+2 (xi ) ≥ plAi+2 (xi+1 ), (12.83)

where plA (x) =


P
B⊆A,B⊃x mb (B). If the latter condition holds:

b(Ai−1 + B) + b(Ai−1 + xi+1 + B)


≤ b(Ai + B) ∀∅ ⊆ B ⊆ Aci+1 ,
2
so that:
P P b(Ai−1 +B)+b(Ai−1 +xi+1 +B)
∅⊆B⊆Aci+1 b(Ai + B) ∅⊆B⊆Aci+1 2
ave(Li ) = ≥
|Li | |Li |

= ave(Li−1 ).
As for the barycenter L1 approximation, obviously int1 (Li ) ≥ b(Ai + B) for all
442 12 Consonant approximation

B s.t. b(Ai + B) ≤ int1 (Li ) so that int1 (Li ) ≥ b(Ai + xi+1 + B) for all B
s.t. b(Ai + B) ≤ int1 (Li ) as well. Hence int1 (Li ) ≥ int1 (Li−1 ) since int1 (Li )
dominates at least half the elements of Li−1 .
In the same way, int2 (Li ) ≥ b(Ai + B) for all B s.t. b(Ai + B) ≤ int2 (Li ) so that
int2 (Li ) ≥ b(Ai + xi+1 + B) for the same Bs, hence int2 (Li ) ≥ int2 (Li−1 ) since
int2 (Li ) dominates at least |Li−1 |/2 + 2 elements of Li−1 .
Summarising, if C = {A1 ⊂ · · · ⊂ An }, with Ai = {x1 , ..., xi } is such that
(12.83) holds for all i, then:

int1 (Li ) + int2 (Li ) int1 (Li−1 ) + int2 (Li−1 )


ave(Li ) ≥ ave(Li−1 ), ≥
2 2
for all i, and both coCB,L1 [b] and coCB,L2 [b] are admissible.
13
Consistent approximation

As we know belief functions are complex objects, in which different and sometimes
contradictory bodies of evidence may coexist, as they mathematically describe the
fusion of possibly conflicting expert opinions and/or imprecise/ corrupted measure-
ments, etcetera. As a consequence, making decisions based on such objects can be
misleading. As we discussed in the second part of Chapter 9, this is a well known
problem in classical logics, where the application of inference rules to inconsis-
tent knowledge bases (sets of propositions) may lead to incompatible conclusions
[1012]. We have also seen that belief functions can be interpreted as generalisations
of knowledge bases in which a belief value, rather than a truth one, is attributed to
each formula (interpreted as the set of worlds for which that formula is true).
We have also identified consistent belief functions, belief functions whose focal ele-
ments have non-empty intersection, as the natural counterparts of consistent knowl-
edge bases in belief theory.
Analogously to consistent knowledge bases, consistent belief functions are char-
acterized by null internal conflict. It may be therefore be desirable to transform a
generic belief function to a consistent one prior to making a decision, or picking a
course of action. This is all the more valuable as several important operators used to
update or elicit evidence represented as belief measures, like Dempster’s sum [343]
and disjunctive combination [1225] (cfr. Section 4.2), do not preserve consistency.
We have seen in this Book how the transformation problem is spelled out in the prob-
abilistic [311, 267] (Chapters 10 and 11) and possibilistic [?] (Chapter 12) case. As
we argued for probability transforms, consistent transformations can be defined as
the solutions to a minimization problem of the form:

cs[b] = arg min dist(b, cs) (13.1)


cs∈CS

443
444 13 Consistent approximation

where b is the original belief function, dist an appropriate distance measure between
belief functions, and CS denotes the collection of all consistent b.f.s. We call (13.1)
the consistent transformation problem. Once again, by plugging in different distance
functions in (13.1) we get different consistent transformations. We refer to Section
4.8.2 for a review of dissimilarity measures for belief functions.
As we did for consonant belief functions in Chapter 12, in this Chapter we focus
on what happens when applying the classical Lp norms to the consistent approxi-
mation problem. Indeed the L∞ norm, in particular, is closely related to consistent
belief functions, as the region of consistent b.f.s can be expressed as
n o
CS = b : max plb (x) = 1 ,
x∈Θ

i.e., the set of b.f.s for which the L∞ norm of the ‘contour function’ plb (x) is
equal to 1. In addition, consistent belief functions relate to possibility distributions,
and possibility measures P os are inherently associated with L∞ as P os(A) =
maxx∈A P os(x).

Chapter content

In this Chapter, therefore, we address the Lp consistent transformation problem in


full generality, and discuss the semantics of the results. Since (Chapter 9) consistent
belief functions live in a simplicial complex CS [441, ?], a partial solution has to
be found separately for each maximal simplex CS x of the consistent complex, each
associated with an ultrafilter {A ⊇ {x}}, x ∈ Θ of focal elements. Global solutions
are identified as the partial solutions at minimal distance from the original b.f. We
conduct our analysis in both the mass space M of basic probability vectors and the
belief space B of vectors of belief values.
In the mass space representation, the partial L1 consistent approximation fo-
cussed on a certain element x of the frame is simply obtained by reassigning all the
mass outside the filter {A ⊇ {x}} to Θ. The global approximation is associated as
expected with cores containing the maximal plausibility element(s) of Θ. L∞ ap-
proximation generates a ‘rectangle’ of partial approximations, with barycenter in the
L1 partial approximation. The corresponding global approximation spans the com-
ponents focussed on the element(s) x such that maxB6⊃x mb (B) is minimal. The
L2 partial approximation coincides with the L1 one if mass vectors include mb (Θ)
as a component. Otherwise the L2 partial approximation reassigns the mass b(xc )
outside the desired filter to each element of the filter focussed on x on equal basis.
In the belief space representation, partial approximations determined by both
L1 and L2 norms are unique and coincide, besides having a rather elegant inter-
pretation in terms of classical inner approximations [?, 61]. The L1 /L2 consistent
approximation onto each component CS x of CS generates indeed the consistent
transformation focused on x, i.e. a new belief function whose focal elements have
the form A0 = A ∪ {x}, whenever A is a focal element of the original b.f. b. The
associated global L1 /L2 solutions do not lie in general on the component of the con-
sistent complex related to the maximal plausibility element.
13.1 The Lp consistent approximation problem 445

The L∞ norm determines instead an entire polytope of solutions whose barycen-


ter lies on the L1 /L2 approximation, and which is natural to associate with the
polytope of inner Bayesian approximations. Global optimal L∞ approximations do
focus on the maximal plausibility element, and their center of mass if the consistent
transformation focused on it.

Chapter outline

We briefly recall in Section 13.1 how to solve the transformation problem separately
for each maximal simplex of the consistent complex. We then proceed to solve the
L1 -, L2 - and L∞ -consistent approximation problems in full generality, in both the
mass (Section 13.2) and the belief (Section 13.3) space representations. In Section
13.4 we compare and interpret the outcomes of Lp approximations in the two frame-
works, with the help of the ternary example.

13.1 The Lp consistent approximation problem


As we have seen in Chapter 9, the region CS of consistent belief functions in the
belief space is the union:
[
CS B = Cl(bA , A 3 x).
x∈Θ

of a number of (maximal) simplices, each associated with a ‘maximal ultrafilter’


{A ⊇ {x}}, x ∈ Θ of subsets of Θ (those containing a given element x).
It is not difficult to see that the same holds in the mass space, where the consistent
complex is the union: [
CS M = Cl(mA , A 3 x)
x∈Θ

of maximal simplices Cl(mA , A 3 x) formed by the mass vectors associated with


all the belief functions with core containing a particular element x of Θ.
As in the consonant case (see Section 12.2.1) solving (13.1), in particular in the
case of Lp norms, involves finding a number of partial solutions:

csxLp [b] = arg min x kb − cskLp , csxLp [mb ] = arg min km − mcs kLp
cs∈CS B mcs ∈CS x
M
(13.2)
in the belief/mass space, respectively. Then, the distance of b from all such partial
solutions needs to be assessed to select a global, optimal approximation. As a matter
of fact, an analysis of the outcomes of Lp consistent approximation in the case of
a binary frame has already been run in Section 12.3 (as if |Θ| = 2 consonant and
consistent belief functions coincide).
446 13 Consistent approximation

13.2 Consistent approximation in M


Let us therefore compute the analytical form of all Lp consistent approximations in
the mass space. We start by describing the difference vector mb − mcs between the
original mass vector and its approximation. Using the notation
X X
mcs = mcs (B)mB , mb = mb (B)mB
B⊇{x},B6=Θ B(Θ

(as in RN −2 mb (Θ) is not included by normalization) the difference vector can be


expressed as:
X X
mb − mcs = (mb (B) − mcs (B))mB + mb (B)mB . (13.3)
B⊇{x},B6=Θ B6⊃{x}

Its Lp norms are then given by the following expressions:


X X
kmb − mcs kML1 = |mb (B) − mcs (B)| + |mb (B)|,
B⊇{x},B6=Θ B6⊃{x}
s X X
kmb − mcs kM
L2 = |mb (B) − mcs (B)|2 + |mb (B)|2 ,
B⊇{x},B6=Θ B6⊃{x}
 
kmb − mcs kM
L∞ = max max |mb (B) − mcs (B)|, max |mb (B)| .
B⊇{x},B6=Θ B6⊃{x}
(13.4)

13.2.1 L1 approximation
.
Let us first tackle the L1 case. After introducing the auxiliary variables β(B) =
mb (B) − mcs (B) we can write the L1 norm of the difference vector as:
X X
kmb − mcs kM L1 = |β(B)| + |mb (B)|, (13.5)
B⊇{x},B6=Θ B6⊃{x}

which is obviously minimized by β(B) = 0, for all B ⊇ {x}, B 6= Θ. Thus:


Theorem 87. Given an arbitrary belief function b : 2Θ → [0, 1] and an element x ∈
Θ of its frame of discernment, its unique L1 consonant approximation csxL1 ,M [mb ]
in M with core containing x is the consonant b.f. whose mass distribution coincides
with that of b on all the subsets containing x:

mb (B) ∀B ⊇ {x}, B 6= Θ
mcsL ,M [mb ] (B) =
x (13.6)
1 mb (Θ) + b({x}c ) B = Θ.

The mass of all the subsets not in the desired principal ultrafilter {B ⊇ {x}} is
simply reassigned to Θ.
13.2 Consistent approximation in M 447

Global approximation. The global L1 consistent approximation in M coin-


cides with the partial approximation (13.6) at minimal distance from the original
mass vectorP mb . By (13.5) the partial approximation focussed on x has distance
b({x}c ) = B6⊃{x} mb (B) from mb . The global L1 approximation mcsL1 ,M [mb ]
is therefore the (union of the) partial approximation(s) associated with the maximal
plausibility singleton(s):

x̂ = arg min b(xc ) = arg max plb (x).


x x

13.2.2 L∞ approximation

In the L∞ case the desired norm of the difference vector is:


 
M
kmb − mcs kL∞ = max max |β(B)|, max mb (B) .
B⊇{x},B6=Θ B6⊃{x}

The above norm is trivially minimized by {β(B)} such that:

|β(B)| ≤ max mb (B) ∀B ⊇ {x}, B 6= Θ,


B6⊃{x}

namely:

− max mb (B) ≤ mb (B) − mcs (B) ≤ max mb (B) ∀B ⊇ {x}, B 6= Θ.


B6⊃{x} B6⊃{x}

Theorem 88. Given an arbitrary belief function b : 2Θ → [0, 1] and an element


x ∈ Θ of its frame of discernment, its L∞ consistent approximations csxL∞ ,M [mb ]
with core containing x in M are those whose mass values on all the subsets con-
taining x differ from the original ones by the maximum mass of the subsets not in
the ultrafilter.
Namely, for all B ⊃ {x}, B 6= Θ:

mb (B) − max mb (C) ≤ mcsxL [mb ] (B) ≤ mb (B) + max mb (C). (13.7)
C6⊃{x} ∞ ,M C6⊃{x}

Clearly this set of solutions can also include pseudo belief functions.
Global approximation. Once again, the global L∞ consistent approximation
in M coincides with the partial approximation (13.7) at minimal distance from the
original b.p.a. mb .
The partial approximation focussed on x has distance max mb (B) from mb . The
B6⊃{x}
global L∞ approximation mcsL∞ ,M [mb ] is therefore the (union of the) partial ap-
proximation(s) associated with the singleton(s) such that:

x̂ = arg min max mb (B).


x B6⊃{x}
448 13 Consistent approximation

13.2.3 L2 approximation

To find the L2 partial consistent approximation(s) in M we resort as usual to impos-


ing that the difference vector is orthogonal to all the generators of the linear space
involved. In the consistent case we need to impose the orthogonality of the differ-
ence vector mb − mcs with respect to the subspace CS x associated with consistent
mass functions focused on {x}.
The generators of (the linear space spanned by) CS x are the vectors: mB −
m{x} , for all B ) {x}. The desired orthogonality condition reads therefore as

hmb − mcs , mB − m{x} i = 0,

where mb − mcs is given by Equation (13.3), while mB − m{x} (C) = 1 if C = B,


= −1 if C = {x}, 0 elsewhere. Therefore, using once again the variables {β(B)},
the condition simplifies as follows:

β(B) − β({x}) = 0 ∀B ) {x}, B 6= Θ;
hmb − mcs , mB − m{x} i = (13.8)
−β(x) = 0 B = Θ.

Notice that, when using vectors mb of RN −1 (including B = Θ, compare (12.12))


X
mb = mb (B)mB (13.9)
∅(B⊆Θ

to represent belief functions, the orthogonality condition reads instead as:

hmb − mcs , mB − m{x} i = β(B) − β({x}) = 0 ∀ B ) {x}. (13.10)

Theorem 89. Given an arbitrary belief function b : 2Θ → [0, 1] and an element


x ∈ Θ of its frame of discernment, its unique L2 partial consistent approximation
csxL2 ,M [mb ] with core containing x in M coincides with its partial L1 approxima-
tion csxL1 ,M [mb ].
However, when using the mass representation (13.9) in RN −1 , the partial L2
approximation is obtained by equally redistributing to each element of the ultrafilter
{B ⊇ {x}} an equal fraction of the mass of focal elements not in it:
b({x}c )
mcsxL ,M [mb ]
(B) = mb (B) + ∀B ⊇ {x}. (13.11)
2 2|Θ|−1
The partial L2 approximation in RN −1 redistributes the mass equally to all the ele-
ments of the ultrafilter.
Global approximation. The global L2 consistent approximation in M is again
given by the partial approximation (13.11) at minimal L2 distance from mb . In the
N − 2 representation, by definition of L2 norm in M (13.4), the partial approxima-
tion focussed on x has distance from mb :
X  X 2 X
c 2 2
(b(x )) + (mb (B)) = mb (B) + (mb (B))2 ,
B6⊃{x} B6⊃{x} B6⊃{x}
13.3 Consistent approximation in the belief space 449

which is minimized by the element(s) x̂ ∈ Θ such that


X
x̂ = arg min (mb (B))2 .
x
B6⊃{x}

In the N − 1-dimensional representation, instead, the partial approximation fo-


cussed on x has distance from mb :
2
b(xc )
X   X
mb (B) − mb (B) + |Θ|−1 + (mb (B))2 =
2
B⊇{x},B6=Θ B6⊃{x}
P 2
B6⊃{x} mb (B)
X
= + (mb (B))2 ,
2|Θ|−1
B6⊃{x}

which is minimized by the same singleton(s). Note that, even though (in the N − 2
representation) the partial L1 and L2 approximations coincide, the global approxi-
mations in general may fall on different components of the consonant complex.

13.3 Consistent approximation in the belief space

13.3.1 L1 /L2 approximations

We have seen that in the mass space (at least in its N −2 representation, Theorem 89)
the L1 and L2 approximations coincide. This is true in the belief space in the general
case as well. We will gather some intuition on the general solution by considering
first the slightly more complex case of a ternary frame: Θ = {x, y, z}.
We will use the notation:
X X
cs = mcs (B)bB , b = mb (B)bB .
B⊇{x} B(Θ

Linear system for L2 A consistent belief function cs ∈ CS x is a solution of the


L2 approximation problem if b − cs is orthogonal to all the generators {bB − bΘ =
bB , {x} ⊆ B ( Θ} of CS x :

hb − cs, bB i = 0
∀B : {x} ⊆ B ( Θ.
X X
As b − cs = (mb (A) − mcs (A))bA = β(A)bA the condition becomes:
A(Θ A(Θ

 X X
β(A)hbA , bB i + mb (A)hbA , bB i = 0 ∀B : {x} ⊆ B ( Θ.

A⊇{x} A6⊃{x}
(13.12)
450 13 Consistent approximation

Linear system for L1 In the L1 case, the minimization problem to solve is:
 X X X 

arg min mb (B) − mcs (B)
α
B⊆A
 A⊇{x}
X X
B⊆A,B⊇{x}
X 

= arg min β(B) + mb (B) ,
β
A⊇{x} B⊆A,B⊇{x} B⊆A,B6⊃{x}

which is clearly solved by setting all addenda to zero.


This yields the following linear system:

 X X
β(B) + mb (B) = 0 ∀A : {x} ⊆ A ( Θ. (13.13)

B⊆A,B⊇{x} B⊆A,B6⊃{x}

Linear transformation in the ternary case An interesting fact emerges when com-
paring the linear systems for L1 and L2 in the ternary case Θ = {x, y, x}:

3β(x) + β(x, y) + β(x, z)+ 
 β(x) = 0


+mb (y) + mb (z) = 0

β(x) + β(x, y) + mb (y) = 0 (13.14)
β(x) + β(x, y) + mb (y) = 0
β(x) + β(x, z) + mb (z) = 0.

 
β(x) + β(x, z) + mb (z) = 0

The solution is the same for both, as the second linear system can be obtained from
the first one by a simple linear transformation of rows (we just need to substitute the
first equation e1 of the first system with the difference: e1 7→ e1 − e2 − e3 ).

Linear transformation in the general case This holds in the general case, too.
Lemma 17. 
X
|B\A| 1C⊆A
hbB , bC i(−1) =
0 otherwise.
B⊇A

Corollary 18. The linear system (13.12) can be reduced to the system (13.13)
through the following linear transformation of rows:
X
rowA 7→ rowB (−1)|B\A| . (13.15)
B⊇A

To obtain both the L2 and the L1 consistent approximations of b it then suffices to


solve the system (13.13) associated with the L1 norm.
Theorem 90. The unique solution of the linear system (13.13) is:

β(A) = −mb (A \ {x}) ∀A : {x} ⊆ A ( Θ.


13.3 Consistent approximation in the belief space 451

The theorem is proved by simple substitution.


Therefore, the partial consistent approximations of b on the maximal simplicial com-
ponent CS x of the consistent complex have basic probability assignment:

mcsxL [b] (A) = mcsxL [b] (A) = mb (A) − β(A) = mb (A) + mb (A \ {x})
1 2

for all events A such that {x} ⊆ A ( Θ.


The value of mcs (Θ) can be obtained by normalization, to get:
X X  
mcs (Θ) = 1 − mcs (A) = 1 − mb (A) + mb (A \ {x})
{x}⊆A(Θ
X X{x}⊆A(Θ
= 1− mb (A) − mb (A \ {x})
{x}⊆A(Θ {x}⊆A(Θ
X
= 1− mb (A) = mb ({x}c ) + mb (Θ)
A6=Θ,{x}c

as all events B 6⊃ {x} can be written as B = A \ {x} for A = B ∪ {x}.


Corollary 19.

mcsxL [b] (A) = mcsxL [b] (A) = mb (A) + mb (A \ {x})


1 2

∀x ∈ Θ, and for all A s.t. {x} ⊆ A ⊆ Θ.

Interpretation as focused consistent transformations The expression of the the


basic probability assignment of the L1 /L2 consistent approximations of b (Corollary
19) is simple and elegant. It also has a straightforward interpretation: to get a con-
sistent belief function focused on a singleton x, the mass contribution of all events
B such that B ∪ {x} = A is assigned to A. But there are just two such events: A
itself, and A \ {x}.

Example The partial consistent approximation with core {x} of an example belief
function defined on a frame Θ = {x, y, z, w} is illustrated in Figure 13.1.

Fig. 13.1. A belief function on Θ = {x, y, z, w} (left) and its partial L1 /L2 consistent
approximation in B with core {x} (right).

The b.f. with focal elements {y}, {y, z}, and {x, z, w} is transformed by the map-
ping:
452 13 Consistent approximation

{y} 7→ {x} ∪ {y} = {x, y},


{y, z} 7→ {x} ∪ {y, z} = {x, y, z},
{x, z, w} 7→ {x} ∪ {x, z, w} = {x, z, w}
into a consistent belief function with focal elements {x, y}, {x, y, z}, and {x, z, w}
and the same basic probability assignment.
The partial solutions of the L1 /L2 consistent approximation problem turn out
to be related to the classical inner consonant approximations of a belief function b,
i.e., the set of consonant belief functions co such that:
co(A) ≥ b(A) ∀A ⊆ Θ.
Dubois and Prade [?] have proven that such approximations exist iff b is consistent.
When b is not consistent, a ‘focused consistent transformation’ can be applied to get
a new belief function b0 such that:
mb0 (A ∪ xi ) = mb (A) ∀A ⊆ Θ
and xi is the element of Θ with highest plausibility.
Corollary 19 then states that the L1 /L2 consistent approximation onto each com-
ponent CS x of the consistent simplex CS is simply the consistent transformation
focused on x in the above sense.

Global optimal solution for L1 To find the global consistent approximation of b


we need to seek the partial approximation(s) csxL1/2 [b] at minimal distance from b,
i.e., we need to minimize the distance kb − csxL1/2 [b]k.
In the L1 case we obtain the following result.
Theorem 91. The global L1 consistent approximation of a belief function b is:
.
csL1 [b] = arg min kb − csxL1 [b]k = csx̂L1 [b],
cs∈CS

i.e., the partial approximation associated with the sigleton element:


X
x̂ = arg min b(A). (13.16)
x∈Θ
A⊆{x}c

In the binary case Θ = {x, y} the condition of Theorem 91 reduces to


X
x̂ = arg min b(A) = arg min mb ({x}c ) = arg max plb (x)
x x x
A⊆{x}c

and the global approximation falls on the component of the consistent complex as-
sociated with the element of maximal plausibility. Unfortunately, in the case of an
arbitrary frame Θ (13.16) is not necessarily the maximal plausibility element:
X
arg min b(A) 6= arg max plb (x),
x∈Θ x∈Θ
A⊆{x}c

as a simple counterexample can prove.


13.3 Consistent approximation in the belief space 453

Global optimal solution for L2


Theorem 92. The global L2 consistent approximation of a belief function b : 2Θ →
[0, 1] is:
.
csL2 [b] = arg min kb − csxL2 [b]k = csx̂L2 [b],
cs∈CS

i.e., the partial approximation associated with the singleton element:


X
x̂ = arg min (b(A))2 .
x∈Θ
A⊆{x}c

Once again, in the binary case the condition of Theorem 92 reads as:
X
x̂ = arg min (b(A))2 = arg min(mb ({x}c ))2 = arg max plb (x)
x x x
A⊆{x}c

and the global approximation for L2 also falls on the component of the consistent
complex associated with the element of maximal plausibility, while this is not gen-
erally true for an arbitrary frame.

13.3.2 L∞ consistent approximation

As observed in the binary case, for each component CS x of the consistent com-
plex the set of partial L∞ -approximations form a polytope whose center of mass is
exactly equal to the partial L1 /L2 approximation.
Theorem 93. Given an arbitrary belief function b : 2Θ → [0, 1] and an element
x ∈ Θ of its frame of discernment, its L∞ partial consistent approximation with
core containing x in the belief space CSLx ∞ ,B [mb ] is determined by the following
system of constraints:
X X
−b(xc ) − mb (B) ≤ γ(A) ≤ b(xc ) − mb (B), (13.17)
B⊆A,B6⊃{x} B⊆A,B6⊃{x}

where
. X X 
γ(A) = β(B) = mb (B) − mcs (B) . (13.18)
B⊆A,B⊇{x} B⊆A,B⊇{x}

This defines a high-dimensional ‘rectangle’ in the space of solutions {γ(A), {x} ⊆


A ( Θ}.
Corollary 20. Given a belief function b : 2Θ → [0, 1], its partial L1 /L2 approxi-
mation onto any given component CS xB of the consistent complex CS B in the belief
space is the geometric barycenter of the set its L∞ consistent approximations on
the same component.
As the L∞ distance between b and CS x is minimal for the singleton element(s) x
which minimise(s) kb − csxL∞ k∞ = b({x}c ) it also follows that:
454 13 Consistent approximation

Corollary 21. The global L∞ consistent approximations of a belief function b :


2Θ → [0, 1] in B form the union of the partial L∞ approximations of b onto the
component(s) of the consistent complex associated with the maximal plausibility
element(s): [
CSL∞ ,B [mb ] = CSLx ∞ ,B [mb ].
x=arg max plb (x)

13.4 Approximations in the belief versus the mass space

Summarizing, in the mass space M:


– the partial L1 consistent approximation focussed on a certain element x of the
frame is obtained by reassigning all the mass b(xc ) outside the filter to Θ;
– the global approximation is associated, as expected, with cores containing the
maximal plausibility element(s) of Θ;
– the L∞ approximation generates a ‘rectangle’ of partial approximations, with
barycenter in the L1 partial approximation;
– the corresponding global approximations span the component(s) focussed on the
element(s) x such that maxB6⊃x mb (B) is minimal;
– the L2 partial approximation coincides with the L1 one in the N − 2 representa-
tion;
– in the N − 1 representation the L2 partial approximation reassigns the mass out-
side the desired filter (b(xc )) to each element of the filter focussed on x on equal
basis;
– global approximations in the L2 case are of more difficult interpretation.
In the belief space:
– the partial L1 and L2 approximations coincide on each component of the consis-
tent complex, and are unique;
– for each x ∈ Θ they coincide with the consistent transformation [?] focused on
x: for all events A such that {x} ⊆ A ⊆ Θ

mcsxL [b] (A) = mcsxL [b] (A) = mb (A) + mb (A \ {x});


1 2

– the L1 global consistent approximation is associated with the singleton(s) x ∈ Θ


such that: X
x̂ = arg min b(A),
x
A⊆{x}c

while the L2 global approximation is associated with


X
x̂ = arg min (b(A))2 ;
x
A⊆{x}c

neither appear to have simple epistemic interpretations;


13.4 Approximations in the belief versus the mass space 455

– the set of partial L∞ solutions form a polytope on each component of the consis-
tent complex, whose center of mass lies on the partial L1 /L2 approximation;
– the global L∞ solutions fall on the component(s) associated with the maximal
plausibility element(s), and their center of mass, when such element is unique, is
the consistent transformation focused on the maximal plausibility singleton [?].
Approximations in both mass and belief space reassign the total mass b(xc )
outside the filter focussed on x, although in different ways. However, mass space-
consistent approximations do so either on an equal basis, or by favouring no particu-
lar focal element in the filter (i.e., by reassigning the entire mass to Θ). They do not
distinguish focal elements in virtue of their set-theoretic relationships with subsets
B 6⊃ x outside the filter.
In contrast, approximations in the belief space do so according to the focussed con-
sistent transformation principle.

13.4.1 Comparison on a ternary example

It can be useful to illustrate the different approximations in the toy case of a ternary
frame, Θ = {x, y, z}, for sake of completeness. Assuming we want the consistent
approximation to focus on x, Figure 13.2 illustrates the different partial consistent
approximations in the simplex Cl(mx , mx,y , mx,z , mΘ ) of consistent belief func-
tions focussed on x in a ternary frame, for the belief function with masses:

mb (x) = 0.2, mb (y) = 0.1, mb (z) = 0,


(13.19)
mb (x, y) = 0.4, mb (x, z) = 0, mb (y, z) = 0.3.

This is a tetrahedron with four vertices, delimited by dark solid edges.


The set of mass space- partial L∞ approximations there is represented by the
green cube. As expected, it does not entirely fall inside the tetrahedron of admis-
sible consistent belief functions. Its barycenter (the green star) coincides with the
(mass space-) L1 partial consistent approximation. The L2N −2 approximation does
also coincide, as expected, with the L1 approximation. There seems to exist a strong
case for the latter, as it posses a natural interpretation in terms of mass assignment:
all the mass outside the filter is reassigned to Θ, increasing the overall uncertainty
of the belief state.
The mass space- L2 partial approximation in the N − 1 representation (blue star)
is distinct from the latter, but still falls inside the polytope of L∞ partial approxi-
mations and is admissible, as it falls in the interior of the simplicial component. It
possesses quite a strong interpretation as it splits the mass not in the filter focused
on x equally to all subsets in the filter.
The unique L1 /L2 partial approximation in B is shown as a red square. It has
−2,M M
something in common with the LN 2 = LM 1 = L∞ approximation (green
star), as they both fall on the same face of the simplex of consistent belief functions
focussed on x (highlighted in yellow) - they assign zero mass to {x, z}, which fails
to be supported by any focal element of the original belief function.
456 13 Consistent approximation

Fig. 13.2. The simplex (solid black tetrahedron) Cl(mx , mx,y , mx,z , mΘ ) of consistent be-
lief functions focussed on x on the ternary frame Θ = {x, y, z}, and the associated Lp partial
consistent approximations for the example belief function (13.19).

Chapter appendix
Proof of Theorem 89

In the N − 2 representation, by (13.8) we have that β(B) = 0, i.e., mcs (B) =


mb (B) ∀B ⊇ {x}, B 6= Θ. By normalization we get mcs (Θ) = mb (Θ) + mb (xc ):
but this is exactly the L1 approximation (13.6).
In the N − 1 representation, by (13.10) we have that mcs (B) = mcs (x) + mb (B) −
mb (x) for all B ) {x}. By normalizing we get:
X X
mcs (B) = mcs (x) + mcs (B)
{x}⊆B⊆Θ {x}(B⊆Θ
= 2|Θ|−1 mcs (x) + plb (x) − 2|Θ|−1 mb (x) = 1,

i.e., mcs (x) = mb (x) + (1 − plb (x))/2|Θ|−1 , as there are 2|Θ|−1 subsets in the
ultrafilter containing x.
By replacing the value of mcs (x) into the first equation we get (13.11).

Proof of Corollary 18

If we apply the linear transformation (13.15) to the system (13.12) we get:


X X X 
β(C)hbB , bC i + mb (C)hbB , bC i (−1)|B\A| =
B⊇A C6⊃{x}
XC⊇{x} X X X
= β(C) hbB , bC i(−1)|B\A| + mb (C) hbB , bC i(−1)|B\A|
C⊇{x} B⊇A C6⊃{x} B⊇A
13.4 Approximations in the belief versus the mass space 457

∀A : {x} ⊆ A ( Θ. Therefore by Lemma 17 we obtain:



 X X
βC + mb (C) = 0 ∀A : {x} ⊆ A ( Θ,

C⊇{x},C⊆A C6⊃{x},C⊆A

i.e. the system of equations (13.13).

Proof of Lemma 17

We first note that, by definition of categorical belief function bA :


X X c
hbB , bC i = 1= 1 = 2|(B∪C) | − 1.
D⊇B,C;D6=Θ E((B∪C)c

Therefore:
X X c
hbB , bC i(−1)|B\A| = (2|(B∪C) | − 1)(−1)|B\A|
B⊆A B⊆A
X c X
= 2|(B∪C) | (−1)|B\A| − (−1)|B\A| (13.20)
B⊆A B⊆A
X c
= 2|(B∪C) | (−1)|B\A|
B⊆A

|B\A|
X X c
as (−1)|B\A| = 1|A |−k
(−1)k = 0 by Newton’s binomial (8.4).
B⊆A k=0
As both B ⊇ A and C ⊇ A the set B can be decomposed into the disjoint sum
B = A + B 0 + B 00 , where ∅ ⊆ B 0 ⊆ C \ A, ∅ ⊆ B 00 ⊆ (C ∪ A)c .
The quantity (13.20) can then be written as:
X X c 00 0 00
2|(A∪C)| −|B | (−1)|B |+|B | =
∅⊆B 0 ⊆C\A ∅⊆B 00 ⊆(C∪A)c
0 00 c
−|B 00 |
X X
= (−1)|B |
(−1)|B | 2|(A∪C)| ,
∅⊆B 0 ⊆C\A ∅⊆B 00 ⊆(C∪A)c

where
00 c
−|B 00 | c c
X
(−1)|B | 2|(A∪C)| = [2 + (−1)]|(A∪C)| = 1|(A∪C)| = 1,
∅⊆B 00 ⊆(C∪A)c

again by Newton’s binomial. We obtain therefore:


X 0
(−1)|B | ,
∅⊆B 0 ⊆C\A

which is nil when C \ A 6= ∅ and equal to 1 when C \ A = ∅, i.e., for C ⊆ A.


458 13 Consistent approximation

Proof of Theorem 91

The L1 distance between the partial approximation and b can be easily computed
as: kb − csxL1 [b]kL1 =
X
= |b(A) − csxL1 [b](A)|
A⊆Θ
X X X
= b(A) −
|b(A) − 0| + mcs (B)

A6⊃{x}
X X A⊇{x}
X B⊆A,B⊇{x}
X
= b(A) + mb (B) − (mb (B) + mb (B \ {x}))


A6⊃{x} A⊇{x} B⊆A B⊆A,B⊇{x}
X X X X
= b(A) + m (B) − mb (B \ {x})

b
A6⊃{x} A⊇{x} B⊆A,B6⊃{x} B⊆A,B⊇{x}
X X X X
= b(A) + mb (C) − mb (C)


A6⊃{x} A⊇{x} C⊆A\{x} C⊆A\{x}
X X
= b(A) = b(A).
A6⊃{x} A⊆{x}c

Proof of Theorem 92

The L2 distance between the partial approximation and b can be computed as: kb −
csxL2 [b]k2 =
X XX X 2
= (b(A) − csxL2 [b](A))2 = mb (B) − mcs (B)
A⊆Θ A⊆Θ B⊆A B⊆A,B⊇{x}
XX X X 2
= mb (B) − mb (B) − mb (B \ {x})
A⊆Θ B⊆A B⊆A,B⊇{x} B⊆A,B⊇{x}
X X  X X 2
2
= (b(A)) + mb (B) − mb (B \ {x})
A6⊃{x} A⊇{x} B⊆A,B6⊃{x} B⊆A,B⊇{x}
X X  X X 2
2
= (b(A)) + mb (C) − mb (C)
A6⊃{x} A⊇{x} C⊆A\{x} C⊆A\{x}

X X
so that kb − csxL2 [b]k2 = (b(A))2 = (b(A))2 .
A6⊃{x} A⊆{x}c

Proof of Theorem 93

The desired set of approximations is, by definition:


 X 
X
x

CSL∞ [b] = arg min max mb (B) − mcs (B) .
mcs (.) A(Θ
B⊆A B⊆A,B⊇{x}
13.4 Approximations in the belief versus the mass space 459

The quantity maxA(Θ has as lower limit the value associated with the largest norm
which does not depend on mcs (.), i.e.:
 X X 
mcs (B) ≥ b({x}c ).

max mb (B) −
A(Θ
B⊆A B⊆A,B⊇{x}

Equivalently, we can write:


 X X 
mb (B) ≥ b({x}c ).

max β(B) +
A(Θ
B⊆A,B⊇{x} B⊆A,B6⊃{x}

In the above constraint only the expressions associated with A ⊇ {x} contain vari-
able terms β(B). Therefore, the desired optimal values are such that:

 X X
mb (B) ≤ b({x}c )

β(B) + {x} ⊆ A ( Θ.

B⊆A,B⊇{x} B⊆A,B6⊃{x}
(13.21)
After introducing the change of variables (13.18), system (13.21) reduces to:

 X

mb (B) ≤ b({x}c )

γ(A) +
{x} ⊆ A ( Θ

B⊆A,B6⊃{x}

whose solution is (13.17).

Proof of Corollary 20

The centre of mass of the set (13.17) of solutions to the L∞ consistent approxima-
tion problem is given by:
X
γ(A) = − mb (B), {x} ⊆ A ( Θ;
B⊆A,B6⊃{x}

which reads in the space of the variables {β(A), {x} ⊆ A ( Θ} as



 X X
β(B) = − mb (B), {x} ⊆ A ( Θ.

B⊆A,B⊇{x} B⊆A,B6⊃{x}

But this is exactly the linear system (13.13) which determines the L1 /L2 consistent
approximation csxL1/2 [b] of b onto CS x .
Part IV

A geometric approach to uncertainty


14
Geometric inference

463
15
Geometric conditioning

As we have seen in Chapter 4, Section 4.3, various distinct definitions of conditional


belief functions have been proposed in the past. As Dempster’s original approach
[342] in his multi-valued mapping framework was almost immediately and strongly
criticised, a string of subsequent proposals [802, 169, 464, 661, 516, 353, 1526,
651, 1327] in different mathematical setups was brought forward. A good overview
of such approaches has been given in Section 4.3.
Quite recently, the idea of formulating the problem geometrically [111, 317,
905] has emerged. Lehrer [827], in particular, has proposed such a geometric ap-
proach to determine the conditional expectation of non-additive probabilities (such
as belief functions).
As we extensively appreciated in Chapters 12 and 13, the use of norm minimisation
for posing and solving problems such as probability and possibility transform has
been gaining traction. In a similar way, conditional belief functions can be defined
by minimising a suitable distance function between the original b.f. b and the ‘con-
ditioning simplex’ BA associated with the conditioning event A, i.e., the set of belief
functions whose b.p.a. assigns mass to subsets of A only:

bd (.|A) = arg min


0
d(b, b0 ). (15.1)
b ∈BA

Such geometrical approach to conditioning contributes to our arsenal of approaches


to the problem and, as we will see in this Chapter, is a promising candidate to the
role of general framework for conditioning.
As in the consonant and consistent approximation problem, any dissimilarity
measure [399, 669, 685, 1191, 713] could be in principle plugged in the above min-
imization problem (15.1) to define conditional belief functions. In [256] the author

465
466 15 Geometric conditioning

of this Book has computed some conditional belief functions generated by minimis-
ing Lp norms in the ‘mass space’ (cfr. Chapter 12, Section 15.5), where b.f.s are
represented by the vectors of their basic probabilities.

Chapter content

In this Chapter we explore the geometric conditioning approach in both the mass
space M and the belief space B, in which belief functions are represented by the
vectors of their belief values b(A) (Chapter 6). We adopt once again distance mea-
sures d of the classical Lp family, as a first step towards a complete analysis of the
geometric approach to conditioning. We show that geometric conditional b.f.s in B
are more complex than in the mass space, less naive objects whose interpretation in
terms of degrees of belief is however less natural.

Conditioning in the mass space In summary, the L1 -conditional belief functions


in M with conditioning event A form a polytope in which each vertex is the b.f.
obtained by re-assigning the entire mass not contained in A to a single subset of A,
{B ⊆ A}. In turn, the unique L2 conditional b.f. is the barycenter
P of this polytope,
i.e., the belief function obtained by re-assigning the mass B6⊂A m(B) to each
focal element {B ⊆ A} on an equal basis. Such results can be interpreted as a
generalization of Lewis’ imaging approach to belief revision, originally formulated
in the context of probabilities [841].
The idea behind imaging is that, upon observing that some state x ∈ Θ is im-
possible, you transfer the probability initially assigned to x completely towards the
remaining state you deem the most similar to x [1032]. Peter Gärdenfors [502] ex-
tended Lewis’ idea by allowing a fractionPλi of the probability of such state x to be
re-distributed to all remaining states xi ( i λi = 1).
In the case of belief functions, the mass m(C) of each focal element not included
in A should be re-assigned to the ‘closest’ focal element in {B ⊆ A}. If no in-
formation on the similarity between focal elements is available or make sense in
a particular context, ignorance translates into allowing all possible set of weights
λ(B) for Gärdenfors’ (generalized) belief revision by imaging. This yields the set
of L1 conditional b.f.s. If such ignorance is expressed by assigning instead equal
weight λ(B) to each B ⊆ A, the resulting revised b.f. is the unique L2 conditional
b.f., the barycenter of the L1 polytope.

Conditioning in the belief space Conditional belief functions in the belief space
seem to have rather less straightforward interpretations than the corresponding
quantities in the mass space. The barycenter of the set of L∞ conditional belief
functions can be interpreted as follows: the mass of all the subsets whose intersec-
tion with A is C ( A is re-assigned by the conditioning process half to C, and half
to A itself. While in the M case the barycenter of L1 conditional b.f.s is obtained
by reassigning the mass of all B 6⊂ A to each B ( A on equal grounds, for the
barycenter of L∞ conditional b.f.s in B normalization is achieved by adding or sub-
tracting their masses according to the cardinality of C (even or odd). As a result, the
15.1 Conditioning in belief calculus: a concrete scenario 467

obtained mass function is not necessarily non-negative: again, such version of ge-
ometrical conditioning may generated pseudo belief functions. Furthermore, while
being quite similar to it, the L2 conditional belief function in B is distinct from the
barycenter of the L∞ conditional b.f.s.
In the L1 case, not only the resulting conditional pseudo belief functions are not
guaranteed to be proper belief functions, but it appears difficult to find simple inter-
pretations for these results in terms of degrees of belief.
A number of interesting cross relations between conditional belief functions of
the two representation domains appear to exist from an empirical comparison, and
remain to be investigated further.

Chapter outline

We commence by illustrating in Section 15.1 the crucial role of conditioning in a


real world scenario drawn from an important computer vision application, data as-
sociation, and an approach to this problem based on belief calculus. In Section 15.2
we define the notion of geometric conditional belief function. In Section 15.3 we
pick the ‘mass’ representation of b.f.s, and prove the analytical forms of the L1 , L2
and L∞ conditional belief functions in M, respectively. We discuss their interpre-
tation in terms of degrees of belief (Section 15.3.5), and hint to an interesting link
with Lewis’ imaging [841], when generalized to belief functions (Section 15.3.6).
Section 15.4 is dedicated to the derivation of geometric conditional belief functions
in the belief space. In Section 15.4.1 we prove the analytical form of L2 conditional
b.f.s in B, and propose a preliminary interpretation for them. We do the same in
Sections 15.4.2 and 15.4.3 for L1 and L∞ conditional belief functions in the be-
lief space, respectively. We conclude the Chapter with a critical discussion of the
obtained results. In Section 15.5 a comparison of conditioning in mass and belief
space is run, with the help of the usual ternary case study.
Finally, in Section 15.6 a number of future developments for the geometric ap-
proach to conditioning are discussed, outlining a programme of research for the
future.

15.1 Conditioning in belief calculus: a concrete scenario


15.1.1 Model-based data association

The data association problem is one of the more intensively studied computer vi-
sion applications for its important role in the implementation of automated defense
systems, and its connections to the classical field of ‘structure from motion’, i.e., the
reconstruction of a rigid scene from a sequence of images.
A number of feature points moving in the 3D space are followed by one or more
cameras and appear in an image sequence as ‘unlabeled’ points (i.e. we do not know
the correspondences between points appearing in two consecutive frames). A typical
example consists of a set of markers set at fixed positions on a moving articulated
468 15 Geometric conditioning

body (e.g., a human body): in order to reconstruct the trajectory of the cloud of
markers (or of the underlying body) we need to associate feature points belonging
to pairs of consecutive images, Ik and Ik+1 .
A classical approach to the data association problem called joint probabilistic
data association filter (JPDAF) [58], is based on tuning a number of Kalman filters
(each associated with a single feature point), whose aim is to predict the future po-
sition of each target, in order to produce the most probable labeling of the cloud of
points in the next image.
Unfortunately, the JPDAF method suffers from a number of drawbacks: for exam-
ple, when several features converge to a small region (‘coalescence’ [116]) the al-
gorithm cannot tell them apart. Several techniques have been proposed to overcome
this sort of problems [648].
However, assume that the feature points represent fixed locations {Mi , i =
1, ..., M } on an articulated body, and that we know the rigid motion constraints
between pairs of markers. This is equivalent to possessing a topological model of
the articulated body, represented by an undirected graph whose edges correspond
to rigid motion constraints. We can then exploit such a-priori information to solve
the association task in critical situations where several points fall into the validation
region of a single filter.
A topological model of the body to track, for instance, can provide:
– a prediction constraint, encoding the likelihood of a measurement mki at time k
of being associated with a measurement mk−1 i of the previous image;
– an occlusion constraint, expressing the chance that a given marker of the model
is occluded in the current image;
– a metric constraint, representing the knowledge of the lengths of the links, which
can be learned from the history of the past associations;
– a rigid motion constraint on pairs of markers.
Belief calculus provides a coherent framework in which to combine all these sources
of information, and cope with possible conflicts. Indeed all these constraints can be
expressed as belief functions over a suitable frame of discernment, namely set of
possible associations mi ↔ mj between feature points.

15.1.2 Rigid motion constraints as conditional belief functions

By reflecting on the nature of the above constraints we can note that the information
carried by predictions of filters and occlusions inherently concerns associations be-
tween feature points belonging to consecutive images, while other conditions (such
as the metric constraint) can be expressed instantaneously in the frame of the cur-
rent time-k associations. Finally, a number of bodies of evidence depend on the
model-measurement associations mk−1 i ↔ Mj at the previous time step. This is the
case of belief functions encoding the information carried by the motion of the body,
expression of rigid motion constraints.
We can then introduce the frame of discernment of past model-to-feature asso-
ciations:
15.1 Conditioning in belief calculus: a concrete scenario 469

Fig. 15.1. Rigid motion constraints in the data association problem involve the combination
of a set of conditional belief functions in each partition of the joint association space in a
single total function.

k−1 .
ΘM = {mk−1
i ↔ Mj , ∀i = 1, ..., nk−1 ∀j = 1, ..., M, }

the feature-to-feature association frame:


.
Θkk−1 = {mk−1
i ↔ mkj , ∀i = 1, ..., nk−1 ∀j = 1, ..., nk },

and the frame of current model-to-feature associations:


k .
ΘM = {mki ↔ Mj , ∀i = 1, ..., nk ∀j = 1, ..., M }, (15.2)

where nk is the number of feature points {mki } appearing in image Ik (see Figure
15.1.
All the available pieces of evidence can be combined on the ‘minimal refinement’
k−1
(see [1149] of [?] of all these frames, the product association frame ΘM ⊗ Θkk−1 .
k
The result is later projected onto the current association set ΘM in order to yield the
best current estimate.
Crucially, rigid motion constraints can be expressed in a conditional way only:
hence, the computation of a belief estimate of the desired current model-measurement
associations (15.2) involves combining conditional belief functions defined over the
product association frame.
The purpose of this Chapter is to study how such conditional belief functions can be
induced by minimizing geometric distances between belief measures.
470 15 Geometric conditioning

15.2 Geometric conditional belief functions


Given an arbitrary conditioning event A ⊆ Θ, the vector ma associated with any
belief function a whose mass supports only focal elements {∅ ( B ⊆ A} included
in a given event A can be decomposed as:
X
ma = ma (B)mB . (15.3)
∅(B⊆A

The set of all such vectors is the simplex:


.
MA = Cl(mB , ∅ ( B ⊆ A).

The same is true in the belief space, where (the vector a associated with) each b.f. a
assigning mass to focal elements included in A only is decomposable as:
X
a= a(B)bB .
∅(B⊆A

.
These vectors live in a simplex BA = Cl(bB , ∅ ( B ⊆ A). We call MA and BA
the conditioning simplices in the mass and the belief space, respectively.

Definition 85. Given a belief function b : 2Θ → [0, 1], we call geometric condi-
tional belief function induced by a distance function d in M (B) the belief func-
tion(s) bd,M (.|A) (bd,B (.|A)) on Θ which minimize(s) the distance d(mb , MA )
(d(b, BA )) between the mass (belief) vector representing b and the conditioning
simplex associated with A in M (B).

As recalled above, a large number of proper distance functions or mere dissimilarity


measures between belief functions have been proposed in the past, and many others
can be conjectured or designed [685]. As we did in Chapters 12 and 13 we consider
here as distance functions the three major Lp norms d = L1 , d = L2 and d = L∞
in both the mass (12.15) and belief (12.16) spaces .

15.3 Geometric conditional belief functions in M


15.3.1 Conditioning by L1 norm

Given a belief function b with basic probability assignment mb collected in a vec-


tor mb ∈ M, its L1 conditional version(s) bL1 ,M (.|A) has/have basic probability
assignment mL1 ,M (.|A) such that:
.
mL1 ,M (.|A) = arg min kmb − ma kL1 . (15.4)
ma ∈MA

Using the expression (12.15) of the L1 norm in the mass space M, (15.4) becomes:
15.3 Geometric conditional belief functions in M 471
X
arg min kmb − ma kL1 = arg min |mb (B) − ma (B)|.
ma ∈MA ma ∈MA
∅(B⊆Θ

By exploiting the fact that the candidate solution ma is an element of MA (Equation


(15.3)) we can greatly simplify this expression.

Lemma 18. The difference vector mb − ma in M has the form:


X  X 
mb − ma = β(B)mB + b(A) − 1 − β(B) mA
∅(B(A
X ∅(B(A (15.5)
+ mb (B)mB
B6⊂A

.
where β(B) = mb (B) − ma (B).

Theorem 94. Given a belief function b : 2Θ → [0, 1] and an arbitrary non-empty


focal element ∅ ( A ⊆ Θ, the set of L1 conditional belief functions bL1 ,M (.|A) with
respect to A in M is the set of b.f.s with core in A such that their mass dominates
that of b over all the proper subsets of A:
n o
bL1 ,M (.|A) = a : 2Θ → [0, 1] : Ca ⊆ A, ma (B) ≥ mb (B) ∀∅ ( B ⊆ A .
(15.6)

Geometrically, the set of L1 conditional belief function in M has the form of a


simplex.

Theorem 95. Given a b.f. b : 2Θ → [0, 1] and an arbitrary non-empty focal element
∅ ( A ⊆ Θ, the set of L1 conditional belief functions bL1 ,M (.|A) with respect to A
in M is the simplex

ML1 ,A [b] = Cl(m[b]|B


L1 A, ∅ ( B ⊆ A)

whose vertex m[b]|B


L1 A, ∅ ( B ⊆ A, has coordinates {ma (B)} such that

 ma (B) = mb (B) + 1 − b(A) = mb (B) + plb (Ac ),

ma (X) = mb (X) ∀∅ ( X ( A, X 6= B.

(15.7)

It is important to notice that all the vertices of the L1 conditional simplex fall inside
MA proper (as the mass assignment (15.7) is non-negative for all subsets X). A
priori, some of them could have belonged to the linear space generated by MA
but outside the simplex MA (i.e., some of the solutions ma (B) could have been
negative). This is indeed the case for geometrical belief functions induced by other
norms, as we will see in the following.
472 15 Geometric conditioning

15.3.2 Conditioning by L2 norm

Let us now compute the analytical form of the L2 conditional belief function(s) in
the mass space. We make use of the form (15.5) of the difference vector mb − ma ,
where again ma is an arbitrary vector of the conditional simplex MA . As usual,
rather than minimising the norm of the difference we seek the point of conditioning
simplex such that the difference vector is orthogonal to all the generators of a(MA ).

Theorem 96. Given a belief function b : 2Θ → [0, 1] and an arbitrary non-empty


focal element ∅ ( A ⊆ Θ, the unique L2 conditional belief function bL2 ,M (.|A)
with respect to A in M is the b.f. whose basic probability assignment redistributes
the mass 1 − b(A) to each focal element B ⊆ A in an equal way: ∀∅ ( B ⊆ A

1 X plb (Ac )
mL2 ,M (B|A) = mb (B) + mb (B) = mb (B) + |A| .
2|A| − 1 B6⊂A 2 − 1 (15.8)

According to Equation (15.8) the L2 conditional belief function is unique, and corre-
sponds to the mass function which redistributes the mass the original belief function
assigns to focal elements not included in A to each and all the subsets of A in an
equal, even way.
L2 and L1 conditional belief functions in M exhibit a strong relationship.
Theorem 97. Given a belief function b : 2Θ → [0, 1] and an arbitrary non-empty
focal element ∅ ( A ⊆ Θ, the L2 conditional belief function bL2 ,M (.|A) with
respect to A in M is the center of mass of the simplex ML1 ,A [b] of L1 conditional
belief functions with respect to A in M.

Proof. By definition the center of mass of ML1 ,A [b], whose vertices are given by
(15.7), is the vector
1 X
|A|
m[b]|B
L1 A
2 −1
∅(B⊆A

1 h
|A|
i
whose entry B is given by mb (B)(2 − 1) + (1 − b(A)) , i.e., (15.8).
2|A| − 1

15.3.3 Conditioning by L∞ norm

Similarly, we can use Equation (15.5) to minimize the L∞ distance between the
original mass vector mb and the conditioning subspace MA .

Theorem 98. Given a belief function b : 2Θ → [0, 1] with b.p.a. mb , and an ar-
bitrary non-empty focal element ∅ ( A ⊆ Θ, the set of L∞ conditional belief
functions mL∞ ,M (.|A) with respect to A in M forms the simplex:

ML∞ ,A [b] = Cl(m[b]|B̄


L∞ A, B̄ ⊆ A)

with vertices
15.3 Geometric conditional belief functions in M 473

m[b]|B̄
(
L∞ (B|A) = mb (B) + max mb (C) ∀B ⊆ A, B 6= B̄
C6⊂A
|A|
m[b]|B̄
P
L∞ (B̄|A) = mb (B̄) + C6⊂A mb (C) − (2 − 2) maxC6⊂A mb (C),
(15.9)
whenever:
1 X
max mb (C) ≥ |A| mb (C). (15.10)
C6⊂A 2 − 1 C6⊂A

It reduces to the single belief function:


1 X
mL∞ ,M (B|A) = mb (B) + mb (C) ∀B ⊆ A
2|A| − 1 C6⊂A

whenever:
1 X
max mb (C) < mb (C). (15.11)
C6⊂A 2|A|− 1 C6⊂A

The latter is the barycenter of the simplex of L∞ conditional b.f.s in the former case,
and coincides with the L2 conditional belief function (15.8).

Note that, as (15.9) is not guaranteed to be non-negative, the simplex of L∞


conditional belief functions in M does not necessarily fall entirely inside the con-
ditioning simplex MA , i.e., it may include pseudo belief functions. Its vertices are
obtained by assigning the maximum mass not in the conditioning event to all its
subsets indifferently. Normalization is then achieved, rather than by normalization
(as in Dempster’s rule) by subtracting of the total mass in excess of 1 in the spe-
cific component B̄. This behavior is exhibited by other geometric conditional belief
functions as shown in the following.

15.3.4 A case study: the ternary frame

If |A| = 2, A = {x, y}, the conditional simplex is 2-dimensional, with three vertices
mx , my and mx,y . For a b.f. b on Θ = {x, y, z} Theorem 94 states that the vertices
of the simplex ML1 ,A of L1 conditional belief functions in M are:
{x}
m[b]|L1 {x, y} = [mb (x) + plb (z), mb (y), mb (x, y) ]0 ,
{y}
m[b]|L1 {x, y} = [mb (x), mb (y) + plb (z), mb (x, y) ]0 ,
{x,y}
m[b]|L1 {x, y} = [mb (x), mb (y), mb (x, y) + plb (z) ]0 .

Figure 15.2 shows such simplex in the case of a belief function b on the ternary
frame Θ = {x, y, z} and basic probability assignment

m = [0.2, 0.3, 0, 0, 0.5, 0]0 , (15.12)

i.e., mb (x) = 0.2, mb (y) = 0.3, mb (x, z) = 0.5.


In the case of the belief function (15.12) of the above example, by Equation
(15.8) its L2 conditional belief function in M has b.p.a.:
474 15 Geometric conditioning

Fig. 15.2. The simplex (solid red triangle) of L1 conditional belief functions in M associated
with the belief function with mass assignment (15.12) in Θ = {x, y, z}. The related unique
L2 conditional belief function in M is also plotted as a red square. It coincides with the center
of mass of the L1 set. The set of L∞ conditional (pseudo) belief functions is also depicted
(green triangle).

1 − b(x, y) plb (z) plb (z)


m(x) = mb (x) + = mb (x) + , m(y) = mb (y) + ,
3 3 3
plb (z)
m(x, y) = mb (x, y) + .
3
(15.13)
Figure 15.2 visually confirms that such L2 conditional belief function lies in the
barycenter of the simplex of the related L1 conditional b.f.s.
For what concerns L∞ conditional belief functions, the b.f. (15.12) is such that
n o
max mb (C) = max mb (z), mb (x, z), mb (y, z), mb (Θ) = mb (x, z)
C6⊂A
1 X 1 0.5
= 0.5 ≥ |A| mb (C) = mb (x, z) = .
2 −1 C6⊂A
3 3

We hence fall under condition (15.10), and there is a whole simplex of L∞ con-
ditional belief function (in M). According to Equation (15.9) such simplex has
2|A| − 1 = 3 vertices, namely (taking into account the nil masses in (15.12)):
{x}
m[b]|L∞ ,M {x, y} = [mb (x) − mb (x, z), mb (y) + mb (x, z), mb (x, z) ]0 ,
{y}
m[b]|L∞ ,M {x, y} = [mb (x) + mb (x, z), mb (y) − mb (x, z), mb (x, z) ]0 ,
{x,y}
m[b]|L∞ ,M {x, y} = [mb (x) + mb (x, z), mb (y) + mb (x, z), −mb (x, z) ]0 .
(15.14)
We can notice that the set of L∞ conditional (pseudo) b.f.s is not entirely admis-
sible, but its admissible part contains the set of L1 conditional b.f.s, which amounts
therefore a more ‘conservative’ approach to conditioning. Indeed, the latter is the
triangle inscribed in the former, determined by its median points. Note also that
both the L1 and L∞ simplices have the same barycenter in the L2 conditional b.f.
(15.13).
15.3 Geometric conditional belief functions in M 475

15.3.5 Features of geometric conditional belief functions in M

From the analysis of geometric conditioning in the space of mass functions M a


number of facts arise:
– Lp conditional b.f.s, albeit obtained by minimizing purely geometric distances,
possess very simple and elegant interpretations in terms of degrees of belief;
– while some of them correspond to pointwise conditioning, some others form en-
tire polytopes of solutions whose vertices also have simple interpretations;
– conditional belief functions associated with the major L1 , L2 and L∞ norms are
closely related to each other;
– in particular, while distinct, both the L1 and L∞ simplices have barycenter in (or
coincide with, in case 2) the L2 conditional b.f.;
– they are all characterized by the fact that, in the way they re-assign mass from
focal elements B 6⊂ A not in A to focal elements in A, they do not distinguish
between subsets which have non-empty intersection with A and those which have
not.
The last point is quite interesting: mass-based geometric conditional b.f.s do not
seem to care about the contribution focal elements make to the plausibility of the
conditioning event A, but only to whether they contribute or not to the degree of
belief of A. The reason is, roughly speaking, that in mass vectors mb the mass of a
P entry of mb . In opposition,
given focal element appears only in the corresponding
belief vectors b are such that each entry b(B) = X⊆B mb (X) of theirs contains
information about the mass of all the subsets of B. As a result, it is to be expected
that geometric conditioning in the belief space B will see the mass redistribution
process function in a manner linked to the contribution of each focal element to the
plausibility of the conditioning event A.
We will see this in detail in Section 15.4.

15.3.6 Interpretation as general imaging for belief functions

Just as we did in Section 12.5.6 for consonant approximations in the mass space, we
can provide an interesting interpretation of geometric conditional belief functions in
the mass space in the framework of the ‘imaging’ approach [1032].
Suppose we briefly glimpse at a transparent urn filled with black or white balls,
and are asked to assign a probability value to the possible ‘configurations’ of the
urn. Suppose also that we are given three options: 30 black balls and 30 white balls
(state a); 30 black balls and 20 white balls (state b); 20 black balls and 20 white
balls (state c). Hence, Θ = {a, b, c}. Since the observation only gave us the vague
impression of having seen approximately the same number of black and white balls,
we would probably deem the states a and c equally likely, but at the same time we
would tend to deem the event ‘a or c’ twice as likely as the state b. Hence, we assign
probability 1/3 to each of the states. Now, we are told that state c is false. How do
we revise the probabilities of the two remaining states a and b?
Lewis [841] argued that, upon observing that a certain state x ∈ Θ is impossible,
476 15 Geometric conditioning

we should transfer the probability originally allocated to x to the remaining state


deemed the ‘most similar’ to x. In this case, a is the state most similar to c, as they
both consider an equal number of black and white balls. We obtain (2/3, 1/3) as
probability values of a and b, respectively.
Gärdenfors further extended Lewis’ idea (‘general imaging’) by allowing to trans-
fer a part λ of the probability 1/3, initially assigned to c, towards state a, and the
remaining part 1 − λ to state b. These fractions should be independent of the initial
probabilistic state of belief.
What happens when our state of belief is described by a belief function, and we
are told that A is true? In the general imaging framework we need to re-assign the
mass m(C) of each focal element not included in A to all the focal elements B ⊆ A,
according to some weights {λ(B), B ⊆ A}.
Suppose there is no reason to attribute larger weights to any focal element in A.
One option is to represent our complete ignorance about the similarities between
C and each B ⊆ A as a vacuous belief function on the set of weights. If applied
to all the focal elements C not included in A, this results in an entire polytope of
revised belief functions, each associated with an arbitrary normalized weighting. It
is not difficult to see that this coincides with the set L1 conditional belief functions
bL1 ,M (.|A) of Theorem 94.
On the other hand, we can represent the same ignorance as a uniform probability
distribution on the set of weights {λ(B), B ⊆ A}, for all C 6⊂ A. Again, it is easy
to see that general imaging produces in this case a single revised belief function, the
L2 conditional belief functions bL2 ,M (.|A) of Theorem 96.
As a final remark, the ‘information order independence’ axiom of belief revision
states that the revised belief should not depend on the order in which the information
is made available. In our case, the revised (conditional) b.f.s obtained by observing
first an event A and later another event A0 should be the same as the ones obtained
by revising first with respect to A0 and then A. Both the L1 and L2 geometric con-
ditioning operators presented here meet such axiom, supporting the case for their
rationality.

15.4 Geometric conditioning in the belief space


To analyse the problem of geometric conditioning by projecting a belief function
b represented by the corresponding vector b of belief values onto an appropriate
conditioning simplex BA = Cl(bB , ∅ ( B ⊆ A) let us write the difference vector
between b and an arbitrary point a of BA as:
X X
vecb − a = mb (B)bB − ma (B)bB
∅(B⊆Θ
X ∅(B⊆A X
= (mb (B) − ma (B))bB + mb (B)bB (15.15)
∅(B⊆A B6⊂A
X X
= β(B)bB + mb (B)bB ,
∅(B⊆A B6⊂A
15.4 Geometric conditioning in the belief space 477

where once again β(B) = mb (B) − ma (B).

15.4.1 L2 conditioning in B

We start with the L2 norm, as this seems to have a more straightforward interpreta-
tion in the belief space.

Theorem 99. Given a belief function b : 2Θ → [0, 1] with b.p.a. mb , and an arbi-
trary non-empty focal element ∅ ( A ⊆ Θ, the L2 conditional b.f. bL2 ,B (.|A) with
respect to A in the belief space B is unique, and has basic probability assignment:
X X
mL2 ,B (C|A) = mb (C) + mb (B ∪ C)2−|B| + (−1)|C|+1 mb (B)2−|B|
B⊆Ac B⊆Ac
(15.16)
for each proper subset ∅ ( C ( A of the event A.

Example: the ternary frame In the ternary case the unique L2 mass space- condi-
tional belief function has b.p.a. ma such that:

mb (z) + mb (x, z)
ma (x) = mb (x) + ,
2
mb (z) + mb (y, z)
ma (y) = mb (y) + , (15.17)
2
mb (x, z) + mb (y, z)
ma (x, y) = mb (x, y) + mb (Θ) + .
2
At a first glance, each focal element B ⊆ A seems to be assigned to a fraction of
the original mass mb (X) of all focal elements X of b such that X ⊆ B ∪ Ac . This
contribution seems proportional to the size of X ∩ Ac , i.e., how much the focal
element of b falls outside the conditioning event A.
Notice that Dempster’s conditioning b⊕ (.|A) = b ⊕ bA yields in this case:

mb (x) + mb (x, z) mb (x, y) + mb (Θ)


m⊕ (x|A) = , m⊕ (x, y|A) = .
1 − mb (z) 1 − mb (z)

L2 conditioning in the belief space differs from its ‘sister’ operation in the mass
space (Theorem 96) in that it makes use of the set-theoretic relations between focal
elements, just as Dempster’s rule does. However, contrarily to Dempster’s condi-
tioning, it does not apply any normalization, as even subsets of Ac ({z} in this case)
contribute as addenda to the mass of the resulting conditional belief function.

Interpretation As for the general case (15.16), we can notice that the (unique) L2
conditional belief function in the belief space is not guaranteed to be a proper belief
function, as some masses can be negative, due to the addendum
X
(−1)|C|+1 mb (B)2−|B| .
B⊆Ac
478 15 Geometric conditioning

The quantity shows, however, an interesting connection with the redistribution pro-
cess associated with the orthogonal projection π[b] of a belief function onto the
probability simplex ([267], Section 10.4), in which the mass of each subset A is
re-distributed among all its subsets B ⊆ A on an equal basis.
Here (15.16), the mass of each focal element not included in A is also broken
into 2|B| parts, equal to the number of its subsets. Only one such part is re-attributed
to C = B ∩ A, while the rest is re-distributed to A itself.

15.4.2 L1 conditioning in B

To discuss L1 conditioning in the belief space we need to write explicitly the differ-
ence vector b − a.
Lemma 19. The L1 norm of the difference vector b − a can be written as
X
kb − akL1 = γ(B ∩ A) + b(B) − b(B ∩ A)

∅(B∩A(A

so that the L1 conditional belief functions in B are the solutions of the following
minimization problem:
X
arg minγ kb − akL1 = arg min γ(B ∩ A) + b(B) − b(B ∩ A) ,

γ
∅(B∩A(A
P
where β(B) = mb (B) − ma (B) and γ(B) = C⊆B β(B).
As we also noticed in the L1 minimization problem in the mass space, each group of
addenda which depend on the same variable γ(X), ∅ ( X ( A, can be minimized
separately. Therefore, the set of L1 conditional belief functions in the belief space
B is determined by the following minimization problem:
X
arg min γ(X) + b(B) − b(X) ∀∅ ( X ( A. (15.18)

γ(X)
B:B∩A=X

The functions appearing in (15.18) are of the form |x+k1 |+...+|x+km |, where m
is even. Such functions are minimized by the interval determined by the two central
‘nodes’ −kint1 ≤ −kint2 (see Figure 15.3 for an example, and compare the proof
of Theorem 81, Chapter 12).
In the case of system (15.18) this yields:
X X
b(X) − b(Bint 1
) ≤ γ(X) ≤ b(X) − b(Bint2
), (15.19)
X X
where Bint 1
and Bint 2
are the central, median values of the collection {b(B), B ∩
A = X}. Unfortunately, it is not possible, in general, to determine the median
values of such a collection of belief values, as belief functions are defined on a
partially (rather than totally) ordered set (the power set 2Θ ).
15.4 Geometric conditioning in the belief space 479

Fig. 15.3. The function |x + 1| + |x + 4| + |x + 7| + |x + 8| is minimized by the interval of


values delimited by the two central nodes −4 and −7.

The special case |Ac | = 1 This is possible, however, in the special case in which
|Ac | = 1 (i.e., the conditioning event is of cardinality n − 1). In this case:
X
Bint1
= b(X + Ac ), X
Bint2
= b(X),

so that the solution in the variables {γ(X)} is:

b(X) − b(X + Ac ) ≤ γ(X) ≤ 0, ∅ ( X ( A.

It is not difficult to see that, in the variables {β(X)}, the solution reads as:
X  
b(X) − b(X + Ac ) ≤ β(X) ≤ − b(B) − b(B + Ac ) ,
∅(B(X

∅ ( X ( A, i.e., in the mass of the desired L1 conditional belief function:


X    
mb (X)+ b(B)−b(B+Ac ) ≤ ma (X) ≤ mb (X)+ b(X +Ac )−b(X) .
∅(B(X
(15.20)
Not only the resulting conditional (pseudo) belief functions are not guaranteed to be
proper belief functions (see Equation (15.20)), but it difficult to find straightforward
interpretations for these results in terms of degrees of belief. On these grounds, we
would be tempted to conclude that the L1 norm is not suitable to induce conditioning
in belief calculus.
However, the analysis of the ternary case seems to hint otherwise. The L1 con-
ditional belief functions bL1 ,B3 (.|{x, y}) with respect to A = {x, y} in B3 are all
those b.f.s with core included in A such that the conditional mass of B ( A falls
between b(B) and b(B ∪ Ac ):

b(B) ≤ mL1 ,B (B|A) ≤ b(B ∪ Ac ).


480 15 Geometric conditioning

The barycenter of the L1 solutions is:


mb (z) + mb (x, z) mb (z) + mb (y, z)
β(x) = − , β(y) = − ,
2 2
i.e., the L2 conditional b.f. (15.17), just like in the case of geometric conditioning in
the mass space M. The same can be easily proved for all A ⊆ {x, y, z}.
Theorem 100. For every belief function b : 2{x,y,z} → [0, 1], the unique L2 con-
ditional belief function bL1 ,B3 (.|{x, y}) with respect to A ⊆ {x, y, z} in B3 is the
barycenter of the polytope of L1 conditional b.f.s with respect to A in B3 .

15.4.3 L∞ conditioning in B
Let us finally approach the problem of finding L∞ conditional belief functions given
an event A, starting with the ternary case study.
The ternary case In the ternary case, kb − akL∞ = max∅(B(Θ |b(B) − a(B)| =
n
max |b(x) − a(x)|, |b(y) − a(y)|, |b(z)|, |b(x, y) − a(x, y)|, |b(x, z) − a(x, z)|,
o n
|b(y, z) − a(y, z)| = max |mb (x) − ma (x)|, |mb (y) − ma (y)|, |mb (z)|,
|mb (x) + mb (y) + mb (x, y) − ma (x) − ma (y) − ma (x, y)|, |mb (x)+o
+mb (z) + mb (x, z) − ma (x)|, |mb (y) + mb (z) + mb (y, z) − ma (y)| =
n
max |β(x)|, |β(y)|, mb (z), 1 − b(x, y), |β(x) + mb (z) + mb (x, z)|,
o
|β(y) + mb (z) + mb (y, z)| ,

which is minimized by (as 1 − b(x, y) ≥ mb (z)):


n o
β(x) : max |β(x)|, |β(x) + mb (z) + mb (x, z)| ≤ 1 − b(x, y)
n o
β(y) : max |β(y)|, |β(y) + mb (z) + mb (y, z)| ≤ 1 − b(x, y).

On the left hand side we have functions of the form max{|x|, |x + k|}. The interval
of values in which such a function is below a certain threshold k 0 ≥ k is [−k 0 , k 0 −k].
This yields:
b(x, y) − 1 ≤ β(x) ≤ 1 − b(x, y) − (mb (z) + mb (x, z))
(15.21)
b(x, y) − 1 ≤ β(y) ≤ 1 − b(x, y) − (mb (z) + mb (y, z)).
The solution in the masses of the sought L∞ conditional b.f. reads as:
mb (x) − mb (y, z) − mb (Θ) ≤ ma (x) ≤ 1 − (mb (y) + mb (x, y))
(15.22)
mb (y) − mb (x, z) − mb (Θ) ≤ ma (y) ≤ 1 − (mb (x) + mb (x, y)).
Its barycenter is clearly given by:
ma (x) = mb (x) + mb (z)+m 2
b (x,z)
ma (y) = mb (y) + mb (z)+m 2
b (y,z)

mb (x,z)+mb (y,z)
ma (x, y) = 1 − ma (x) − ma (y) = mb (x, y) + mb (Θ) + 2
(15.23)
i.e., the L2 conditional belief function (15.17) as computed in the ternary case.
15.4 Geometric conditioning in the belief space 481

The general case From the expression


P (19) of the difference b − a we get, after
introducing the variables γ(C) = X⊆C β(X), ∅ ( C ⊆ A:
X X
max |b − a(B)| = max β(C) + mb (C)

∅(B(Θ ∅(B(Θ
C⊆A∩B C⊆B,C6⊂A
X
= max γ(A ∩ B) + mb (C)

∅(B(Θ
( C⊆B,C6⊂A
X
= max max mb (C) ,

B:B∩A=∅
C⊆B,C6⊂A P
maxB:B∩A6=∅,A γ(A ∩ B) + C⊆B,C6⊂A mb (C) ,

)
X
max γ(A) + mb (C) ,

B:B∩A=A,B6=Θ
C⊆B,C6⊂A
(15.24)
where, once again, γ(A) = b(A) − 1.

Lemma 20. The values γ ∗ (X) which minimize (15.24) are, ∀ ∅ ( X ( A:


X
−(1 − b(A)) ≤ γ ∗ (X) ≤ (1 − b(A)) − mb (C). (15.25)
C∩Ac 6=∅,C∩A⊆X

Lemma 20 can be used to prove the following form of the set of L∞ conditional
belief functions in B.
Theorem 101. Given a belief function b : 2Θ → [0, 1] and an arbitrary non-empty
focal element ∅ ( A ⊆ Θ, the set of L∞ conditional belief functions bL∞ ,B (.|A)
with respect to A in B is the set of b.f.s with focal elements in {X ⊆ A} which meet
the following constraints for all ∅ ( X ⊆ A:
X
mb (X) + mb (C) + (2|X| − 1)(1 − b(A)) ≤ ma (X) ≤ mb (X)
C∩Ac 6=∅,∅⊆C∩A⊆X X X
+(2|X| − 1)(1 − b(A)) − mb (C) − (−1)|X| mb (B).
C∩Ac 6=∅,∅⊆C∩A(X B⊆Ac
(15.26)
This result appears of rather difficult interpretation in terms of mass allocation. Nev-
ertheless, the ternary example we will see in Section 15.5.2 seems to suggest that
this set, or at least its admissible part, has some nice properties worth to explore.
For instance, its barycenter has a much simpler form.

Barycenter of the L∞ solution The barycenter of (15.25) is


1 X
γ(X) = − mb (C),
2
C∩Ac 6=∅,C∩A⊆X

a solution which corresponds, in the set of variables {β(X)}, to the system:


482 15 Geometric conditioning
(
X 1 X
β(C) + mb (C) = 0, ∀∅ ( X ( A. (15.27)
2
∅(C⊆X C∩Ac 6=∅,C∩A⊆X

The following result proceeds from the latter expression.


Theorem 102. The center of mass of the set of L∞ conditional belief functions
bL∞ ,B (.|A) with respect to A in the belief space B is the unique solution of the
system of equations (15.27), and has basic probability assignment:
1 X h i
mL∞ ,B (C|A) = mb (C) + mb (B ∪ C) + (−1)|C|+1 mb (B)
2
∅(B⊆Ac
1 X 1
= mb (C) + mb (B + C) + (−1)|C|+1 b(Ac ).
2 c
2
∅(B⊆A
(15.28)

15.5 Mass space versus belief space conditioning


To conclude this overview of geometric conditioning via Lp norms, it is worth com-
paring the outcomes of mass space- versus belief space- Lp conditioning.

15.5.1 Geometric conditioning: a summary

Given a belief function b : 2Θ → [0, 1] and an arbitrary non-empty conditioning


focal element ∅ ( A ⊆ Θ:
1. the set of L1 conditional belief functions bL1 ,M (.|A) with respect to A in M is
the set of b.f.s with core in A such that their mass dominates that of b over all
the subsets of A:
n o
bL1 ,M (.|A) = a : Ca ⊆ A, ma (B) ≥ mb (B) ∀∅ ( B ⊆ A .

Such a set is a simplex ML1 ,A [b] = Cl(m[b]|B


L1 A, ∅ ( B ⊆ A) whose vertices
ma = m[b]|B L1 A have b.p.a.:

ma (B) = mb (B) + 1 − b(A) = mb (B) + plb (Ac ),



ma (X) = mb (X) ∀∅ ( X ( A, X 6= B;

2. the unique L2 conditional belief function bL2 ,M (.|A) with respect to A in M


is the b.f. whose b.p.a. redistributes the mass 1 − b(A) = plb (Ac ) to each focal
element B ⊆ A in an equal way:
plb (Ac )
mL2 ,M (B|A) = mb (B) + , (15.29)
2|A| − 1
∀∅ ( B ⊆ A, and corresponds to the center of mass of the simplex ML1 ,A [b]
of L1 conditional b.f.s.
15.5 Mass space versus belief space conditioning 483

3. the L∞ conditional b.f. either coincides with the L2 one, or forms a simplex
obtained by assigning the maximal mass outside A (rather than the sum of such
masses plb (Ac )) to all subsets of A (but one) indifferently.
L1 and L2 conditioning are closely related in the mass space, and have a compelling
interpretation in terms of general imaging [1032, 502].
The L2 and L∞ conditional b.f.s just computed in the belief space are instead:
X X
mL2 ,B (B|A) = mb (B) + mb (B + C)2−|C| + (−1)|B|+1 mb (C)2−|C|
C⊆Ac C⊆Ac
1 X 1
mL∞ ,B (B|A) = mb (B) + mb (B + C) + (−1)|B|+1 b(Ac ).
2 c
2
∅(C⊆A

As for the L2 case, the result makes a lot of sense in the ternary case, but it is difficult
to interpret in its general form (above). It seems to be related to the process of mass
redistribution among all subsets, as it happens with the (L2 induced) orthogonal
projection of a belief function onto the probability simplex. In both expressions
above we can note that normalization is achieved by alternatively subtracting and
summing a quantity, rather than via a ratio or, as in Equation (15.29), by reassigning
the mass of all B 6⊂ A to each B ( A on equal grounds.
We can interpret the barycenter of the set of L∞ conditional belief functions
as follows: the mass of all the subsets whose intersection with A is C ( A is re-
assigned by the conditioning process half to C, and half to A itself. In the case of
C = A itself, by normalization, all the subsets D ⊇ A including A have their whole
mass re-assigned to A, consistently with the above interpretation. The mass b(Ac )
of the subsets which have no relation with the conditioning event A is used to guar-
antee the normalization of the resulting mass distribution. As a result, the obtained
mass function is not necessarily non-negative: again, such version of geometrical
conditioning may generated pseudo belief functions.
The L1 case is also intriguing, as in that case it appears impossible to obtain a
general analytic expression, whereas in the special cases in which this is possible
the result has potentially interesting interpretations, as confirmed by the empirical
comparison of Section 15.5.2.
Generally speaking, though, Lp conditional belief functions in the belief space
seem to have rather less straightforward interpretations than the corresponding
quantities in the mass space.

15.5.2 Comparison on the ternary example

We conclude by comparing the different approximations in the case study of a


ternary frame, Θ = {x, y, z}, already introduced in Section 15.3.4.
Assuming again that the conditioning event is A = {x, y}, the unique L2 con-
ditional belief function in B is given by Equation (15.17), while the L∞ conditional
b.f.s form the set determined by Equation (15.22), with barycenter in (15.23).
By Theorem 94 the vertices of ML1 ,{x,y} [b] are instead:
484 15 Geometric conditioning
{x}
m[b]|L1 {x, y} = [mb (x) + plb (z), mb (y), mb (x, y)]0 ,
{y}
m[b]|L1 {x, y} = [mb (x), mb (y) + plb (z), mb (x, y)]0 ,
{x,y}
m[b]|L1 {x, y} = [mb (x), mb (y), mb (x, y) + plb (z)]0 .

By Theorem 96 the L2 conditional belief function given {x, y} in M has b.p.a.:

1 − b(x, y) plb (z)


m(x) = mb (x) + = mb (x) + ,
3 3
plb (z) plb (z)
m(y) = mb (y) + , m(x, y) = mb (x, y) + .
3 3
Figure 15.4 illustrates the different geometric conditional belief functions given
A = {x, y} for the belief function with masses as in (15.12), i.e., mb (x) = 0.2,
mb (y) = 0.3, mb (x, z) = 0.5. In this case the conditional simplex is 2-dimensional,
with three vertices bx , by and bx,y . The picture confirms that mL2 ,M (.|A) lies in
the barycenter of the simplex of the related L1 conditional b.f.s. The same is true
(in the ternary case) for mL2 ,B (.|A) which is the barycenter of the (green) polytope
of mL∞ ,B (.|A) conditional b.f.s. The latter does not fall entirely in the admissible
conditional simplex C(bx , by , bx,y ), but a good portion of it does.

Fig. 15.4. The simplex (red triangle) of L1 , M conditional belief functions associated with
the belief function with mass assignment (15.12) in Θ = {x, y, z}, with conditioning event
A = {x, y}. The related L2 , M conditional belief function is plotted as a red square, and
coincides with the center of mass of the L1 set. The set of L∞ , M conditional belief functions
is represented as the green triangle containing L2 , M. The set of L∞ , B conditional b.f.s is
drawn as a yellow rectangle, and also falls partly outside the conditioning simplex (black
triangle). The set of L1 conditional b.f.s in B is a (light blue) line segment with barycenter
in the L2 conditional b.f. (black square). In the ternary case L2 , B is the barycenter of this
rectangle. Interesting cross - relations between conditional functions in M and B seem to
emerge which are not clearly reflected by their analytical expressions computed here.
15.6 An outline of future research 485

The set of L1 conditional belief functions in B is a line segment with barycenter


in the L2 conditional b.f. mL2 ,B (.|A), which is:
– entirely included in the set of L∞ approximations in both B and M, i.e., a more
conservative approach to conditioning;
– entirely admissible.
It seems that, hard as it is to compute, L1 conditioning in the belief space produces
interesting results. A number of interesting cross relations between conditional be-
lief functions in the two representation domains appear to exist:
1. mL∞ ,B (.|A) seems to contain mL1 ,M (.|A), while
2. the two L2 conditional b.f.s mL2 ,M (.|A) and mL2 ,B (.|A) appear to lie on the
same line joining opposite vertices of mL∞ ,B (.|A);
3. mL∞ ,B (.|A) and mL∞ ,M (.|A) have several vertices in common.
Finding the admissible parts of mL∞ ,B (.|A) and mL∞ ,M (.|A) remains an open
problem.

15.6 An outline of future research


This Chapter’s sketch of the geometric conditioning approach opens a number of
interesting questions.
We may wonder, for instance, what classes of conditioning rules can be gener-
ated by such a distance minimization process. Do they span all known definitions of
conditioning (Section 4.3), once one applies a sufficiently general class of dissimi-
larity measures?
A related question links geometric conditioning with combination rules. Indeed,
we have seen in Chapter 7 that Dempster’s combination rule can be decomposed
into a convex combination of Dempster’s conditioning with respect to all possible
events A: X X
b ⊕ b0 = b ⊕ m0 (A)bA = µ(A)b ⊕ bA ,
A⊆Θ A⊆Θ
0
where µ(A) ∝ m (A)plb (A). We can imagine to reverse this link, and generate
combination rules ] as convex combinations of conditioning operators b|]
A:
X X
b ] b0 = m0 (A)b ] bA = m0 (A)b|]
A.
A⊆Θ A⊆Θ

Additional constraints may have to be imposed in order to obtain a unique result, for
instance commutativity with affine combination (or linearity, in Smets’ terminology
[1221]).
In the near future we plan to explore the world of combination rules induced
by conditioning rules, starting from the different geometrical conditional processes
introduced here.
486 15 Geometric conditioning

Appendix
Proof of Lemma 18

By definition:
X X
mb − ma = mb (B)mB − ma (B)mB .
∅(B⊆Θ ∅(B⊆A

.
The change of variables β(B) = mb (B) − ma (B) further yields:
X X
mb − ma = β(B)mB + mb (B)mB . (15.30)
∅(B⊆A B6⊂A

Observe that the variables {β(B), ∅ ( B ⊆ A} are not all independent. Indeed:
X X X
β(B) = mb (B) − ma (B) = b(A) − 1
∅(B⊆A ∅(B⊆A ∅(B⊆A
P
as ∅(B⊆A ma (B) = 1 by definition, since ma ∈ MA . As a consequence, in
optimization problem (15.4) only 2|A| − 2 variables are independent (as ∅ is not
included), while: X
β(A) = b(A) − 1 − β(B).
∅(B(A

By replacing the above equality into (15.30) we get Equation (15.5).

Proof of Theorem 94

The minima of the L1 norm of the difference vector are given by the set of con-
straints: 
X≤0
 β(B) ∀∅ ( B ( A
β(B) ≥ b(A) − 1. (15.31)

∅(B(A

In the original simplicial coordinates {ma (B), ∅ ( B ⊆ A} of the candidate solu-


tion ma in MA such system reads as:

 mX b (B) − ma (B) ≤ 0 ∀∅ ( B ( A
 
mb (B) − ma (B) ≥ b(A) − 1,

∅(B(A

i.e., ma (B) ≥ mb (B) ∀∅ ( B ⊆ A.


15.6 An outline of future research 487

Proof of Theorem 95

It is easy to see that, by Equation (15.31), the 2|A| − 2 vertices of the simplex of L1
conditional belief functions in M (denoted by m[b]|B L1 A, where ∅ ( B ⊆ A) are
determined by the following solutions:

m[b]|AL1 A : β(X) = 0 ∀∅ ( X ( A,

β(B) = b(A) − 1,
m[b]|B
L1 A : ∀∅ ( B ( A.
β(X) = 0 ∀∅ ( X ( A, X 6= B

In the {ma (B)} coordinates, the vertex m[b]|B


L1 A is the vector ma ∈ MA defined
by Equation (15.7).

Proof of Theorem 96

The generators of MA are all the vectors mB − mA , ∀∅ ( B ( A, and have the


following structure:

[0, · · · , 0, 1, 0, · · · , 0, −1, 0, · · · , 0]0

with all zero entries but entry B (equal to 1) and entry A (equal to -1). Making use
of Equation (15.30), condition hmb − ma , mB − mA i = 0 assumes then a very
simple form X
β(B) − b(A) + 1 + β(X) = 0
∅(X(A,X6=B

for all possible generators of MA , i.e.:


X
2β(B) + β(X) = b(A) − 1 ∀∅ ( B ( A. (15.32)
∅(X(A,X6=B

System (15.32) is a linear system of 2|A| − 2 equations in 2|A| − 2 variables (the


β(X)), that can be written as Aβ = (b(A) − 1)1, where 1 is the vector of the
appropriate size with all entries at 1. Its unique solution is trivially β = (b(A) − 1) ·
A−1 1. The matrix A and its inverse are
   
2 1 ··· 1 d −1 · · · −1
1 2 ··· 1
 . A−1 = 1  −1 d · · · −1  ,
 
A=  ···  d+1  ··· 
1 1 ··· 2 −1 −1 · · · d

where d is the number of rows (or columns) of A. It is easy to see that A−1 1 =
1 |A|
d+1 1, where in our case d = 2 − 2.
The solution to (15.32) is then, in matrix form:
1
β = A−1 1 · (b(A) − 1) = 1(b(A) − 1)
2|A| − 1
488 15 Geometric conditioning

or, more explicitly:


b(A) − 1
β(B) = ∀∅ ( B ( A.
2|A| − 1
Thus, in the {ma (B)} coordinates the L2 conditional belief function reads as:
1 − b(A) plb (Ac )
ma (B) = mb (B) + |A|
= mb (B) + |A| ∀∅ ( B ⊆ A,
2 −1 2 −1
A included.

Proof of Theorem 98

The L∞ norm of the difference vector (15.5) reads as kmb − ma kL∞ =


 
X
= max |β(B)|, ∅ ( B ( A; |mb (B)|, B 6⊂ A; b(A) − 1 −
β(B) .
∅(B(A

As X
X X
b(A) − 1 − β(B) =
mb (B) + β(B) ,

∅(B(A B6⊂A ∅(B(A

the above norm simplifies as:


 X X 

max |β(B)|, ∅ ( B ( A; max{mb (B)}; mb (B) + β(B) .
B6⊂A
B6⊂A ∅(B(A
(15.33)
This is a function of the form
n X o
f (x1 , ..., xm−1 ) = max |xi | ∀i, xi + k1 , k2 ,

(15.34)
i

with 0 ≤ k2 ≤ k1 ≤ 1. These functions were studied in the proof of Theorem 77,


Chapter 12, and illustrated in Figure 12.12. For norm (15.33) the condition k2 ≥
k1 /m for functions of the form (15.34) reads as:
1 X
max mb (C) ≥ mb (C). (15.35)
C6⊂A 2|A| − 1 C6⊂A

P the set of L∞ conditional belief


In such a case (cfr. the proof of Theorem 77 again)
functions is given by the constraints xi ≥ −k2 , i xi ≤ k2 − k1 , namely:

 β(B) ≥ − max

C6⊂A
mb (C) ∀B ( A,
X X
β(B) ≤ max mb (C) − mb (C).
C6⊂A


B(A C6⊂A

This is a simplex Cl(m[b]|LB̄



A, B̄ ⊆ A), where each vertex m[b]|L


A is charac-
terized by the following values βB̄ of the auxiliary variables:
15.6 An outline of future research 489

 βB̄ (B) = − max

C6 ⊂A
mb (C) ∀B ⊆ A, B 6= B̄
X
|A|
β (B̄) = − mb (C) + (2 − 2) max mb (C)
 B̄
 C6⊂A
C6⊂A

or, in terms of their basic probability assignments, (15.9).


The barycenter of this simplex can be computed as follows:
X
m[b]|L


(B|A)
B̄⊆A
mL∞ ,M (B|A) =
2|A| − 1 X X
(2|A| − 1)mb (B) + mb (C) mb (C)
C6⊂A C6⊂A
= = mb (B) + ,
2|A| − 1 2|A| −1
i.e., the L2 conditional belief function (15.8). The corresponding minimal L∞ norm
of the difference vector is, according to (15.33), equal to maxC6⊂A mb (C).
When (15.35) does not hold:
1 X
max mb (C) < mb (C) (15.36)
C6⊂A 2|A| − 1 C6⊂A

system (15.33) has as unique solution


1 X
β(B) = − mb (C) ∀B ( A
2|A|− 1 C6⊂A

or, in terms of basic probability assignments:


1 X
mL∞ ,M (B|A) = mb (B) + mb (C) ∀B ⊆ A.
2|A| − 1 C6⊂A

1
P
The corresponding minimal L∞ norm of the difference vector is: 2|A| −1 C6⊂A mb (C).

Proof of Theorem 99

The orthogonality of the difference vector with respect to the generators bC − bA ,


∅ ( C ( A of the conditional simplex

hb − a, bC − bA i = 0 ∀∅ ( C ( A

(where b − a is given by Equation (15.15)) reads as:


X h i X h i
mb (B) hbB , bC i − hbB , bA i + β(B) hbB , bC i − hbB , bA i = 0
B6⊂A B⊆A

for all ∅ ( C ( A. Now, categorical belief functions are such that:


490 15 Geometric conditioning
c
hbB , bC i = |{Y ⊇ B ∪ C, Y 6= Θ}| = 2|(B∪C) | − 1 (15.37)
c
and hbB , bA i = 2|(B∪A) | − 1. As (B ∪ A)c = Ac when B ⊆ A, the system of
orthogonality conditions is equivalent to, ∀∅ ( C ( A:
X h c c
i X h c c
i
mb (B) 2|(B∪C) | − 2|(B∪A) | + β(B) 2|(B∪C) | − 2|A | = 0.
B6⊂A B(A
(15.38)
This is a system of 2|A| − 2 equations in the 2|A| − 2 variables {β(B), B ( A}.
In the {β(B)} variables (15.16) reads as:
X X
β(C) = − mb (B ∪ C)2−|B| + (−1)|C| mb (B)2−|B| .
B⊆Ac B⊆Ac

To prove Theorem 99 we just need to replace the above expression into the system
of constraints (15.38). We obtain, for all ∅ ( C ( A:
X h c c
i Xh X
mb (B) 2|(B∪C) | − 2|(B∪A) | + − mb (X ∪ B)2−|X| +
B6⊂A B(Aih X⊆Ac i
X c c
|B| −|X|
+(−1) mb (X)2 2|(B∪C) | − 2|A | = 0.
X⊆Ac

Now, whenever B 6⊂ A it can be decomposed as B = X + Y , with ∅ ( X ⊆ Ac ,


∅ ⊆ Y ⊆ A. Therefore B ∪ C = (Y ∪ C) + X, B ∪ A = A + X and, since
c c c c
2−|X| (2|(Y ∪C) | − 2|A | ) = 2|[(Y ∪C)+X] | − 2|(A+X) | ,

we can write the above system of constraints as:


X  c c

mb (X + Y ) 2|[(Y ∪C)+X] | − 2|(A+X) | +
∅(X⊆Ac
∅⊆Y ⊆A
X    c c

+ (−1)|Y | mb (X) − mb (X ∪ Y ) 2−|X| 2|(Y ∪C) | − 2|A | = 0.
∅(X⊆Ac
∅(Y (A

As c c
2−|X| (2|(Y ∪C) | − 2|A | ) = 2n−|Y ∪C|−|X| − 2n−|A|−|X|
c c
= 2|[(Y ∪C)+X] | − 2|(A+X) | ,
the system further simplifies as:
X  c c

mb (X + Y ) 2|[(Y ∪C)+X] | − 2|(A+X) | +
∅(X⊆Ac
∅⊆Y ⊆A   
X c c
(−1)|Y | mb (X) − mb (X + Y ) 2|[(Y ∪C)+X] | − 2|(A+X) | = 0.
∅(X⊆Ac
∅(Y (A
15.6 An outline of future research 491

After separating in the first sum the contributions of Y = ∅ and Y = A, noting that
A ∪ C = A as C ⊂ A, and splitting the second one into a part which depends on
mb (X) and one which depends on mb (X + Y ), the system of constraints becomes,
again for all ∅ ( C ( A:
X X  c c
 X
mb (X + Y ) 2|[(Y ∪C)+X] | − 2|(A+X) | + mb (X)·
∅(X⊆A c ∅(Y (A ∅(X⊆Ac
   
|(X+C)c | |(X+A)c | |(X+A)c | |(X+A)c |
X
· 2 −2 + mb (X + A) 2 −2
 c
∅(X⊆A 
|[(Y ∪C)+X]c | c
X X
+ (−1)|Y | mb (X) 2 − 2|(A+X) |

∅(X⊆Ac ∅(Y (A  
X X c c
+ −mb (X + Y ) 2|[(Y ∪C)+X] | − 2|(A+X) | = 0
∅(X⊆Ac ∅(Y (A

By further simplification we obtain:


X  c c

mb (X) 2|(X+C) | − 2|(X+A) | +
∅(X⊆Ac  
X X c c
+ (−1)|Y | mb (X) 2|[(Y ∪C)+X] | − 2|(A+X) | = 0.
∅(X⊆Ac ∅(Y (A
(15.39)
The first addendum is easily reduced to:
X  c c
 X  c c

mb (X) 2|(X+C) | − 2|(X+A) | = mb (X)2−|X| 2|C | − 2|A | .
∅(X⊆Ac ∅(X⊆Ac

As for the second one, we have:


X X  c c

(−1)|Y | mb (X) 2|[(Y ∪C)+X] | − 2|(A+X) |
∅(X⊆Ac ∅(Y (A  
X X c c
= mb (X)2−|X| (−1)|Y | 2|(Y ∪C) | − 2|A |
∅(X⊆Ac ∅(YX
(A
X  c c

−|X|
= mb (X)2 (−1)|Y | 2|(Y ∪C) | − 2|A |
∅(X⊆Ac ∅⊆Y ⊆A 
|(∅∪C)c | |Ac | c c
−(2 −2) − (−1)|A| (2|(A∪C) | − 2|A | )
 X   
X c c c c
= mb (X)2−|X| (−1)|Y | 2|(Y ∪C) | − 2|A | − (2|C | − 2|A | ) .
∅(X⊆Ac ∅⊆Y ⊆A
(15.40)
At this point we can notice that:
X  c c

(−1)|Y | 2|(Y ∪C) | − 2|A |
∅⊆Y ⊆A
X c c X X c
= (−1)|Y | 2|(Y ∪C) | − 2|A |
(−1)|Y | = (−1)|Y | 2|(Y ∪C) | ,
∅⊆Y ⊆A ∅⊆Y ⊆A ∅⊆Y ⊆A
492 15 Geometric conditioning

since ∅⊆Y ⊆A (−1)|Y | = 0 by Newton’s binomial.


P
As for the remaining term in (15.40), using a standard technique we can decom-
pose Y into the disjoint sum

Y = (Y ∩ C) + (Y \ C)

and rewrite it as:


|C|   |A\C|
X |C| X |A \ C|
(−1)|Y ∩C|+|Y \C| 2n−|Y \C|−|C|
|Y ∩ C| |Y \ C|
|Y ∩C|=0 |Y \C|
|C|   |A\C| 
X |C| X |A \ C|
|Y ∩C|
= 2n−|C| (−1) (−1)|Y \C| 2−|Y \C|
|Y ∩ C| |Y \ C|
|Y ∩C|=0 |Y \C|

where
|A\C|    |A\C|
X |A \ C| |Y \C| −|Y \C| 1
(−1) 2 = −1+ = −2−|A\C|
|Y \ C| 2
|Y \C|

by Newton’s binomial, so that we obtain:


|C|  
X
|Y | |(Y ∪C)c |
X |C|
(−1) 2 = −2 n−|C|−|A\C|
(−1)|Y ∩C| = 0,
|Y ∩ C|
∅⊆Y ⊆A |Y ∩C|=0

again by Newton’s binomial. By replacing this result in cascade into (15.40) and
(15.39) we have that the system of constraints is always met as it reduces to the
equality 0 = 0.

Proof of Lemma 19

P introducing the auxiliary variables β(B) = mb (B) − ma (B) and γ(B) =


After
C⊆B β(B) the desired norm becomes kb − akL1 =

X X X X
= |b(B) − a(B)| =
β(C) + mb (C)
∅(B(Θ ∅(B(Θ ∅(C⊆A∩B C⊆B,C6⊂A
X X
= γ(A ∩ B) + mb (C)

∅(B(Θ C⊆B,C6 ⊂A
X X X X
=
m b (C)
+
γ(A ∩ B) + m b (C)

B:B∩A=∅ C⊆B,C6
⊂A B:B∩A6 = ∅,Θ
C⊆B,C6 ⊂A
X X
+ γ(A) +
mb (C) ,
B:B∩A=A,B6=Θ C⊆B,C6⊂A

where
15.6 An outline of future research 493
X X X
γ(A) = β(C) = mb (C) − ma (C) = b(A) − 1.
C⊆A C⊆A C⊆A

Thus, the first and the third addenda above are constant, and since
X
mb (C) = b(B) − b(B ∩ A)
C⊆B,C6⊂A

we obtain, as desired:

X X
arg min kb − akL1 = arg min γ(B ∩ A) + mb (C) .
γ γ
∅(B∩A(A C⊆B,C6⊂A

Proof of Lemma 20

The first term in (15.24) is such that


X
X
mb (C) = maxc b(B) = b(Ac ).

max mb (C) = maxc

B:B∩A=∅ B⊆A B⊆A
C⊆B,C6⊂A C⊆B

For the third one we have instead:


X X
mb (C) ≤ mb (C) = plb (Ac ) = 1 − b(A),
C⊆B,C6⊂A C∩Ac

which is maximized when B = A, in which case it is equal to:



X
b(A) − 1 + mb (C) = |b(A) − 1 + 0| = |b(A) − 1| = 1 − b(A).

C⊆A,C6⊂A

Therefore, the L∞ norm (15.24) of the difference b − a reduces to:


( )
X
max |b−a(B)| = max max γ(A∩B)+ mb (C) , 1−b(A)
∅(B(Θ B:B∩A6=∅,A
C⊆B,C6⊂A
(15.41)
which is obviously minimized by all the values of γ ∗ (X) such that:
X

max γ (A ∩ B) +
mb (C) ≤ 1 − b(A).
B:B∩A6=∅,A
C⊆B,C6⊂A

The variable term in (15.41) can be decomposed into collections of terms which
depend on the same individual variable γ(X):
X

max γ(A ∩ B) +
mb (C)
B:B∩A6=∅,A
C⊆B,C6⊂A
X X
= max max c γ(X) + mb (Z + W ) ,
∅(X(A ∅⊆Y ⊆A
∅(Z⊆Y ∅⊆W ⊆X
494 15 Geometric conditioning

where B = X + Y , with X = A ∩ B and Y = B ∩ Ac . Note that Z 6= ∅, as


C = Z + W 6⊂ A.
Therefore, the global optimal solution decomposes into a collection of solutions
{γ ∗ (X), ∅ ( X ( A} for each individual problem, where:
X X


γ (X) : max c γ (X) +
mb (Z + W ) ≤ 1 − b(A). (15.42)
∅⊆Y ⊆A
∅(Z⊆Y ∅⊆W ⊆X

We distinguish three cases.


1. If γ ∗ (X) ≥ 0 we have that:
 X X 
γ ∗ (X) : max γ ∗ (X) + mb (Z + W )
∅⊆Y ⊆Ac
X ∅(Z⊆Y
X ∅⊆W ⊆X

= γ (X) + mb (Z + W )
c ∅⊆W ⊆X
∅(Z⊆AX
= γ ∗ (X) + mb (C) ≤ 1 − b(A)
C∩Ac 6=∅,C∩A⊆X

since when γ ∗ (X) ≥ 0 the argument to maximize is non-negative, and its max-
imum is trivially achieved by Y = Ac . Hence, all the
X
γ ∗ (X) : γ ∗ (X) ≤ 1 − b(A) − mb (C) (15.43)
C∩Ac 6=∅,C∩A⊆X

are optimal.
2. If γ ∗ (X) < 0 the maximum in (15.42) can be achieved by either Y = Ac or
Y = ∅, and we are left with the two corresponding terms in the max:
 X 

∗ ∗
γ (X) : max c γ (X) +
mb (C) , −γ (X) ≤ 1 − b(A).

∅⊆Y ⊆A
C∩Ac 6=∅,C∩A⊆X
(15.44)
Now, either

∗ X
mb (C) ≥ −γ ∗ (X)

γ (X) +

C∩Ac 6=∅,C∩A⊆X

or viceversa. In the first case, since the argument of the absolute value has to be
non-negative:
1 X
γ ∗ (X) ≥ − mb (C).
2 c
C∩A 6=∅,C∩A⊆X

Furthermore, the optimality condition is met when:


X
γ ∗ (X) + mb (C) ≤ 1 − b(A)
C∩Ac 6=∅,C∩A⊆X
15.6 An outline of future research 495

which is equivalent to:


X X
γ ∗ (X) ≤ 1 − b(A) − mb (C) = mb (C),
C∩Ac 6=∅,C∩A⊆X C∩Ac 6=∅,C∩(A\X)6=∅

in turn trivially true for γ ∗ (X) < 0 and mb (C) ≥ 0 for all C. Therefore, all
1 X
0 ≥ γ ∗ (X) ≥ − mb (C) (15.45)
2
C∩Ac 6=∅,C∩A⊆X

are optimal as well.


3. In the last case:

∗ X
mb (C) ≤ −γ ∗ (X),

γ (X) +

C∩Ac 6=∅,C∩A⊆X

i.e., γ ∗ (X) ≤ − 21
P
C∩Ac 6=∅,C∩A⊆X mb (C). Optimality is met for
−γ ∗ (X) ≤ 1 − b(A) ≡ γ ∗ (X) ≥ b(A) − 1,
which is satisfied for all
1 X
b(A) − 1 ≤ γ ∗ (X) ≤ − mb (C). (15.46)
2
C∩Ac 6=∅,C∩A⊆X

Putting (15.43), (15.45) and (15.46) together we have the thesis.

Proof of Theorem 101

Following Lemma 20 it is not difficult to see by induction that in the original aux-
iliary variables {β(X)} the set of L∞ conditional b.f.s in B is determined by the
following constraints:
X
−K(X) + (−1)|X| mb (C) ≤ β(X)
C∩Ac 6=∅,C∩A⊆X X
≤ K(X) − mb (C)
C∩Ac 6=∅,C∩A⊆X
(15.47)
where we have defined:
X
K(X) = (2|X| − 1)(1 − b(A)) − mb (C).
C∩Ac 6=∅,∅⊆C∩A(X

In the masses of the sought L∞ conditional b.f.s (15.47) becomes:


X
mb (X) − K(X) + mb (X + B) ≤
∅(B⊆Ac X
≤ ma (X) ≤ mb (X) + K(X) − (−1)|X| mb (B)
B⊆Ac
496 15 Geometric conditioning

which reads as, after replacing the expression for K(X):


X X
mb (X) + mb (X + B) + mb (C) + (2|X| − 1)(1 − b(A))
∅(B⊆Ac C∩Ac 6=∅
∅⊆C∩A(X
≤ maX
(X) ≤ X
|X|
mb (X) + (2 − 1)(1 − b(A)) − mb (C) − (−1)|X| mb (B).
C∩Ac 6=∅ B⊆Ac
∅⊆C∩A(X

By further trivial simplification we have as desired.

Proof of Theorem 102

The proof is by substitution. In the {β(B)} variables the thesis reads as:
1 X h i
β(C) = (−1)|C| mb (B) − mb (B ∪ C) . (15.48)
2
∅(B⊆Ac

By replacing (15.48) in (15.27) we get, since


X
(−1)|C| = 0 − (−1)0 = −1
∅(C⊆X

by Newton’s binomial:
1 X h i 1 X
(−1)|C| mb (B) − mb (B ∪ C) + mb (C)
2 2
∅(C⊆X C∩Ac 6=∅,C∩A⊆X
∅(B⊆Ac
1 X X 1 X 1 X
= mb (B) (−1)|C| − mb (B ∪ C) + mb (C)
2 2 2
∅(B⊆Ac ∅(C⊆X ∅(B⊆Ac C∩Ac 6=∅
∅(C⊆X C∩A⊆X
1 X 1 X 1 X
=− mb (B) − mb (B ∪ C) + mb (C)
2 c
2 c
2
∅(B⊆A ∅(B⊆A C∩Ac 6=∅
∅(C⊆X C∩A⊆X
1 X X 1 X
=− mb (B ∪ C) + mb (C)
2 c
2 c
∅(B⊆A ∅⊆C⊆X C∩A 6=∅,C∩A⊆X
1 X 1 X
=− mb (C) + mb (C) = 0
2 c
2 c
C∩A 6=∅,C∩A⊆X C∩A 6=∅,C∩A⊆X

for all ∅ ( X ( A, and system (15.27) is met.


Decision making with epistemic
transforms
16
As we learned in Chapter 4, decision making with belief functions has been exten-
sively studied. Approaches based on (upper/lower) expected utility (e.g. Strat’s) and
multicriteria decision making, in particular, have attracted much attention.
In the Transferable Belief Model [1276, 1223], in particular, decision making is
done by maximising the expected utility of actions based on the pignistic transform:
X
E[u] = u(f, ω)BetP (ω),
ω∈Ω

where Ω is the collection of all the possible outcomes ω, F is the set of possible
actions f , and the utility function is defined on F × Ω.
As we know, besides satisfying a number of sensible rationality principles, this prob-
ability transform has a nice geometric interpretation in the probability simplex as the
barycenter of the credal set of probability measures consistent with b:
.
n o
P[b] = p ∈ P : p(A) ≥ b(A) ∀A ⊆ Θ .

Betting and credal semantics seem to be connected, in the case of the pignistic trans-
form. Unfortunately, while their geometry in the belief space is well understood, a
credal semantic is still lacking for most of the transforms we studied in the last part
of the Book.
We address this issue here in the framework of probability intervals [1333, 320]
(Section 5.3), which we briefly recall here.
A set of probability intervals or interval probability system is a system of constraints

497
498 16 Decision making with epistemic transforms

on the probability values of a probability distribution p : Θ → [0, 1] on a finite


domain Θ of the form
.
n o
P(l, u) = p : l(x) ≤ p(x) ≤ u(x), ∀x ∈ Θ . (16.1)

Probability intervals have been introduced as a tool for uncertain reasoning in [320],
where combination and marginalization of intervals were studied in detail and spe-
cific constraints for such intervals to be consistent and tight were given.
A typical way in which probability intervals arise is through measurement errors,
for measurements can be inherently of interval nature (due to the finite resolution of
the instruments) [642]. In such as case the probability interval of interest is the class
of probability measures consistent with the measured interval.
A set of constraints of the form (16.1) determines a convex set of probabilities or
‘credal set’ [838]. Lower and upper probabilities (cfr. Section 5.1) determined by
P(l, u) on any event A ⊆ Θ can be easily obtained from the lower and upper
bounds (l, u) as follows:
X X X X
P (A) = max l(x), 1 − u(x), P (A) = min u(x), 1 − l(x).
x∈A x6∈A x∈A x6∈A
(16.2)
Making decisions based on credal sets is not trivial, for the natural extensions
of the classical expected utility rule amount to multiple potentially optimal deci-
sions [1343]. In alternative, similarly to what done for belief functions, we can seek
a single probability measure to represent the credal set associated with a set of prob-
ability intervals.

Fig. 16.1. The focus of a pair of simplices is, in non-pathological situations, the unique inter-
section of the lines joining their corresponding vertices.

Chapter content

As we show here, the credal set associated with a probability interval possesses an
interesting structure, as it can be decomposed into a pair of simplices.
16 Decision making with epistemic transforms 499

Indeed, the probabilities consistent with a certain interval system (16.1) lie in
the intersection of two simplices: a ‘lower simplex’ T 1 [b] determined by the lower
bound b(x) ≤ p(x), and an ‘upper simplex’ T n−1 [b] determined by the upper con-
straint p(x) ≤ plb (x):
. .
n o n o
T 1 [b] = p : p(x) ≥ b(x) ∀x ∈ Θ , T n−1 [b] = p : p(x) ≤ plb (x) ∀x ∈ Θ .

This allows us to provide probability transforms of the epistemic family (Chapter


11) with a credal semantic similar to that of the pignistic function. We prove here
that each of those transformations can be described in a homogeneous fashion as
the focus f (S, T ) of a pair S, T of simplices, i.e., the unique probability measure
associated with same (simplicial) coordinates in the two simplices. When the focus
of S and T falls within their intersection, it coincides with the unique intersection
of the lines joining corresponding vertices of S and T (see Figure 16.1).
In particular we prove that, while the relative belief of singletons is the focus of
{P, T 1 [b]}, the relative plausibility of singletons is the focus of {P, T n−1 [b]}, and
the intersection probability that of {T 1 [b], T n−1 [b]}. Their focal coordinates encode
major features of the underlying belief function: the total mass it assigns to single-
tons, their total plausibility, and the fraction of the related probability interval which
determines the intersection probability.
As the centre of mass is a special case of focus, this credal interpretation of epis-
temic transforms potentially paves the way for TBM-like frameworks based on those
transformations.

Chapter outline

We start by proving that the credal set associated with a system of probability in-
tervals can be decomposed in terms of a pair of upper and lower simplices (Section
16.1). We point out that the intersection probability, although originally defined for
belief functions (Chapter 10), is closely linked to the notion of interval probabil-
ity system and can be seen as the natural representative of the associated credal set
(Section 16.2).
Drawing inspiration from the analysis of the ternary case (Section 16.3), we
prove in Section 16.4 that all the considered probability transformations (relative
belief and plausibility of singletons, intersection probability) are geometrically the
foci of different pairs of simplices, and discuss the meaning of the mapping asso-
ciated with a focus in terms of mass assignment. We prove that upper and lower
simplices can themselves be interpreted as the sets of probabilities consistent with
belief and plausibility of singletons.
The conclusions of this analysis are used in Section 16.5 to prospect alternative
decision frameworks based on the introduced credal interpretations of upper and
lower probability constraints and the associated probability transformations.
In Section 16.6 preliminary results are discussed which show that relative belief and
plausibility play an interesting role in determining the safest betting strategy in an
500 16 Decision making with epistemic transforms

adversarial game scenario in which the decision maker has to minimize their max-
imal loss/maximize their minimal return, in a modified Wald approach to decision
making.

16.1 The credal set of probability intervals


Just as belief functions do, probability interval systems admit a credal representa-
tion, which for intervals associated with belief functions is also strictly related to
the credal set P[b] of all consistent probabilities.
By definition (3.10) of P[b] it follows that the polygon of consistent probabilities
can be decomposed into a number of polytopes
n−1
\
P[b] = P i [b], (16.3)
i=1

where P i [b] is the set of probabilities meeting the lower probability constraint for
size-i events:
.
n o
P i [b] = p ∈ P : p(A) ≥ b(A), ∀A : |A| = i .

Note that for i = n the constraint is trivially met by all distributions: P n [b] = P.

Lower and upper simplices A simple and elegant geometric description can be
given if we consider instead the credal sets:
.
n o
T i [b] = p ∈ P 0 : p(A) ≥ b(A), ∀A : |A| = i .

Here P 0 denotes the set of all pseudo-probabilities


P on Θ, the functions p : Θ → R
which meet the normalization constraint x∈Θ p(x) = 1 but not necessarily the
non-negativity one: there may exist an element x such that p(x) < 0.
In particular we focus here on the set of pseudo-probability measures which meet
the lower constraint on singletons:
.
n o
T 1 [b] = p ∈ P 0 : p(x) ≥ b(x) ∀x ∈ Θ , (16.4)

and the set T n−1 [b] of pseudo-probabilities which meet the analogous constraint on
events of size n − 1:
.
n o
T n−1 [b] = p ∈ P 0 : p(A) ≥ b(A) ∀A : |A| = n − 1
n o
= p ∈ P 0 : p({x}c ) ≥ b({x}c ) ∀x ∈ Θ (16.5)
n o
= p ∈ P 0 : p(x) ≤ plb (x) ∀x ∈ Θ ,

i.e., the set of pseudo-probabilities which meet the upper bound for the elements x
of Θ.
16.1 The credal set of probability intervals 501

Simplicial form The extension to pseudo-probabilities allows to prove that the


credal sets (16.4) and (16.5) have the form of simplices.
Theorem 103. The credal set T 1 [b] or lower simplex can be written as

T 1 [b] = Cl(t1x [b], x ∈ Θ), (16.6)

the convex closure of the vertices


X  X 
t1x [b] = mb (y)by + 1 − mb (y) bx . (16.7)
y6=x y6=x

Dually, the upper simplex T n−1 [b] reads as the convex closure

T n−1 [b] = Cl(txn−1 [b], x ∈ Θ) (16.8)

of the vertices
X  X 
tn−1
x [b] = plb (y)by + 1 − plb (y) bx . (16.9)
y6=x y6=x

To further clarify those results, let us denote by


. X . X
kb = mb (x) ≤ 1, kplb = plb (x) ≥ 1,
x∈Θ x∈Θ

the total mass and plausibility of singletons, respectively. By Equation (16.7) each
vertex t1x [b] of the lower simplex is a probability that adds the mass 1 − kb of non-
singletons to the mass of the element x, leaving all the others unchanged:

mt1x [b] (x) = mb (x) + 1 − kb , mt1x [b] (y) = mb (y) ∀y 6= x.

As mt1x [b] (z) ≥ 0 ∀z ∈ Θ ∀x (all t1x [b] are actual probabilities) we have that

T 1 [b] = P 1 [b] (16.10)

is completely included in the probability simplex.


On the other hand the vertices (16.9) of the upper simplex are not guaranteed
to be valid probabilities. Each vertex tn−1
x [b] assigns to each element of Θ different
from x its plausibility plb (y), while it subtracts from plb (x) the plausibility “in
excess” kplb − 1:
mtn−1 [b] (x) = plb (x) + (1 − kplb ),
x

mtn−1
x [b] (y) = plb (y) ∀y 6= x.
n−1
Now, as 1 − kplb can be a negative quantity, mtn−1
x [b] (x) can be negative and tx [b]
is not guaranteed to be a ‘true’ probability.
We will have a confirmation of this in the example of Section 9.2.2.
502 16 Decision making with epistemic transforms

Lower and upper simplices and probability intervals By comparing Equations


(16.1), (16.4) and (16.5) it is clear that the credal set associated with a set of proba-
bility intervals P(l, u) is nothing but the intersection

P(l, u) = T [l] ∩ T [u]

of the lower and upper simplices associated with its lower and upper bound con-
straints, where
n o n o
T [l] = p ∈ P 0 : p(x) ≥ l(x) ∀x ∈ Θ , T [u] = p ∈ P 0 : p(x) ≤ u(x) ∀x ∈ Θ .

In particular, when lower and upper bounds are those enforced by a pair of belief
and plausibility measures on the singletons, l(x) = b(x) and u(x) = plb (x):

P[b, plb ] = T 1 [b] ∩ T n−1 [b].

16.2 Intersection probability and probability intervals


There are clearly many ways of selecting a single measure to represent a collection
of probability intervals (16.1). Note, however, that each of the intervals [l(x), u(x)],
x ∈ Θ, has the same importance in the definition of the system of constraints (16.1),
as there is no reason for the different elements x of the domain to be treated differ-
ently. It is then sensible to request that the desired representative probability should
behave homogeneously in each element x of the frame Θ.
Mathematically, this translates into seeking a probability distribution p : Θ →
[0, 1] such that:
p(x) = l(x) + α(u(x) − l(x))
for all the elements x of Θ, and some constant value α ∈ [0, 1] (see Figure 16.2).
Such value needs to be between 0 and 1 in order for the sought probability distribu-
tion p to belong to the interval. It is easy to see that there is indeed a unique solution

Fig. 16.2. An illustration of the notion of intersection probability for an interval probability
system (16.1).

to this problem. It suffices to enforce the normalization constraint


X Xh i
p(x) = l(x) + α(u(x) − l(x)) = 1
x x
16.3 Credal interpretation of Bayesian transforms: The ternary case 503

to understand that the unique such value α is given by


P
. 1 − x∈Θ l(x)
α = β[(l, u)] = P . (16.11)
x∈Θ u(x) − l(x)

Definition 86. The intersection probability p[(l, u)] : Θ → [0, 1] associated with
the interval probability system (16.1) is the probability measure:
p[(l, u)](x) = β[(l, u)]u(x) + (1 − β[(l, u)])l(x), (16.12)
with β[(l, u)] given by Equation (16.11).
The ratio β[(l, u)] (16.11) measures the fraction of each probability interval which
we need to add to the lower bound l(x) to obtain a valid probability function (sum-
ming to one).
It is easy to see that when (l, u) are a pair of belief/plausibility measures (b, plb ), we
obtain the intersection probability we defined for belief functions (Section 10.3). Al-
though originally defined by geometric means, the intersection probability is really
‘the’ rational probability transform for general interval probability systems.
As it was the case for p[b], p[(l, u)] can also be written as:
 X 
p[(l, u)](x) = l(x) + 1 − l(x) R[(l, u)](x) (16.13)
x

where
. u(x) − l(x) ∆(x)
R[(l, u)](x) = P =P , (16.14)
y∈Θ (u(y) − l(y)) y∈Θ ∆(y)

∆(x) measures the width of the probability interval for x, and R[(l, u)] : Θ →
[0, 1] measures how much the uncertainty on the probability value of each singleton
‘weighs’ on the total width of the interval system (16.1). We will therefore call it
P uncertainty on singletons. We can then say that p[(l, u)] distributes the mass
relative
(1 − x l(x)) which is necessary to obtain a valid probability to each singleton x ∈
Θ according to the relative uncertainty R[(l, u)](x) it carries in the given interval.

16.3 Credal interpretation of Bayesian transforms: The ternary


case
Let us start from the case of a frame of cardinality three: Θ = {x, y, z}.
Consider the belief function:
mb (x) = 0.2, mb (y) = 0.1, mb (z) = 0.3,
(16.15)
mb ({x, y}) = 0.1, mb ({y, z}) = 0.2, mb (Θ) = 0.1.
Figure 16.3 illustrates the geometry of the related credal set P[b] in the simplex
Cl(bx , by , bz ) of all the probability measures on Θ.
It is well known that the credal set associated with a belief function is a polytope
whose vertices are associated with all possible permutations of singletons.
504 16 Decision making with epistemic transforms

Proposition 49. Given a belief function b : 2Θ → [0, 1], the simplex P[b] of the
probability measures consistent with b is the polytope:

P[b] = Cl(pρ [b] ∀ρ),

where ρ is any permutation {xρ(1) , ..., xρ(n) } of the singletons of Θ, and the vertex
pρ [b] is the Bayesian b.f. such that
X
pρ [b](xρ(i) ) = mb (A). (16.16)
A3xρ (i),A63xρ (j) ∀j<i

By Proposition 49 P[b] has as vertices the probabilities ρ1 , ρ2 , ρ3 , ρ4 , ρ5 [b] iden-


tified by red squares in Figure 16.3, namely:

ρ1 = (x, y, z),
ρ1 [b](x) = .4, ρ1 [b](y) = .3, ρ1 [b](z) = .3;
2
ρ = (x, z, y),
ρ2 [b](x) = .4, ρ2 [b](y) = .1, ρ2 [b](z) = .5;
3
ρ = (y, x, z),
(16.17)
ρ3 [b](x) = .2, ρ3 [b](y) = .5, ρ3 [b](z) = .3;
4
ρ = (z, x, y),
ρ4 [b](x) = .3, ρ4 [b](y) = .1, ρ4 [b](z) = .6;
5
ρ = (z, y, x),
ρ5 [b](x) = .2, ρ5 [b](y) = .2, ρ5 [b](z) = .6;

(as the permutations (y, x, z) and (y, z, x) yield the same probability distribution).
We can notice a number of interesting facts:
1. P[b] (the polygon delimited by the red squares) is the intersection of the two
triangles (2-dimensional simplices) T 1 [b] and T 2 [b];
2. the relative belief of singletons,
.2 1 .1 1 .3 1
b̃(x) = = , b̃(y) = = , b̃(z) = = ,
.6 3 .6 6 .6 2
is the intersection of the lines joining the corresponding vertices of probability
simplex P and lower simplex T 1 [b];
3. the relative plausibility of singletons,

˜ (x) = mb (x)+mb ({x,y})+mb (Θ)


pl b kplb −kb
.4 4
= .4+.5+.6 = 15 ,
˜ (y) =
pl .5 1 ˜ 2
b .4+.5+.6 = 3 , plb (z) = 5 ,

is the intersection of the lines joining the corresponding vertices of P and upper
simplex T 2 [b];
16.3 Credal interpretation of Bayesian transforms: The ternary case 505

Fig. 16.3. The simplex of probabilities consistent with the belief function (16.15) defined on
{x, y, z} is shown. Its vertices (red squares) are given by (16.17). Intersection probability, rel-
ative belief and plausibility of singletons are the foci of the pairs of simplices {T 1 [b], T 2 [b]},
{T 1 [b], P} and {P, T 2 [b]}, respectively. In the ternary case T 1 [b] and T 2 [b] are normal tri-
angles. Geometrically, their focus is the intersection of the lines joining their corresponding
vertices (dashed lines for {T 1 [b], P},{P, T 2 [b]}; solid lines for {T 1 [b], T 2 [b]}).

4. finally, the intersection probability

p[b](x) = mb (x) + β[b](mb ({x, y}) + mb (Θ))


.4
= .2 + 1.5−0.4 0.2 = .27;
.4
p[b](y) = .1 + 1.1 0.4 = .245; p[b](z) = .485,

is the unique intersection of the lines joining the corresponding vertices of upper
T 2 [b] and lower T 1 [b] simplices.
Point 1. can be explained by noticing that in the ternary case, by Equation (16.3),
P[b] = T 1 [b] ∩ T 2 [b].
Although Figure 16.3 suggests that b̃, pl ˜ and p[b] might be consistent with b,
b
this is a mere artifact of the ternary case for we proved in Theorem 56 that neither
the relative belief of singletons nor the relative plausibility of singletons necessarily
belong to the credal set P[b].
Indeed, the point of this Chapter is that these epistemic transforms b̃, pl ˜ , p[b] are
b
consistent with the interval probability P[b, plb ] associated with b:
˜ , p[b] ∈ P[b, plb ] = T 1 [b] ∩ T n−1 [b].
b̃, pl b

Their geometric behavior as described by points 2., 3. and 4. holds in the general
case, as we will see in Section 16.4.
506 16 Decision making with epistemic transforms

16.4 Credal geometry of probability transformations


16.4.1 Focus of a pair of simplices

Definition 87. Consider an arbitrary pair of simplices in Rn−1 , denoted by S =


Cl(s1 , ..., sn ) and T = Cl(t1 , ..., tn ).
We call focus of the pair (S, T ) the unique point f (S, T ) of Rn−1 which has the
same affine coordinates in both simplices:
n
X n
X n
X
f (S, T ) = αi si = αj tj , αi = 1. (16.18)
i=1 j=1 i=1

Such a point always exists. As a matter of fact condition (16.18) can be written as
n
X
αi (si − ti ) = 0.
i=1

As the vectors {si − ti , i = 1, ..., n} cannot be linearly independent in Rn−1 (since


there are n of them) there exists a set of real numbers {αi0 , i = 1, ..., n} which meet
the above condition. By normalizing these real numbers in order for them to sum to
1, we have the coordinates of the focus.
The following establishes an elegant sufficient condition for a point to be a focus
of two simplices.
Theorem 104. Given two simplices S = Cl(s1 , ..., sn ) and T = Cl(t1 , ..., tn ) with
the same number of points, whenever p is such that:
n
\
p= a(si , ti ), (16.19)
i=1

then p = f (S, T ) is the focus of S and T .


Notice that the barycenter itself of a simplex is a special case of focus. Indeed,
the center of mass of a d-dimensional simplex S is the intersection of the medians
of S, i.e. the lines joining each vertex with the barycenter of the opposite (d − 1
dimensional) face (see Figure 16.4). But those barycenters for all d − 1 dimensional
faces form themselves a simplex T .

16.4.2 Probability transformations as foci

The notion of focus of a pair of simplices provides a unified geometric interpretation


of the coherent family of Bayesian approximations formed by relative belief and
plausibility of singletons, and intersection probability.
Theorem 105. The relative belief of singletons is the focus of the pair of simplices
{P, T 1 [b]}.
16.4 Credal geometry of probability transformations 507

Fig. 16.4. The barycenter of a simplex is a special case of focus.


A dual result can be proven for the relative plausibility of singletons.
Theorem 106. The relative plausibility of singletons is the focus of {P, T n−1 [b]}.

Proof. We just need to replace mb (x) with plb (x) in the proof of Theorem 105.

It is interesting to note that the affine coordinate of both belief and plausibility of
singletons as foci on the respective intersecting lines (16.19) has a meaning in terms
of degrees of belief.
Theorem 107. The affine coordinate of b̃ as the focus of {P, T 1 [b]} on the corre-
sponding intersecting lines is the reciprocal k1b of the total mass of singletons.

Theorem 108. The affine coordinate of pl ˜ as focus of {P, T n−1 [b]} on the corre-
b
1
sponding intersecting lines is the reciprocal kpl of the total plausibility of single-
b
tons.

Similar results hold for the intersection probability.


Theorem 109. For each belief function b, the intersection probability p[b] is the
focus of the pair of upper and lower simplices (T n−1 [b], T 1 [b]).

Theorem 110. The coordinate of the intersection probability as focus of the pair
{T 1 [b], T n−1 [b]} on the corresponding intersecting lines coincides with the ratio
β[b] (10.10).

The fraction α = β[b] of the width of the probability interval that generates the
intersection probability can be read in the probability simplex as its coordinate on
any of the lines determining the focus of {T 1 [b], T n−1 [b]}.

16.4.3 Semantic of foci and a rationality principle

The pignistic function adhere to sensible rationality principles, and as a consequence


it has a clean geometrical interpretation as center of mass of the credal set associated
with a belief function b. Similarly, the intersection probability has an elegant geo-
metric behavior with respect to the credal set associated with an interval probability
system, being the focus of the related upper and lower simplices.
508 16 Decision making with epistemic transforms

It is quite straightforward to notice that the geometric notion of focus turns out
to possess a simple semantic in terms of probability constraints. Selecting the focus
of two simplices representing two different constraints (i.e., the point with the same
convex coordinates in the two simplices) means adopting the single probability dis-
tribution which meets both constraints in exactly the same way.
If we assume homogeneous behavior in the two sets of constraints {p(x) ≥
b(x) ∀x}, {p(x) ≤ plb (x) ∀x} as a rationality principle for the probability transfor-
mation of an interval probability system, then the intersection probability necessar-
ily follows as the unique solution to the problem.

16.4.4 Mapping associated with a probability transformation

Interestingly, each pair of simplices S = Cl(s1 , ..., sn ), T = Cl(t1 , ..., tn ) in Rn−1


is naturally associated with a mapping, which maps each point of Rn−1 with simpli-
cial coordinates αi in S to the point of Rn−1 with the same simplicial coordinates
αi in T :
FS,T : Rn−1 → Rn−1
n n
X X (16.20)
v= αi si 7→ FS,T (v) = αi ti .
i=1 i=1

Clearly the focus is the (unique) fixed point of this transformation: FS,T (f (S, T )) =
f (S, T ). Each Bayesian transformation in 1-1 correspondence with a pair of sim-
plices (relative plausibility, relative belief, and intersection probability) determines
therefore a mapping of probabilities to probabilities.
The mapping (16.20) induced by the relative Pbelief of singletons is actually quite
interesting. Any probability distribution p = x p(x)bx is mapped by FP,T 1 [b] to
the probability distribution:
X
FP,T 1 [b] (p) = p(x)t1x [b]
x X   i
X X
= p(x) mb (y)by + 1 − mb (y) bx
x∈Θ  y6=x  y6=x 
X X
= bx 1 − mb (y) p(x) + mb (x)(1 − p(x)) (16.21)
x∈Θ  y6 =x 
X X
= bx p(x) − p(x) mb (y) + mb (x)
x∈Θ
X y∈Θ
 
= bx mb (x) + p(x)(1 − kb ) ,
x∈Θ

the probability obtained by adding to the belief value of each singleton x a fraction
p(x) of mass (1 − kb ) of non-singletons. In particular, (16.21) maps the relative
uncertainty of singletons R[b] to the intersection probability p[b]:
16.4 Credal geometry of probability transformations 509
X  
FP,T 1 [b] (R[b]) = bx mb (x) + R[b](x)(1 − kb )
x∈Θ
X
= bx p[b](x) = p[b].
x∈Θ

In a similar fashion, the relative plausibility of singletons is associated with the


mapping:
X
FP,T n−1 [b] (p) = p(x)txn−1 [b]
x∈Θ X   
X X
= p(x) plb (y)by + 1 − plb (y) bx
x∈Θ y6=x y6=x
X  X  
= bx 1 − plb (y) p(x) + plb (x)(1 − p(x)) (16.22)
x∈Θ  y6=x 
X X
= bx p(x) − p(x) plb (y) + plb (x)
x∈Θ
X y∈Θ
 
= bx plb (x) + p(x)(1 − kplb ) ,
x∈Θ

which generates a probability by subtracting to the plausibility of each singleton x


a fraction p(x) of plausibility kplb − 1 in “excess”.
It is curious to note that the map associated with pl ˜ also maps R[b] to p[b].
b
Indeed:
X  
FP,T n−1 [b] (R[b]) = bx plb (x) + R[b](x)(1 − kplb )
x∈Θ  
X 1 − kplb
= plb (x) + (plb (x) − mb (x))
kplb − kb
x∈Θ
X  
= bx plb (x) + (β[b] − 1)(plb (x) − mb (x))
x∈Θ
X 
= β[b]plb (x) + (1 − β[b])mb (x)
x∈Θ
X
= bx p[b](x) = p[b].
x∈Θ

A similar mapping exists for the intersection probability too.

16.4.5 Upper and lower simplices as consistent probabilities

Relative belief and plausibility are then the foci associated with lower T 1 [b] and
upper T n−1 [b] simplices, the incarnations of lower and upper constraints on single-
tons. We can close the circle opened by the analogy with the pignistic transforma-
tion by showing that those two simplices can in fact also be interpreted as the sets of
probabilities consistent with the plausibility (11.14) and belief (11.16) of singletons,
respectively (cfr. Chapter 11).
510 16 Decision making with epistemic transforms

Indeed, the set of pseudo probabilities consistent with a pseudo belief function
ς can be defined as:
.
n o
P[ς] = p ∈ P 0 : p(A) ≥ ς(A) ∀A ⊆ Θ ,

just as we did for ‘standard’ belief functions. We can then prove the following result.
Theorem 111. The simplex T 1 [b] = P 1 [b] of the lower probability constraint for
singletons (16.4) is the set of probabilities consistent with the belief of singletons b̄:

T 1 [b] = P[b̄].

The simplex T n−1 [b] of the upper probability constraint for singletons (16.5) is the
¯ :
set of pseudo probabilities consistent with the plausibility of singletons pl b

¯ ].
T n−1 [b] = P[pl b

A straightforward consequence is that:


Corollary 22. The barycenter t1 [b] of the lower simplex T 1 [b] is the pignistic trans-
form of b̄:
.
t1 [b] = bary(T 1 [b]) = BetP [b̄].
The barycenter tn−1 [b] of the upper simplex T n−1 [b] is the pignistic transform of
¯ :
pl b
. ¯ ].
tn−1 [b] = bary(T n−1 [b]) = BetP [pl b

Proof. As the pignistic function is the center of mass of the simplex of consistent
probabilities, and upper and lower simplices are the sets of probabilities consistent
¯ respectively (by Theorem 111) the thesis follows.
with b̄, pl b

Another corollary stems from the fact that pignistic function and affine combi-
nation commute:

BetP [α1 b1 + α2 b2 ] = α1 BetP [b1 ] + α2 BetP [b2 ]

whenever α1 + α2 = 1.
Corollary 23. The intersection probability is the convex combination of the barycen-
ters of the lower and upper simplices, with coefficient (10.10):

p[b] = β[b]tn−1 [b] + (1 − β[b])t1 [b].

16.5 Alternative versions of the Transferable Belief Model


In summary, all the considered Bayesian transformations of a belief function (pig-
nistic function, relative plausibility, relative belief, and intersection probability) pos-
sess a simple credal interpretation in the probability simplex. Such interpretations
16.5 Alternative versions of the Transferable Belief Model 511

have a common denominator, in the sense that they can all be linked to different
(credal) sets of probabilities, in this way extending the classical interpretation of the
pignistic transformation as barycenter of the polygon of consistent probabilities.
As P[b] is the credal set associated with a belief function b, upper and lower
simplices geometrically embody the probability interval associated with b:
n o
P[b, plb ] = p ∈ P : b(x) ≤ p(x) ≤ plb (x), ∀x ∈ Θ .

By applying the notion of focus to all the possible pairs of simplices in the triad
{P, T 1 [b], T n−1 [b]} we obtain in turn all the different Bayesian transformations
considered here:
n o
P, T 1 [b] : f (P, T 1 [b]) = b̃,
n o
P, T n−1
[b] : f (P, T n−1
[b]) ˜ ,
= pl (16.23)
b
n o
T 1 [b], T n−1 [b] : f (T 1 [b], T n−1 [b]) = p[b].

Their coordinates as foci encode major features of the underlying belief function:
the total mass it assigns to singletons, their total plausibility, and the fraction β of
the related probability interval which yields the intersection probability.
The credal interpretation of upper, lower, and interval probability constraints
on singletons lays in perspective the foundations for the formulation of TBM-like
frameworks for such systems.
We can think of the TBM as of a pair {P[b], BetP [b]} formed by a credal set linked
to each belief function b (in this case the polytope of consistent probabilities) and
a probability transformation (the pignistic function). As the barycenter of a simplex
is a special case of focus, the pignistic transformation is just another probability
transformation induced by the focus of two simplices.
The results of this Chapter suggest therefore similar frameworks:
n o n o n o
˜ , T 1 [b], T n−1 [b] , pl
P, T 1 [b] , b̃ , P, T n−1 [b] , pl ˜ ,

b b

in which lower, upper, and interval constraints on probability distributions on P are


represented by similar pairs, formed by the associated credal set (in the form of a
pair of simplices) and by the probability transformation determined by their focus.
Decisions are then made based on the appropriate focus probability: relative belief,
plausibility, or interval probability respectively.
In the TBM [1220] disjunctive/conjunctive combination rules are applied to be-
lief functions to update or revise our state of belief according to new evidence.
The formulation of similar alternative frameworks for lower, upper, and interval
probability systems would then need to design specific evidence elicitation/revision
operators for such credal sets.
512 16 Decision making with epistemic transforms

16.6 A game/utility theory interpretation


In this perspective, an interesting interpretation for relative belief and plausibility of
singletons can be provided in a game/utility theory context [1357, 1302, 662].
In expected utility theory [1357], a decision maker can choose between a number
of ‘lotteries’ (probability distributions) Li in order to maximize their expected return
or utility calculated as: X
E(Li ) = u(x) · pi (x),
x∈Θ

where u is a utility function u : Θ → R+ which measures the relative satisfaction


(for us) of the different outcomes x ∈ Θ of the lottery, and pi (x) is the probability
of x under lottery Li .

16.6.1 The cloaked carnival wheel scenario

Consider instead the following game theory scenario, inspired by Strat’s expected
utility approach to decision making with belief functions [1303, 1121] (Section
4.5.1).
In a country fair, people are asked to bet on one of the possible outcomes of a
spinning carnival wheel. Suppose the outcomes are {♣, ♦, ♥, ♠}, and that they each
have the same utility (return) to the player. This is equivalent to a lottery (probability
distribution), in which each outcome has a probability proportional to the area of
the corresponding sectors on the wheel. However, the fair manager decides to make
the game more interesting by covering part of the wheel. Players are still asked
to bet on a single outcome, knowing that the manager is allowed to rearrange the
hidden sector of the wheel as he pleases (see Figure 16.5). Clearly, this situation

Fig. 16.5. The modified carnival wheel, in which part of the spinning wheel is cloaked.
16.6 A game/utility theory interpretation 513

can be described as a belief function, in particular one in which the fraction of area
associated with the hidden sector is assigned as mass to the whole decision space
{♣, ♦, ♥, ♠}. If additional (partial) information is provided, for instance that ♦
cannot appear in the hidden sector, different belief functions must be chosen instead.
Regardless the particular belief function b (set of probabilities) at hand, the rule
allowing the manager to pick an arbitrary distribution of outcomes in the hidden
section mathematically translates into allowing him/her to choose any probability
distribution p ∈ P[b] consistent with b in order to damage the player. Supposing the
aim of the player is to maximize their minimal chance of winning the bet, which
outcome (singleton) should they pick?

16.6.2 A minimax/maximin decision strategy

In the probability-bound interpretation, the belief value of each singleton x ∈ Θ


measures the minimal support x can receive from a distribution of the family asso-
ciated with the belief function b:

b(x) = min p(x).


p∈P[b]

.
Hence xmaximin = arg maxx∈Θ b(x) is the outcome which maximizes such mini-
mal support. In the example of Figure 16.5, as ♣ is the outcome which occupies the
largest share of the visible part of the wheel, the safest bet (the one which guarantees
the maximal chance in the worst case) is indeed ♣. In a more formal language, ♣
is the singleton with the largest belief value. Now, if we normalize to compute the
r.b.s. this outcome is obviously conserved:

xmaximin = arg max b̃(x) = arg max min p(x).


x∈Θ x∈Θ p∈P[b]

In conclusion, if the utility function is constant (i.e., no element of Θ can be pre-


ferred over the others), xmaximin (the peak(s) of the relative belief of singletons)
represents the best possible defensive strategy aimed at maximizing the minimal
utility of the possible outcomes.
Dually, plb (x) measures the maximal possible support to x by a distribution
consistent with b, so that
˜ (x) = arg min max p(x)
xminimax = arg min pl b
x∈Θ x∈Θ p∈P[b]

is the outcome which minimizes the maximal possible support.


Suppose for sake of simplicity that the loss function l : Θ → R+ which measures
the relative dissatisfaction of the outcomes is constant, and that in the same game
theory setup our opponent is (again) free to pick a consistent probability distribution
p ∈ P[b]. Then the element with minimal relative plausibility is the best possible
defensive strategy aimed at minimizing the maximum possible loss.
Note that when the utility function is not constant the above minimax and maximin
problems naturally generalize as
514 16 Decision making with epistemic transforms

xmaximin = arg maxx∈Θ b̃(x)u(x), ˜ (x)l(x).


xminimax = arg minx∈Θ pl b

While in classical utility theory the decision maker has to select the best ‘lottery’
(probability distribution) in order to maximize the expected utility, here the ‘lottery’
is chosen by their opponent (given the available partial evidence), and the decision
maker is left with betting on the safest strategy (element of Θ).
Relative belief and plausibility of singletons play then a crucial role in determining
the safest betting strategy in an adversarial scenario in which the decision maker has
to minimize their maximal loss/maximize their minimal return.

Appendix: proofs
Proof of Theorem 103

Lemma 21. The points {t1x [b], x ∈ Θ} are affinely independent.

Proof. Let us suppose against the thesis that there exists an affine decomposition of
one of the points, say tx [b], in terms of the others:
X X
t1x [b] = αz t1z [b], αz ≥ 0 ∀z 6= x, αz = 1.
z6=x z6=x

But then we would have, by definition of t1z [b]:


X X X  X
αz t1z [b] =

t1x [b] = αz mb (y)by + αz mb (z) + 1 − kb bz
z6=x X z6=x X y6=z z6=x
= mb (x)bx αz + bz mb (z)(1 − αz )+
X z6=x z6=x X
+ αz mb (z)bz + (1 − kb ) αz bz
Xz6=x z6=x X
= mb (z)bz + mb (x)bx + (1 − kb ) αz bz .
z6=x z6=x

The latter is equal to (16.7)


X
t1x [b] = mb (z)bz + (mb (x) + 1 − kb )bx
z6=x

X
if and only if: αz bz = bx . But this is impossible, as the categorical probabilities
z6=x
bx are trivially affinely independent. 
16.6 A game/utility theory interpretation 515

Proof of Theorem 103 . Let us detail the proof for T 1 [b]. We need to show that:
1. all the points which belong to Cl(t1x [b], x ∈ Θ) satisfy p(x) ≥ mb (x) too;
2. all the points which do not belong to the above polytope do not meet the con-
straint either.
Concerning item 1., as

 mb (y) x 6= y
t1x [b](y) = 1 −
X
 mb (z) = mb (y) + 1 − kb x = y,
z6=y

the condition p ∈ Cl(t1x [b], x ∈ Θ) is equivalent to:


X X
p(y) = αx t1x [b](y) = mb (y) αx + (1 − kb )αy + mb (y)αy ∀y ∈ Θ,
x∈Θ x6=y
P
where x αx = 1 and αx ≥ 0 ∀x ∈ Θ. Therefore:
p(y) = mb (y)(1 − αy ) + (1 − kb )αy + mb (y)αy = mb (y) + (1 − kb )αy ≥ mb (y)
as 1 − kb and αy are both non-negative quantities.
Point 2. If p 6∈ Cl(t1x [b], x ∈ Θ) then p = x∈Θ αx t1x [b] where ∃z ∈ Θ such
P
that αz < 0. But then:
p(z) = mb (z) + (1 − kb )αz < mb (z)
as (1 − kb )αz < 0, unless kb = 1 in which case b is already a probability.
By Lemma 21 the points {t1x [b], x ∈ Θ} are affinely independent: hence T 1 [b] is a
simplex.
Dual proofs for Lemma 21 and Theorem 103 can be provided for T n−1 [b] by
simply replacing the belief values of singletons with their plausibility values.

Proof of Theorem 104


Suppose indeed that a point p is such that:
p = αsi + (1 − α)ti , ∀ i = 1, ..., n (16.24)
(i.e. p lies on the line passing through si and ti ∀i). Then necessarily:
1
ti = [p − αsi ] ∀ i = 1, ..., n.
1−α
Pn
If p has coordinates {αi , i = 1, ..., n} in T , p = i=1 αi ti , then:
n
X 1 X  
p= αi ti = αi p − αsi =
i=1
1 − α i
 X   
1 X 1 X
= p αi − α αi si = p−α αi si .
1−α i i
1−α i
P
The latter implies that p = i αi si , i.e., p is the focus of (S, T ).
516 16 Decision making with epistemic transforms

Proof of Theorem 105

We need to prove that b̃ has the same simplicial coordinates in P and T 1 [b]. By def-
inition (4.18) b̃ can be expressed in terms of the vertices of the probability simplex
P as:
X mb (x)
b̃ = bx .
kb
x∈Θ

We then need to prove that b̃ can be written as the same affine combination
X mb (x)
b̃ = t1 [b]
kb x
x∈Θ

in terms of the vertices t1x [b] of T 1 [b].


Replacing (16.7) in the above equation yields:
X mb (x) X mb (x)  X  X  
1
t [b] = mb (y)by + 1 − mb (y) bx
kb x kb
x∈Θ x∈Θ  y6=x  y6=x
X mb (x) X X mb (x)
= bx mb (y) + bx +
kb kb
x∈Θ  y6=x  x∈Θ
X mb (x) X X mb (x)
− bx mb (y) = bx = b̃.
kb kb
x∈Θ y6=x x∈Θ

Proof of Theorem 107

In the case of the pair {P, T 1 [b]} we can compute the (affine) line coordinate α of
b̃ = f (P, T 1 [b]) by imposing condition (16.24).
The latter assumes the following form (being si = bx , ti = t1x [b]):
X mb (x)
bx = t1x [b] + α(bx − t1x [b]) = (1 − α)t1x [b] + αbx
kb
x∈Θ X 

= (1 − α) mb (y)by + 1 − kb + mb (x) bx + αbx
h y6=x i X

= bx (1 − α) 1 − kb + mb (x) + α + mb (y)(1 − α)by ,
y6=x

1 kb −1
and for 1 − α = kb , α= kb the condition is met.

Proof of Theorem 108

Again we can compute the line coordinate α of pl ˜ = f (P, T n−1 [b]) by imposing
b
condition (16.24). The latter assumes the form (being si = bx , ti = tn−1
x [b]):
16.6 A game/utility theory interpretation 517
X plb (x)
bx = tn−1
x [b] + α(bx − tn−1x [b]) = (1 − α)tn−1
x [b] + αbx
kplb
x∈Θ X 

= (1 − α) plb (y)by + 1 − kplb + plb (x) bx + αbx
h y6 =x i X

= bx (1 − α) 1 − kplb + plb (x) + α + plb (y)(1 − α)by .
y6=x

1 kplb −1
For 1 − α = kplb ,α= kplb the condition is met.

Proof of Theorem 109


We need to show that p[b] has the same simplicial coordinates in T 1 [b] and T n−1 [b].
These coordinates turn out to be the values of the relative uncertainty function
(16.14) for b:
plb (x) − mb (x)
R[b](x) = . (16.25)
kplb − kb
Recalling the expression (16.7) of the vertices of T 1 [b], the point of the simplex
T 1 [b] with coordinates (16.25) is:
X X X X 
1

R[b](x)tx [b] = R[b](x) mb (y)by + 1 − mb (y) bx
x x  y6=x y6=x 
X X
= R[b](x) mb (y)by + (1 − kb )bx
x  y∈Θ 
X X
= bx (1 − kb )R[b](x) + mb (x) R[b](y)
x
X  y

= bx (1 − kb )R[b](x) + mb (x) ,
x
P
as R[b] is a probability ( y R[b](y) = 1).
By Equation (16.13) the above quantity coincides with p[b].
The point of T n−1 [b] with the same coordinates {R[b](x), x ∈ Θ} is again:
X X X X 
n−1

R[b](x)tx [b] = R[b](x) plb (y)by + 1 − plb (y) bx
x x  y6=x y6=x 
X X
= R[b](x) plb (y)by + (1 − kplb )bx
x  y∈Θ 
X X
= bx (1 − kplb )R[b](x) + plb (x) R[b](y)
x
X  y

= bx (1 − kplb )R[b](x) + plb (x)
x
X  1 − kb 1 − kb

= bx plb (x) − mb (x) ,
x
kplb − kb kplb − kb
which is equal to p[b] by Equation (16.25).
518 16 Decision making with epistemic transforms

Proof of Theorem 110

Again, we need to impose condition (16.24) on the pair {T 1 [b], T n−1 [b]}, or

p[b] = t1x [b] + α(txn−1 [b] − t1x [b]) = (1 − α)t1x [b] + αtn−1
x [b]

for all the elements x ∈ Θ of the frame, α being some constant. This is equivalent
to (after replacing the expressions (16.7), (16.9) of t1x [b] and tn−1
x [b]):
X  
bx mb (x) + β[b](plb (x) − mb (x)) =
x∈Θ hX i hX i
= (1 − α) mb (y)by + (1 − kb )bx + α plb (y)by + (1 − kplb )bx
h y∈Θ y∈Θ i
= bx (1 − α)(1 − kb ) + (1 − α)mb (x) + αplb (x) + α(1 − kplb ) +
X  
+ by (1 − α)mb (y) + αplb (y)
y6n
=x
 o
= bx (1 − kb ) + mb (x) + α plb (x) + (1 − kplb ) − mb (x) − (1 − kb ) +
X  
+ by mb (y) + α(plb (y) − mb (y)) .
y6=x

1−kb
If we set α = β[b] = kpl −kb we get for the coefficient of bx in the above expression
b
(i.e., the probability value of x):
1−kb
 
kplb −kb plb (x) + (1 − kplb ) − mb (x) − (1 − kb ) + (1 − kb ) + mb (x)
= β[b][plb (x) − mb (x)] + (1 − kb ) + mb (x) − (1 − kb ) = p[b](x).

On the other hand:

mb (y) + α(plb (y) − mb (y)) = mb (y) + β[b](plb (y) − mb (y)) = p[b](y)

for all y 6= x, no matter the choice of x.

Proof of Theorem 111

For each belief function b, the vertices of the consistent polytope P[b] are generated
by a permutation ρ of the elements of Θ (16.16). This is true for the b.f. b̄ too, i.e.,
the vertices of P[b̄] are also generated by permutations of singletons.
In this case, however:
– given such a permutation ρ = (xρ(1) , ..., xρ(n) ) the mass of Θ (the only non-
singleton focal element of b̄) is assigned according to the mechanism of Propo-
sition 49 to xρ(1) , while all the other elements receive only their original mass
mb (xρ(j) ), j > 1;
– therefore all the permutations ρ putting the same element in the first place yield
the same vertex of P[b̄];
16.6 A game/utility theory interpretation 519

– hence there are just n such vertices, one for each choice of the first element
xρ(1) = x;
– but this vertex, a probability distribution, has mass values (simplicial coordinates
in P):
m(x) = mb (x) + (1 − kb ), m(y) = mb (y) ∀y 6= x,
as (1 − kb ) is the mass b̄ assigns to Θ;
– the latter clearly corresponds to t1x [b] (16.7).
¯ , as Proposition 1 remains valid for pseudo
A similar proof holds for the case of pl b
b.f.s too.

Proof of Corollary 23
¯ and
By Equation (11.17) the intersection probability p[b] lies on the line joining pl b
b̄, with coordinate β[b]:
¯ + (1 − β[b])b̄.
p[b] = β[b]pl b

If we apply the pignistic transformation we get directly:


¯ ]
BetP [p[b]] = p[b] = BetP [β[b]b̄ + (1 − β[b])pl b
¯ ]
= β[b]BetP [b̄] + (1 − β[b])BetP [pl b
= β[b]tn−1 [b] + (1 − β[b])t1 [b]

by Corollary 22.
Part V

The future of uncertainty


17
An agenda for the future

As we have seen in this Book, the theory of belief functions is a modeling lan-
guage for representing and combining elementary items of evidence, which do not
necessarily come in the form of sharp statements, with the goal of maintaining a
mathematical representation of our beliefs about those aspects of the world which
we are unable to predict with reasonable certainty.
While arguably a more appropriate mathematical description of uncertainty than
classical probability theory, the theory of evidence is relatively simple to implement
and it does not require to abandon the notion of event, as is the case, for instance, for
Walley’s imprecise probability theory. It is grounded in the beautiful mathematics
of random sets, which constitute the natural continuous extension of belief func-
tions, and exhibits strong relationships with many other theories of uncertainty. As
mathematical objects, belief functions have interesting properties in terms of their
geometry, algebra, and combinatorics. This Book was, in particular, dedicated to
the geometric approach to belief and other uncertainty measures proposed by the
Author.
Despite initial objections on the computational complexity of a naive implemen-
tation of the theory of evidence, evidential reasoning can actually be implemented
on large sample spaces and in situations involving the combination of numerous
pieces of evidence. Elementary items of evidence often induce simple belief func-
tions, which can be combined very efficiently with complexity O(n + 1). We do not
need to assign mass to all subsets, but we need to be allowed to do so when neces-
sary (e.g. in case of missing data) – this directly implies a random set description.
Most relevantly, the most plausible hypothesis can be found without computing the
whole combined belief function. At any rate, Monte-Carlo approximations can be
easily implemented when the explicit result of the combination is required. Last but

523
524 17 An agenda for the future

not least, local propagation schemes allow for the parallelisation of belief function
reasoning just as it happens with Bayesian networks.
As we saw in Chapter 4, statistical evidence can be represented in belief theory
in several ways:
– by likelihood-based belief functions, in a way that generalises both likelihood-
based and Bayesian inference;
– via Dempster’s inference approach, which makes use of auxiliary variables;
– in the framework of the Generalised Bayesian Theorem proposed by Smets.
Decision making strategies based on intervals of expected utilities can be formu-
lated, which produce decision which are more cautious than traditional ones, and
are able to explain the empirical aversion to second-order uncertainty highlighted in
Ellsberg’s paradox.
The extension of the theory, originally formulated for finite sample space, to contin-
uous domains can be tackled via the Borel interval representation initially brought
forward by Strat and Smets, in case the analysis is restricted to intervals of real val-
ues. In the more general case of arbitrary subsets of the real domain, the theory of
random sets is the natural mathematical framework to adopt.
An array of estimation, classification, regression tools based on the theory of belief
functions is already available, and more can be envisaged.

Open issues

As we have had the chance to appreciate, a number of important issues remain open.
For instance, the correct epistemic interpretation of belief function theory should
be clarified once and for all: we argue here that belief measures should be seen as
random variables for set-valued observations (recall the random die example of
Chapter 1).
What is the most appropriate mechanism for evidence combination is also still de-
bated. The reason is that the choice seems to depend on meta-information on the
reliability and independence of the sources involved which is hardly accessible. As
we argue here (and we have hinted at in Chapter 4), working with intervals of belief
functions may be the way forward, as this acknowledges the meta-uncertainty on
the nature of the sources generating the evidence.
The same holds for conditioning, as we showed.
Finally, the theory of belief functions on Borel intervals of the real line is rather
elegant, but if we want to achieve full generality the way forward is grounding the
theory into the mathematics of random sets.

A research programme

We then think appropriate to conclude this Book by outlining what in our view is the
research agenda for the future development of random set and belief function theory.
For obvious reason we will only touch a few of the most interesting developments,
17 An agenda for the future 525

without being able to go beyond a certain level of detail. However, we hope this will
stimulate the reader to pursue some of the research directions and contribute to the
further development of the theory in the near future.
Although random set theory as a mathematical formalism is quite well devel-
oped, thanks in particular to the work of Ilya Molchanov [], a theory of statistical
inference with random sets is not yet in sight.
In Section 17.1 if this final Chapter we briefly touch, in particular, the following
points:
– the notion of generalised lower and upper likelihoods (Section 17.1.1), to go
beyond inference with belief functions which takes classical likelihood at face
value;
– the formulation of a framework for logistic regression with belief functions,
which makes use of these generalised lower and upper likelihoods (Section
17.1.2);
– fiducial inference with belief functions is also possible (Section 17.1.3), as pro-
posed by .. and Gong [].
– the generalisation of the classical total probability theorem for random sets
(17.1.4), starting with belief functions [?];
– the generalisation of classical limit theorems (central limit theorem, law of large
numbers) to the case of random sets (17.1.5): this allows us, for instance, a rigor-
ous definition of Gaussian random sets and belief functions (17.1.5);
– the introduction of parametric models based of random sets (Section 17.1.6)
will allow us to perform robust hypothesis testing (Section 17.1.6), thus laying
the foundations for a theory of frequentist inference with random sets (Section
17.1.6);
– the development of a theory of random variables and processes in which the un-
derlying probability space is replaced by a random set space (Section 17.1.7):
in particular, this requires the generalisation of the notion of Radon-Nikodym
derivative to belief measures (17.1.7).
The geometric approach to uncertainty is also open to a number of further de-
velopments (Section 17.2), including:
– the geometry of combination rules other than Dempster’s (Section 17.2.1), and
the associated conditioning operators (17.2.2);
– the possibility of conducting inference in a geometric fashion, by finding a com-
mon representation for both belief measures and the data that drives the inference
(17.2.3);
– the geometry of continuous extension of belief functions needs to be explored
(Section 17.2.4): starting from the geometry of belief functions on Borel intervals
to later tackle the general random set representation;
– in Chapters ?? we provided a first extension to possibility theory; a geometric
analysis of other uncertainty measures is in place (Section 17.2.5), including ma-
jor ones such as capacities and gambles;
526 17 An agenda for the future

– newer geometrical representations (Section 17.2.6), based on isoperimeters of


convex bodies or exterior algebras.
Other theoretical developments are necessary, in our view, in particular:
– a set of prescriptions for reasoning with intervals of belief functions (17.3.1): as
we saw in Chapter 4, this seems to be the natural way to avoid the entangled issue
with choosing a combination rule;
– the full development of a theory of random set graphical models (17.3.2), requires
merging what are currently two separate lines of research: (1) belief function on
graphical models and (2) evidential networks (compare Chapter 4, Section 4.4.5);
– the further development of machine learning tools based on belief theory (cfr.
Section 4.7), able to tackle the current trends in the field, such as transfer learn-
ing, deep learning. Here, in particular, we briefly discuss the idea of random set
random forests (Section 17.3.3).
Last but not least, it is important that we show how to tackle high impact prob-
lems using random set theory (Section 17.4).
Here we discuss in particular:
– rare event prediction using generalised logistic regression (Section 17.4.1);
– the possible creation of a framework for climatic change predictions based on ran-
dom sets (17.4.2), which overcomes the limitations of existing (albeit neglected)
Bayesian approaches;
– new robust foundations for machine learning (17.4.3), obtained by generalising
Vapnik’s Probably Approximately Correct analysis to the case in which the dis-
tribution from which the data is sampled is not assumed to be known.
.

17.1 A statistical random set theory


17.1.1 Lower and upper likelihoods

The traditional likelihood function (Chapter 1, Section 1.3.3) is a conditional prob-


ability of the data given a parameter θ ∈ Θ, i.e., a family of PDF over X param-
eterised by θ. Most of the work on belief function inference just takes the notion
of likelihood as a given, and constructs belief functions from an input likelihood
function (cfr. Chapter 4, Section 4.1.1).
However, there is no reason why we should not define a ‘belief likelihood func-
tion’ mapping of a sample observation x ∈ X to a real number, rather than use
conventional likelihood to construct belief functions.
It is natural to define such a belief likelihood function as family of belief functions
on X, BelX (.|θ), parameterised by θ ∈ Θ. Note that this is the input of Smets’
Generalised Bayesian Theorem, a collection of ‘conditional’ belief functions. Such
a belief likelihood takes values on sets of outcomes, A ⊂ Θ – individual outcomes
are a just special case.
17.1 A statistical random set theory 527

This seems to provide a natural setting for computing likelihoods of set-valued


observations, in coherence with the random set philosophy that underscores this
Book.

Belief likelihood function of repeated trials What can we say about the belief
likelihood function of a series of trials? Note that this is defined on arbitrary subsets
A of X1 × · · · × Xn , where Xi denotes the space of quantities that can be observed
at time i. A series of sharp observations is then a tuple x = (x1 , ..., xn ) ∈ X1 ×
· · · × Xn .
Definition 88. The value of the belief likelihood function on an arbitrary subset A
of X1 × · · · × Xn is:
.
BelX1 ×···×Xn (A|θ) = BelX ↑×
1
i Xi
· · · BelX ↑×
n
i Xi
(A|θ), (17.1)

where BelX ↑×
j
i Xi
is the vacuous extension of BelXj to the Cartesian product X1 ×
· · · × Xn where the observed tuples live, and is an arbitrary combination rule.
Can we express a belief likelihood value (17.1) as a function of the belief values
of the individual trials? The answer is that yes, if we merely wish to compute like-
lihood values of tuples of individual outcomes x = (x1 , ..., xn ) ∈ X1 × · · · × Xn
rather than sets of outcomes, the following decomposition holds.
Theorem 112. When using either ∩ or ⊕ as a combination rule in the definition
of belief likelihood function, the following decomposition holds:
n
Y
BelX1 ×···×Xn ({(x1 , ..., xn )}|θ) = BelXi (xi )
i=1
Yn (17.2)
P lX1 ×···×Xn ({(x1 , ..., xn )}|θ) = P lXi (xi )
i=1

Proof.
Definition 89. We call the quantities
.
L(x = {x1 , ..., xn }) = BelX1 ×···×Xn ({(x1 , ..., xn )}|θ),
(17.3)
.
L(x = {x1 , ..., xn }) = PlX1 ×···×Xn ({(x1 , ..., xn )}|θ)

lower and upper likelihood, respectively, of the sample x = {x1 , ..., xn }.


Decomposition (17.2) amounts to a property of conditional conjunctive inde-
pendence (recall our discussion of the Generalised Bayes Theorem, Section 4.3.2),
but only for series of sharp samples x.
One can prove that similar regularities hold when using the more cautious disjunc-
tive combination .∪
An open question is whether this does generalise to arbitrary subsets of samples
A ⊂ X1 × · · · × Xn .
528 17 An agenda for the future

Bernoulli trials example Consider once again the Bernoulli trials example, which
the single outcome space is binary: Xi = X = {H, T }. We know that under the
assumptions of conditional independence and equidistribution, the traditional like-
lihood for a series of Bernoulli trials reads as pk (1 − p)n−k , where p = P (H), k is
the number of successes and n the total number of trials.
Let us then compute the belief likelihood function for a series of Bernoulli trials,
under the similar assumption that the belief functions BelXi = BelX , i = 1, ..., n,
coincide, with BelX parameterised by p = m(H), q = m(T ) (with p + q ≤ 1
this time). We seek the belief function on X = {H, T }, which best describes the
observed sample, i.e., the optimal values of the two parameters p and q.
Under the equal mass distribution assumption, applying Theorem 112 yields the
following expression for the lower and upper likelihoods of the sample x =
{x1 , ..., xn }, respectively:
L({x1 , ..., xn }) = BelX ({x1 }) · ... · BelX ({xn }) = pk q n−k ;
(17.4)
L({x1 , ..., xn }) = P lX ({x1 }) · ... · P lX ({xn }) = (1 − q)k (1 − p)n−k .
After normalisation, these can be seen as probability distribution functions (PDFs)
over the (belief) space B of all belief functions definable on X (compare Chapter 6).

Fig. 17.1. Lower (left) and upper (right) likelihood functions plotted over the space of belief
functions defined on the frame X = {H, T }, parameterised by p = m(H) (X axis) and
q = m(T ) (Y axis), for the case of k = 6 successes over n = 10 trials.

Figure 17.1 plots the both lower and upper likelihoods (17.4) for the case of
k = 6 successes over n = 10 trials.
Note that lower likelihood (left) subsumes the traditional likelihood pk (1 − p)n−k ,
as its section for p + q = 1. Indeed, the the maximum of the lower likelihood is
the traditional ML estimate p = k/n, q = 1 − p. This makes sense, for the lower
likelihood is highest for the most committed belief functions (i.e., for probability
measures).
The upper likelihood (right) has a unique maximum in p = q = 0: this is the
vacuous belief function on {H, T }, with m({H, Y }) = 1.
17.1 A statistical random set theory 529

The interval of belief functions joining max L with max L is the set of belief
functions such that pq = n−k k
, i.e., those which preserve the ratio between the ob-
served empirical counts. Once again the maths leads us to think in terms of intervals
of belief functions, rather than individual ones.

17.1.2 Generalised logistic regression

Bernoulli trials are central in statistics: generalising their likelihood, as we just did,
allows us to represent uncertainty in a number of regression problems.
For instance, in logistic regression (recall Chapter 1, Section 1.4.7):

1 e−(β0 +β1 xi )
πi = P (Yi = 1|xi ) = , 1−πi = P (Yi = 0|xi ) =
1+ e−(β0 +β1 xi ) 1 + e−(β0 +β1 xi )
(17.5)
the two scalar parameters β0 , β1 are estimated by maximising the likelihood of the
sample, where the likelihood function is:
n
Y
L(β0 , β1 |Y ) = πiYi (1 − πi )1−Yi .
i=1

Here, Yi ∈ {0, 1} and πi is a function of β0 , β1 . Maximising L(β0 , β1 |Y ) yields a


single conditional PDF P (Y |x).
As in the Bernoulli series experiment (17.1.1), we can generalise logistic regres-
sion to a belief function setting by replacing the conditional probability (πi , −πi )
on X = {0, 1} with a belief function (pi = m({1}), qi = m({0})) on 2X . Note that,
just in traditional logistic regression, this time the belief functions Beli associated
with different input values xi are not equally distributed.
Lower and upper likelihoods can then be computed as:
n
Y n
Y
L(β|Y ) = pYi i qi1−Yi , L(β|Y ) = (1 − qi )Yi (1 − pi )1−Yi .
i=1 i=1

The problem is, how do we generalise the logit link between observations x and
outputs y? For just assuming (17.5) does not yield any analytical dependency for qi .
In other words, we seek a logit-type analytical mapping between observations and
belief functions over a binary frame.
A first, simple proposal may consist of just adding a parameter β2 such that the
following relationship holds:

e−(β0 +β1 xi )
qi = m(Yi = 0|xi ) = β2 . (17.6)
1 + e−(β0 +β1 xi )
We can then seek lower and upper optimal estimates for the parameter vector β =
[β0 , β1 , β2 ]:

arg max L 7→ β 0 , β 1 , β 2 arg max L 7→ β 0 , β 1 , β 2 (17.7)


β β
530 17 An agenda for the future

Plugging these optimal paramaters into (17.5), (17.6) will then yield an upper and
a lower family of conditional belief functions given x (once again, an interval of
belief functions):
BelX (.|β, x) BelX (.|β, x).
An analysis of the validity of such a straightforward extension of the logit map-
ping, and the exploration of alternative ways of generalising it are research questions
potentially very interesting to pursue.

17.1.3 Fiducial inference with belief functions

17.1.4 The total probability theorem for random sets

Spies (Section 4.3.3) and others have posed themselves the problem of generalising
the law of total probability
N
X
P (A) = P (A|Bi )P (Bi ),
i=1

where {B1 , ..., BN } is a disjoint partition of the sample space, to the case of belief
function. They mostly did so from the angle of producing a generalisation of Jef-
frey’s combination rule – nevertheless, the question goes rather beyond their original
intentions, as it involves understanding the space of solutions to the generalised total
probability problem.
The problem of generalising the total probability theorem to belief functions can
be posed as follows (Figure 17.2).

Theorem 113. (Total belief theorem) Suppose Θ and Ω are two frames of dis-
cernment, and ρ : 2Ω → 2Θ the unique refining between them. Let b0 be a belief
function defined over Ω = {ω1 , ..., ω|Ω| }. Suppose there exists a collection of be-
lief functions bi : 2Πi → [0, 1], where Π = {Π1 , ..., Π|Ω| }, Πi = ρ({ωi }), is the
partition of Θ induced by its coarsening Ω.
Then, there exists a belief function b : 2Θ → [0, 1] such that:
1. b0 is the restriction of b to Ω, b0 = b|Ω (Equation (2.11), Chapter 2);

2. b ⊕ bΠi = bi ∀i = 1, ..., |Ω|, where bΠi is the categorical belief function with
b.p.a. mΠi (Πi ) = 1, mΠi (B) = 0 for all B 6= Πi .

It can be proven that any solution to the total belief problem must have focal
elements which obey the following structure [?].

Proposition 50. Each focal element ek(.) of a total belief function b meeting the re-
quirements of Theorem 113 is the union of exactly one focal element for each of
the conditional belief functions whose domain Πi is a subset of ρ(Ek ), where Ek is
the smallest focal element of the a-priori belief function b0 such that ek(.) ⊂ ρ(Ek ).
Namely:
17.1 A statistical random set theory 531

Fig. 17.2. Pictorial representation of the total belief theorem hypotheses (Theorem 113).

[
ek(.) = eji i (17.8)
i:Πi ⊂ρ(Ek )

where eji i ∈ Ebi ∀i, and Ebi denotes the list of focal elements of bi .

If we enforce the a-priori function b0 to have only disjoint focal elements (i.e.,
b0 to be the vacuous extension of a Bayesian function defined on some coarsening
of Ω), we have what we call the restricted total belief theorem.
In this special case it suffices to solve the |Eb0 | sub-problems obtained by consid-
ering each focal element E of b0 separately, and then combine the resulting partial
solutions by simply weighing the resulting basic probability assignments using the
a-priori mass mb0 (E), to obtain a fully normalized total belief function.
For each individual focal element of b0 the task of finding a suitable solution to
the total belief problem translates into a linear algebra problem.
A candidate solution to the subproblem of the restricted total beliefPproblem associ-
ated with E ∈ Eb0 is the solutionQ to a linear system with n min = i=1,...,N (ni −
1) + 1 equations and nmax = i ni unknowns:

Ax = b, (17.9)

where each column of A is associated with an admissible (i.e., meeting the struc-
ture of Lemma ??) focal element ej of the candidate total belief function, x =
[mb (e1 ), · · · , mb (en )] and n = nmin is the number of equalities generated by the
N conditional constraints.
Since the rows of the solution system (??) are linearly independent, any system
of equation obtained by selecting nmin columns from A has a unique solution. A
minimal solution to the restricted total belief problem (??) (i.e., a solution with the
minimum number of focal elements) is then uniquely determined by the solution of
532 17 An agenda for the future

a system of equations obtained by selecting nmin columns from the nmax columns
of A.

Definition 90. We define a class T of transformations acting on transformable


columns e of a candidate minimal solution system via the following formal sum:
X X
e 7→ e0 = −e + ei − ej (17.10)
i∈C j∈S

where C, |C| < N is a covering set of companions of e (i.e., every component of e is


covered by at least one of them), and a number of selection columns S, |S| = |C|−2,
are employed to compensate the side effect of C to yield an admissible column (i.e.,
a candidate focal element meeting the structure of Lemma ??).

We call the elements of T column substitutions.

Theorem 114. Column substitutions of the class T reduce the absolute value of the
most negative solution component.

using Theorem 114 to prove that there always exists a selection of columns of
A (focal elements of the total belief function) such that the resulting square linear
system has a positive vector as a solution. This can be done in a constructive way, by
applying a transformation of the type (17.10) recursively to the column associated
with the most negative component, to obtain a path in the solution space which
eventually lead to the desired solution.
The following sketch of an existence proof for the restricted total belief theorem
exploits the effects on solution components of colum substitutions of type T :
1. at each column substitution the most negative solution component decreases by
Theorem 114;
2. if we keep substituting the most negative variable we keep obtaining distinct
linear systems, for at each step the transformed column is assigned a positive
solution component and therefore, if we follow the proposed procedure, cannot
be changed back to a negative one by applying transformations of class T ;
3. this implies that there can be no cycles in the associated path in the solution
space;
4. the number nnmax

min
of solution systems is obviously finite, hence the procedure
must terminate.
Unfortunately, counterexamples show that there are ‘transformable’ columns
(associated with negative solution components) which do not admit a transforma-
tion of the type (17.10). Although they do have companions on every partition Πi ,
such counterexamples do not admit a complete collection of ‘selection’ columns.
candidate minimal solution systems related to a problem of a given size {ni , i =
1, ..., N } into a solution graph.
17.1 A statistical random set theory 533

Fig. 17.3. The solution graph associated with the restricted total belief problem with N = 2,
n1 = 3 and n2 = 2.

17.1.5 Limit theorems for random sets

Total probability is only one important result of classical probability theory that
needs to be generalised to the wider setting of random sets.

Gaussian random sets The Gaussian distribution is central in probability theory


and its applications, hence the name ‘normal’. It possesses very nice properties,
for its moments are sufficient statistics, and is the PDF with maximum entropy,
among those with given mean and standard deviation. Furthermore, the central limit
theorem shows that all sums of i.i.d. random variables is Gaussian, so that whenever
test statistics or estimators are functions of sums of random variables, they will have
asymptotical normal distributions.
An interesting option is to investigate how Gaussian distributions are trans-
formed under (appropriate) multivalued mappings in Dempster’s original setting.
It involves, in particular, exploring the space of mappings to seek the most sensible
and convenient ones.

Central limit theorem In order to properly define a Gaussian belief function, how-
ever, we need to need to generalise the classical central limit theorem to random
sets. The old proposal by Dempster and Liu merely transfers normal distributions
on the real line by Cartesian product with Rm (cfr. Chapter 3, Section 3.3.4).
Both the central limit theorem and the law(s) of large numbers have already been
generalised to imprecise probabilities1
A central limit theorem for belief function was recently formulated by Boston
University’s Larry G. Epstein and Kyoungwon Seo2 . Xiaomin Shi from Shandong
University) has separately brought forward a number of cntral limit theorems for
belief measures3 .
1
See ‘Introduction to Imprecise Probabilities’, http://onlinelibrary.wiley.com/book/10.1002/9781118763117.
2
http://people.bu.edu/lepstein/files-research/CLT-Nov17-2011.pdf
3
https://arxiv.org/pdf/1501.00771.pdf
534 17 An agenda for the future

17.1.6 Frequentist inference with random sets

Random sets are mathematical objects detached from any specific interpretation.
Just as probability measures are used by both Bayesians and frequentists for their
analyses, random sets can also be employed in different ways according to the in-
terpretation they are provided with.
In particular, it is natural to think of a generalised frequentist framework in
which random experiments are designed by assuming a specific random set distribu-
tion, rather than a conventional one, in order to better cope with the ever-occurring
set-valued observations.

Parameterised families of random sets The first necessary step is to introduce


parametric models based on random sets.
Recall Dempster’s random set interpretation (Figure 17.4). Should the multi-

Fig. 17.4. Describing the family of random sets (right) induced by families of probability
distributions in the source probability space (left) is the first step towards a generalisation of
frequentist inference to random sets.

valued mapping Γ which defines a random set be ‘designed’, or derived from the
problem?
For instance, in the cloaked die example (Section 1.4.5) it is the occlusion which
generates the multi-valued mapping and we have no control over it. In other situa-
tions, however, it may make sense to impose a parameterised family of mappings

Γ (.|θ) : Ω → 2Θ

which, given a (fixed) probability on the source space Ω, would yield as a result a
parameterised family of random sets.
The alternative is to fix the multi-valued mapping (e.g., when it is given by
the problem), and model the source probability by a classical parametric model. A
Gaussian or binomial family of source probabilities would then induce a family of
‘Gaussian’ or ‘binomial’ random sets (see Figure 17.4 again).
17.1 A statistical random set theory 535

Hypothesis testing with random sets As we know, in hypothesis testing (Section


1.3.3) designing an experiment amounts to choosing a family of probability distri-
butions which is assumed to generate the observed data. If parameterised families
of random sets can be contructed, they can then be plugged in to the frequentist
inference machinery, after an obvious generalisation of some of the steps involved.
Hypothesis testing with random sets would then read as follows:
1. State the relevant null H0 and alternative hypotheses;
2. state the assumptions about the form of the distribution random set (mass as-
signment) describing the observations;
3. state the relevant test statistic T (a quantity derived from the sample) – only this
time the sample contains set-valued observations!
4. derive the distribution mass assignment of the test statistic under the null hy-
pothesis (from the assumptions);
5. set a significance level (α);
6. compute from the observations the observed value tobs of the test statistic T –
now this will also be set-valued;
7. calculate the p-value conditional belief value under H0 of sampling a test statis-
tic at least as extreme as the observed value;
8. reject the null hypothesis, in favor of the alternative hypothesis, if and only if
the p-value the conditional belief value is less than the significance level.

17.1.7 Random set variables

We know that random sets are set-valued random variables: nevertheless, the ques-
tion stands as to whether one can build random variables on top of random set (be-
lief) spaces, rather than the usual probability space.
Just as in the classical case, we need a mapping from Θ to a measurable space
(e.g. the positive real half line):

f : Θ → R+ = [0, +∞],

where this time Θ is the co-domain of a multivalued mapping Γ : Ω → 2Θ with


source probability space Ω.

Generalising the Radon-Nikodym derivative For a classical continuous random


variable X, we can compute its probability density function (PDF) as its Radon-
Nikodym Derivative (RND), namely the measurable function p such that:
Z
P [X ∈ A] = pdµ,
A

where ...
An interesting questyion is: can we compute a (generalised) PDF for a random
set random variable as defined above?
536 17 An agenda for the future

The extension of the Radon-Nikodym derivative for set functions4 was first studied
by Harding et al in 1997. Yann Rebille (2009) has also investigated the problem in
his ‘A Radon-Nikodym derivative for almost subadditive set functions’5 . Graf, on
the other hand, has tackled the problem of defining the RND for capacities, rather
than probability measures []. The following summary of the problem is abstracted
from Molchanov’s ‘Theory of Random Sets’6 .
Assume that the two capacities µ, ν are monotone, subadditive and continuous
from below.

Definition 91. Absolute continuity A capacity ν is absolutely continuous with re-


spect to another capacity µ if, for every A ∈ F, ν(A) = 0 whenever φ(A) = 0.

The definition is the same as for standard measures. Only, Rfor standard measures
absolute continuity is equivalent to the integral relation µ = ν – this is not longer
true for general capacities.

Strong decomposition Indeed for capacities (as opposed to probability measures),


absolute continuity does not guarantee the existence of a RN derivative. To under-
stand this, consider the case of a finite Θ, |Θ| = n. Then any measurable function
f : Θ → R+ is determined by just n numbers, which do not suffice to uniquely
define a capacity on 2Θ (which has 2n degrees of freedom).

Definition 92. The pair (µ, ν) has the strong decomposition property if ∀α ≥ 0
there exists a measurable set Aα ∈ F such that

α(ν(A) − ν(B)) ≤ µ(A) − µ(B) if B ⊂ A ⊂ Aα ,


α(ν(A) − ν(A ∩ Aα )) ≥ µ(A) − µ(A ∩ Aα ) ∀A.

In rough words, the strong decomposition condition states that, for each bound α,
the ‘incremental ratio’ of the two capacities is bounded by α in the sub-power set
capped by some event Aα .
Note that all standard measures meet the strong decomposition property.

A Radon-Nikodym theorem for capacities The RN theorem for capacities then


reads as follows [].

Theorem 115. For every two capacities µ and ν, ν is an indefinite integral of µ if


and only if the pair (µ, ν) has the strong decomposition property and ν is absolutely
continuous with respect to µ.

A number of problems remain open. The conditions of the theorem (which holds
for general capacities) need to be elaborated for the case of completely alternating
capacities (distributions of random closed sets).
4
https://www.math.nmsu.edu/ jharding/
5
https://halshs.archives-ouvertes.fr/hal-00441923/document
6
http://www.springer.com/jp/book/9781852338923
17.2 Developing the geometric approach 537

As Molchanov notes []. the strong decomposition property for ν = TX and µ = TY


mean that
B B
αPX (FA ) ≤ PY (FA )
if B ⊂ A ⊂ Aα , and
A∩Aα A∩Aα
αPX (FA ) ≥ PY (FA ) ∀A
B
where FA = {C ∈ F, B ⊂ C ⊂ A}.
Nguyen [] has proposed a constructive approach to RN derivatives for capacities of
random sets, similar to the one in constructive measure theory based on derivatives
of set functions [?].

17.2 Developing the geometric approach


The geometric approach to uncertainty measures, the main focus of this Book, also
has much room for further extensions. On the one hand, the geometric language
needs to be applied all the aspects of the reasoning chain, including combination and
conditioning, but also (potentially) inference. As we move on from belief fuctions on
finite frames to random sets on arbitrary domains, the geometry of these continuous
formulations poses new questions.
On the other hand, the formalism needs to tackle (besides probability, possibility
and belief measures) other important mathematical descriptions of uncertainty, first
of all general monotone capacities and gambles (or variations thereof).
Finally, new more sophisticated geometric representations of belief measures can be
sought, in terms of either exterior algebras or areas of projections of convex bodies.

17.2.1 Geometry of other combination rules


The study of the geometry of the notion of evidence combination and belief update,
started with the rather elegant analysis of Dempster’s rule provided in Chapter ??,
will find a natural prosecution by understanding the geometric behaviour of the
other main combination operators, including Yager’s rule, Dubois and Prade’s rule,
conjunctive ∩ and disjunctive ∪ rules, cautious and bold rules, Josang’s consensus
operator, Murphy’s and Deng’s averaging.
The final goal of this work would be the ability to describe the ‘cone’ of pos-
sible future belief states under stronger or weaker assumptions on reliability and
independence of sources.
The inversion of combination results via geometric means (i.e., canonical de-
composition in its various forms) can also be pursued.

17.2.2 Geometry of other conditioning operators


Our analysis of geometric conditioning is also only in its infancy. We analysed the
case of Lp norms, but what happens when we plug different norms into the asso-
ciated optimisation problem? Even more importantly, is geometric conditioning a
538 17 An agenda for the future

general encompassing framework for conditioning in belief calculus, i.e., can we


express any conditioning operator as minimising an appropriate distance to the con-
ditioning simplex?
Furthermore, the geometry of all the main conditioning operators remain to be
understood, including lower and upper envelopes, Suppes’ ‘geometric’ condition-
ing, and Smets’ unnormalised conditioning.

17.2.3 Geometric inference

An intriguing question is whether we can pose the inference problem in this setting
as well. Namely, we seek a geometric representation general enough to encode both
the data driving the inference and the (belief) measures possibly resulting from the
inference, in such a way that the inferred measure minimises some sort of distance
from the empirical data.

17.2.4 Geometry of continuous formulations

This Book has mainly concerned itself with the geometric representation of finite
belief measures. Nevertheless, here we can start providing some insights on how to
extend this approach to belief functions on infinite spaces.

Geometry of Borel belief functions

Geometry of random sets

17.2.5 A true geometry of uncertainty

A true geometry of uncertainty will require the ability to manipulate in our geo-
metric language any (or most) forms of uncertainty measures (compare the partial
hierarchy reported in Chapter ??).
Probability and possibility measures are, as we know, special cases of belief
functions: therefore, their geometric interpretation does not require any extension
of the notion of belief space (as we extensive learned in Parts II and III of this
Book).
Most other uncertainty measures, however, are not special cases of belief functions
– in fact, a number of them are more general than belief functions, such as for in-
stance probability intervals (2-monotone capacities), general monotone capacities,
upper/lower previsions.
Tackling these more general measures requires therefore an extension of the geomet-
ric belief space able to encapsulate the most general such representation. Arguably,
this will lead to a geometric theory of imprecise probabilities, starting from gambles
and sets of desirable gambles.

Geometry of capacities
17.2 Developing the geometric approach 539

Geometry of gambles

17.2.6 Fancier geometries

Representing belief functions as mere vectors of mass or belief values is not entirely
satisfactory. Basically, when doing so all vector components are undistinguishable,
while they correspond to values assigned to subsets of Θ of different cardinality.
Other geometrical representation of belief functions on finite space can never-
theless be imagined, which take into account the qualitative different between events
of different cardinalities.

Capacities as isoperimeters of convex bodies Convex bodies, for instance, are the
subject of a fascinating field of study.
Any convex body in Rn obviously possesses 2n distinct orthogonal projections
onto the 2n subspaces generated by all possible subsets of coordinate axes (see
Figure 17.5).

Fig. 17.5. Given a convex body K in the Cartesian space Rn , endowed with coordinates
x1 , ..., xn , the function ν assigning to each subset of coordinates S = {xi1 , ..., xim } the
(hyper)-volume ν(S) of the orthogonal projection K|S of K onto the linear subspace gener-
ated by S = {xi1 , ..., xim } is a capacity.

This idea is clearly related to the notion of Grassman manifold, i.e. the manifold
of all linear subspaces of a given vector space.
It is easy to see that, given a convex body K in the Cartesian space Rn , endowed
with coordinates x1 , ..., xn , the function ν assigning to each subset of coordinates
540 17 An agenda for the future

S = {xi1 , ..., xim } the (hyper)-volume ν(S) of the orthogonal projection K|S of K
onto the linear subspace generated by S = {xi1 , ..., xim } is a capacity.
Under what conditions is this capacity monotone? Under what conditions is this
capacity a belief function (i.e. an infinitely-monotone capacity)?

Belief functions and exterior algebras

17.3 Completing the theory of evidence


17.3.1 Reasoning with intervals of belief functions

we saw they pop up all the time when reasoning or making inference
geometry of convex sets of belief functions?

17.3.2 Graphical models

17.3.3 Random set random forests

Decision trees A decision tree is a recursive divide and conquer structure, in which
at each step:
1. an attribute is selected to partition the training set in an optimal manner;
2. the current training set is split into training subsets according to the values of
the selected attribute.
A typical attribute selection criterion is based on the information gain Info(S)InfoA (S),
where information is measured by the classical Shannon entropy:
. X
Info(S) = − pc log2 pc ,
c∈C

where pc is the proportion of objects in S with class label c, and


v
X |SA | v
InfoA (S) = Info(SA ).
|S|
a∈range(A)

The information gain criterion favours attributes with a larger number of values over
those with fewer possible values.

Belief decision trees A belief decision tree [] is composed by the same elements
of a traditional decision tree but, at each step, class information on the items of
the dataset is expressed by a basic probability assignment (b.p.a.) over the set of
possible classes C for each object.
The average such b.p.a. is then computed, and the pignistic probability of the result
is used to compute the entropy InfoA (S). The attribute with the highest gain ratio is
selected, and eventually each leaf is labeled by a b.p.a. expressing a belief about the
actual class of the object, rather than a unique class.
17.4 High-impact applications 541

Random forests Random forests [?] are ensembles of decision trees by random
selection of sub-training sets with replacement.
At each step one selects a random subset of features as well as a set of thresholds
for these features, and then
P chooses the single best feature and threshold (using
entropy or Gini impurity c pc (1 − pc )). The process is repeated until all the trees
are fully grown.
For multi-label problems entropy has to be computed over sets of labels: Varma, for
instance [] assumes independence of individual labels.

Random set random forests The idea behind random set random forests is that the
training data for a given subset provides sample statistics for a random set (a belief
function).
Consequently, a measure of entropy/uncertainty for random sets should be used
to perform the splitting. This can be done efficiently by Monte-Carlo sampling of
subsets (cfr. Chapter 4, Section 4.4.3). As we saw in Chapter 4, Section 4.8.3, a
number of generalisations of entropy to belief functions have been proposed – what
is the most suitable for decision purposes will be an interesting topic of research.

Fig. 17.6. Both the lower and the upper optimal belief functions (17.11) amount to a convex
envelope of logistic functions.

17.4 High-impact applications


17.4.1 Rare events

Using the framework of generalised logistic regression (Section 17.4.1), we can


employ belief functions in a cautious approach to rare event prediction.
The outcome of the coupled optimisation problems (17.7) applied to a series of
training data is a pair of lower and upper belief functions on Y = {0, 1}:
542 17 An agenda for the future

BelX (.|β, x) BelX (.|β, x), (17.11)

associated with the optimal parameter vectors β, β, respectively.


After training, any new observation x can simply be plugged into BelX (.|β, x) and
BelX (.|β, x), yielding a pair of lower and upper belief functions on Y . Once again,
the result of the regression is an interval of belief functions, rather than an individual
object. Note also that each bound belief function is equivalent to a convex envelope
of logistic functions, making apparent the greater robustness of generalised logistic
regression (Figure 17.6).
The question of how this robust estimate of rare events relate to the results of
classical logit regression arises.

17.4.2 Climatic change

Climatic change [] is a paramount example of a problem which requires predictions


to be made under heavy uncertainty, due to both the imprecision associated with
any climate model, the long-term nature of the predictions involved, and the second-
order uncertainty affecting the statistical parameters involved.
A typical question a policymaker may ask a climate scientist is, for instance [?]:

“What is the probability that a doubling of atmospheric CO2 from pre-industrial


levels will raise the global mean temperature by at least 2o C?”

Rougier [?] has very nicely outlined a Bayesian approach to climate modelling
and prediction, in which the predictive distribution for future climate is found by
conditioning future climate on the observed values for historical and current climate.
A number of challenging arise:
– in climate prediction the collection of uncertain quantities for which the climate
scientist must specify prior probabilities can be large;
– specifying a prior distribution over climate vectors is very challenging.
Considering that people spend thousands of hours collecting climate data and con-
structing climate models, it is surprising to know that little attention is devoted to
quantifying our judgements about how the two are related.
In this Section, climate is represented as a vector of measurements y, collected
at a given time. Its components include, for instance, the level of CO2 concentration
on the various points of a grid.
More precisely, the climate vector y = (yh , yf ) collects both historical and present
(yh ) and future (yf ) climate values. A measurement error e is introduced to taken
into account errors due to, for instance, a seasick technician, or atmospheric turbu-
lence. The actual measurement vector is therefore:
.
z = yh + e
The Bayesian treatment of the problem makes use of a number of assumptions.
For starters:
17.4 High-impact applications 543

Axiom 1 Climate and measurement error are independent: e⊥y.


Axiom 2 The measurement error is Gaussian distributed, with 0 mean and covari-
ance Σ e : e ∼ N (0, Σ e ).
Thanks to these assumptions, the predictive distribution for the climate given the
measured values z = z̃ is:

p(y|z = z̃) ∼ N (z̃ − yh |0, Σ e )p(y), (17.12)

which requires us to specify a prior distribution for the climate vector y itself.

Climate models The choice of such a prior p(y) is extremely challenging, because
y is such a large collection of quantities, and these component quantities are linked
by complex interdependencies, such as those arising from the laws of nature.
The role of the climate model is then to induce a distribution for climate itself, and
plays the role of a parametric model in statistical inference (Section 1.3.3).
Namely, a climate model is a deterministic mapping from a collection of param-
eters x (equation coefficients, initial conditions, forcing functions) to a vector of
measurements (the ‘climate’):

x → y = g(x) (17.13)

where g belongs to a predefined ‘model space’ G.


Climate scientists call model evaluation an actual value g(x), computed for
some specific set of parameter values x. The reason is that the anlytical mapping
(17.13) is generally not known, and only images of specific input parameters can be
computed or sampled (at a cost).
A climate scientist considers, on a priori grounds given by their past experience, that
some choices of x are better than others, i.e. there exists a set of parameter values
x∗ such that:
y = g(x∗ ) + ∗ ,
where ∗ is termed ‘model discrepancy’.

Prediction via a parametric model The difference between the climate vector and
any model evaluation can be decomposed into two parts:

y − g(x) = g(x∗ ) − g(x) + ∗ .

The first part is a contribution that may be reduced by a better choice of the model g;
the second part is, instead, an irreducible contribution that arises from the model’s
own imperfections.
Note that x∗ is not just a statistical parameter, though, for it relates to physical quan-
tities, so that climate scientists have a clear intuition of its effects. Consequently,
scientists may be able to exploit their expertise to provide a prior p(x∗ ) on the input
parameters.
In this Bayesian framework to climate prediction, two more assumptions are
needed.
544 17 An agenda for the future

Axiom 3 ‘Best’ input, discrepancy, and measurement error are mutually (statisti-
cally) independent:
x∗ ⊥∗ ⊥e.

Axiom 4 The model discrepancy ∗ is Gaussian distributed, with mean 0 and co-
variance Σ  .
Axioms 3 and 4 then allow us to compute the desired climate prior as:
Z
p(y) = N (y − g(x∗ )|0, Σ  )p(x∗ )dx∗ , (17.14)

which can be plugged into (17.12) to yield a Bayesian prediction of future climate
values.
In practice, as we said, the climate model function g(.) is not known – we only
possess a sample of model evaluations {g(x1 ), ..., g(xn )}. We call model validation
the process of tuning the covariances Σ  , Σ e , and checking the validity of the gaus-
sianity assumptions 2 and 4.
This can be done by using (17.12) to predict past/present climates p(z), and apply
some hypothesis testing to the result. If the observed value z̃ is in the tail of the
distribution, the model parameters (if not the entire set of model assumptions) need
to be corrected. As Rougier admits [], responding to bad validation results is not
straightforward.

Model calibration Assuming that the model has been validated, it needs to be ‘cal-
ibrated’, i.e., we need to find the desired ‘best’ value x∗ of the model’s parameters.
Indeed, under Axioms 1–4 we can compute:

p(x∗ |z = z̃) = p(z = z̃|x∗ ) = N (z̃ = g(x∗ )|0, Σ  + Σ e )p(x∗ ).

As we know, MAP could be applied to the above posterior distribution: however,


the presence of multiple modes could make it ineffective.

Bayesian posterior prediction In alternative, we can apply full Bayesian inference


to compute:
Z
p(yf |z = z̃) = p(yf |x∗ , z = z̃)p(x∗ |z = z̃)dx∗ , (17.15)

where p(yf |x∗ , z = z̃) is Gaussian with a mean which depends on z̃ − g(x).
The posterior prediction (17.15) highlights two routes for climate data to impact on
future climate predictions:
– by concentrating the distribution p(x∗ |z = z̃) relative to the prior p(x∗ ), depend-
ing on both quantity and quality of the climate data;
– by shifting the mean of p(yf |x∗ , z = z̃) away from g(x), depending on the size
of the difference z̃ − g(x).
17.4 High-impact applications 545

Role of model evaluations Let us go back to the initial question: what is the prob-
ability that a doubling of atmospheric CO2 will raise the global mean temperature
by at least 2o C by 2100?
Let Q ⊂ Y be the set of climates y for which the global mean temperature is
at least 2o C higher in 2100. The probability of the event of interest can then be
computed by integration, as follows:
Z
P r(yf ∈ Q|z = z̃) = f (x∗ )p(x∗ |z = z̃)dx∗ .
Z
The following integral can be computed directly f (x) = n(yf |µ(x), Σ)dyf
Q
the other integral requires numerical integration, e.g.
Pn
f (x )
– naive Monte-Carlo: ∼ = Pi=1n i , xi ∼ p(x∗ |z = z̃)
R
n
w f (x )
– weighted sampling: ∼ = i=1 ni i , xi ∼ p(x∗ |z = z̃) weighted by the likeli-
R

hood:
wi ∝ p(z = z̃|x∗ = xi )
sophisticated models whicn take a long time to evaluate may not provide enough
samples for the prediction to be statistically significant
albeit they may make the prior p(x∗ ) and covariance Σ  easier to specify

Modelling climate with belief functions there is a number of issue with making
climate inferences in the Bayesian framework
lots of assumptions are necessary (e.g. Gaussianity), most of them to make cal-
culations practical rather than anything else
although the prior on climates is reconduced to prior on the parameters of a
climate model, there is no obvious way of picking p(x∗ )
it is far easier to say what are wrong choices (e.g. uniform priors)
significant parameter tuning is required (e.g. for Σ  , Σ e ..)
Quite a lot of work to do, but a few landmarks:
– avoid committing to priors p(x∗ ) on the correct climate model parameters
– use climate model as a parametric model to infer either a BF on the space of
climates Y
– or on the space of parameters (e.g. covariances, etc) of the distribution on Y

17.4.3 Statistical learning theory

new challenging real-world applications, such as smart cars navigating a complex,


dynamic environment
robot surgical assistants capable of predicting the surgeons needs
existing theory and algorithms typically focus on fitting the observable outputs
in the training data
may lead, for instance, an autonomous driving system to perform well on vali-
dation tests but fail catastrophically when tested in the real world
546 17 An agenda for the future

unable to predict how a system will behave in a radically new setting (e.g., how
does a smart car cope with driving through extreme weather conditions?
most systems have no way of detecting whether their underlying assumptions
have been violated: they will happily continue to predict and act even on inputs that
are completely outside the scope of what they have actually learned
it is imperative to ensure that these algorithms behave predictably in the wild

PAC learning classical statistical learning theory [Vapnik] contemplates “gener-


alisation” criteria which are based on a naive correlation between smoothness and
generality
makes PAC predictions on the reliability of a training set which are based on
simple quantities such as number of samples N
generalisation problem: training error is different from the expected generalisa-
tion error – in classification problems:
N
X
Ex∼D [δ(h(x) 6= y(x))] 6= δ(h(xn ) 6= y(xn ))
n=1

where the training data x = [x1 , ..., xn ] is assumed drawn from a distribution D,
h(x) is the predicted label for input x and y(x) the actual label
Definition 93. Probabilistically Approximately Correct learning The learning al-
gorithm finds with probability at least 1 − δ a model h ∈ H which is approximately
correct, i.e. it makes a training error of no more than 
the main result of PAC learning is that we can relate the required size N of a
training sample to the size of the model space H
1
log |H| ≤ N  − log
δ
so the minimum number of training examples given , δ and |H| is
1 1
N≥ log |H| + log
 δ
for infinite-dimensional hypothesis spaces H
17.4 High-impact applications 547

Definition 94. Vapnik-Chervonenkis Dimension The VC dimension of H is the max-


imum number of points that can be successfully shattered by a hypothesis h ∈ H
(i.e, they can be correctly classified by some h ∈ H for all possible binary labellings
of these points).

dramatically overestimate the number of training instances required


pretty useless for model selection, for bounds are too wide: people do cross
validation instead
however, it provides the only justification for max margin linear SVMs!
for the space Hm of linear classifiers with margin m

4R2
V CSV M = min{D, }+1
m2
where R is the radius of the smallest hypersphere enclosing all the training data
Large margin classifiers As the VC dimension of Hm decreases when m grows,
it is desirable to select linear boundaries with max margin.

Imprecise-theoretical foundations for machine learning issues with Vapnik’s


traditional statistical learning theory have been recently recognised by many re-
searchers [http://futureoflife.org/AI/2015awardeesErmonErmon, http://cs.stanford.edu/ pli-
ang/Liang, Weller]
what about deep learning? nobody has a clue of why it works, really
approaches should provide worst-case guarantees: it is not possible to rule
out completely unexpected behaviours or catastrophic failures
Percy Liang’s proposal: a new generation of ML algorithms which, rather than
learning models that predict accurately on a target distribution, use minimax opti-
mization to learn models that are suitable for any target distribution within a “safe”
family
concept does evoke imprecise probability!
minimax models similar to Liang’s are naturally associated with convex sets of
probabilities
imprecise probabilities naturally arise whenever the data are insufficient to allow
the estimation of a probability distribution
training sets in virtually all applications of machine learning constitute a glaring
example of data which is
– insufficient in quantity (think of a Google object detection from images routine
trained on even a few million images compared to the thousands of billions of
images out there)
– insufficient in quality (as they are selected based on criteria such as cost, avail-
ability or mental attitudes, therefore biassing the whole learning process
uncertainty theory may be able to provide worst-case, cautious predictions, de-
livering AI agents aware of their own limitations
research programme: a generalisation of the concept of Probably Approxi-
mately Correct – where does the probability distribution of the data come from?
References

1.
2.
3.
4. Tech. report.
5.
6. Weighing evidence: The design and comparison of probability thought experiments,
Tech. report, Research paper, 75th anniversary colloquium series Harvard Business
School, 1983.
7. Evidentiary value: philosophical, judicial and psychological aspects of a theory
(P. Gardenfors, B. Hansson, and N. E. Sahlin, eds.), 1988.
8. Nilson’s probabilistic entailment extended to dempster-shafer theory, International
Journal of Approximate Reasoning 2 (1988), no. 3, 339 – 340.
9. A. Laurentini A. Bottino and P. Zuccone, Towards non-intrusive motion capture, Asian
Conf. on Computer Vision, 1998.
10. M Lecours A Cheaito and E Bosse, Modified dempster-shafer approach using an ex-
pected utility interval decision rule, Proc. SPIE 3719, Sensor Fusion: Architectures,
Algorithms, and Applications III, vol. 34, 1999.
11. S. Abel, The sum-and-lattice points method based on an evidential reasoning system
applied to the real-time vehicle guidance problem, Uncertainty in Artificial Intelli-
gence 2 (Lemmer and Kanal, eds.), 1988, pp. 365–370.
12. A. Agarwal and B. Triggs, A local basis representation for estimating human pose
from cluttered images, 2006, pp. I:50–59.
13. Ankur Agarwal and Bill Triggs, 3d human pose from silhouettes by relevance vector
regression, cvpr 02 (2004), 882–888.
14. , Learning to track 3d human motion from silhouettes, ICML ’04: Proceed-
ings of the twenty-first international conference on Machine learning (New York, NY,
USA), ACM Press, 2004, p. 2.
15. J. Aggarwal and Q. Cai, Human motion analysis: a review, Computer Vision and Im-
age Understanding 73 (1999).
16. , Human motion analysis: a review, IEEE Proc. Nonrigid and Articulated Mo-
tion Workshop, June 1997, pp. 90–102.
17. J. Aggarwal, Q. Cai, W. Liao, and B. Sabata, Articulated and elastic non-rigid mo-
tion: A review, IEEE Proc. Nonrigid and Articulated Motion Workshop, Austin, Texas,
1994, pp. 2–14.
18. , Nonrigid motion analysis: articulated and elastic motion, CVIU 70 (1998),
142–156.
19. Martin Aigner, Combinatorial theory, Classics in Mathematics, Springer, New York,
1979.
20. J. Aitchinson, Discussion on professor Dempster’s paper, Journal of the Royal Statis-
tical Society B 30 (1968), 234–237.

549
550 References

21. K. Akita, Image sequence analysis of real world human motion, Pattern Recognition
17 (1984), 73–83.
22. R. Almond, Belief function models for simple series and parallel systems, Tech. report,
Department of Statistics, University of Washington, Tech. Report 207, 1991.
23. R. G. Almond, Fusion and propagation of graphical belief models: an implementation
and an example, PhD dissertation, Department of Statistics, Harvard University, 1990.
24. , Graphical belief modeling, Chapman and Hall/CRC, 1995.
25. Diego A. Alvarez, On the calculation of the bounds of probability of events using
infinite random sets, International Journal of Approximate Reasoning 43 (2006), no. 3,
241 – 267.
26. J. Amat, M. Casals, and M. Frigola, Stereoscopic systems for human body tracking in
natural scenes, Int. Workshop on Modeling People at ICCV’99, September 1999.
27. P. An and W. M. Moon, An evidential reasoning structure for integrating geophysical,
geological and remote sensing data, Proceedings of IEEE, 1993, pp. 1359–1361.
28. Z. An, Relative evidential support, PhD dissertation, University of Ulster, 1991.
29. Z. An, D. A. Bell, and J. G. Hughes, Relation-based evidential reasoning, International
Journal of Approximate Reasoning 8 (1993), 231–251.
30. Z. An, D.A. Bell, and J.G. Hughes, Relation-based evidential reasoning, International
Journal of Approximate Reasoning 8 (1993), no. 3, 231 – 251.
31. K.A. Andersen and J.N. Hooker, A linear programming framework for logics of un-
certainty, Decision Support Systems 16 (1996), 39–53.
32. , A linear programming framework for logics of uncertainty, Decision Support
Systems 16 (1996), no. 1, 39 – 53.
33. B. Anrig, R. Haenni, and N. Lehmann, ABEL - a new language for assumption-based
evidential reasoning under uncertainty, Tech. report, Institute of Informatics, Univer-
sity of Fribourg, 1997.
34. A. Antonucci and F. Cuzzolin, Credal sets approximation by lower probabilities: Ap-
plication to credal networks, Proc. of IPMU 2010, 2010.
35. A. Appriou, Knowledge propagation in information fusion processes, Keynote talk,
wtbf’10, Brest, France, mars 2010.
36. O. Aran, T. Burger, A. Caplier, and L. Akarun, Sequential Belief-Based Fusion of
Manual and Non-manual Information for Recognizing Isolated Signs, Gesture-Based
Human-Computer Interaction and Simulation (2009), 134–144.
37. Oya Aran, Thomas Burger, Alice Caplier, and Lale Akarun, A belief-based sequen-
tial fusion approach for fusing manual and non-manual signs, Pattern Recognition 42
(2009), no. 5, 812–822.
38. A. Aregui and T. Denoeux, Constructing consonant belief functions from sample data
using confidence sets of pignistic probabilities, International Journal of Approximate
Reasoning 49 (2008), no. 3, 575–594.
39. M. Armstrong and A. Zisserman, Robust object tracking, Proc. ACCV’95, Singapore,
vol. 1, December 1995, pp. 58–62.
40. Krassimir T. Atanassov, Intuitionistic fuzzy sets, Fuzzy Sets and Systems 20 (1986),
no. 1, 87 – 96.
41. C.I. Attwood, G.D. Sullivan, and K.D. Baker, Model-based recognition of human
posture using single synthetic images, Fifth Alvey Vision Conference, Reading, UK,
1989.
42. T. Augustin, Modeling weak information with generalized basic probability assign-
ments, Data Analysis and Information Systems - Statistical and Conceptual Ap-
proaches (H. H. Bock and W. Polasek, eds.), Springer, 1996, pp. 101–113.
References 551

43. Thomas Augustin, Generalized basic probability assignments, International Journal


of General Systems 34 (2005), no. 4, 451–463.
44. A. Ayoun and Philippe Smets., Data association in multi-target detection using the
transferable belief model, Intern. J. Intell. Systems (2001).
45. A. Azarbayejani and A. Pentland, Real-time self calibrating stereo person tracking
using 3-d shape estimation from blob features, Proc. of the 13th International Confer-
ence on Pattern Recognition, vol. 3, 1996, pp. 627–632.
46. A. Azarbayejani, C.R. Wren, and A. Pentland, Real-time 3-d tracking of the human
body, IMAGE’COM 96, Bordeaux, France, May 1996.
47. J. Baldwin, Support logic programming, Tech. report, Tech. Report ITRC 65, Infor-
mation Technology Research Center, Univ. of Bristol, 1985.
48. J. F. Baldwin, Evidential support logical programming, Fuzzy Sets and Systems 24
(1985), 1–26.
49. J. F. Baldwin, Evidential support logic programming, Fuzzy Sets Syst. 24 (1987),
no. 1, 1–26.
50. , Combining evidences for evidential reasoning, International Journal of Intel-
ligent Systems 6 (1991), no. 6, 569–616.
51. , Fuzzy logic and fuzzy control: Ijcai ’91 workshops on fuzzy logic and fuzzy
control sydney, australia, august 24, 1991 proceedings, ch. A theory of mass assign-
ments for artificial intelligence, pp. 22–34, Springer Berlin Heidelberg, Berlin, Hei-
delberg, 1994.
52. J. F. Baldwin, Towards a general theory of evidential reasoning, Proceedings of the 3rd
International Conference on Information Processing and Management of Uncertainty
in Knowledge-Based Systems (IPMU’90) (B. Bouchon-Meunier, R.R. Yager, and L.A.
Zadeh, eds.), Paris, France, 2-6 July 1990, pp. 360–369.
53. , Combining evidences for evidential reasoning, International Journal of Intel-
ligent Systems 6:6 (September 1991), 569–616.
54. J. F. Baldwin, T. P. Martin, and B. W. Pilsworth, Fril- fuzzy and evidential reasoning
in artificial intelligence, John Wiley & Sons, Inc., New York, NY, USA, 1995.
55. James F. Baldwin, Advances in the dempster-shafer theory of evidence, John Wiley &
Sons, Inc., New York, NY, USA, 1994, pp. 513–531.
56. Mohua Banerjee and Didier Dubois, Symbolic and quantitative approaches to reason-
ing with uncertainty: 10th european conference, ecsqaru 2009, verona, italy, july 1-3,
2009. proceedings, ch. A Simple Modal Logic for Reasoning about Revealed Beliefs,
pp. 805–816, Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.
57. G. Banon, Distinction between several subsets of fuzzy measures, Fuzzy Sets and Sys-
tems 5 (1981), no. 3, 291 – 305.
58. Yaakov Bar-Shalom and Thomas E. Fortmann, Tracking and data association, Aca-
demic Press, Inc., 1988.
59. J.A. Barnett, Computational methods for a mathematical theory of evidence, Proc. of
the 7th National Conference on Artificial Intelligence (AAAI-88), 1981, pp. 868–875.
60. Jonathan Baron, Second-order probabilities and belief functions, Theory and Decision
23 (1987), no. 1, 25–36.
61. P. Baroni, Extending consonant approximations to capacities, Proceedings of IPMU,
2004, pp. 1127–1134.
62. Pietro Baroni and Paolo Vicig, Symbolic and quantitative approaches to reasoning
with uncertainty: 6th european conference, ecsqaru 2001 toulouse, france, septem-
ber 19–21, 2001 proceedings, ch. On the Conceptual Status of Belief Functions with
Respect to Coherent Lower Probabilities, pp. 328–339, Springer Berlin Heidelberg,
Berlin, Heidelberg, 2001.
552 References

63. Pietro Baroni and Paolo Vicig, Transformations from imprecise to precise probabili-
ties, ECSQARU, 2003, pp. 37–49.
64. J. L. Barron, D. J. Fleet, and S. S. Beauchemin, Performance of optical flow tech-
niques, International Journal of Computer Vision, vol. 12(1), 1994, pp. 43–77.
65. Jean-Pierre Barthélemy, Monotone functions on finite lattices: An ordinal approach to
capacities, belief and necessity functions, pp. 195–208, Physica-Verlag HD, Heidel-
berg, 2000.
66. O. Basir, F. Karray, and Hongwei Zhu, Connectionist-based dempster-shafer eviden-
tial reasoning for data fusion, Trans. Neur. Netw. 16 (2005), no. 6, 1513–1530.
67. D. Batens, C. Mortensen, and G. Priest, Frontiers of paraconsistent logic, Studies in
logic and computation (J.P. Van Bendegem, ed.), vol. 8, Research Studies Press, 2000.
68. M. Bauer, A Dempster-Shafer approach to modeling agent preferences for plan recog-
nition, User Modeling and User-Adapted Interaction 5:3-4 (1995), 317–348.
69. , Approximation algorithms and decision making in the Dempster-Shafer the-
ory of evidence – An empirical study, International Journal of Approximate Reasoning
17 (1997), 217–237.
70. , Approximations for decision making in the Dempster-Shafer theory of evi-
dence, Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence
(F. Horvitz, E.; Jensen, ed.), Portland, OR, USA, 1-4 August 1996, pp. 73–80.
71. A. Baumberg and D. Hogg, Learning flexible models from image sequences,
ECCV’94, Stockholm (J. Eklundh, ed.), vol. 800, 1994, pp. 299–308.
72. P. Beardsley, P. Torr, and A. Zisserman, 3D model acquisition from extended image
sequences, Proc. of ECCV’96, Cambridge, UK, vol. 2, April 1996, pp. 683–695.
73. D. A. Bell and J. W. Guan, Discounting and combination operations in evidential
reasoning, Uncertainty in Artificial Intelligence. Proceedings of the Ninth Conference
(1993) (A. Heckerman, D.; Mamdani, ed.), Washington, DC, USA, 9-11 July 1993,
pp. 477–484.
74. D. A. Bell, J. W. Guan, and G. M. Shapcott, Using the Dempster-Shafer orthogonal
sum for reasoning which involves space, Kybernetes 27:5 (1998), 511–526.
75. D.A. Bell, J.W. Guan, and Suk Kyoon Lee, Generalized union and project operations
for pooling uncertain and imprecise information, Data and Knowledge Engineering
18 (1996), 89–117.
76. , Generalized union and project operations for pooling uncertain and impre-
cise information, Data Knowledge Engineering 18 (1996), no. 2, 89 – 117.
77. Yakov Ben-Haim, Info-gap decision theory (second edition), second edition ed., Aca-
demic Press, Oxford, 2006.
78. Kent Bendall, Belief-theoretic formal semantics for first-order logic and probability,
Journal of Philosophical Logic 8 (1979), no. 1, 375–397.
79. A. Bendjebbour and W. Pieczynski, Unsupervised image segmentation using
Dempster-Shafer fusion in a Markov fields context, Proceedings of the Interna-
tional Conference on Multisource-Multisensor Information Fusion (FUSION’98)
(R. Hamid, A. Zhu, and D. Zhu, eds.), vol. 2, Las Vegas, NV, USA, 6-9 July 1998,
pp. 595–600.
80. S. Benferhat, A. Saffiotti, and Ph. Smets, Belief functions and default reasoning, Procs.
of the 11th Conf. on Uncertainty in AI. Montreal, Canada, 1995, pp. 19–26.
81. S. Benferhat, Alessandro Saffiotti, and Philippe Smets, Belief functions and de-
fault reasonings, Tech. report, Universite’ Libre de Bruxelles, Technical Report
TR/IRIDIA/95-5, 1995.
82. R. J. Beran, On distribution-free statistical inference with upper and lower probabili-
ties, Annals of Mathematical Statistics 42 (1971), 157–168.
References 553

83. Berger, Robust bayesian analysis: Sensitivity to the prior, Journal of Statistical Plan-
ning and Inference 25 (1990), 303–328.
84. N. Bergman and A. Doucet, Markov chain monte carlo data association for target
tracking, IEEE Int. Conference on Acoustics, Speech and Signal Processing, 2000.
85. Ulla Bergsten and Johan Schubert, Dempster’s rule for evidence ordered in a complete
directed acyclic graph, International Journal of Approximate Reasoning 9 (1993), 37–
73.
86. Ulla Bergsten, Johan Schubert, and P. Svensson, Applying data mining and ma-
chine learning techniques to submarine intelligence analysise, Proceedings of the
Third International Conference on Knowledge Discovery and Data Mining (KDD’97)
(D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy, eds.), Newport Beach,
USA, 14-17 August 1997, pp. 127–130.
87. R. Bertschy and P.A. Monney, A generalization of the algorithm of heidtmann to non-
monotone formulas, Journal of Computational and Applied Mathematics 76 (1996),
no. 12, 55 – 76.
88. P. Besnard and Jurg Kohlas, Evidence theory based on general consequence relations,
Int. J. of Foundations of Computer Science 6 (1995), no. 2, 119–135.
89. PHILIPPE BESNARD and JRG KOHLAS, Evidence theory based on general conse-
quence relations, International Journal of Foundations of Computer Science 06 (1995),
no. 02, 119–135.
90. B. Besserer, S. Estable, and B. Ulmer, Multiple knowledge sources and evidential
reasoning for shape recognition, Proceedings of IEEE, 1993, pp. 624–631.
91. Albrecht Beutelspacher and Ute Rosenbaum, Projective geometry, Cambridge Uni-
versity Press, Cambridge, 1998.
92. Malcolm Beynon, Bruce Curry, and Peter Morgan, The Dempster-Shafer theory of
evidence: approach to multicriteria decision modeling, OMEGA: The International
Journal of Management Science 28 (2000), 37–50.
93. KK Bharadwaj, Neerja, and GC Goel, Hierarchical censored production rules (hcprs)
system employing the Dempster-Shafer uncertainty calculus, Information and Soft-
ware Technology 36 (1994), no. 3, 155 – 164.
94. A.G. Bharatkumar, K.E. Diagle, M.G. Pandy, Q. Cai, and J.K. Aggarwal, Lower limb
kinematics of human walking with the medial axis transformation, Workshop on Mo-
tion of Non-Rigid and Articulated Objects, Austin, Texas, 1994.
95. P. Bhattacharya, On the dempster-shafer evidence theory and non-hierarchical aggre-
gation of belief structures, IEEE Transactions on Systems, Man, and Cybernetics -
Part A: Systems and Humans 30 (2000), no. 5, 526–536.
96. Loredana Biacino, Fuzzy subsethood and belief functions of fuzzy events, Fuzzy Sets
and Systems 158 (2007), no. 1, 38 – 49.
97. Elisabetta Binaghi, L. Luzi, P. Madella, F. Pergalani, and A. Rampini, Slope insta-
bility zonation: a comparison between certainty factor and fuzzy Dempster-Shafer
approaches, Natural Hazards 17 (1998), 77–97.
98. Elisabetta Binaghi, P. Madella, I. Gallo, and A. Rampini, A neural refinement strategy
for a fuzzy Dempster-Shafer classifier of multisource remote sensing images, Proceed-
ings of the SPIE - Image and Signal Processing for Remote Sensing IV, vol. 3500,
Barcelona, Spain, 21-23 Sept. 1998, pp. 214–224.
99. Elisabetta Binaghi and Paolo Madella, Fuzzy Dempster-Shafer reasoning for rule-
based classifiers, International Journal of Intelligent Systems 14 (1999), 559–583.
100. J. Binder, D. Koeller, S. Russell, and K. Kanazawa, Adaptive probabilistic networks
with hidden variables, Machine Learning, vol. 29, 1997, pp. 213–244.
554 References

101. G. Birkhoff, Abstract linear dependence and lattices, American Journal of Mathemat-
ics 57 (1935), 800–804.
102. , Lattice theory (3rd edition), Amer. Math. Soc. Colloquium Publications, Vol.
25, Providence, RI, 1967.
103. R. Bissig, Jurg Kohlas, and N. Lehmann, Fast-division architecture for Dempster-
Shafer belief functions, Qualitative and Quantitative Practical Reasoning, First In-
ternational Joint Conference on Qualitative and Quantitative Practical Reasoning;
ECSQARU–FAPR’97 (D. Gabbay, R. Kruse, A. Nonnengart, and H.J. Ohlbach, eds.),
Springer, 1997.
104. G. Biswas and T. S. Anand, Using the Dempster-Shafer scheme in a mixed-initiative
expert system shell, Uncertainty in Artificial Intelligence, volume 3 (L.N. Kanal, T.S.
Levitt, and J.F. Lemmer, eds.), North-Holland, 1989, pp. 223–239.
105. M. Black and P. Anandan, The robust estimation of multiple motions: parametric and
piecewise smooth flow fields, Computer Vision and Image Understanding, vol. 63(1),
January 1996, pp. 75–104.
106. M. J. Black, Explaining optical flow events with parameterized spatio-temporal mod-
els, Proc. of Conference on Computer Vision and Pattern Recognition, vol. 1, 1999,
pp. 326–332.
107. M.J. Black and A.D. Jepson, Eigentracking: Robust matching and tracking of articu-
lated objects using a view-based representation, ECCV’96, 1996, pp. 329–342.
108. M.J. Black and Y. Yacoob, Tracking and recognizing rigid and non-rigid facial mo-
tions using local parametric models of image motions, Proceedings of the International
Conference on Computer Vision ICCV’95, Cambridge, MA, 1995, pp. 374–381.
109. P. Black, Is Shafer general Bayes?, Proceedings of the Third AAAI Uncertainty in
Artificial Intelligence Workshop, 1987, pp. 2–9.
110. , An examination of belief functions and other monotone capacities, PhD dis-
sertation, Department of Statistics, Carnegie Mellon University, 1996, Pgh. PA 15213.
111. , Geometric structure of lower probabilities, Random Sets: Theory and Appli-
cations (Goutsias, Malher, and Nguyen, eds.), Springer, 1997, pp. 361–383.
112. Paul K. Black, Is shafer general bayes?, CoRR abs/1304.2711 (2013).
113. A. Blake and M. Isard, Active contours, Springer-Verlag, April 1998.
114. I. Bloch, Information combination operators for data fusion: a comparative review
with classification, IEEE Transactions on Systems, Man, and Cybernetics - Part A:
Systems and Humans 26 (1996), no. 1, 52–67.
115. Isabelle Bloch, Some aspects of Dempster-Shafer evidence theory for classification
of multi-modality medical images taking partial volume effect into account, Pattern
Recognition Letters 17 (1996), 905–919.
116. Edwin A. Bloem and Henk A.P. Blom, Joint probabilistic data association methods
avoiding track coalescence, Proceedings of the 34th Conference on Decision and Con-
trol, December 1995.
117. Aaron F. Bobick and Andrew D. Wilson, Learning visual behavior for gesture analy-
sis, IEEE Symposium on Computer Vision, November 1995.
118. P.L. Bogler, ShaferDempster reasoning with applications to multisensor target iden-
tification systems, IEEE Transactions on Systems, Man and Cybernetics 17 (1987),
no. 6, 968–977.
119. H. Borotschnig, L. Paletta, M. Prantl, and A. Pinz, A comparison of probabilistic, pos-
sibilistic and evidence theoretic fusion schemes for active object recognition, Comput-
ing 62 (1999), 293–319.
120. Michael Boshra and Hong Zhang, Accommodating uncertainty in pixel-based verifi-
cation of 3-d object hypotheses, Pattern Recognition Letters 20 (1999), 689–698.
References 555

121. E. Bosse and J. Roy, Fusion of identity declarations from dissimilar sources using the
Dempster-Shafer theory, Optical Engineering 36:3 (March 1997), 648–657.
122. J. R. Boston, A signal detection system based on Dempster-Shafer theory and compar-
ison to fuzzy detection, IEEE Transactions on Systems, Man, and Cybernetics - Part
C: Applications and Reviews 30:1 (February 2000), 45–51.
123. L. Boucher, T. Simons, and P. Green, Evidential reasoning and the combination of
knowledge and statistical techniques in syllable based speech recognition, Proceed-
ings of the NATO Advanced Study Institute, Speech Recognition and Understanding.
Recent Advances, Trends and Applications (R. Laface, P.; De Mori, ed.), Cetraro,
Italy, 1-13 July 1990, pp. 487–492.
124. S. Boucheron and E. Gassiat, Optimal error exponents for HMM order estimation,
IEEE Trans. Info. Th. 48 (2003), 964–980.
125. R. Bowden, T. Mitchell, and M. Sarhadi, Reconstructing 3D pose and motion from a
single camera view, BMVC’98, Southampton, UK, 1998, pp. 904–913.
126. M. Brand, Shadow puppetry, ICCV’99, Corfu, Greece, September 1999.
127. M. Brand, N. Oliver, and A. Pentland, Coupled hmm for complex action recogni-
tion, Proc. of Conference on Computer Vision and Pattern Recognition, vol. 29, 1997,
pp. 213–244.
128. Jerome J. Braun, Dempster-shafer theory and bayesian reasoning in multisensor data
fusion, 2000, pp. 255–266.
129. C. Bregler, Learning and recognizing human dynamics in video sequences, Proc. of
the Conference on Computer Vision and Pattern Recognition, 1997, pp. 568–574.
130. C. Bregler and J. Malik, Video motion capture, Tech. report, UCB//CSD-97-973, Com-
puter Science Dept., U.C. Berkeley, 1997.
131. , Estimating and tracking kinematic chains, Proceedings of the Conference on
Computer Vision and Pattern Recognition CVPR’98, Santa Barbara, CA, June 1998.
132. , Tracking people with twists and exponential maps, Proceedings of the Con-
ference on Computer Vision and Pattern Recognition CVPR’98, Santa Barbara, CA,
June 1998.
133. M. Bruning and D. Denneberg, Max-min σ-additive representation of monotone mea-
sures, Statistical Papers 34 (2002), 23–35.
134. Noel Bryson and Ayodele Mobolurin, Qualitative discriminant approach for gener-
ating quantitative belief functions, IEEE Transactions on Knowledge and Data Engi-
neering 10 (1998), 345–348.
135. B. G. Buchanan and E. H. Shortliffe, Rule-based expert systems, Addison-Wesley,
Reading (MA), 1984.
136. D. M. Buede and J. W. Martin, Comparison of bayesian and dempster-shafer fusion,
In 1989 Tri-Service Data Fusion Symposium, 1989, pp. 81–101.
137. Dennis M. Buede and Paul Girardi, Target identification comparison of Bayesian and
Dempster-Shafer multisensor fusion, IEEE Transactions on Systems, Man, and Cyber-
netics Part A: Systems and Humans. 27 (1997), 569–577.
138. A. Bundy, Incidence calculus: A mechanism for probability reasoning, Journal of au-
tomated reasoning 1 (1985), 263–283.
139. T. Burger, Defining new approximations of belief function by means of dempster’s
combination, Proceedings of the Workshop on the theory of belief functions, 2010.
140. T. Burger, O. Aran, A. Urankar, L. Akarun, and A. Caplier, A dempster-shafer theory
based combination of classifiers for hand gesture recognition, Computer Vision and
Computer Graphics - Theory and Applications, Lecture Notes in Communications in
Computer and Information Science (2008).
556 References

141. T. Burger and A. Caplier, A Generalization of the Pignistic Transform for Partial
Bet, Proceedings of the 10th European Conference on Symbolic and Quantitative
Approaches to Reasoning with Uncertainty (ECSQARU), Verona, Italy, July 1-3,
Springer-Verlag New York Inc, 2009, pp. 252–263.
142. T. Burger and F. Cuzzolin, The barycenters of the k-additive dominating belief func-
tions and the pignistic k-additive belief functions, First International Workshop on the
Theory of Belief Functions (BELIEF’10), Brest, France, 2010.
143. T. Burger and F. Cuzzolin, The barycenters of the k-additive dominating belief func-
tions & the pignistic k-additive belief functions, (2010).
144. T. Burger, Y. Kessentini, and T. Paquet, Dealing with precise and imprecise decisions
with a Dempster-Shafer theory based algorithm in the context of handwritten word
recognition, 2010 12th International Conference on Frontiers in Handwriting Recog-
nition, IEEE, 2010, pp. 369–374.
145. Thomas Burger, Oya Aran, and Alice Caplier, Modeling hesitation and conflict: A
belief-based approach for multi-class problems, Machine Learning and Applications,
Fourth International Conference on (2006), 95–100.
146. P. Burman, A comparative study of ordinary cross-validation, v-fold cross-validation
and the repeated learning-testing methods, Biometrika 76(3) (1989), 503–514.
147. A. C. Butler, F. Sadeghi, S. S. Rao, and S. R. LeClair, Computer-aided de-
sign/engineering of bearing systems using the Dempster-Shafer theory, Artificial In-
telligence for Engineering Design, Analysis and Manufacturin 9:1 (January 1995),
1–11.
148. R. Buxton, Modelling uncertainty in expert systems, International Journal of Man-
Machine Studies 31 (1989), 415–476.
149. Y. Li C. Hu, Q. Tu and S. Ma, Extraction of parametric human model for posture
recognition using genetic algorithm, Fourth International Conference on Automatic
Face and Gesture Recognition, Grenoble, France, March 2000.
150. X. Xu C. Wen and Z. Li, Research on unified description and extension of combination
rules of evidence based on random set theory, The Chinese Journal of Electronics 17.
151. J. Rocha C. Yaniz and F. Perales, 3D region graph for reconstruction of human motion,
Workshop on Perception of Human Motion at ECCV, 1998.
152. Q. Cai and J.K. Aggarwal, Tracking human motion using multiple cameras, Interna-
tional Conference on Pattern Recognition, 1996.
153. Q. Cai, A. Mitiche, and J.K. Aggarwal, Tracking human motion in an indoor environ-
ment, International Conference on Image Processing, 1995.
154. C. Camerer and M. Weber, Recent developments in modeling preferences: uncertainty
and ambiguity, Journal of Risk and Uncertainty 5 (1992), 325–370.
155. L. Campbell and A. Bobick, Recognition of human body motion using phase space
constraints, ICCV’95, Cambridge, MA, 1995.
156. F. Campos and S. Cavalcante, An extended approach for dempster-shafer theory, In-
formation Reuse and Integration, 2003. IRI 2003. IEEE International Conference on,
Oct 2003, pp. 338–344.
157. F. Campos and F. M. C. de Souza, Extending dempster-shafer theory to overcome
counter intuitive results, 2005 International Conference on Natural Language Process-
ing and Knowledge Engineering, Oct 2005, pp. 729–734.
158. F. Campos and F.M.C. de Souza, Extending Dempster-Shafer theory to overcome
counter intuitive results, Proceedings of IEEE NLP-KE ’05, vol. 3, 2005, pp. 729–
734.
References 557

159. J. Cano, M. Delgado, and S. Moral, An axiomatic framework for propagating uncer-
tainty in directed acyclic networks, International Journal of Approximate Reasoning 8
(1993), 253–280.
160. J. Carlson and R.R. Murphy, Use of Dempster-Shafer conflict metric to adapt sensor
allocation to unknown environments, Tech. report, Safety Security Rescue Research
Center, University of South Florida, 2005.
161. Lucas Caro and Araabi Babak Nadjar, Generalization of the Dempster-Shafer theory:
a fuzzy-valued measure, IEEE Transactions on Fuzzy Systems 7 (1999), 255–270.
162. W. F. Caselton and W. Luo, Decision making with imprecise probabilities: Dempster-
Shafer theory and application, Water Resources Research 28 (1992), 3071–3083.
163. M. E. G. V. Cattaneo, Combining belief functions issued from dependent sources.,
ISIPTA, 2003, pp. 133–147.
164. Marco E. G. V. Cattaneo, Combining belief functions issued from dependent sources.,
ISIPTA, 2003, pp. 133–147.
165. Subhash Challa and Don Koks, Bayesian and dempster-shafer fusion, Sadhana 29
(2004), no. 2, 145–174.
166. T.-J. Cham and J. Rehg, A multiple hypothesis approach to figure tracking, Proceed-
ings of CVPR’99, Fort Collins, Colorado, vol. 2, 1999, pp. 239–245.
167. M. Chan, D. Metaxas, and S. Dickinson, A new approach to tracking 3-d objects in
2-d image sequences, Proc. of AAAI’94, Seattle, WA, August 1994.
168. A. Chateauneuf and J. Y. Jaffray, Some characterizations of lower probabilities and
other monotone capacities through the use of Möbius inversion, Mathematical Social
Sciences 17 (1989), 263–283.
169. A. Chateauneuf and J.Y. Jaffray, Some characterization of lower probabilities and
other monotone capacities through the use of mbius inversion, Math. Soc. Sci. 17
(1989), 263–283.
170. A. Chateauneuf and J.-C. Vergnaud, Ambiguity reduction through new statistical data,
International Journal of Approximate Reasoning 24 (2000), 283–299.
171. Alain Chateauneuf, On the use of capacities in modeling uncertainty aversion and risk
aversion, Journal of Mathematical Economics 20 (1991), no. 4, 343–369.
172. Alain Chateauneuf, Combination of compatible belief functions and relations of speci-
ficity, Papiers d’economie mathematique et applications, Universit Panthon Sorbonne
(Paris 1), 1992.
173. , Decomposable capacities, distorted probabilities and concave capacities,
Mathematical Social Sciences 31 (1996), no. 1, 19 – 37.
174. Alain Chateauneuf and Jean-Yves Jaffray, Some characterizations of lower probabili-
ties and other monotone capacities through the use of mbius inversion, Mathematical
Social Sciences 17 (1989), no. 3, 263 – 283.
175. Alain Chateauneuf and Jean-Yves Jaffray, Local mÖbius transforms on monotone ca-
pacities, Proceedings of the European Conference on Symbolic and Quantitative Ap-
proaches to Reasoning and Uncertainty (London, UK, UK), ECSQARU ’95, Springer-
Verlag, 1995, pp. 115–124.
176. C. W. R. Chau, P. Lingras, and S. K. M. Wong, Upper and lower entropies of be-
lief functions using compatible probability functions, Proceedings of the 7th Interna-
tional Symposium on Methodologies for Intelligent Systems (ISMIS’93) (Z.W. Ko-
morowski, J.; Ras, ed.), Trondheim, Norway, 15-18 June 1993, pp. 306–315.
177. A. Cheaito, M. Lecours, and E. Bosse, A non-ad-hoc decision rule for the Dempster-
Shafer method of evidential reasoning, Proceedings of the SPIE - Sensor Fusion: Ar-
chitectures, Algorithms, and Applications II, Orlando, FL, USA, 16-17 April 1998,
pp. 44–57.
558 References

178. , Study of a modified Dempster-Shafer approach using an expected utility in-


terval decision rule, Proceedings of the SPIE - Sensor Fusion: Architectures, Algo-
rithms, and Applications III, vol. 3719, Orlando, FL, USA, 7-9 April 1999, pp. 34–42.
179. P. Cheeseman, Probabilistic versus fuzzy reasoning, Uncertainty in Artificial Intelli-
gence 2 (J. Lemmer L. Kanal, ed.), 1986, p. 85102.
180. S S Chen, Evidential logic and dempster-shafer theory, Proceedings of the ACM
SIGART International Symposium on Methodologies for Intelligent Systems (New
York, NY, USA), ISMIS ’86, ACM, 1986, pp. 201–206.
181. Shiuh-Yung Chen, Wei-Chung Lin, and Chin-Tu Chen, Spatial reasoning based on
multivariate belief functions, Proceedings of IEEE, 1992, pp. 624–626.
182. , Evidential reasoning based on Dempster-Shafer theory and its application
to medical image analysis, Proceedings of SPIE - Neural and Stochastic Methods in
Image and Signal Processing II, vol. 2032, San Diego, CA, USA, 12-13 July 1993,
pp. 35–46.
183. Y. Y. Chen, Statistical inference based on the possibility and belief measures, Trans-
actions of the American Mathematical Society 347 (1995), 1855–1863.
184. , Statistical inference based on the possibility and belief measures, Transac-
tions of the American Mathematical Society 347 (1995), 1855–1863.
185. WANG Jun-jie CHEN Yi-lei, An improved method of D-S evidential reasoning, Acta
Simulata Systematica Sinica 1 (2004).
186. G. Cheung, T. Kanade, J. Bouguet, and M. Holler, A real time system for robust 3D
voxel reconstruction of human motions, Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition CVPR’00, Hilton Head Island, SC, USA, vol. 2,
July 2000, pp. 714–720.
187. L. Cholvy, Applying theory of evidence in multisensor data fusion: a logical interpre-
tation, Information Fusion, 2000. FUSION 2000. Proceedings of the Third Interna-
tional Conference on, vol. 1, July 2000, pp. TUB4/17–TUB4/24 vol.1.
188. L. Cholvy, Using logic to understand relations between DSmT and Dempster-Shafer
Theory, Symbolic and Quantitative Approaches to Reasoning with Uncertainty (2009),
264–274.
189. Laurence Cholvy, Towards another logical interpretation of theory of evidence and a
new combination rule, Rule, Conference IPMU 2002, 2002, pp. 1–5.
190. Laurence Cholvy, Symbolic and quantitative approaches to reasoning with uncer-
tainty: 10th european conference, ecsqaru 2009, verona, italy, july 1-3, 2009. proceed-
ings, ch. Using Logic to Understand Relations between DSmT and Dempster-Shafer
Theory, pp. 264–274, Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.
191. C. Christensen and S. Corneliussen, Tracking of articulated objects using model-based
computer vision, Tech. report, Laboratory of Image Analysis, Aalborg University,
Denmark, 1997.
192. J.M. Chung and N. Ohnishi, Cue circles: Image feature for measuring 3-d motion of
articulated objects using sequential image pair, Int. Conf. Automatic Face and Ges-
ture Recognition, Nara, Japan, 1998.
193. M. Clarke and Nic Wilson, Efficient algorithms for belief functions based on the re-
lationship between belief and probability, Proceedings of the European Conference
on Symbolic and Quantitative Approaches to Uncertainty (P. Kruse, R.; Siegel, ed.),
Marseille, France, 15-17 October 1991, pp. 48–52.
194. J. Van Cleynenbreugel, S. A. Osinga, F. Fierens, P. Suetens, and A. Oosterlinck, Road
extraction from multitemporal satellite images by an evidential reasoning approach,
Pattern Recognition Letters 12:6 (June 1991), 371–380.
References 559

195. Etienne Cme, Laurent Bouillaut, Patrice Aknin, and Same Allou, Bayesian network
for railway infrastructure diagnosis, IPMU, 2006.
196. B. Cobb and P.P. Shenoy, On the plausibility transformation method for translating
belief function models to probability models, Int. J. Approx. Reasoning 41 (2006),
no. 3, 314–330.
197. B. R. Cobb and P. P. Shenoy, A comparison of bayesian and belief function reasoning,
Information Systems Frontiers 5(4) (2003), 345–358.
198. , A comparison of methods for transforming belief function models to probabil-
ity models, Proceedings of ECSQARU’2003, Aalborg, Denmark, July 2003, pp. 255–
266.
199. B.R. Cobb and P.P. Shenoy, On transforming belief function models to probability
models, Tech. report, University of Kansas, School of Business, Working Paper No.
293, February 2003.
200. Paul R. Cohen and Milton R. Grinberg, A framework for heuristic reasoning about
uncertainty, Proceedings of the Eighth International Joint Conference on Artificial
Intelligence - Volume 1 (San Francisco, CA, USA), IJCAI’83, Morgan Kaufmann
Publishers Inc., 1983, pp. 355–357.
201. , Readings from the ai magazine, American Association for Artificial Intelli-
gence, Menlo Park, CA, USA, 1988, pp. 559–566.
202. D. Comaniciu, V. Ramesh, and P. Meer, Kernel-based object tracking, IEEE Trans.
PAMI 25 (2003).
203. Roger Cooke and Philippe Smets, Self-conditional probabilities and probabilistic in-
terpretations of belief functions, Annals of Mathematics and Artificial Intelligence 32
(2001), no. 1, 269–285.
204. K. Coombs, D. Freel, and D. Lampert S. Brahm, Using Dempster-Shafer methods
for object classification in the theater ballistic missile environment, Proceedings of
the SPIE - Sensor Fusion: Architectures, Algorithms, and Applications III, vol. 3719,
Orlando, FL, USA, 7-9 April 1999, pp. 103–113.
205. C.R. Corlin and J. Ellesggard, Real time tracking of a human arm, Tech. report, Lab-
oratory of Image Analysis, Aalborg University, Denmark, 1998.
206. M. Covell, A. Rahimi, M. Harville, and T. Darrell, Articulated pose estimation using
brightness- and depth-constancy constraints, Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition CVPR’00, Hilton Head Island, SC, USA,
July 2000, pp. 438–445.
207. F. G. Cozman, Calculation of posterior bounds given convex sets of prior probability
measures and likelihood functions, Journal of Computational and Graphical Statistics
8(4) (1999), 824–838.
208. , Credal networks, Artificial Intelligence 120 (2000), 199–233.
209. Fabio G. Cozman, Credal networks, Artificial Intelligence 120 (2000), 199–233.
210. Fabio G. Cozman and Serafı́n Moral, Reasoning with imprecise probabilities, Interna-
tional Journal of Approximate Reasoning 24 (2000), 121–123.
211. Fabio Gagliardi Cozman, Computing posterior upper expectations, International Jour-
nal of Approximate Reasoning 24 (2000), no. 23, 191 – 205.
212. H. H. Crapo and G.-C. Rota, On the foundations of combinatorial theory: combinato-
rial geometries, M.I.T. Press, Cambridge, Mass., 1970.
213. A. Cretual, F. Chaumette, and P. Bouthemy, Complex object tracking by visual ser-
voing based on 2d image motion, International Conference on Pattern Recognition,
1998.
560 References

214. Valerie Cross and T. Sudkamp, Compatibility measures for fuzzy evidential reason-
ing, Proceedings of the Fourth International Conference on Industrial and Engineering
Applications of Artificial Intelligence and Expert Systems, Kauai, HI, USA, 2-5 June
1991, pp. 72–78.
215. Valerie Cross and Thomas Sudkamp, Compatibility and aggregation in fuzzy eviden-
tial reasoning, Proceedings of IEEE, 1991, pp. 1901–1906.
216. J. Crowley, P. Stelmaszyk, T. Skordas, and P. Puget, Measurement and integration of
3D structures by tracking edge lines, Int. J. Computer Vision 8 (1992), 29–52.
217. Peter Cucka and Azriel Rosenfeld, Evidence-based pattern-matching relaxation, Pat-
tern Recognition 26 (1993), no. 9, 1417 – 1427.
218. W. Cui and D. I. Blockley, Interval probability theory for evidential support, Interna-
tional Journal of Intelligent Systems 5 (1990), no. 2, 183–192.
219. Shawn P. Curley and James I. Golden, Using belief functions to represent degrees of
belief, Organizational Behavior and Human Decision Processes 58 (1994), no. 2, 271
– 303.
220. F. Cuzzolin, Probabilistic approximations of belief functions, in preparation.
221. , Probabilistic approximations of belief functions, preparing for submission to
the IEEE Transactions on Systems, Man and Cybernetics B.
222. , Visions of a generalized probability theory, PhD dissertation, Università di
Padova, Dipartimento di Elettronica e Informatica, 19 February.
223. , Lattice modularity and linear independence, 18th British Combinatorial
Conference, Brighton, UK, 2001.
224. , Canonical decomposition of belief functions in the belief space, in prepara-
tion (2002).
225. , Geometry of Dempster’s rule of combination, IEEE Transactions on Systems,
Man and Cybernetics part B 34 (2004), no. 2, 961–977.
226. , Simplicial complexes of finite fuzzy sets, Proceedings of the 10th International
Conference on Information Processing and Management of Uncertainty IPMU’04,
Perugia, Italy, 2004, pp. 1733–1740.
227. , Algebraic structure of the families of compatible frames of discernment, An-
nals of Mathematics and Artificial Intelligence 45(1-2) (2005), 241–274.
228. , Probabilistic approximations of belief functions, in preparation (2005).
229. , The geometry of relative plausibility and belief of singletons, submitted to
the International Journal of Approximate Reasoning (2006).
230. , Learning evidential models for object pose estimation, submitted to the In-
ternational Journal of Approximate Reasoning (2006).
231. , Two new Bayesian approximations of belief functions based on convex geom-
etry, submitted to the IEEE Trans. on Systems, Man, and Cybernetics - part B (2006).
232. , Dual properties of relative belief of singletons, submitted to the IEEE Tr.
Fuzzy Systems (2007).
233. , Geometry of relative plausibility and belief of singletons, submitted to the
Annals of Mathematics and Artificial Intelligence (2007).
234. , On the orthogonal projection of a belief function, Symbolic and Quantitative
Approaches to Reasoning with Uncertainty, Lecture Notes in Computer Science, vol.
4724/2007, Springer Berlin / Heidelberg, 2007, pp. 356–367.
235. , On the relationship between the notions of independence in matroids, lattices,
and boolean algebras, British Combinatorial Conference (BCC’07), Reading, UK,
2007.
236. , Relative plausibility, affine combination, and Dempster’s rule, Tech. report,
INRIA Rhone-Alpes, 2007.
References 561

237. F. Cuzzolin, Two new Bayesian approximations of belief functions based on convex
geometry, IEEE Transactions on Systems, Man, and Cybernetics, Part B 37 (2007),
no. 4, 993–1008.
238. F. Cuzzolin, A geometric approach to the theory of evidence, IEEE Transactions on
Systems, Man and Cybernetics part C (2007 (to appear)).
239. F. Cuzzolin, A geometric approach to the theory of evidence, IEEE Transactions on
Systems, Man, and Cybernetics, Part C: Applications and Reviews 38 (2008), no. 4,
522–534.
240. F. Cuzzolin, Alternative formulations of the theory of evidence based on basic plau-
sibility and commonality assignments, Proceedings of the Pacific Rim International
Conference on Artificial Intelligence (PRICAI’08), Hanoi, Vietnam, 2008.
241. F. Cuzzolin, Dual properties of the relative belief of singletons, PRICAI 2008: Trends
in Artificial Intelligence (2008), 78–90.
242. F. Cuzzolin, Dual properties of the relative belief of singletons, Proceedings of the
Tenth Pacific Rim Conference on Artificial Intelligence (PRICAI’08), Hanoi, Viet-
nam, December 15-19 2008, 2008.
243. , Dual properties of the relative belief of singletons, Proceedings of the Pacific
Rim International Conference on Artificial Intelligence (PRICAI’08), Hanoi, Vietnam,
2008.
244. , A geometric approach to the theory of evidence, IEEE Transactions on Sys-
tems, Man, and Cybernetics - Part C 38 (2008), no. 4, 522–534.
245. , lp consistent approximations of belief functions, IEEE Transactions on Fuzzy
Systems (under review) (2008).
246. , On the credal structure of consistent probabilities, Logics in Artificial Intel-
ligence, vol. 5293/2008, Springer Berlin / Heidelberg, 2008, pp. 126–139.
247. F. Cuzzolin, Semantics of the relative belief of singletons, Interval/Probabilistic Un-
certainty and Non-Classical Logics (2008), 201–213.
248. F. Cuzzolin, Semantics of the relative belief of singletons, International Workshop on
Uncertainty and Logic UNCLOG’08, Kanazawa, Japan, 2008.
249. , Semantics of the relative belief of singletons, Workshop on Uncertainty and
Logic, Kanazawa, Japan, March 25-28 2008, 2008.
250. , Complexes of outer consonant approximations, Proceedings of EC-
SQARU’09, Verona, Italy, 2009.
251. , Complexes of outer consonant approximations, Proceedings of EC-
SQARU’09, 2009.
252. , Credal semantics of bayesian transformations in terms of probability inter-
vals, IEEE Transactions on Systems, Man, and Cybernetics - Part B (to appear) (2009).
253. , The intersection probability and its properties, Symbolic and Quantitative
Approaches to Reasoning with Uncertainty - Lecture Notes in Artificial Intelligence,
vol. 5590/2009, Springer, Berlin / Heidelberg, 2009, pp. 287–298.
254. , Rationale and properties of the intersection probability, submitted to Artifi-
cial Intelligence Journal (2009).
255. F. Cuzzolin, Credal semantics of Bayesian transformations in terms of probability
intervals, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
40 (2010), no. 2, 421–432.
256. F. Cuzzolin, Geometric conditioning of belief functions, Proceedings of BELIEF’10,
Brest, France, 2010.
257. , The geometry of consonant belief functions: simplicial complexes of necessity
measures, Fuzzy Sets and Systems 161 (2010), no. 10, 1459–1479.
562 References

258. , The geometry of consonant belief functions: simplicial complexes of necessity


measures, Fuzzy Sets and Systems (2010).
259. , On consistent belief functions, submitted to the IEEE Transactions on Sys-
tems, Man and Cybernetics - part B (2010).
260. , Three alternative combinatorial formulations of the theory of evidence, In-
telligent Data Analysis 14 (2010), no. 4, 439–464.
261. , Geometric conditional belief functions in the belief space, submitted to
ISIPTA’11, Innsbruck, Austria, 2011.
262. , Lp consonant approximations of belief functions, submitted to EC-
SQARU’11, Belfast, UK, 2011.
263. , Lp consonant approximations of belief functions in the mass space, submitted
to ISIPTA’11, Innsbruck, Austria, 2011.
264. , On consistent approximations of belief functions in the mass space, Submit-
ted to ECSQARU’11, Belfast, UK, 2011.
265. , Geometry of Dempster’s rule of combination, IEEE Transactions on Systems,
Man and Cybernetics part B 34:2 (April 2004), 961–977.
266. , Geometry of Dempster’s rule of combination, submitted to the IEEE Trans-
actions on Systems, Man and Cybernetics B (August 2002).
267. , Two new Bayesian approximations of belief functions based on convex geom-
etry, IEEE Transactions on Systems, Man, and Cybernetics - Part B 37 (August 2007),
no. 4.
268. , Geometry and combinatorics of plausibility and commonality functions, sub-
mitted to the International Journal of Uncertainty, Fuzziness, and Knowledge-Based
Systems (December 2006).
269. , Geometry and combinatorics of plausibility and commonality functions, sub-
mitted to the IJFUKS (December 2006).
270. , Geometrical structure of belief space and conditional subspaces, submitted
to the IEEE Transactions on Systems, Man and Cybernetics C (January 2003).
271. , The geometry of relative plausibilities, Proceedings of the 11th International
Conference on Information Processing and Management of Uncertainty IPMU’06,
special session on Fuzzy measures and integrals, capacities and games, July 2-7, 2006.
272. , Geometrical structure of belief space and conditional subspaces, submitted
to the IEEE Transactions on Systems, Man and Cybernetics part C (July 2005).
273. , Consistent approximation of belief functions, Proceedings of ISIPTA’09,
Durham, UK, June 2009.
274. , Geometrical structure of belief space and conditional subspaces, submitted
to the IEEE Transactions on Systems, Man and Cybernetics part C (November 2002).
275. , On the properties of relative plausibilities, Proceedings of the International
Conference of the IEEE Systems, Man, and Cybernetics Society (SMC’05), Hawaii,
USA, October 10-12, 2005.
276. , Geometry of fuzzy sets, submitted to the IEEE Transactions on Fuzzy Systems
(October 2002).
277. , Possibilistic approximations of belief functions, submitted to the IEEE Trans-
actions on Fuzzy Systems (October 2002).
278. , Algebraic structure of the families of compatible frames of discernment, sub-
mitted to a Special Issue of the Annals of Mathematics and Artificial Intelligence
(September 2002).
279. , Geometry of Dempster’s rule, Proceedings of FSDK02, Singapore, 18-22
November 2002.
References 563

280. F. Cuzzolin and R. Frezza, Evidential modeling for pose estimation, Proceedings of
the 4rd Internation Symposium on Imprecise Probabilities and Their Applications
(ISIPTA’05), Pittsburgh, July 2005.
281. F. Cuzzolin, R. Frezza, A. Bissacco, and S. Soatto, Towards unsupervised detection
of actions in clutter, Proc. of the 2002 Asilomar Conference on Signals, Systems, and
Computers, vol. 1, 2002, pp. 463–467.
282. F. Cuzzolin, A. Sarti, and S. Tubaro, Action modeling with volumetric data, Proc. of
ICIP’04, vol. 2, 2004, pp. 881–884.
283. Fabio Cuzzolin, Geometry of relative plausibility and relative belief of singletons,
Annals of Mathematics and Artificial Intelligence 59 (2010), 47–79.
284. Fabio Cuzzolin, Geometry of upper probabilities, Proceedings of the 3rd Interna-
tion Symposium on Imprecise Probabilities and Their Applications (ISIPTA’03), July
2003.
285. , Families of compatible frames of discernment as semimodular lattices, Proc.
of the International Conference of the Royal Statistical Society (RSS2000), September
2000.
286. , Families of compatible frames of discernment as semimodular lattices, Proc.
of the International Conference of the Royal Statistical Society (RSS2000), September
2000.
287. Fabio Cuzzolin, Alessandro Bissacco, Ruggero Frezza, and Stefano Soatto, Towards
unsupervised detection of actions in clutter, submitted to the International Conference
on Computer Vision (ICCV2001), June 2001.
288. Fabio Cuzzolin and Ruggero Frezza, An evidential reasoning framework for object
tracking, SPIE - Photonics East 99 - Telemanipulator and Telepresence Technologies
VI (Matthew R. Stein, ed.), vol. 3840, 19-22 September 1999, pp. 13–24.
289. , An evidential reasoning framework for object tracking, SPIE - Photonics East
99 - Telemanipulator and Telepresence Technologies VI (Matthew R. Stein, ed.), vol.
3840, 19-22 September 1999, pp. 13–24.
290. , Integrating feature spaces for object tracking, Proc. of the International Sym-
posium on the Mathematical Theory of Networks and Systems (MTNS2000), 21-25
June 2000.
291. , Integrating feature spaces for object tracking, Proc. of the International Sym-
posium on the Mathematical Theory of Networks and Systems (MTNS2000), 21-25
June 2000.
292. , Geometric analysis of belief space and conditional subspaces, Proceedings
of the 2nd International Symposium on Imprecise Probabilities and their Applications
(ISIPTA2001), Cornell University, Ithaca, NY, 26-29 June 2001.
293. , Geometric analysis of belief space and conditional subspaces, Proceedings
of the 2nd International Symposium on Imprecise Probabilities and their Applications
(ISIPTA2001), 26-29 June 2001.
294. , Lattice structure of the families of compatible frames, Proceedings of
the 2nd International Symposium on Imprecise Probabilities and their Applications
(ISIPTA2001), 26-29 June 2001.
295. , Lattice structure of the families of compatible frames, Proceedings of
the 2nd International Symposium on Imprecise Probabilities and their Applications
(ISIPTA2001), 26-29 June 2001.
296. , Sequences of belief functions and model-based data association, submitted to
the IAPR Workshop on Machine Vision Applications (MVA2000), November 28-30,
2000.
564 References

297. , Sequences of belief functions and model-based data association, submitted to


the IAPR Workshop on Machine Vision Applications (MVA2000), November 28-30,
2000.
298. T. Cham D. DiFranco and J. Rehg, Reconstruction of 3-d figure motion from 2d cor-
respondences, Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition CVPR’01, Hawaii, December 2001.
299. F. Correa da Silva and Alan Bundy, On some equivalence relations between incidence
calculus and dempster-shafer theory of evidence, CoRR abs/1304.1126 (2013).
300. Wagner Texeira da Silva and Ruy Luiz Milidiu, Algorithms for combining belief func-
tions, International Journal of Approximate Reasoning 7 (1992), 73–94.
301. Bruce D’Ambrosio, A hybrid approach to reasoning under uncertainty, International
Journal of Approximate Reasoning 2 (1988), no. 1, 29 – 45.
302. M. Daniel, Transformations of belief functions to probabilities, Tech. report, Institute
of Computer Science, Academy of Sciences of the Csech Republic.
303. , Associativity and contradiction in combination of belief functions, Proceed-
ings of Information Processing and Management Uncertainty in Knowledge-based
Systems (IPMU00), 2000, pp. 133–140.
304. , Combination of belief functions and coarsening/refinement, Proceedings of
Information Processing and Management Uncertainty in Knowledge-based Systems
(IPMU02), 2002, pp. 587–594.
305. M. Daniel, Associativity in combination of belief functions; a derivation of minc com-
bination, Soft Computing 7 (2003), no. 5, 288–296.
306. M. Daniel, Consistency of probabilistic transformations of belief functions, IPMU,
2004, pp. 1135–1142.
307. , Minc combination of belief functions: derivation and formulas, Tech. report,
Tech Report No. 964, Acad. Sci. of the Czech. Republic, 2006.
308. , On transformations of belief functions to probabilities, International Journal
of Intelligent Systems 21 (2006), no. 3, 261–282.
309. , New approach to conflicts within and between belief functions, Tech. report,
Technical Report 1062, Institute of Computer Science, Academy of Sciences of the
Czech Republic, 2009.
310. , Non-conflicting and conflicting parts of belief functions, Proceedings of
ISIPTA, 2011.
311. , On transformations of belief functions to probabilities, International Journal
of Intelligent Systems, special issue on Uncertainty Processing 21(3) (February 2006),
261 – 282.
312. , A generalization of the classic combination rules to dsm hyper-power sets,
Information Security Journal 20 (January 2006), 50–64.
313. , Algebraic structures related to Dempster-Shafer theory, Proceedings of the
5th International Conference on Information Processing and Management of Uncer-
tainty in Knowledge-Based Systems (IPMU’94) (B. Bouchon-Meunier, R.R. Yager,
and L.A. Zadeh, eds.), Paris, France, 4-8 July 1994, pp. 51–61.
314. Milan Daniel, Contribution of dsm approach to the belief function theory, Proceedings
of IPMU.
315. Milan Daniel, Distribution of contradictive belief masses in combination of belief
functions, pp. 431–446, Springer US, Boston, MA, 2000.
316. , Conflicts within and between belief functions, pp. 696–705, Springer Berlin
Heidelberg, Berlin, Heidelberg, 2010.
References 565

317. V.I. Danilov and G.A. Koshevoy, Cores of cooperative games, superdifferentials of
functions and the minkovski difference of sets, Journal of Mathematical Analysis Ap-
plications 247 (2000), 1–14.
318. N. Daucher, M. Dhome, J. Lapreste, and G. Rives, Modeled object pose estimation and
tracking by monocular vision, BMVC’93, Guildford, UK, September 1993, pp. 249–
258.
319. S.J. Davey and S.B. Colgrove, A unified probabilistic data assotiation filter with mul-
tiple models, Tech. Report DSTO-TR-1184, Surveillance System Division, Electonic
and Surveillance Reserach Lab., 2001.
320. L. de Campos, J. Huete, and S. Moral, Probability intervals: a tool for uncertain rea-
soning, Int. J. Uncertainty Fuzziness Knowledge-Based Syst. 1 (1994), 167–196.
321. , Probability intervals: a tool for uncertain reasoning, IJUFKS 1 (1994), 167–
196.
322. Gert de Cooman, A behavioural model for vague probability assessments, Fuzzy Sets
and Systems 154 (2005), no. 3, 305 – 358.
323. Gert de Cooman and D. Aeyels, A random set description of a possibility measure and
its natural extension, (1998), submitted for publication.
324. Gert de Cooman and Marco Zaffalon, Updating beliefs with incomplete observations,
Artif. Intell. 159 (2004), no. 1-2, 75–125.
325. J. Kampé de Fériet, Interpretation of membership functions of fuzzy sets in terms of
plausibility and belief, Fuzzy Information and Decision Processes (M. M. Gupta and
E. Sanchez, eds.), North-Holland, Amsterdam, 1982, pp. 93–98.
326. J. Kamp de Friet, Interpretation of membership functions of fuzzy sets in terms
of plausibility and belief, Fuzzy Information and Decision Processes (E. Sanchez
M.M. Gupta, ed.), North-Holland, Amsterdam, 1982, p. 9398.
327. F. Dupin de Saint Cyr, J. Lang, and N. Schiex, Penalty logic and its link with Dempster-
Shafer theory, Proceedings of UAI’94, 1994, pp. 204–211.
328. Florence Dupin de Saint-Cyr, Jérôme Lang, and Thomas Schiex, Penalty logic and its
link with Dempster-Shafer theory, CoRR abs/1302.6804 (2013).
329. Q. Delamarre and O. Faugeras, 3D articulated models and multi-view tracking with
silhouettes, Proceedings of ICCV’99, Kerkyra, Greece, vol. 2, 20-27 September 1999,
pp. 716–721.
330. , Finding pose of hand in video images: a stereo-based approach, IEEE Pro-
ceedings of the International Conference on Automatic Face and Gesture Recognition
FG’98, Japan, April 1998, pp. 585–590.
331. , 3D articulated models and multi-view tracking with physical forces, Special
Issue of Computer Vision and Image Understanding on Modeling People 81 (March
2001), 328–357.
332. F. Delmotte and D. Gacquer, Detection of defective sources with belief functions, Pro-
ceedings of IPMU08, 2008.
333. D.F. DeMenthon and L.S. Davis, Model-based object pose in 25 lines of code, Int. J.
Computer Vision 15 (June 1995), 123–141.
334. S. Demotier, W. Schon, and T. Denoeux, Risk assessment based on weak information
using belief functions: a case study in water treatment, IEEE Transactions on Systems,
Man and Cybernetics, Part C 36(3) (May 2006), 382– 396.
335. A. P. Dempster, New methods for reasoning towards posterior distributions based on
sample data, Annals of Mathematical Statistics 37 (1966), 355–374.
336. , Upper and lower probability inferences based on a sample from a finite uni-
variate population, Biometrika 54 (1967), 515–528.
566 References

337. , Bayes, Fischer, and belief fuctions, Bayesian and Likelihood Methods in
Statistics and Economics (S. J. Press S. Geisser, J. S. Hodges and A. Zellner, eds.),
1990.
338. , Construction and local computation aspects of network belief functions, In-
fluence Diagrams, Belief Nets and Decision Analysis (R. M. Oliver and J. Q. Smith,
eds.), Wiley, Chirichester, 1990.
339. , Normal belief functions and the Kalman filter, Tech. report, Department of
Statistics, Harvard Univerisity, Cambridge, MA, 1990.
340. A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete
data via the EM algorithm, Journal of the Royal Statistical Society B 39.
341. A.P. Dempster, Belief functions in the 21st century: A statistical perspective”, bookti-
tle=.
342. , Upper and lower probabilities induced by a multivariate mapping, Annals of
Mathematical Statistics 38 (1967), 325–339.
343. , A generalization of Bayesian inference, Journal of the Royal Statistical Soci-
ety, Series B 30 (1968), 205–247.
344. , Upper and lower probabilities generated by a random closed interval, Annals
of Mathematical Statistics 39 (1968), 957–966.
345. , Upper and lower probabilities inferences for families of hypothesis with
monotone density ratios, Annals of Mathematical Statistics 40 (1969), 953–969.
346. , A generalization of Bayesian inference, Classic Works of the Dempster-
Shafer Theory of Belief Functions, 2008, pp. 73–104.
347. , Lindley’s paradox: Comment, Journal of the American Statistical Association
77:378 (June 1982), 339–341.
348. A.P. Dempster and Augustine Kong, Uncertain evidence and artificial analysis, Tech.
report, S-108, Department of Statistics, Harvard University, 1986.
349. Arthur P. Dempster, A generalization of bayesian inference, Classic Works of the
Dempster-Shafer Theory of Belief Functions, 2008, pp. 73–104.
350. C. Van den Acker, Belief function representation of statistical audit evidence, Interna-
tional Journal of Intelligent Systems 15 (2000), 277–290.
351. Y. Deng and W.-K. Shi, A modified combination rule of evidence theory, Journal-
Shanghai Jiaotong University (2003).
352. Y. Deng, Dong Wang, and Qi Li, An improved combination rule in fault diagnosis
based on dempster shafer theory, 2008 International Conference on Machine Learning
and Cybernetics, vol. 1, July 2008, pp. 212–216.
353. D. Denneberg, Conditioning (updating) non-additive probabilities, Ann. Operations
Res. 52 (1994), 21–42.
354. Dieter Denneberg, Conditioning (updating) non-additive measures, Annals of Opera-
tions Research 52 (1994), no. 1, 21–42.
355. Dieter Denneberg, Representation of the choquet integral with the 6-additive mbius
transform, Fuzzy Sets and Systems 92 (1997), no. 2, 139 – 156, Fuzzy Measures and
Integrals.
356. , Totally monotone core and products of monotone measures, International
Journal of Approximate Reasoning 24 (2000), 273–281.
357. Dieter Denneberg and Michel Grabisch, Interaction transform of set functions over a
finite set, Information Sciences 121 (1999), 149–170.
358. T. Denoeux, Inner and outer approximation of belief structures using a hierarchi-
cal clustering approach, Int. Journal of Uncertainty, Fuzziness and Knowledge-Based
Systems 9(4) (2001), 437–460.
References 567

359. , The cautious rule of combination for belief functions and some extensions,
2006 9th International Conference on Information Fusion, July 2006, pp. 1–8.
360. , Construction of predictive belief functions using a frequentist approach,
IPMU, 2006.
361. , Conjunctive and disjunctive combination of belief functions induced by non
distinct bodies of evidence, Artificial Intelligence (2007).
362. , A new justification of the unnormalized dempster’s rule of combination from
the Least Commitment Principle, Proceedings of FLAIRS’08, Special Track on Un-
certaint Reasoning, 2008.
363. T. Denoeux and A. Ben Yaghlane, Approximating the combination of belief functions
using the fast moebius transform in a coarsened frame, International Journal of Ap-
proximate Reasoning 31(1-2) (October 2002), 77–101.
364. Thierry Denoeux, Modeling vague beliefs using fuzzy-valued belief structures, Fuzzy
Sets and Systems.
365. , A k-nearest neighbour classification rule based on Dempster-Shafer theory,
IEEE Transactions on Systems, Man, and Cybernetics 25:5 (1995), 804–813.
366. , Analysis of evidence-theoretic decision rules for pattern classification, Pat-
tern Recognition 30:7 (1997), 1095–1107.
367. , Reasoning with imprecise belief structures, International Journal of Approx-
imate Reasoning 20 (1999), 79–111.
368. , Reasoning with imprecise belief structures, International Journal of Approx-
imate Reasoning 20 (1999), 79–111.
369. Thierry Denœux, Allowing imprecision in belief representation using fuzzy-valued be-
lief structures, pp. 269–281, Springer US, Boston, MA, 2000.
370. Thierry Denoeux, Allowing imprecision in belief representation using fuzzy-valued
belief structures, Proceedings of IPMU’98, vol. 1, July Paris, 1998, pp. 48–55.
371. , An evidence-theoretic neural network classifier, Proceedings of the 1995
IEEE International Conference on Systems, Man, and Cybernetics (SMC’95), vol. 3,
October 1995, pp. 712–717.
372. Thierry Denoeux and A.P. Dempster, The dempstershafer calculus for statisticians,
International Journal of Approximate Reasoning 48 (2008), no. 2, 365 – 377.
373. Thierry Denoeux and G. Govaert, Combined supervised and unsupervised learning
for system diagnosis using Dempster-Shafer theory, Proceedings of the International
Conference on Computational Engineering in Systems Applications, Symposium on
Control, Optimization and Supervision, CESA ’96 IMACS Multiconference, vol. 1,
Lille, France, 9-12 July 1996, pp. 104–109.
374. Thierry Denoeux, Jrg Kohlas, and Paul-Andr Monney, An algebraic theory for statis-
tical information based on the theory of hints, International Journal of Approximate
Reasoning 48 (2008), no. 2, 378 – 398.
375. T. Denouex, Inner and outer approximation of belief structures using a hierarchical
clustering approach, International Journal of Uncertainty, Fuzziness and Knowledge-
Based Systems 9(4) (2001), 437–460.
376. Thierry Denux, Reasoning with imprecise belief structures, International Journal of
Approximate Reasoning 20 (1999), no. 1, 79 – 111.
377. , Conjunctive and disjunctive combination of belief functions induced by
nondistinct bodies of evidence, Artificial Intelligence 172 (2008), no. 2, 234 – 264.
378. M. C. Desmarais and J. Liu, Experimental results on user knowledge assessment with
an evidential reasoning methodology, Proceedings of the 1993 International Workshop
on Intelligent User Interfaces (W.D. Gray, W.E. Hefley, and D. Murray, eds.), Orlando,
FL, USA, 4-7 January 1993, pp. 223–225.
568 References

379. S. Destercke and T. Burger, Toward an axiomatic definition of conflict between belief
functions, IEEE Trans Cybern. 43.
380. S. Destercke and D. Dubois, Idempotent conjunctive combination of belief functions:
Extending the minimum rule of possibility theory, Information Sciences 181 (2011),
no. 18, 3925 – 3945.
381. S. Destercke, D. Dubois, and E. Chojnacki, Unifying Practical Uncertainty Represen-
tations: I. Generalized P-Boxes, ArXiv e-prints (2008).
382. Sébastien Destercke and Thomas Burger, Revisiting the notion of conflicting belief
functions, pp. 153–160, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.
383. Sébastien Destercke and Didier Dubois, Can the minimum rule of possibility theory
be extended to belief functions?, pp. 299–310, Springer Berlin Heidelberg, Berlin,
Heidelberg, 2009.
384. Sebastien Destercke, Didier Dubois, and Eric Chojnacki, Cautious conjunctive merg-
ing of belief functions, pp. 332–343, Springer Berlin Heidelberg, Berlin, Heidelberg,
2007.
385. M. Deutsch-Mccleish, A model for non-monotonic reasoning using Dempster’s rule,
Uncertainty in Artificial Intelligence 6 (P.P. Bonissone, M. Henrion, L.N. Kanal, and
J.F. Lemmer, eds.), Elsevier Science Publishers, 1991, pp. 481–494.
386. M. Deutsch-McLeish, A study of probabilities and belief functions under conflicting
evidence: comparisons and new method, Proceedings of the 3rd International Confer-
ence on Information Processing and Management of Uncertainty in Knowledge-Based
Systems (IPMU’90) (B. Bouchon-Meunier, R.R. Yager, and L.A. Zadeh, eds.), Paris,
France, 2-6 July 1990, pp. 41–49.
387. M. Deutsch-McLeish, P. Yao, Fei Song, and T. Stirtzinger, Knowledge-acquisition
methods for finding belief functions with an application to medical decision mak-
ing, Proceedings of the International Symposium on Artificial Intelligence (H. Cantu-
Ortiz, F.J.; Terashima-Marin, ed.), Cancun, Mexico, 13-15 November 1991, pp. 231–
237.
388. Mary Deutsch-McLeish, A study of probabilities and belief functions under conflict-
ing evidence: Comparisons and new methods, pp. 41–49, Springer Berlin Heidelberg,
Berlin, Heidelberg, 1991.
389. J. Deutscher, A. Blake, and I. Reid, Articulated body motion capture by annealed par-
ticle filtering, Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition CVPR’00, Hilton Head Island, SC, USA, July 2000, pp. 126–133.
390. J. Deutscher, A. Davidson, and I. Reid, Automatic partitioning of high dimensional
search spaces associated with articulated body motion capture, Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition CVPR’01, Hawaii,
December 2001.
391. J. Dezert and F. Smarandache, A new probabilistic transformation of belief mass as-
signment, 2007.
392. J. Dezert, F. Smarandache, and M. Daniel, The generalized pignistic transformation,
Arxiv preprint cs/0409007 (2004).
393. J. Dezert, P. Wang, and A. Tchamova, On the validity of dempster-shafer theory, Infor-
mation Fusion (FUSION), 2012 15th International Conference on, July 2012, pp. 655–
660.
394. Jean Dezert, Foundations for a new theory of plausible and paradoxical reasoning, In-
formation and Security (Tzv. Semerdjiev, ed.), Bulgarian Academy of Sciences, 2002.
395. Jean Dezert, An introduction to the theory of plausible and paradoxical reasoning,
pp. 12–23, Springer Berlin Heidelberg, Berlin, Heidelberg, 2003.
References 569

396. Jean Dezert and Florentin Smarandache, On the generation of hyper-powersets for the
dsmt, Proc. of Fusion 2003 Conf, 2003, pp. 8–11.
397. , An introduction to dsm theory of plausible, paradoxist, uncertain, and im-
precise reasoning for information fusion, Proc. of the 13th International Congress of
Cybernetics and Systems, 2005.
398. P. Diaconis, Review of ’a mathematical theory of evidence’, Journal of American Sta-
tistical Society 73:363 (1978), 677–678.
399. J. Diaz, M. Rifqi, and B. Bouchon-Meunier, A similarity measure between basic belief
assignments, Proceedings of FUSION’06, 2006.
400. S. Dickinson and D. Metaxas, Integrating qualitative and quantitative shape recovery,
Int. J. Computer Vision 13 (1994), 1–20.
401. S. Dickinson, A. Pentland, and A. Rosenfeld, 3-d shape recovery using distributed
aspect matching, IEEE Trans. PAMI 14 (1992), 174–198.
402. S.J. Dickinson and D. Metaxas, Integrating qualitative and quantitative object repre-
sentations in the recovery and tracking of 3-d shape, Computational and Psychophys-
ical Mechanism of Visual Coding (L. Harris and M. Jenkin, eds.), Cambridge Univer-
sity Press, New York, NY.
403. R. P. Dilworth, Dependence relations in a semimodular lattice, Duke Math. J. 11
(1944), 575–587.
404. A. F. Dragoni, P. Giorgini, and A. Bolognini, Distributed knowledge elicitation
through the Dempster-Shafer theory of evidence: a simulation study, Proceedings of
the Second International Conference on Multi-Agent Systems (ICMAS’96), Kyoto,
Japan, 10-13 December 1996, p. 433.
405. T. Drummond and R. Cipolla, Real-time tracking of complex structures with on-line
camera calibration, Proc. of BMVC’99, Nottingham, 1999, pp. 574–583.
406. , Real-time tracking of multiple articulated structures in multiple views,
ECCV’00, Dublin, Ireland, 2000.
407. I. Dryden and K.V. Mardia, General shape distributions in a plane, Adv. Appl. Prob.
23 (1991), 259:276.
408. Werner Dubitzky, Alex G. Bchner, John G. Hughes, and David A. Bell, Towards
concept-oriented databases, Data and Knowledge Engineering 30 (1999), 23–55.
409. D. Dubois and H. Prade, Unfair coins and necessity measures: towards a possibilistic
interpretation of histograms, Fuzzy Sets and Systems 10 (1983), no. 1, 15–20.
410. , A set-theoretic view of belief functions: Logical operations and approxima-
tions by fuzzy sets, International Journal of General Systems 12 (1986), 193–226.
411. , Possibility theory, Plenum Press, New York, 1988.
412. , Consonant approximations of belief functions, International Journal of Ap-
proximate Reasoning 4 (1990), 419–449.
413. , On the combination of evidence in various mathematical frameworks, Relia-
bility Data Collection and Analysis (J. flamm and T. Luisi, eds.), 1992, pp. 213–241.
414. D. Dubois, H. Prade, and S. Sandri, On possibility/probability transformations, 1993.
415. D. Dubois, H. Prade, and S.A. Sandri, On possibility-probability transformations,
Fuzzy Logic: State of the Art (R. Lowen and M. Lowen, eds.), Kluwer Academic
Publisher, 1993, pp. 103–112.
416. D. Dubois, H. Prade, and Ph. Smets, New semantics for quantitative possibility theory,
Proc. of the 6th European Conference on Symbolic and Quantitative Approaches to
Reasoning and Uncertainty (ECSQARU 2001) (Toulouse, France) (S. Benferhat and
Ph. Besnard, eds.), Springer-Verlag, 2001, pp. 410–421.
570 References

417. Didier Dubois, Hélène Fargie, and Henri Prade, Comparative uncertainty, belief func-
tions and accepted beliefs, Proceedings of the Fourteenth Conference on Uncertainty
in Artificial Intelligence (San Francisco, CA, USA), UAI’98, Morgan Kaufmann Pub-
lishers Inc., 1998, pp. 113–120.
418. Didier Dubois, M. Grabisch, Henri Prade, and Philippe Smets, Using the transferable
belief model and a qualitative possibility theory approach on an illustrative example:
the assessment of the value of a candidate, Intern. J. Intell. Systems (2001).
419. Didier Dubois, Michel Grabisch, Henri Prade, and Philippe Smets, Assessing the value
of a candidate: Comparing belief function and possibility theories, Proceedings of the
Fifteenth Conference on Uncertainty in Artificial Intelligence (San Francisco, CA,
USA), UAI’99, Morgan Kaufmann Publishers Inc., 1999, pp. 170–177.
420. Didier Dubois and Henri Prade, On several representations of an uncertain body of
evidence, Fuzzy Information and Decision Processes (M. M. Gupta and E. Sanchez,
eds.), North Holland, Amsterdam, 1982, pp. 167–181.
421. Didier Dubois and Henri Prade, Combination and propagation of uncertainty with be-
lief functions: A reexamination, Proceedings of the 9th International Joint Conference
on Artificial Intelligence - Volume 1 (San Francisco, CA, USA), IJCAI’85, Morgan
Kaufmann Publishers Inc., 1985, pp. 111–113.
422. , On the unicity of dempster rule of combination, International Journal of In-
telligent Systems 1 (1986), no. 2, 133–142.
423. Didier Dubois and Henri Prade, On the unicity of Dempster’s rule of combination,
International Journal of Intelligent Systems 1 (1986), 133–142.
424. , A set theoretical view of belief functions, International Journal of Intelligent
Systems 12 (1986), 193–226.
425. , The mean value of a fuzzy number, Fuzzy Sets and Systems 24 (1987), 279–
300.
426. , The principle of minimum specificity as a basis for evidential reasoning, Un-
certainty in Knowledge-Based Systems (B. Bouchon and R. R. Yager, eds.), Springer-
Verlag, Berlin, 1987, pp. 75–84.
427. , Properties of measures of information in evidence and possibility theories,
Fuzzy Sets and Systems 24 (1987), 161–182.
428. Didier Dubois and Henri Prade, A tentative comparison of numerical approximate
reasoning methodologies, Int. J. Man-Mach. Stud. 27 (1987), no. 5-6, 717–728.
429. Didier Dubois and Henri Prade, Representation and combination of uncertainty with
belief functions and possibility measures, Computational Intelligence 4 (1988), 244–
264.
430. , Modeling uncertain and vague knowledge in possibility and evidence theo-
ries, Uncertainty in Artificial Intelligence, volume 4 (R. D. Shachter, T. S. Levitt, L. N.
Kanal, and J. F. Lemmer, eds.), North-Holland, 1990, pp. 303–318.
431. , Epistemic entrenchment and possibilistic logic, Artificial Intelligence 50
(1991), 223–239.
432. , Focusing versus updating in belief function theory, Tech. report, Internal Re-
port IRIT/91-94/R, IRIT, Universite P. Sabatier, Toulouse, France, 1991.
433. , Evidence, knowledge, and belief functions, International Journal of Approxi-
mate Reasoning 6 (1992), 295–319.
434. , Evidence, knowledge, and belief functions, International Journal of Approxi-
mate Reasoning 6 (1992), no. 3, 295 – 319.
435. , A survey of belief revision and updating rules in various uncertainty models,
International Journal of Intelligent Systems 9 (1994), 61–100.
References 571

436. , Bayesian conditioning in possibility theory, Fuzzy Sets and Systems 92


(1997), 223–240.
437. Didier Dubois, Henri Prade, and Philippe Smets, New semantics for quantitative pos-
sibility theory., ISIPTA, 2001, pp. 152–161.
438. , A definition of subjective possibility, Int. J. Approx. Reasoning 48 (2008),
no. 2, 352–364.
439. Didier Dubois and Ronald R. Yager, Fuzzy set connectives as combinations of belief
structures, Information Sciences 66 (1992), no. 3, 245 – 276.
440. Didler Dubois and Henri Prade, Representation and combination of uncertainty with
belief functions and possibility measures, Computational Intelligence 4 (1988), no. 3,
244–264.
441. B.A. Dubrovin, S.P. Novikov, and A.T. Fomenko, Sovremennaja geometrija. metody i
prilozenija, Nauka, Moscow, 1986.
442. , Geometria contemporanea 3, Editori Riuniti, 1989.
443. Jacques Dubucs and Glenn Shafer, A betting interpretation for probabilities and
dempstershafer degrees of belief, International Journal of Approximate Reasoning 52
(2011), no. 2, 127 – 136.
444. Richard O. Duda and Peter E. Hart, Pattern classification and scene analysis, John
Wiley and Sons Inc., 1973.
445. Richard O. Duda, Peter E. Hart, and Nils J. Nilsson, Subjective bayesian methods for
rule-based inference systems, Proceedings of the June 7-10, 1976, National Computer
Conference and Exposition (New York, NY, USA), AFIPS ’76, ACM, 1976, pp. 1075–
1082.
446. R.O. Duda, P.E. Hart, and D.G. Stork, Pattern classification, Wiley, 2001.
447. V. Dugat and S. Sandri, Complexity of hierarchical trees in evidence theory, ORSA
Journal of Computing 6 (1994), 37–49.
448. J.S. Duncan, R.L. Owen, and P. Anandan, Measurement of nonrigid motion using
contour shape descriptors, Proc. of CVPR’91, 1991, pp. 318–324.
449. Stephen D. Durham, Jeffery S. Smolka, and Marco Valtorta, Statistical consistency
with Dempster’s rule on diagnostic trees having uncertain performance parameters,
International Journal of Approximate Reasoning 6 (1992), 67–81.
450. A. Dutta, Reasoning with imprecise knowledge in expert systems, Information Sci-
ences 37 (1985), 3–24.
451. Ludmila Dymova and Pavel Sevastjanov, An interpretation of intuitionistic fuzzy sets
in terms of evidence theory: Decision making aspect, Knowledge-Based Systems 23
(2010), no. 8, 772 – 782.
452. M.-O. Berger E. Boyer, 3D surface reconstruction using occluding contours,
CAIP’95, Prague, Czech Republic, vol. 970, September 1995.
453. F. Chaumette E. Marchand, P. Bouthemy and V. Moreau, Robust real-time visual track-
ing using a 2D-3D model-based approach, Proceedings of ICCV’99, Kerkira, Greece,
vol. 1, September 1999, pp. 262–268.
454. W.F. Eddy and G.P. Pei, Structures of rule-based belief functions, IBM J.Res.Develop.
30 (1986), 43–101.
455. H. J. Einhorn and R. M. Hogarth, Decision making under ambiguity, Journal of Busi-
ness 59 (1986), S225–S250.
456. A.M. Elgammal and C.S. Lee, Inferring 3d body pose from silhouettes using activity
manifold learning, 2004, pp. II: 681–688.
457. R. Elliot, L. Aggoun, and J. Moore, Hidden Markov models: estimation and control,
1995.
572 References

458. Z. Elouedi, K. Mellouli, and P. Smets, Assessing sensor reliability for multisensor data
fusion within the transferable belief model, IEEE Transactions on Systems, Man, and
Cybernetics, Part B (Cybernetics) 34 (2004), no. 1, 782–787.
459. Z. Elouedi, K. Mellouli, and Philippe Smets, Decision trees using belief function the-
ory, Proceedings of the Eighth International Conference IPMU: Information Process-
ing and Management of Uncertainty in Knowledge-based Systems, vol. 1, Madrid,
2000, pp. 141–148.
460. , Classification with belief decision trees, Proceedings of the Nineth Inter-
national Conference on Artificial Intelligence: Methodology, Systems, Architectures:
AIMSA 2000, Varna, Bulgaria, 2000.
461. I. A. Essa and A. O. Pentland, Facial expression recognition using a dynamic model
and motion energy, Proc. of the 5th Conference on Computer Vision, 1995, pp. 360–
367.
462. G. Donato et al., Classifying facial actions, IEEE Journal on Pattern Analysis and
Machine Intelligence, vol. 21(10), October 1999, pp. 974–989.
463. Y DENG F Du, W SHI, Feature extraction of evidence and its application in modifi-
cation of evidence theory, Journal of Shanghai Jiaotong University (2004), 164168.
464. R. Fagin and Joseph Y. Halpern, A new approach to updating beliefs, Uncertainty in
Artificial Intelligence, 6 (L.N. Kanal P.P. Bonissone, M. Henrion and J.F. Lemmer,
eds.), 1991, pp. 347–374.
465. R. Fagin, Joseph Y. Halpern, and Nimrod Megiddo, A logic for reasoning about prob-
abilities, Inf. Comput. 87 (1990), no. 1-2, 78–128.
466. R. Fagin and J.Y. Halpern, Uncertainty, belief and probability, Proc. Intl. Joint Conf.
in AI (IJCAI-89), 1988, pp. 1161–1167.
467. , Uncertainty, belief, and probability, Proc. of AAAI’89, 1989, pp. 1161–1167.
468. Xianfeng Fan and Ming J. Zuo, Fault diagnosis of machines based on ds evidence
theory. part 1: Ds evidence theory and its improvement, Pattern Recognition Letters
27 (2006), no. 5, 366 – 376.
469. Tao Feng, Shao-Pu Zhang, and Ju-Sheng Mi, The reduction and fusion of fuzzy cover-
ing systems based on the evidence theory, International Journal of Approximate Rea-
soning 53 (2012), no. 1, 87 – 103.
470. Juan M. Fernndez-Luna, Juan F. Huete, Benjamin Piwowarski, Abdelaziz Kallel, and
Sylvie Le Hgarat-Mascle, Combination of partially non-distinct beliefs: The cautious-
adaptive rule, International Journal of Approximate Reasoning 50 (2009), no. 7, 1000
– 1021.
471. C. Ferrari and G. Chemello, Coupling fuzzy logic techniques with evidential reason-
ing for sensor data interpretation, Proceedings of Intelligent Autonomous Systems 2
(T. Kanade, F.C.A. Groen, and L.O. Hertzberger, eds.), vol. 2, Amsterdam, Nether-
lands, 11-14 December 1989, pp. 965–971.
472. Scott Ferson, Roger B. Nelsen, Janos Hajagos, Daniel J. Berleant, and Jianzhong
Zhang, Dependence in probabilistic modeling , Dempster-Shafer theory , and prob-
ability bounds analysis, SAND2004-3072 (2004), no. October, 1–151.
473. A. Filippidis, Fuzzy and Dempster-Shafer evidential reasoning fusion methods for de-
riving action from surveillance observations, Proceedings of the Third International
Conference on Knowledge-Based Intelligent Information Engineering Systems, Ade-
laide, September 1999, pp. 121–124.
474. , A comparison of fuzzy and Dempster-Shafer evidential reasoning fusion
methods for deriving course of action from surveillance observations, International
Journal of Knowledge-Based Intelligent Engineering Systems 3:4 (October 1999),
215–222.
References 573

475. T. L. Fine, Review of a mathematical theory of evidence, Bulletin of the American


Mathematical Society 83 (1977), 667–672.
476. Terrence Fine, Book reviews - a mathematical theory of evidence, Bulletin of The
American Mathematical Society 83 (1977).
477. Terrence L. Fine, Lower probability models for uncertainty and nondeterministic pro-
cesses, Journal of Statistical Planning and Inference 20 (1988), no. 3, 389 – 411.
478. B. De Finetti, Theory of probability, Wiley, London, 1974.
479. Guido Fioretti, A mathematical theory of evidence for g.l.s. shackle, Mind & Society
2 (2001), no. 1, 77–98.
480. Peter C. Fishburn, Decision and value theory, Wiley, New York, 1964.
481. T. Fister and R. Mitchell, Modified dempster-shafer with entropy based belief body
compression.
482. A. Fitzgibbon, D. Eggert, and R. Fisher, High-level cad model acquisition from range
images, Computer-Aided Design 29 (1997), 321–330.
483. Dale Fixen and Ronald P. S. Mahler, The modified Dempster-Shafer approach to clas-
sification, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems
and Humans 27:1 (January 1997), 96–104.
484. Dale Fixsen and Ronald P.S. Mahler, Modified Dempster-Shafer approach to classi-
fication, IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and
Humans. 27 (1997), 96–104.
485. Mihai Cristian Florea, Jean Dezert, Pierre Valin, Florentin Smarandache, and Anne-
Laure Jousselme, Adaptative combination rule and proportional conflict redistribution
rule for information fusion, CoRR abs/cs/0604042 (2006).
486. Mihai Cristian Florea, Anne-Laure Jouselme, Dominic Grenier, and loi Boss, Com-
bining belief functions and fuzzy membership functions, Sensor Fusion : Architectures,
Algorithms, and Applications VII , Proceedings of SPIE, Vol. 5099, 2003.
487. Philippe Fortemps, Jobshop scheduling with imprecise durations: A fuzzy approach,
IEEE Transactions on Fuzzy Systems 5 (1997), 557–569.
488. S. Foucher, J.-M. Boucher, and G. B. Benie, Multiscale and multisource classification
using Dempster-Shafer theory, Proceedings of IEEE, 1999, pp. 124–128.
489. Patrizio Frosini, Measuring shape by size functions, Proceedings of SPIE on Intelli-
gent Robotic Systems, vol. 1607, 1991, pp. 122–133.
490. Chao Fu and Shanlin Yang, Conjunctive combination of belief functions from depen-
dent sources using positive and negative weight functions, Expert Systems with Ap-
plications 41 (2014), no. 4, Part 2, 1964 – 1972.
491. P. Fua, Using probability density functions in the framework of evidential reasoning,
Uncertainty in Knowledge-Based Systems, Lectures Notes in Computer science 286
(1986), 243–252.
492. P. Fua, A. Gruen, R. Plankers, N. D’Apuzzo, and D. Thalmann, Human body modeling
and motion analysis from video sequences, International Symposium on Real Time
Imaging and Dynamic Analysis, Hakodate, Japan, June 1998.
493. H. Fujiyoshi and A.J. Lipton, Real-time human motion analysis by image skeletoniza-
tion, Workshop on Applications of Computer Vision, 1998.
494. Robert M. Fung and Chee Yee Chong, Metaprobability and dempster-shafer in evi-
dential reasoning, CoRR abs/1304.3427 (2013).
495. F. Hermans G. de Cooman and E. Quaeghebeur, Imprecise markov chains and their
limit behavior, Tech. report, Universiteit Gent, 2009.
496. F. Cuzzolin G. Gennari, A. Chiuso and R. Frezza, Integrating shape and dynamic
probabilistic models for data association and tracking, CDC’02, Las Vegas, Nevada,
December 2002.
574 References

497. Y. Xiao G. Xin and H. You, An improved dempster-shafer algorithm for resolving the
conflicting evidences, International Journal of Information Technology 11 (2005).
498. Zhun ga Liu, Jean Dezert, Quan Pan, and Grgoire Mercier, Combination of sources
of evidence with different discounting factors based on a new dissimilarity measure,
Decision Support Systems 52 (2011), no. 1, 133 – 141.
499. Haim Gaifman, Causation, chance and credence: Proceedings of the irvine conference
on probability and causation volume 1, ch. A Theory of Higher Order Probabilities,
pp. 191–219, Springer Netherlands, Dordrecht, 1988.
500. Fabio Gambino, Giovanni Ulivi, and Marilena Vendittelli, The transferable belief
model in ultrasonic map building, Proceedings of IEEE, 1997, pp. 601–608.
501. H. Garcia-Compeán, J.M. López-Romero, M.A. Rodriguez-Segura, and M. So-
colovsky, Principal bundles, connections and BRST cohomology, Tech. report, Los
Alamos National Laboratory, hep-th/9408003, July 1994.
502. P. Gardenfors, Knowledge in flux: Modeling the dynamics of epistemic states, MIT
Press, Cambridge, MA, 1988.
503. Thomas D. Garvey, John D. Lowrance, and Martin A. Fischler, An inference technique
for integrating knowledge from disparate sources, Proceedings of the 7th International
Joint Conference on Artificial Intelligence - Volume 1 (San Francisco, CA, USA),
IJCAI’81, Morgan Kaufmann Publishers Inc., 1981, pp. 319–325.
504. W. L. Gau and D. J. Buehrer, Vague sets, IEEE Transactions on Systems, Man, and
Cybernetics 23 (1993), no. 2, 610–614.
505. D. M. Gavrila, The visual analysis of human movement: A survey, Computer Vision
and Image Understanding, vol. 73, 1999, pp. 82–98.
506. D. M. Gavrila and L. S. Davis, 3D model-based tracking of humans in action: A
multi-view approach, Proceedings of CVPR’96, San Francisco, CA, 18-20 June 1996,
pp. 73–80.
507. , Towards 3D model-based tracking and recognition of human movement:
A multi-view approach, International Workshop on Face and Gesture Recognition,
Zurich, 1995.
508. D.M. Gavrila, The visual analysis of human movement: a survey, Computer Vision
and Image Understanding 73 (1999), 82–98.
509. Jrg Gebhardt and Rudolf Kruse, The context model: An integrating view of vagueness
and uncertainty, International Journal of Approximate Reasoning 9 (1993), no. 3, 283
– 314.
510. W. Genxiu, Belief function combination and local conflict management, Computer
Engineering and Applications 40.
511. See Ng Geok and Singh Harcharan, Data equalisation with evidence combination for
pattern recognition, Pattern Recognition Letters 19 (1998), 227–235.
512. T. George and N.R. Pal, Quantification of conflict in Dempster-Shafer framework: a
new approach, International Journal Of General System 24.
513. Janos J. Gertler and Kenneth C. Anderson, An evidential reasoning extension to quan-
titative model-based failure diagnosis, IEEE Transactions on Systems, Man, and Cy-
bernetics 22:2 (March/April 1992), 275–289.
514. G. Giacinto, R. Paolucci, and F. Roli, Application of neural networks and statisti-
cal pattern recognition algorithms to earthquake risk evaluation, Pattern Recognition
Letters 18 (1997), 1353–1362.
515. M. A. Giese and T. Poggio, Morphable models for the analysis and synthesis of com-
plex motion patterns, International Journal of Computer Vision, vol. 38(1), 2000,
pp. 1264–1274.
References 575

516. I. Gilboa and D. Schmeidler, Updating ambiguous beliefs, Journal of economic theory
59 (1993), 33–49.
517. , Additive representations of non-additive measures and the choquet integral,
Annals of Operations Research 52 (1994), no. 1, 43–65.
518. Itzhak Gilboa and David Schmeidler, Updating ambiguous beliefs, Journal of Eco-
nomic Theory 59 (1993), no. 1, 33 – 49.
519. Itzhak Gilboa and David Schmeidler, Additive representations of non-additive mea-
sures and the choquet integral, Annals of Operations Research 52 (1994), no. 1, 43–
65.
520. R. Giles, Foundations for a theory of possibility, Fuzzy Information and Decision
Processes (1982), 183–195.
521. Peter R. Gillett, Monetary unit sampling: a belief-function implementation for au-
dit and accounting applications, International Journal of Approximate Reasoning 25
(2000), 43–70.
522. M. L. Ginsberg, Non-monotonic reasoning using Dempster’s rule, Proc. 3rd National
Conference on AI (AAAI-84), 1984, pp. 126–129.
523. Lluı́s Godo, Petr Hájek, and Francesc Esteva, A fuzzy modal logic for belief functions,
Fundam. Inf. 57 (2003), no. 2-4, 127–146.
524. M. Goldszmidt and J. Pearl, Default ranking: A practical framework for evidential
reasoning, belief revision and update, In Proceedings of the 3rd International Confer-
ence on Knowledge Representation and Reasoning, 1992, pp. 661–672.
525. Forouzan Golshani, Enrique Cortes-Rello, and Thomas H. Howell, Dynamic route
planning with uncertain information, Knowledge-based Systems 9 (1996), 223–232.
526. I. J. Good, Subjective probability as the measure of a non-measurable set, Logic,
Methodology, and Philosophy of Science (P. Suppes E. Nagel and A. Tarski, eds.),
Stanford Univ. Press, 1962, pp. 319–329.
527. I. R. Goodman, Fuzzy sets as equivalence classes of random sets, Recent Develop-
ments in Fuzzy Sets and Possibility Theory, 1982, p. 327 343.
528. I. R. Goodman and Hung T. Nguyen, Uncertainty models for knowledge-based sys-
tems, North Holland, New York, 1985.
529. J. Gordon and E. H. Shortliffe, Readings in uncertain reasoning, Morgan Kaufmann
Publishers Inc., San Francisco, CA, USA, 1990, pp. 529–539.
530. J. Gordon and E. H. Shortliffe, A method for managing evidential reasoning in a hi-
erarchical hypothesis space: a retrospective, Artificial Intelligence 59:1-2 (February
1993), 43–47.
531. J. Gordon and Edward H. Shortliffe, A method for managing evidential reasoning in
hierarchical hypothesis spaces, Artificial Intelligence 26 (1985), 323–358.
532. Jean Gordon and Edward H. Shortliffe, A method for managing evidential reasoning
in a hierarchical hypothesis space, Artificial Intelligence 26 (1985), 323–357.
533. Jean Goubault-Larrecq, Automata, languages and programming: 34th international
colloquium, icalp 2007, wrocław, poland, july 9-13, 2007. proceedings, ch. Continu-
ous Capacities on Continuous State Spaces, pp. 764–776, Springer Berlin Heidelberg,
Berlin, Heidelberg, 2007.
534. John Goutsias, Modeling random shapes: an introduction to random closed set the-
ory, Tech. report, Department of Electrical and Computer Engineering, John Hopkins
University, Baltimore, JHU/ECE 90-12, April 1998.
535. John Goutsias, Ronald P.S. Mahler, and Hung T. Nguyen, Random sets: theory and
applications (IMA Volumes in Mathematics and Its Applications, Vol. 97), Springer-
Verlag, December 1997.
576 References

536. M. Grabisch, K-order additive discrete fuzzy measures and their representation, Fuzzy
sets and systems 92 (1997), 167–189.
537. , The Moebius transform on symmetric ordered structures and its application
to capacities on finite sets, Discrete Mathematics 287 (1-3) (2004), 17–34.
538. , Belief functions on lattices, Int. J. of Intelligent Systems (2006).
539. M. Grabisch, T. Murofushi, and M. Sugeno, Fuzzy measures and integrals: theory and
applications, New York: Springer, 2000.
540. Michel Grabisch, Belief functions on lattices, CoRR abs/0811.3373 (2008).
541. Michel Grabisch, Hung T. Nguyen, and Elbert A. Walker, Fundamentals of uncer-
tainty calculi with applications to fuzzy inference, Kluwer Academic Publishers, 1995.
542. M. Grabish, Belief functions on lattices, Int. J. of Intelligent Systems (2009), 1–20.
543. Siegfried Graf, A radon-nikodym theorem for capacities., Journal fr die reine und
angewandte Mathematik 320 (1980), 192–214.
544. K. Grauman, G. Shakhnarovich, and T.J. Darrell, Inferring 3d structure with a statis-
tical image-based shape model, 2003, pp. 641–648.
545. Frank J. Groen and Ali Mosleh, Foundations of probabilistic inference with uncertain
evidence, International Journal of Approximate Reasoning 39 (2005), no. 1, 49 – 83.
546. E. Grosicki, M. Carre, J.M. Brodin, and E. Geoffrois, Results of the rimes evaluation
campaign for handwritten mail processing, International Conference on Document
Analysis and Recognition 0 (2009), 941–945.
547. Benjamin N. Grosof, Evidential confirmation as transformed probability, CoRR
abs/1304.3439 (2013).
548. , An inequality paradigm for probabilistic knowledge, CoRR abs/1304.3418
(2013).
549. J. Guan, D. A. Bell, and V. R. Lesser, Evidential reasoning and rule strengths in expert
systems, Proceedings of AI and Cognitive Science ’90 (N.. McTear, M.F.; Creaney,
ed.), Ulster, UK, 20-21 September 1990, pp. 378–390.
550. J. W. Guan and D. A. Bell, Generalizing the dempster-shafer rule of combination to
boolean algebras, Developing and Managing Intelligent System Projects, 1993., IEEE
International Conference on, Mar 1993, pp. 229–236.
551. , The Dempster-Shafer theory on Boolean algebras, Chinese Journal of Ad-
vanced Software Research 3:4 (November 1996), 313–343.
552. , Evidential reasoning in intelligent system technologies, Proceedings of
the Second Singapore International Conference on Intelligent Systems (SPICIS’94),
vol. 1, Singapore, 14-17 November 1994, pp. 262–267.
553. , A linear time algorithm for evidential reasoning in knowledge base systems,
Proceedings of the Third International Conference on Automation, Robotics and Com-
puter Vision (ICARCV ’94), vol. 2, Singapore, 9-11 November 1994, pp. 836–840.
554. J. W. Guan, D. A. Bell, and Z. Guan, Evidential reasoning in expert systems: computa-
tional methods, Proceedings of the Seventh International Conference on Industrial and
Engineering Applications of Artificial Intelligence and Expert Systems (IEA/AIE-94)
(F.D. Anger, R.V. Rodriguez, and M. Ali, eds.), Austin, TX, USA, 31 May - 3 June
1994, pp. 657–666.
555. Jiwen Guan and David A. Bell, Discounting and combination operations in evidential
reasoning, CoRR abs/1303.1511 (2013).
556. Jiwen Guan, David A. Bell, and Victor R. Lesser, Ai and cognitive science ’90: Uni-
versity of ulster at jordanstown 20–21 september 1990, ch. Evidential Reasoning and
Rule Strengths in Expert Systems, pp. 378–390, Springer London, London, 1991.
557. Jiwen Guan, Jasmina Pavlin, and Victor R. Lesser, Combining evidence in the extended
dempster-shafer theory, pp. 163–178, Springer London, London, 1990.
References 577

558. J.W. Guan and D.A. Bell, Approximate reasoning and evidence theory, Information
Sciences 96 (1997), no. 3, 207 – 235.
559. Zhang Guang-Quan, Semi-lattice structure of all extensions of the possibility measure
and the consonant belief function on the fuzzy set, Fuzzy Sets and Systems 43 (1991),
no. 2, 183 – 188.
560. M. Guironnet, D. Pellerin, and Mich‘ele Rombaut, Camera motion classification
based on the transferable belief model, Proceedings of EUSIPCO’06, Florence, Italy,
2006.
561. DENG Yong GUO Hua-wei, SHI Wen-kang and CHEN Zhi-jun, Evidential conflict
and its 3d strategy: discard, discover and disassemble?, Systems Engineering and
Electronics 6 (2007).
562. Michael A. S. Guth, Uncertainty analysis of rule-based expert systems with dempster-
shafer mass assignments, International Journal of Intelligent Systems 3 (1988), no. 2,
123–139.
563. Q. Liu H. Guo, W. Shi and Y. Deng, A new combination rule of evidence, Journal of
Shanghai Jiaotong University 40 (2006), no. 11, 1895–1900.
564. R. Chellappa H. Moon and A. Rosenfeld, 3D object tracking using shape-encoded par-
ticle propagation, Proceeding of the Eighth IEEE International Conference on Com-
puter Vision (ICCV’01), Vancouver, Canada, July 9-12, 2001, pp. 307–314.
565. H.S. Sawhney H. Tao and R. Kumar, Object tracking with Bayesian estimation of
dynamic layer representation, IEEE Transactions on PAMI 24 (January 2002), 75–89.
566. V. Ha and P. Haddawy, Geometric foundations for interval-based probabilities,
KR’98: Principles of Knowledge Representation and Reasoning (Anthony G. Cohn,
Lenhart Schubert, and Stuart C. Shapiro, eds.), San Francisco, California, 1998,
pp. 582–593.
567. , Theoretical foundations for abstraction-based probabilistic planning, Proc.
of the 12th Conference on Uncertainty in Artificial Intelligence, August 1996,
pp. 291–298.
568. M. Ha-Duong, Hierarchical fusion of expert opinion in the transferable belief model,
application on climate sensivity, Working Papers halshs-00112129-v3, HAL, 2006.
569. Peter Haddawy, A variable precision logic inference system employing the dempster-
shafer uncertainty calculus, PhD dissertation, University of Illinois at Urbana-
Champaign.
570. R. Haenni, Are alternatives to dempster’s rule of combination real alternatives?: Com-
ments on about the belief function combination and the conflict management”, Infor-
mation Fusion 3 (2002), 237–239.
571. , Shedding new light on zadeh’s criticism of dempster’s rule of combination,
2005 7th International Conference on Information Fusion, vol. 2, July 2005, pp. 6
pp.–.
572. , Towards a unifying theory of logical and probabilistic reasoning, Proceed-
ings of ISIPTA’05, 2005.
573. , Aggregating referee scores: an algebraic approach, COMSOC’08, 2nd In-
ternational Workshop on Computational Social Choice (U. Endriss and W. Goldberg,
eds.), 2008, pp. 277–288.
574. R. Haenni and N. Lehmann, Resource bounded and anytime approximation of belief
function computations, International Journal of Approximate Reasoning 31(1-2) (Oc-
tober 2002), 103–154.
575. R. Haenni, J.W. Romeijn, G. Wheeler, and J. Williamson, Possible semantics for a
common framework of probabilistic logics, UncLog’08, International Workshop on
578 References

Interval/Probabilistic Uncertainty and Non-Classical Logics (Ishikawa, Japan) (V. N.


Huynh, Y. Nakamori, H. Ono, J. Lawry, V. Kreinovich, and H. T. Nguyen, eds.), Ad-
vances in Soft Computing, no. 46, pp. 268–279.
576. Rolf Haenni, Ordered valuation algebras: a generic framework for approximating
inference, International Journal of Approximate Reasoning 37 (2004), no. 1, 1 – 41.
577. Rolf Haenni and Stephan Hartmann, Modeling partially reliable information sources:
A general approach based on dempstershafer theory, Information Fusion 7 (2006),
no. 4, 361 – 379, Special Issue on the Seventh International Conference on Information
Fusion-Part {ISeventh} International Conference on Information Fusion.
578. Rolf Haenni and Norbert Lehmann, Probabilistic argumentation systems: a new per-
spective on the dempster-shafer theory, International Journal of Intelligent Systems 18
(2003), no. 1, 93–106.
579. G. Hager, S-W. Lee, and B-J. You, Model-based 3-d object tracking using projective
invariance, Proceedings of the International Conference on Robotics and Automation,
1999.
580. P. Hajek, Deriving Dempster’s rule, Proceeding of IPMU’92, 1992, pp. 73–75.
581. P. Hájek, Proceedings of the issek94 workshop on mathematical and statistical meth-
ods in artificial intelligence, ch. On Logics of Approximate Reasoning II, pp. 147–
155, Springer Vienna, Vienna, 1995.
582. P. Hajek, Getting belief functions from kripke models, International Journal of General
Systems 24 (1996), 325–327.
583. , A note on belief functions in mycin-like systems, Proceedings of Aplikace
Umele Inteligence AI ’90, Prague, Czechoslovakia, 20-22 March 1990, pp. 19–26.
584. P. HAJEK and D. HARMANEC, An exercise in Dempster-Shafer theory, International
Journal of General Systems 20 (1992), no. 2, 137–142.
585. P. Hajek and D. Harmanec, On belief functions (the present state of Dempster-Shafer
theory), Advanced topics in AI (Marik, ed.), Springer-Verlag, 1992.
586. Petr Hájek, Knowledge representation and reasoning under uncertainty: Logic at
work, ch. On logics of approximate reasoning, pp. 17–29, Springer Berlin Heidelberg,
Berlin, Heidelberg, 1994.
587. Jim W. Hall and Jonathan Lawry, Generation, combination and extension of random
set approximations to coherent lower and upper probabilities, Reliability Engineering
System Safety 85 (2004), no. 13, 89 – 101, Alternative Representations of Epistemic
Uncertainty.
588. J. Y. Halpern and R. Fagin, Two views of belief: belief as generalized probability and
belief as evidence, Artificial Intelligence 54 (1992), 275–317.
589. Joseph Y. Halpern and Ronald Fagin, Two views of belief: Belief as generalized prob-
ability and belief as evidence, Artif. Intell. 54 (1992), no. 3, 275–317.
590. Joseph Y. Halpern and Riccardo Pucella, A logic for reasoning about evidence, Pro-
ceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (San
Francisco, CA, USA), UAI’03, Morgan Kaufmann Publishers Inc., 2003, pp. 297–304.
591. Joseph Y. Halpern and Riccardo Pucella, Reasoning about expectation, CoRR
abs/1407.7184 (2014).
592. J.Y. Halpern, Reasoning about uncertainty, MIT Press, 2003.
593. Deqiang Han, Chongzhao Han, and Yi Yang, A modified evidence combination ap-
proach based on ambiguity measure, Information Fusion, 2008 11th International
Conference on, June 2008, pp. 1–6.
594. F. Harary and W. T. Tutte, Matroids versus graphs, The many facets of graph theory,
Lecture Notes in Math., Vol. 110, Springer-Verlag, Berlin, 1969, pp. 155–170.
References 579

595. D. Harmanec, G. Klir, and G. Resconi, On modal logic inpterpretation of Dempster-


Shafer theory, International Journal of Intelligent Systems 9 (1994), 941–951.
596. D. Harmanec, G. Klir, and Z. Wang, Modal logic inpterpretation of Dempster-Shafer
theory: an infinite case, International Journal of Approximate Reasoning 14 (1996),
81–93.
597. David Harmanec, Toward a characterisation of uncertainty measure for the Dempster-
Shafer theory, Proceedings of the Eleventh Conference on Uncertainty in Artificial
Intelligence (S. Besnard, P.; Hanks, ed.), Montreal, Que., Canada, 18-20 August 1995,
pp. 255–261.
598. David Harmanec and Petr Hajek, A qualitative belief logic, International Journal of
Uncertainty, Fuzziness and Knowledge-Based Systems (1994).
599. David Harmanec, George J. Klir, and Zhenyuan Wang, Modal logic interpretation of
dempster-shafer theory: An infinite case, International Journal of Approximate Rea-
soning 14 (1996), no. 23, 81 – 93.
600. C.J. Harris, Tracking with rigid models, Active Vision (A. Black and A. Yuille, eds.),
MIT Press, Cambridge, MA, 1992.
601. H. Y. Hau and R. L. Kashyap, On the robustness of dempster’s rule of combina-
tion, Tools for Artificial Intelligence, 1989. Architectures, Languages and Algorithms,
IEEE International Workshop on, Oct 1989, pp. 578–582.
602. Kanako Hayashi, Lionel Heng, and Vikram Srivastava, Pose estimation from occluded
images, 2006.
603. David Heckerman, An empirical comparison of three inference methods, CoRR
abs/1304.2357 (2013).
604. , Probabilistic interpretations for mycin’s certainty factors, CoRR
abs/1304.3419 (2013).
605. Sylvie Le Hegarat-Mascle, Isabelle Bloch, and D. Vidal-Madjar, Application of
Dempster-Shafer evidence theory to unsupervised clasification in multisource remote
sensing, IEEE Transactions on Geoscience and Remote Sensing 35:4 (July 1997),
1018–1031.
606. Stanislaw Heilpern, Representation and application of fuzzy numbers, Fuzzy Sets and
Systems 91 (1997), 259–268.
607. Y. Hel-Or and M. Werman, Constraint fusion for recognition and localization of ar-
ticulated objects, Int. J. Computer Vision 19 (1996), 5–28.
608. J.C. Helton, J.D. Johnson, and W.L. Oberkampf, An exploration of alternative ap-
proaches to the representation of uncertainty in model predictions, Reliability Engi-
neering System Safety 85 (2004), no. 13, 39 – 71, Alternative Representations of
Epistemic Uncertainty.
609. Ebbe Hendon, Hans Jorgen Jacobsen, Birgitte Sloth, and Torben Tranaes, The product
of capacities and belief functions, Mathematical Social Sciences 32 (1996), 95–108.
610. L. Herda, P. Fua, R. Plankers, R. Boulic, and D. Thalmann, Skeleton-based motion
capture for robust reconstruction of human motion, Computer Animation (May 2000).
611. T. Herron, T. Seidenfeld, and L. Wasserman, Divisive conditioning: further results on
dilation, Philosophy of Science 64 (1997), 411–444.
612. H.T. Hestir, H.T. Nguyen, and G.S. Rogers, A random set formalism for evidential
reasoning, Conditional Logic in Expert Systems, North Holland, 1991, pp. 309–344.
613. A. Hilton, Towards model-based capture of a person’s shape, appearance and motion,
International Workshop on Modeling People at ICCV’99, Corfu, Greece, September
1999.
614. PETR HJEK, Getting belief functions from kripke models, International Journal of
General Systems 24 (1996), no. 3, 325–327.
580 References

615. Eyke Hllermeier, Similarity-based inference as evidential reasoning, International


Journal of Approximate Reasoning 26 (2001), no. 2, 67 – 100.
616. J. Hodges, S. Bridges, C. Sparrow, B. Wooley, B. Tang, and C. Jun, The development of
an expert system for the characterization of containers of contaminated waste, Expert
Systems with Applications 17 (1999), 167–181.
617. J. Hoey and J. J. Little, Representation and recognition of complex human motion,
Proc. of the Conference on Computer Vision and Pattern Recognition, vol. 1, 2000,
pp. 752–759.
618. James C. Hoffman and Robin R. Murphy, Comparison of bayesian and dempster-
shafer theory for sensing: A practitioner’s approach, in SPIE Proc. on Neural and
Stochastic Methods in Image and Signal Processing II, 1993, pp. 266–279.
619. D. Hogg, Model-based vision: A program to see a walking person, Image Vision Com-
put. 1 (1983), 5–20.
620. A. Honda and M. Grabisch, Entropy of capacities on lattices and set systems, To ap-
pear in Information Science (2006).
621. Lang Hong, Recursive algorithms for information fusion using belief functions with
applications to target identification, Proceedings of IEEE, 1992, pp. 1052–1057.
622. TAKAHIKO HORIUCHI, A new theory of evidence for non-exclusive elementary
propositions, International Journal of Systems Science 27 (1996), no. 10, 989–994.
623. Takahiko Horiuchi, Decision rule for pattern classification by integrating interval
feature values, IEEE Transactions on Pattern Analysis and Machine Intelligence 20
(1998), 440–448.
624. Kevin S Van Horn, Constructing a logic of plausible inference: a guide to coxs theo-
rem, International Journal of Approximate Reasoning 34 (2003), no. 1, 3 – 24.
625. N. Howe, M. Leventon, and W. Freeman, Bayesian reconstruction of 3D human mo-
tion from single-camera video, Neural Information Processing Systems, Denver, Col-
orado, November 1999.
626. N.R. Howe, Silhouette lookup for automatic pose tracking, 2004, p. 15.
627. Y. Hsia and Prakash P. Shenoy, An evidential language for expert systems, Method-
ologies for Intelligent Systems (Ras Z., ed.), North Holland, 1989, pp. 9–16.
628. , Macevidence: A visual evidential language for knowledge-based systems,
Tech. report, No 211, School of Business, University of Kansas, 1989.
629. Y. T. Hsia, A belief function semantics for cautious non-monotonicity, Tech. report,
Technical Report TR/IRIDIA/91-3, Univeriste’ Libre de Bruxelles, 1991.
630. , Characterizing belief functions with minimal commitment, Proceedings of
IJCAI-91, 1991, pp. 1184–1189.
631. Yen-Teh Hsia, Characterizing belief with minimum commitment, Proceedings of the
12th International Joint Conference on Artificial Intelligence - Volume 2 (San Fran-
cisco, CA, USA), IJCAI’91, Morgan Kaufmann Publishers Inc., 1991, pp. 1184–1189.
632. Y.T. Hsia and Ph. Smets, Belief functions and non-monotonic reasoning, Tech. report,
Université Libre de Bruxelles, Technical Report IRIDIA/TR/1990/3, 1990.
633. Lifang HU, Xin GUAN, Yong DENG, Deqiang HAN, and You HE, Measuring conflict
functions in generalized power space, Chinese Journal of Aeronautics 24 (2011), no. 1,
65 – 73.
634. T.S. Huang, Modeling, analysis and visualization on nonrigid object motion, Proc. of
the 10th IEEE Int. Conf. on Pattern Recognition, vol. 1, 1990, pp. 361–364.
635. R. Hummel and M. Landy, A statistical viewpoint on the theory of evidence, IEEE
Transactions on PAMI (1988), 235–247.
636. R.A. Hummel and L.M. Manevitz, Combining bodies of dependent information, Pro-
ceedings of IJCAI, 1987, pp. 1015–1017.
References 581

637. A. Hunter and W. Liu, Fusion rules for merging uncertain information, Information
Fusion 7(1) (2006), 97–134.
638. D. Hunter, Dempster-Shafer versus probabilistic logic, Proceedings of the Third
AAAI Uncertainty in Artificial Intelligence Workshop, 1987, pp. 22–29.
639. Daniel Hunter, Dempster-shafer vs. probabilistic logic, CoRR abs/1304.2713 (2013).
640. E. Hunter, Visual estimation of articulated motion using the expectation-constrained
maximization algorithm, PhD dissertation, University of California at San Diego, Oc-
tober 1999.
641. E.A. Hunter, P.H. Kelly, and R.C. Jain, Estimation of articulated motion using kine-
matically constrained mixture densities, Workshop on Motion of Non-Rigid and Ar-
ticulated Objects, Puerto Rico, USA, 1997.
642. V.-N. Huynh, Y. Nakamori, H. Ono, J. Lawry, V. Kreinovich, and H.T. Nguyen (eds.),
Interval / probabilistic uncertainty and non-classical logics, Springer, 2008.
643. I. Iancu, Prosum-prolog system for uncertainty management, International Journal of
Intelligent Systems 12 (1997), 615–627.
644. Laurie Webster II, Jen-Gwo Chen, Simon S. Tan, Carolyn Watson, and André de Ko-
rvin, Vadidation of authentic reasoning expert systems, Information Sciences 117
(1999), 19–46.
645. S. S. Intille and A. F. Bobick, Visual recognition of multi agent action using binary
temporal relations, Proc. of the Conf. on Computer Vision and Pattern Recognition,
vol. 1, 1999, pp. 56–62.
646. Horace H. S. Ip and Richard C. K. Chiu, Evidential reasonign for facial gesture recog-
nition from cartoon images, Proceedings of IEEE, 1994, pp. 397–401.
647. Horace H. S. Ip and Hon-Ming Wong, Evidential reasonign in foreign exchange rates
forecasting, Proceedings of IEEE, 1991, pp. 152–159.
648. Michael Isard and Andrew Blake, Contour tracking by stochastic propagation of
conditional density, Proceedings of the European Conference of Computer Vision
(ECCV96), 1996, pp. 343–356.
649. Mitsuru Ishizuka, Inference methods based on extended Dempster & Shafer’s theory
for problems with uncertainty/fuzziness, New Generation Computing 1 (1983), no. 2,
159–168.
650. Mitsuru Ishizuka, K.S. Fu, and James T.P. Yao, Inference procedures under uncer-
tainty for the problem-reduction method, Information Sciences 28 (1982), no. 3, 179
– 206.
651. M. Itoh and T. Inagaki, A new conditioning rule for belief updating in the Dempster-
Shafer theory of evidence, Transactions of the Society of Instrument and Control En-
gineers 31:12 (December 1995), 2011–2017.
652. Y. A. Ivanov and A. F. Bobick, Recognition of visual activities and interactions by
stochastic parsing, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.
22(8), 2000, pp. 852–872.
653. Y. Iwai, K. Ogaki, and M. Yachida, Posture estimation using structure and motion
models, ICCV’99, Corfu, Greece, September 1999.
654. S. Iwasawa, J. Ohya, K. Takahashi, T. Sakaguchi, S. Kawato, K. Ebihara, and S. Mor-
ishima, Real-time estimation of human body posture from trinocular images, Interna-
tional Workshop on Modeling People at ICCV’99, Corfu, Greece, September 1999.
655. B. Bascle J. Deutscher, B. North and A. Blake, Tracking through singularities and
discontinuities by random sampling, Proceedings of ICCV’99, 1999, pp. 1144–1149.
656. D. Dubois J. Ma, W. Liu and H. Prade, Revision rules in the theory of evidence, Pro-
ceedings of ICTAI 2010, vol. 1, 2010, pp. 295–302.
582 References

657. Nathan Jacobson, Basic algebra I, Freeman and Company, New York, 1985.
658. J. Y. Jaffray, Application of linear utility theory for belief functions, Uncertainty and
Intelligent Systems, Springer-Verlag, Berlin, 1988, pp. 1–8.
659. , Coherent bets under partially resolving uncertainty and belief functions, The-
ory and Decision 26 (1989), 99–105.
660. , Linear utility theory for belief functions, Operation Research Letters 8
(1989), 107–112.
661. , Bayesian updating and belief functions, IEEE Transactions on Systems, Man
and Cybernetics 22 (1992), 1144–1152.
662. J. Y. Jaffray and P. P. Wakker, Decision making with belief functions: compatibility
and incompatibility with the sure-thing principle, Journal of Risk and Uncertainty 8
(1994), 255–271.
663. JEAN-YVES JAFFRAY, On the maximum-entropy probability which is consis-
tent with a convex capacity, International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems 03 (1995), no. 01, 27–33.
664. J.Y. Jaffray, Dynamic decision making with belief functions, Advances in the
Dempster-Shafer Theory of Evidence (M. Fedrizzi R.R. Yager and J. Kacprzyk, eds.),
Wiley, New York, 1994, pp. 331–352.
665. F. Janez, Fusion de sources d’information definies sur des referentiels non exhaustifs
differents. solutions proposees sous le formalisme de la theorie de l’evidence, PhD
dissertation, University of Angers, France.
666. Fabrice Janez and Alain Appriou, Theory of evidence and non-exhaustive frames of
discernment: Plausibilities correction methods, International Journal of Approximate
Reasoning 18 (1998), no. 1, 1 – 19.
667. R. Jeffrey, Conditioning, kinematics, and exchangeability, Causation, chance, and cre-
dence 1 (1988), 221–255.
668. R.C. Jeffrey, The logic of decision, Mc Graw - Hill, 1965.
669. W. Jiang, A. Zhang, and Q. Yang, A new method to determine evidence discounting
coefficient, Lecture Notes in Computer Science, vol. 5226/2008, 2008, pp. 882–887.
670. Qiang Miao Jianping Yang, Hong-Zhong Huang and Rui Sun, A novel information
fusion method based on dempster-shafer evidence theory for conflict resolution, Intel-
ligent Data Analysis 15.
671. N. Jojic, J. Gu, H.C. Shen, and T. Huang, 3-d reconstruction of multipart self-
occluding objects, Asian Conference on Computer Vision, 1998.
672. A. Josang, M. Daniel, and P. Vannoorenberghe, Strategies for combining conflicting
dogmatic beliefs, Proceedings of Fusion 2003, vol. 2, 2003, pp. 1133–1140.
673. , Strategies for combining conflicting dogmatic beliefs, Information Fusion,
2003. Proceedings of the Sixth International Conference of, vol. 2, July 2003,
pp. 1133–1140.
674. Audun Jøsang, A logic for uncertain probabilities, Int. J. Uncertain. Fuzziness Knowl.-
Based Syst. 9 (2001), no. 3, 279–311.
675. Audun Jøsang and Zied Elouedi, Symbolic and quantitative approaches to reasoning
with uncertainty: 9th european conference, ecsqaru 2007, hammamet, tunisia, october
31 - november 2, 2007. proceedings, ch. Interpreting Belief Functions as Dirichlet
Distributions, pp. 393–404, Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
676. Audun Josang and Simon Pope, Dempster’s rule as seen by little colored balls, Com-
put. Intell. 28 (2012), no. 4, 453–474.
677. Audun Josang, Simon Pope, and David McAnally, Normalising the consensus opera-
tor for belief fusion, IPMU, 2006.
References 583

678. A. V. Joshi, S. C. Sahasrabudhe, and K. Shankar, Sensitivity of combination schemes


under conflicting conditions and a new method, pp. 39–48, Springer Berlin Heidel-
berg, Berlin, Heidelberg, 1995.
679. C. Joslyn, Towards an empirical semantics of possibility through maximum uncer-
tainty, Proc. IFSA 1991 (R. Lowen and M. Roubens, eds.), vol. A, 1991, pp. 86–89.
680. , Possibilistic normalization of inconsistent random intervals, Advances in
Systems Science and Applications (1997), 44–51.
681. C. Joslyn and S. Ferson, Approximate representations of random intervals for hybrid
uncertain quantification in engineering modeling, Proceedings of the 4th International
Conference on Sensitivity Analysis of Model Output (SAMO 2004) (F.M. Hemez
K.M. Hanson, ed.), 2004, p. 453469.
682. C. Joslyn and G. Klir, Minimal information loss possibilistic approximations of ran-
dom sets, Proc. 1992 FUZZ-IEEE Conference, San Diego, 1992, pp. 1081–1088.
683. Cliff Joslyn and Luis Rocha, Towards a formal taxonomy of hybrid uncertainty repre-
sentations, Information Sciences 110 (1998), 255–277.
684. A. Jouan, L. Gagnon, and E. Shahbazian P. Valin, Fusion of imagery attributes with
non-imaging sensor reports by truncated Dempster-Shafer evidential reasoning, Pro-
ceedings of the International Conference on Multisource-Multisensor Information Fu-
sion (FUSION’98) (R. Hamid, A. Zhu, and D. Zhu, eds.), vol. 2, Las Vegas, NV, USA,
6-9 July 1998, pp. 549–556.
685. A.-L. Jousselme and P. Maupin, On some properties of distances in evidence theory,
Proceedings of BELIEF’10, Brest, France, 2010.
686. , Distances in evidence theory: Comprehensive survey and generalizations,
International Journal of Approximate Reasoning (2011 (in press)).
687. A. Jsang and S. Pope, Normalising the consensus operator for belief fusion, 2006.
688. Audun Jsang, Artificial reasoning with subjective logic, 1997.
689. , The consensus operator for combining beliefs, Artificial Intelligence 141
(2002), no. 1, 157 – 170.
690. , Subjective evidential reasoning, In Proceedings of the International Confer-
ence on Information Processing and Management of Uncertainty (IPMU2002, 2002.
691. Audun Jsang, Javier Diaz, and Maria Rifqi, Cumulative and averaging fusion of be-
liefs, Information Fusion 11 (2010), no. 2, 192 – 200.
692. Audun Jsang and David McAnally, Multiplication and comultiplication of beliefs, In-
ternational Journal of Approximate Reasoning 38 (2005), no. 1, 19 – 51.
693. S.X. Ju, M.J. Black, and Y. Yacoob, Cardboard people: A parameterized model of
articulated motion, Proceedings of the International Conference on Automatic Face
and Gesture Recognition, 1996, pp. 38–44.
694. B. H. Juang and L. R. Rabiner, A probabilistic distance measure for hidden Markov
models, AT&amp;T Technical Journal Vol. 64(2) (February 1985), 391–408.
695. S. Jung and K. Wohn, Tracking and motion estimation of the articulated object: a
hierarchical kalman filter approach, Real-Time Imaging 3 (1997), 415–432.
696. F. Jurie, Model-based object tracking in cluttered scenes with occlusions, Intelligent
Robots and Systems IROS’97, vol. 2, 1997, pp. 886–892.
697. I. A. Kakadiaris and D. Metaxas, Model-based estimation of 3D human motion with
occlusion based on active multi-viewpoint selection, Proceedings of the Conference
on Computer Vision and Pattern Recognition CVPR’96, San Francisco, CA, 18-20
June 1996, pp. 81–87.
698. , Three-dimensional human body model acquisition from multiple views, In-
ternational Journal on Computer Vision 30 (1998), 191–218.
584 References

699. I. A. Kakadiaris, D. Metaxas, and R. Bajcsy, Active part-decomposition, shape and


motion estimations of articulated objects: A physics-based approach, Proceedings
of the Conference on Computer Vision and Pattern Recognition CVPR’94, 1994,
pp. 980–984.
700. Y. Kameda, M. Minoh, and K. Ikeda, Three dimensional pose estimation of an articu-
lated object from its silhouette image, Asian Conference on Computer Vision, 1993.
701. M. Karan, Frequency tracking and hidden Markov models, Ph.D. thesis, 1995.
702. R. Karlsoon and F. Gustafsson, Monte carlo data association for multiple target track-
ing, IEEE Workshop on Target Tracking, 2001.
703. M. Kayanuma and M. Hagiwara, A new method to detect object and estimate the
position and the orientation from an image using a 3-d model having feature points,
IEEE Conference on Systems, Man and Cybernetics SMC’99, vol. 4, 1999, pp. 931–
936.
704. D.G. Kendall, Stochastic geometry, ch. Foundations of a theory of random sets,
p. 322376, Wiley, London, 1974.
705. , A survey of the statistical theroy of shape, Statistical Science 4(2) (1989),
87–120.
706. R. Kennes, Evidential reasoning in a categorial perspective: conjunction and disjunc-
tion on belief functions, Uncertainty in Artificial Intelligence 6 (P. Smets B. Dámbrosio
and P. P. Bonissone, eds.), Morgan Kaufann, San Mateo, CA, 1991, pp. 174–181.
707. , Computational aspects of the moebius transformation of graphs, IEEE Trans-
actions on Systems, Man, and Cybernetics 22 (1992), 201–223.
708. R. Kennes and Philippe Smets, Computational aspects of the moebius transformation,
Uncertainty in Artificial Intelligence 6 (P.P. Bonissone, M. Henrion, L.N. Kanal, and
J.F. Lemmer, eds.), Elsevier Science Publishers, 1991, pp. 401–416.
709. , Fast algorithms for Dempster-Shafer theory, Uncertainty in Knowledge
Bases, Lecture Notes in Computer Science 521 (L.A. Zadeh B. Bouchon-Meunier,
R.R. Yager, ed.), Springer-Verlag, Berlin, 1991, pp. 14–23.
710. Y. Kessentini, T. Burger, and T. Paquet, Evidential ensemble hmm classifier for hand-
writing recognition, Proceedings of IPMU, 2010.
711. Y. Kessentini, T. Paquet, and A. Ben Hamadou, Off-line handwritten word recogni-
tion using multi-stream hidden markov models, Pattern Recognition Letters 30 (2010),
no. 1, 60–70.
712. J. M. Keynes, Fundamental ideas, A Treatise on Probability, Ch. 4 (1921).
713. V. Khatibi and G.A. Montazer, A new evidential distance measure based on belief
intervals, Scientia Iranica - Transactions D: Computer Science and Engineering and
Electrical Engineering 17 (2010), no. 2, 119–132.
714. Josef Kittler, Mohamad Hatef, Robert P.W. Duin, and Jiri Matas, On combining clas-
sifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998),
no. 3, 226–239.
715. D. A. Klain and G.-C. Rota, Introduction to geometric probability, Cambridge Uni-
versity Press, 1997.
716. F. Klawonn and E. Schwecke, On the axiomatic justification of dempster’s rule of
combination, International Journal of Intelligent Systems 7 (1992), no. 5, 469–478.
717. F. Klawonn and E. Schweke, On the axiomatic justification of Dempster’s rule of com-
bination, International Journal of Intelligent Systems 7 (1990), 469–478.
718. F. Klawonn and Philippe Smets, The dynamic of belief in the transferable belief model
and specialization-generalization matrices, Proceedings of the 8th Conference on Un-
certainty in Artificial Intelligence (DÁmbrosio B. Dubois D., Wellman M.P. and Smets
Ph., eds.), 1992, pp. 130–137.
References 585

719. Frank Klawonn and Philippe Smets, The dynamic of belief in the transferable belief
model and specialization-generalization matrices, Proceedings of the Eighth Interna-
tional Conference on Uncertainty in Artificial Intelligence (San Francisco, CA, USA),
UAI’92, Morgan Kaufmann Publishers Inc., 1992, pp. 130–137.
720. J. Klein and O. Colot, Automatic discounting rate computation using a dissent crite-
rion, Workshop on the theory of belief functions (BELIEF 2010), 2010, pp. 1–6.
721. John Klein, Christle Lecomte, and Pierre Mich, Hierarchical and conditional combi-
nation of belief functions induced by visual tracking, International Journal of Approx-
imate Reasoning 51 (2010), no. 4, 410 – 428.
722. G. J. Klir, Dynamic decision making with belief functions, Measures of uncertainty in
the Dempster-Shafer theory of evidence (M. Fedrizzi R. R. Yager and J. Kacprzyk,
eds.), Wiley, New York, 1994, pp. 35–49.
723. G. J. Klir and T. A. Folger, Fuzzy sets, uncertainty and information, Prentice Hall,
Englewood Cliffs (NJ), 1988.
724. G. J. Klir and A. Ramer, Uncertainty in the Dempster-Shafer theory: a critical re-
examination, International Journal of General Systems 18 (1990), 155–166.
725. G. J. Klir and B. Yuan, Fuzzy sets and fuzzy logic: theory and applications, Prentice
Hall PTR, Upper Saddle River, NJ, 1995.
726. George J. Klir, Principles of uncertainty: What are they? why do we need them?, Fuzzy
Sets and Systems 74 (1995), 15–31.
727. , On fuzzy-set interpretation of possibility theory, Fuzzy Sets and Systems 108
(1999), 263–273.
728. , Generalized information theory: aims, results, and open problems, Reliabil-
ity Engineering System Safety 85 (2004), no. 13, 21 – 38, Alternative Representations
of Epistemic Uncertainty.
729. , Generalized information theory: aims, results, and open problems, Reliabil-
ity Engineering System Safety 85 (2004), no. 13, 21 – 38, Alternative Representations
of Epistemic Uncertainty.
730. George J. Klir and David Harmanec, Generalized information theory, Kybernetes 25
(1996), no. 7/8, 50–67.
731. George J. Klir, Wang Zhenyuan, and David Harmanec, Constructing fuzzy measures
in expert systems, Fuzzy Sets and Systems 92 (1997), 251–264.
732. M. A. Klopotek, A. Matuszewski, and S. T. Wierzchon, Overcoming negative-valued
conditional belief functions when adapting traditional knowledge acquisition tools to
Dempster-Shafer theory, Proceedings of the International Conference on Computa-
tional Engineering in Systems Applications, Symposium on Modelling, Analysis and
Simulation, CESA ’96 IMACS Multiconference, vol. 2, Lille, France, 9-12 July 1996,
pp. 948–953.
733. M.A. Klopotek and S.T. Wierzchon, An interpretation for the conditional belief func-
tion in the theory of evidence, Foundations of intelligent systems - Lecture Notes in
Computer Science, vol. 1609/1999, Springer Berlin/Heidelberg, 1999, pp. 494–502.
734. Mieczysław A. Kłopotek and Sławomir T. Wierzchoń, Rough sets and current trends
in computing: First international conference, rsctc’98 warsaw, poland, june 22–26,
1998 proceedings, ch. A New Qualitative Rough-Set Approach to Modeling Belief
Functions, pp. 346–354, Springer Berlin Heidelberg, Berlin, Heidelberg, 1998.
735. Mieczysław Alojzy Kłopotek and Sławomir Tadeusz Wierzchoń, Belief functions in
business decisions, ch. Empirical Models for the Dempster-Shafer-Theory, pp. 62–
112, Physica-Verlag HD, Heidelberg, 2002.
736. W.W. Koczkodaj, A new definition of consistency of pairwise comparisons, Mathemat-
ical and Computer Modelling 18 (1993), no. 7, 79 – 84.
586 References

737. E. T. Kofler and C. T. Leondes, Algorithmic modifications to the theory of evidential


reasoning, Journal of Algorithms 17:2 (September 1994), 269–279.
738. Jurg Kohlas, Modeling uncertainty for plausible reasoning with belief, Tech. Report
116, Institute for Automation and Operations Research, University of Fribourg, 1986.
739. , The logic of uncertainty. potential and limits of probability. theory for man-
aging uncertainty in expert systems, Tech. Report 142, Institute for Automation and
Operations Research, University of Fribourg, 1987.
740. , Conditional belief structures, Probability in Engineering and Information
Science 2 (1988), no. 4, 415–433.
741. , Modeling uncertainty with belief functions in numerical models, Europ. J. of
Operational Research 40 (1989), 377–388.
742. , Evidential reasoning about parametric models, Tech. Report 194, Institute
for Automation and Operations Research, University Fribourg, 1992.
743. , Support and plausibility functions induced by filter-valued mappings, Int. J.
of General Systems 21 (1993), no. 4, 343–363.
744. , Mathematical foundations of evidence theory, Tech. Report 94-09, Institute
of Informatics, University of Fribourg, 1994, Lectures to be presented at the Interna-
tional School of Mathematics “G. Stampacchia” Mathematical Methods for Handling
Partial Knowledge in Artificial Intelligence Erice, Sicily, June 19-25, 1994.
745. , Mathematical foundations of evidence theory, Mathematical Models for Han-
dling Partial Knowledge in Artificial Intelligence (G. Coletti, D. Dubois, and R. Scoz-
zafava, eds.), Plenum Press, 1995, pp. 31–64.
746. , The mathematical theory of evidence – a short introduction, System Mod-
elling and Optimization (J. Dolezal, ed.), Chapman and Hall, 1995, pp. 37–53.
747. , Allocation of arguments and evidence theory, Theoretical Computer Science
171 (1997), 221–246.
748. Jurg Kohlas and P. Besnard, An algebraic study of argumentation systems and evidence
theory, Tech. Report 95–13, Institute of Informatics, University of Fribourg, 1995.
749. Jurg Kohlas and H.W. Brachinger, Argumentation systems and evidence theory, Ad-
vances in Intelligent Computing – IPMU’94, Paris (B. Bouchon-Meunier, R.R. Yager,
and L.A. Zadeh, eds.), Springer, 1994, pp. 41–50.
750. Jürg Kohlas and Christian Eichenberger, Uncertain information, pp. 128–160,
Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.
751. Jurg Kohlas and Paul-André Monney, Modeling and reasoning with hints, Tech. Re-
port 174, Institute for Automation and Operations Research, University of Fribourg,
1990.
752. , Propagating belief functions through constraint systems, Int. J. Approximate
Reasoning 5 (1991), 433–461.
753. , Representation of evidence by hints, Advances in the Dempster-Shafer The-
ory of Evidence (R.R. Yager, J. Kacprzyk, and M. Fedrizzi, eds.), John Wiley, New
York, 1994, pp. 473–492.
754. , Theory of evidence - a survey of its mathematical foundations, applications
and computational anaylsis, ZOR- Mathematical Methods of Operations Research 39
(1994), 35–68.
755. , A mathematical theory of hints - an approach to the Dempster-Shafer theory
of evidence, Lecture Notes in Economics and Mathematical Systems, Springer-Verlag,
1995.
756. , A mathematical theory of hints. an approach to Dempster-Shafer theory of
evidence, Lecture Notes in Economics and Mathematical Systems, vol. 425, Springer-
Verlag, 1995.
References 587

757. Jurg Kohlas, Paul-André Monney, R. Haenni, and N. Lehmann, Model-based diag-
nostics using hints, Symbolic and Quantitative Approaches to Uncertainty, European
Conference ECSQARU95, Fribourg (Ch. Fridevaux and J. Kohlas, eds.), Springer,
1995, pp. 259–266.
758. P. Kohli and Ph. Torr, Efficiently solving dynamic Markov random fields using graph
cuts, Proceedings of ICCV’05, vol. 2, 2005, pp. 922–929.
759. Don Koks and Subhash Challa, An introduction to Bayesian and Dempster-Shafer data
fusion, Tech. report, Defence Science and Tech Org, 2003.
760. D. Koller and H.-H. Nagel, Model-based object tracking in monocular image se-
quences of road traffic scenes, Int. J. Computer Vision 10 (1993), 257–281.
761. H. Kollnig and H.-H. Nagel, 3D pose estimation by fitting image gradients directly to
polyhedral models, ICCV’95, Boston, MA, May 1995, pp. 569–574.
762. Augustine Kong, Multivariate belief functions and graphical models, PhD disserta-
tion, Harvard University, Department of Statistics, 1986.
763. B. 0. Koopman, The bases of probability, Bull. Amer. Math. Soc. 46.
764. , The axioms and algebra of intuitive probability, Ann. Math. 41 (1940), 269–
292.
765. P. Korpisaari and J. Saarinen, Dempster-Shafer belief propagation in attribute fu-
sion, Proceedings of the Second International Conference on Information Fusion (FU-
SION’99), vol. 2, Sunnyvale, CA, USA, 6-8 July 1999, pp. 1285–1291.
766. G.A. Koshevoy, Distributive lattices and products of capacities, Journal of Mathemat-
ical Analysis Applications 219 (1998), 427–441.
767. Volker Kraetschmer, Constraints on belief functions imposed by fuzzy random vari-
ables: Some technical remarks on romer-kandel, IEEE Transactions on Systems, Man,
and Cybernetics, Part B: Cybernetics 28 (1998), 881–883.
768. I. Kramosil, Probabilistic analysis of belief functions.
769. Ivan Kramosil, Expert systems with non-numerical belief functions, Problems of Con-
trol and Information Theory 17 (1988), 285–295.
770. , Possibilistic belief functions generated by direct products of single possibilis-
tic measures, Neural Network World 9:6 (1994), 517–525.
771. , Approximations of believeability functions under incomplete identification of
sets of compatible states, Kybernetika 31 (1995), 425–450.
772. , Dempster-Shafer theory with indiscernible states and observations, Interna-
tional Journal of General Systems 25 (1996), 147–152.
773. , Expert systems with non-numerical belief functions, Problems of control and
information theory 16 (1996), 39–53.
774. , Belief functions generated by signed measures, Fuzzy Sets and Systems 92
(1997), 157–166.
775. , Probabilistic analysis of Dempster-Shafer theory. part one, Tech. report,
Academy of Science of the Czech Republic, Technical Report 716, 1997.
776. , Probabilistic analysis of Dempster-Shafer theory. part three., Tech. report,
Academy of Science of the Czech Republic, Technical Report 749, 1998.
777. , Probabilistic analysis of Dempster-Shafer theory. part two., Tech. report,
Academy of Science of the Czech Republic, Technical Report 749, 1998.
778. , Fuzzy measures and integrals measure-theoretic approach to the inversion
problem for belief functions, Fuzzy Sets and Systems 102 (1999), no. 3, 363 – 369.
779. , Measure-theoretic approach to the inversion problem for belief functions,
Fuzzy Sets and Systems 102 (1999), 363–369.
588 References

780. IVAN KRAMOSIL, Dempster combination rule with boolean-like processed belief
functions, International Journal of Uncertainty, Fuzziness and Knowledge-Based Sys-
tems 09 (2001), no. 01, 105–121.
781. Ivan Kramosil, Belief functions generated by fuzzy and randomized compatibility re-
lations, Fuzzy Sets and Systems 135 (2003), no. 3, 341 – 366.
782. , Nonspecificity degrees of basic probability assignments in Dempster-Shafer
theory, Computers and Artificial Intelligence 18:6 (April-June 1993), 559–574.
783. , Belief functions with nonstandard values, Proceedings of Qualitative and
Quantitative Practical Reasoning (Dav Gabbay, Rudolf Kruse, Andreas Nonnengart,
and H. J. Ohlbach, eds.), Bonn, June 1997, pp. 380–391.
784. , Dempster combination rule for signed belief functions, International Journal
of Uncertainty, Fuzziness and Knowledge-Based Systems 6:1 (February 1998), 79–
102.
785. , Jordan decomposition of signed belief functions, Proceedings of the inter-
national conference on Information Processing and Management of Uncertainty in
Knowledge-Based Systems (IPMU’96), Granada, Universidad de Granada, July 1996,
pp. 431–434.
786. , Monte-carlo estimations for belief functions, Proceedings of the Fourth In-
ternational Conference on Fuzzy Sets Theory and Its Applications (A. Heckerman,
D.; Mamdani, ed.), vol. 16, Liptovsky Jan, Slovakia, 2-6 Feb. 1998, pp. 339–357.
787. , Definability of belief functions over countable sets by real-valued ran-
dom variables, IPMU. Information Processing and Management of Uncertainty in
Knowledge-Based Systems (Svoboda V., ed.), vol. 3, Paris, July 1994, pp. 49–50.
788. , Toward a boolean-valued Dempster-Shafer theory, LOGICA ’92 (Svoboda
V., ed.), Prague, 1993, pp. 110–131.
789. , A probabilistic analysis of Dempster combination rule, The Logica. Year-
book 1997 (Childers Timothy, ed.), Prague, 1997, pp. 174–187.
790. , Measure-theoretic approach to the inversion problem for belief functions,
Proceedings of IFSA’97, Seventh International Fuzzy Systems Association World
Congress, vol. 1, Prague, Academia, June 1997, pp. 454–459.
791. , Strong law of large numbers for set-valued random variables, Proceedings
of the 3rd Workshop on Uncertainty Processing in Expert Systems, Prague, University
of Economics, September 1994, pp. 122–142.
792. David H. Krantz and John Miyamoto, Priors and likelihood ratios as evidence, Journal
of the American Statistical Association 78 (June 1983), 418–423.
793. P. Krause and D. Clark, Representing uncertain knowledge, Kluwer, Dordrecht, 1993.
794. R. Krause and E. Schwecke, Specialization: a new concept for uncertainty handling
with belief functions, International Journal of General Systems 18 (1990), 49–60.
795. Claude; Kreinovich, Vladik; Langrand and Hung T. Nguyen, Combining fuzzy and
probabilistic knowledge using belief functions, Tech. report, University of Texas at El
Paso, Departmental Technical Reports (CS), Paper 414, 2001.
796. R. Kruse, D. Nauck, and F. Klawonn, Reasoning with mass, Uncertainty in Artificial
Intelligence (P. Smets B. D. DÁmbrosio and P. P. Bonissone, eds.), Morgan Kaufmann,
San Mateo, CA, 1991, pp. 182–187.
797. R. Kruse, E. Schwecke, and F. Klawonn, On a tool for reasoning with mass distribu-
tion, Proceedings of the 12th International Joint Conference on Artificial Intelligence
(IJCAI91), vol. 2, 1991, pp. 1190–1195.
798. Rudolf Kruse and Erhard Schwecke, Specialization - a new concept for uncertainty
handling with belief functions, International Journal of General Systems 18 (1990),
no. 1, 49–60.
References 589

799. J.J. Kuch and T. Huang, Vision based hand modeling and tracking for virtual telecon-
ferencing and telecollaboration, Proc. of the Fifth ICCV, pp. 666–672.
800. J. K
”uhr and D. Mundici, De Finetti theorem and Borel states in [0, 1]-valued algebraic
logic, International journal of approximate reasoning 46 (2007), no. 3, 605–616.
801. H. Kyburg, Bayesian and non-Bayesian evidential updating, Artificial Intelligence
31:3 (1987), 271–294.
802. H. E. Kyburg, Bayesian and non-Bayesian evidential updating, Artificial Intelligence
31 (1987), 271–293.
803. Henry E. Kyburg and Jr., Interval-valued probabilities, 1998.
804. E. Ursella ad P. Perona L. Goncalves, E. Di Bernardo, Monocular tracking of the
human arm in 3D, Proceedings of the International Conference on Computer Vision
ICCV’95, Cambridge, MA, 1995, pp. 764–770.
805. M. Lamata and S. Moral, Calculus with linguistic probabilites and belief, Advances
in the Dempster-Shafer Theory of Evidence, Wiley, New York, 1994, pp. 133–152.
806. M.T. Lamata and S. Moral, Classification of fuzzy measures, Fuzzy Sets and Systems
33 (1989), no. 2, 243 – 253.
807. S. Mac Lane, A lattice formulation for transcendence degrees and p-bases, Duke
Math. J. 4 (1938), 455–468.
808. K. Laskey and P.E. Lehner, Belief manteinance: an integrated approach to uncertainty
management, Proceeding of the Seventh National Conference on Artificial Intelligence
(AAAI-88), vol. 1, 1988, pp. 210–214.
809. K. B. Laskey, Beliefs in belief functions: an examination of Shafer’s canonical exam-
ples, AAAI Third Workshop on Uncertainty in Artificial Intelligence, Seattle, 1987,
pp. 39–46.
810. Kathryn Blackmond Laskey, Belief in belief functions: An examination of shafer’s
canonical examples, CoRR abs/1304.2715 (2013).
811. Kathryn Blackmond Laskey and Paul E. Lehner, Assumptions, beliefs and probabili-
ties, Artificial Intelligence 41 (1989), 65–77.
812. , Assumptions, beliefs and probabilities, Artificial Intelligence 41 (1989),
no. 1, 65 – 77.
813. Chia-Hoang Lee, A comparison of two evidential reasoning schemes, Artificial Intel-
ligence 35 (1988), 127–134.
814. E. S. Lee and Q. Zhu, Fuzzy and evidential reasoning, Physica-Verlag, Heidelberg,
1995.
815. E.S. Lee and Qing Zhu, An interval Dempster-Shafer approach, Computers Mathe-
matics with Applications 24 (1992), no. 7, 89 – 95.
816. H.J. Lee and Z. Chen, Determination of 3D human body posture from a single view,
Computer Vision, Graphics, and Image Processing 30 (1985), 148–168.
817. , Knowledge-guided visual perception of 3-d human gait from a single image
sequence, IEEE Transactions on Systems, Man, and Cybernetics 22 (March 1992).
818. Seung-Jae Lee, Sang-Hee Kang, Myeon-Song Choi, Sang-Tae Kim, and Choong-Koo
Chang, Protection level evaluation of distribution systems based on Dempster-Shafer
theory of evidence, Proceedings of the IEEE Power Engineering Society Winter Meet-
ing, vol. 3, Singapore, 23-27 January 2000, pp. 1894–1899.
819. E. Lefevre, O. Colot, and P. Vannoorenberghe, Belief function combination and con-
flict management, Information Fusion 3 (2002), no. 2, 149 – 162.
820. , Belief functions combination and conflict management, Information Fusion
Journal 3 (2002), no. 2, 149–162.
590 References

821. E. Lefevre, O. Colot, P. Vannoorenberghe, and D. de Brucq, A generic framework


for resolving the conflict in the combination of belief structures, Information Fusion,
2000. FUSION 2000. Proceedings of the Third International Conference on, vol. 1,
July 2000, pp. MOD4/11–MOD4/18 vol.1.
822. Eric Lefevre, Olivier Colot, and Patrick Vannoorenberghe, Reply to the comments of
r. haenni on the paper ”belief functions combination and conflict management, Infor-
mation Fusion 4 (2003), 63–65.
823. Éric Lefèvre, Zied Elouedi, and David Mercier, Towards an alarm for opposition con-
flict in a conjunctive combination of belief functions, pp. 314–325, Springer Berlin
Heidelberg, Berlin, Heidelberg, 2011.
824. Eric Lefvre and Zied Elouedi, How to preserve the conflict as an alarm in the combi-
nation of belief functions?, Decision Support Systems 56 (2013), 326 – 333.
825. Norbert Lehmann, Argumentation systems and belief functions, PhD dissertation, Uni-
versit de Fribourg, 2001.
826. Norbert Lehmann and Rolf Haenni, An alternative to outward propagation for
Dempster-Shafer belief functions, Proceedings of The Fifth European Conference on
Symbolic and Quantitative Approaches to Reasoning with Uncertainty - Ecsqaru (
Lecture Notes in Computer Science Series), London, 5-9 July 1999.
827. E. Lehrer, Updating non-additive probabilities - a geometric approach, Games and
Economic Behavior 50 (2005), 42–57.
828. J. F. Lemmer and Jr. H. E. Kyburg, Conditions for the existence of belief functions
corresponding to intervals of belief, Proceedings of the Ninth National Conference on
Artificial Intelligence, (AAAI-91), Anaheim, CA, USA, 14-19 July 1991, pp. 488–
493.
829. John F. Lemmer, Confidence factors, empiricism and the dempster-shafer theory of
evidence, CoRR abs/1304.3437 (2013).
830. J. F. Lemmers, Confidence factors, empiricism, and the Dempster-Shafer theory of
evidence, Uncertainty in Artificial Intelligence (L. N. Kanal and J. F. Lemmers, eds.),
North Holland, Amsterdam, 1986, pp. 167–196.
831. F. Lerasle, G. Rives, and M. Dhome, Human body limbs tracking by multi-ocular
vision, Scandinavian Conference on Image Analysis, Lappeenranta, Finland, 1997.
832. , Tracking of human limbs by multiocular vision, CVIU 75 (September 1999),
229–246.
833. S. A. Lesh, An evidential theory approach to judgement-based decision making, PhD
dissertation, Department of Forestry and Environmental Studies, Duke University, De-
cember 1986.
834. H. Leung, Y. Li, E. Bosse, M. Blanchette, and K. C. C. Chan, Improved multiple
target tracking using Dempster-Shafer identification, Proceedings of the SPIE - Signal
Processing, Sensor Fusion, and Target Recognition VI, vol. 3068, Orlando, FL, USA,
21-24 April 1997, pp. 218–27.
835. Henry Leung and Jiangfeng Wu, Bayesian and Dempster-Shafer target identification
for radar surveillance, IEEE Transactions on Aerospace and Electronic Systems 36:2
(April 2000), 432–447.
836. M.E. Leventon and W.T. Freeman, Bayesian estimation of 3D human motion from an
image sequence, Tech. report, TR-98-06, Mitsubishi Electric Research Lab, 1998.
837. I. Levi, The enterprise of knowledge, MIT Press, 1980.
838. , The enterprise of knowledge, MIT Press, 1980.
839. , The enterprise of knowledge: An essay on knowledge, credal probability, and
chance, The MIT Press, Cambridge, Mass., 1980.
References 591

840. , Consonance, dissonance and evidentiary mechanism, Festschrift for Soren


Hallden, Theoria, 1983, pp. 27–42.
841. D.K. Lewis, Probabilities of conditionals and conditional probabilities, Philosophical
Review 85 (1976), 297–315.
842. Bicheng Li, Bo Wang, Jun Wei, Yuqi Huang, and Zhigang Guo, Efficient combination
rule of evidence theory, 2001, pp. 237–240.
843. Jinping Li, Qingbo Yang, and Bo Yang, Dempster-shafer theory is a special case of
vague sets theory, Information Acquisition, 2004. Proceedings. International Confer-
ence on, June 2004, pp. 50–53.
844. S.Z. Li, Q.D. Fu, L. Gu, B. Scholkopf, Y. Cheng, and H.J. Zhang, Kernel machine
based learning for multi-view face detection and pose estimation, 2001, pp. II: 674–
679.
845. Xinde Li, Xianzhong Dai, Jean Dezert, and Florentin Smarandache, Fusion of impre-
cise qualitative information, Applied Intelligence 33 (2010), no. 3, 340–351.
846. Z. Li and L. Uhr, Evidential reasoning in a computer vision system, Uncertainty in
Artificial Intelligence 2 (Lemmer and Kanal, eds.), North Holland, Amsterdam, 1988,
pp. 403–412.
847. HUANG Yong-qing LIANG Chang-yong, CHEN Zeng-ming and TONG Jian-jun, A
method of dispelling the absurdities of Dempster-Shafers rule of combination, Systems
Engineering-theory and Practice 3 (2005).
848. Chen Liang-zhou, Shi Wen-kang, Deng Yong, and Zhu Zhen-fu, A new fusion ap-
proach based on distance of evidences, Journal of Zhejiang University SCIENCE A 6
(2005), no. 5, 476–482.
849. C.-C. Lien and C.-L. Huang, Model-based articulated hand motion tracking for ges-
ture recognition, Image and Vision Computing 16 (February 1998), 121–134.
850. Ee-Peng Lim, Jaideep Srivastava, and Shashi Shekar, Resolving attribute incompati-
bility in database integration: an evidential reasoning approach, Proceedings of IEEE,
1994, pp. 154–163.
851. T. Y. Lin, Granular computing on binary relations II: Rough set representations and
belief functions, Rough Sets In Knowledge Discovery, PhysicaVerlag, 1998, pp. 121–
140.
852. T. Y. Lin, Rough sets and current trends in computing: First international conference,
rsctc’98 warsaw, poland, june 22–26, 1998 proceedings, ch. Fuzzy Partitions II: Be-
lief Functions A Probabilistic View, pp. 381–386, Springer Berlin Heidelberg, Berlin,
Heidelberg, 1998.
853. T. Y. Lin, Measure theory on granular fuzzy sets, Fuzzy Information Processing Soci-
ety, 1999. NAFIPS. 18th International Conference of the North American, Jul 1999,
pp. 809–813.
854. Tsau Young Lin and Lin Churn jung Liau, Belief functions based on probabilistic
multivalued random variables, 1997, pp. 269–272.
855. T.Y. Lin and Y.Y. Yao, Neighborhoods systems: measure, probability and belief func-
tions, Proceedings of The Fourth Workshop on Rough Sets, Fuzzy Sets and Machine
Discovery, 1996, pp. 202–207.
856. HAN Qing LIN Zuo-Quan, MU Ke-Dian+, An approach to combination of conflicting
evidences by disturbance of ignorance, Journal of Software 8 (2004), 005.
857. Dennis V. Lindley, The probability approach to the treatment of uncertainty in artifi-
cial intelligence and expert systems, Statist. Sci. 2 (1987), 17–24.
858. Pawan Lingras and S. K. Michael Wong, Two perspectives of the dempster-shafer
theory of belief functions., International Journal of Man-Machine Studies 33 (1990),
no. 4, 467–487.
592 References

859. S. Linnainmaa, D. Harwood, and L.S. Davis, Pose determination of a three-


dimensional object using triangle pairs, IEEE Trans. PAMI (September 1988), 634–
647.
860. Baoding Liu, Uncertainty theory, Springer-Verlag, 2004.
861. Guilong Liu, Rough set theory based on two universal sets and its applications,
Knowledge-Based Systems 23 (2010), no. 2, 110 – 115.
862. J. S. Liu and Y. Wu, Parameter expansion for data augmentation, Journal of the Amer-
ican Statistical Association, vol. 94, 1999, pp. 1264–1274.
863. Jiming Liu and Michel C. Desmarais, Method of learning implication networks from
empirical data: algorithm and monte-carlo simulation-based validation, IEEE Trans-
actions on Knowledge and Data Engineering 9 (1997), 990–1004.
864. L. Liu, Model combination using gaussian belief functions, Tech. report, School of
Business, University of Kansas, Lawrence, KS, 1995.
865. Lei Jian Liu, Jing Yu Yang, and Jian Feng Lu, Data fusion for detection of early stage
lung cancer cells using evidential reasoning, Proceedings of the SPIE - Sensor Fusion
VI, vol. 2059, Boston, MA, USA, 7-8 September 1993, pp. 202–212.
866. Liping Liu, Model combination using Gaussian belief functions, Tech. report, School
of Business, University of Kansas, Lawrence, KS, 1995.
867. , Propagation of gaussian belief functions, Learning Models from Data: AI
and Statistics (D. Fisher and H. J. Lenz, eds.), Springer, New York, 1996, pp. 79–88.
868. , A theory of gaussian belief functions, International Journal of Approximate
Reasoning 14 (1996), 95–126.
869. , Local computation of gaussian belief functions, International Journal of Ap-
proximate Reasoning 22 (1999), 217–248.
870. Liping Liu, C. Shenoy, and P. P. Shenoy, Knowledge representation and integration
for portfolio evaluation using linear belief functions, IEEE Transactions on Systems,
Man, and Cybernetics - Part A: Systems and Humans 36 (2006), no. 4, 774–785.
871. W. Liu, Analyzing the degree of conflict among belief functions, Artif. Intell. 170
(2006), no. 11, 909–924.
872. , Measuring conflict between possibilistic uncertain information through belief
function theory, Knowledge Science, Engineering and Management, Lecture Notes in
Computer Science, vol. 4092, 2006, pp. 265–277.
873. W. Liu, D. McBryan, and A. Bundy, Method of assigning incidences, Applied Intelli-
gence 9 (1998), 139–161.
874. Weiru Liu, Analyzing the degree of conflict among belief functions, Artificial Intelli-
gence 170 (2006), no. 11, 909 – 924.
875. Weiru Liu, Propositional, probabilistic and evidential reasoning: Integrating numer-
ical and symbolic approaches, 1st ed., Physica-Verlag GmbH, Heidelberg, Germany,
Germany, 2010.
876. Weiru Liu and Alan Bundy, The combination of different pieces of evidence using
incidence calculus, Dept. of Artificial Intelligence, Univ. of Edinburgh (1992).
877. Weiru Liu and Alan Bundy, A comprehensive comparison between generalized in-
cidence calculus and the dempster-shafer theory of evidence, Int. J. Hum.-Comput.
Stud. 40 (1994), no. 6, 1009–1032.
878. Weiru Liu and Jun Hong, Reinvestigating dempster’s idea on evidence combination,
Knowledge and Information Systems 2 (2000), no. 2, 223–241.
879. Weiru Liu, Jun Hong, M. F. McTear, and J. G. Hughes, An extended framework for
evidential reasoning system, International Journal of Pattern Recognition and Artificial
Intelligence 7:3 (June 1993), 441–457.
References 593

880. WEIRU LIU, JUN HONG, M.F. McTEAR, and J.G. HUGHES, An extended frame-
work for evidential reasoning systems, International Journal of Pattern Recognition
and Artificial Intelligence 07 (1993), no. 03, 441–457.
881. Weiru Liu, Jun Hong, and Micheal F. McTear, An extended framework for evidential
reasoning systems, Proceedings of IEEE, 1990, pp. 731–737.
882. Tang Hai Ying Chen Jian Zhong Liu Da You, Ouyang Ji Hong and Yu Qiang Yuan,
Research on a simplified evidence theory model, Journal of Computer Research and
Development (1999).
883. L. Ljung and T. Sderstrm, Theory and practice of recursive identification, MIT Press,
1983.
884. K. C. Lo, Agreement and stochastic independence of belief functions, Mathematical
Social Sciences 51(1) (2006), 1–22.
885. G. Lohmann, An evidential reasoning approach to the classification of satellite im-
ages, Symbolic and Qualitative Approaches to Uncertainty (R. Kruse and P. Siegel,
eds.), Springer-Verlag, Berlin, 1991, pp. 227–231.
886. W. Long and Y.-H. Yang, Log-tracker: An attribute based approach to tracking human
body motion, Pattern Recognition and Artificial Intelligence 5 (1991), 439–458.
887. Pierre Loonis, El-Hadi Zahzah, and Jean-Pierre Bonnefoy, Multi-classifiers neural
network fusion versus Dempster-Shafer’s orthogonal rule, Proceedings of IEEE, 1995,
pp. 2162–2165.
888. D. Lowe, Integrated treatment of matching and measurement errors for robust model-
based motion tracking, ICCV’90, 1990, pp. 436–440.
889. , Fitting parameterised 3-d models to images, IEEE Trans. PAMI 13 (1991),
441–450.
890. , Robust model-based motion tracking through the integration of search and
estimation, International Journal on Computer Vision 8 (1992), 113–122.
891. John D. Lowrance, Evidential reasoning with gister: A manual, Tech. report, Artificial
Intelligence Center, SRI International, 333 Ravenswood Avenue, Menlo Park, CA.,
1987.
892. , Automated argument construction, Journal of Statistical Planning Inference
20 (1988), 369–387.
893. , Evidential reasoning with gister-cl: A manual, Tech. report, Artificial Intel-
ligence Center, SRI International, 333 Ravenswood Avenue, Menlo Park, CA., 1994.
894. John D. Lowrance and T. D. Garvey, Evidential reasoning: A developing concept, Pro-
ceedings of the Internation Conference on Cybernetics and Society (Institute of Elec-
trical and Electronical Engineers, eds.), 1982, pp. 6–9.
895. , Evidential reasoning: an implementation for multisensor integration, Tech.
report, SRI International, Menlo Park, CA, Technical Note 307, 1983.
896. John D. Lowrance, T. D. Garvey, and Thomas M. Strat, A framework for evidential-
reasoning systems, Proceedings of the National Conference on Artificial Intelligence
(American Association for Artificial Intelligence, ed.), 1986, pp. 896–903.
897. John D. Lowrance, T.D. Garvey, and Thomas M. Strat, A framework for evidential
reasoning systems, Readings in uncertain reasoning (Shafer and Pearl, eds.), Morgan
Kaufman, 1990, pp. 611–618.
898. C.-P. Lu, G.D. Hager, and E. Mjolsness, Fast and globally convergent pose estimation
from video images, IEEE Trans. PAMI 22 (2000), 610–622.
899. S. Y. Lu and H. E. Stephanou, A set-theoretic framework for the processing of uncer-
tain knowledge.
900. Y. Luo, F.J. Perales, and J.J. Villanueva, An automatic rotoscopy system for human
motion base on a biomechanical graphical model, Computers and Graphics 16 (1992).
594 References

901. T. Drummond M. Brown and R. Cipolla, 3D model acquisition by tracking 2D wire-


frames, BMVC2000, 2000.
902. J. Husler M. Falk and R-D. Reiss, Laws of small numbers: Extremes and rare events,
(2004).
903. J. Lasenby M. Ringer, Modelling and tracking of articulated motion from multiple
camera views, BMVC’2000, 2000, pp. 172–181.
904. Jianbing Ma, Weiru Liu, Didier Dobuis, and Henri Prade, Bridging jeffrey’s rule, agm
revision and dempster conditioning in the theory of evidence, International Journal on
Artificial Intelligence Tools 20 (2011), no. 4, 691–720.
905. S. Maass, A philosophical foundation of non-additive measure and probability, Theory
and decision 60 (2006), 175–191.
906. A. Madabhushi and J. K. Aggarwal, A Bayesian approach to human activity recog-
nition, Proc. of the 2nd International Workshop on Visual Surveillance, June 1999,
pp. 25–30.
907. R. Mahler, Using a priori evidence to customize dempster-shafer theory, Proceedings
of 6th Nat. Symp. on Sensor Fusion, vol. 1, 1993, pp. 331–345.
908. , Can the bayesian and dempster-shafer approaches be reconciled? yes, 2005
7th International Conference on Information Fusion, vol. 2, July 2005, pp. 8 pp.–.
909. Ronald P.S Mahler, Combining ambiguous evidence with respect to ambiguous a pri-
ori knowledge. part ii: Fuzzy logic, Fuzzy Sets and Systems 75 (1995), 319–354.
910. David A. Maluf, Monotonicity of entropy computations in belief functions, Intelligent
Data Analysis 1 (1997), 207–213.
911. G. Markakis, A boolean generalization of the Dempster-Shafer construction of be-
lief and plausibility functions, Proceedings of the Fourth International Conference
on Fuzzy Sets Theory and Its Applications, Liptovsky Jan, Slovakia, 2-6 Feb. 1998,
pp. 117–125.
912. I. Marsic, Evidential reasoning in visual recognition, Proceedings Intelligent Engi-
neering Systems Through Artificial Neural Networks (C.H. Dagli, B.R. Fernandez,
J. Ghosh, and R.T.S. Kumara, eds.), vol. 4, St. Louis, MO, USA, 13-16 November
1994, pp. 511–516.
913. A. Martin, Reliability and combination rule in the theory of belief functions, Informa-
tion Fusion, 2009. FUSION ’09. 12th International Conference on, July 2009, pp. 529–
536.
914. A. Martin, A. L. Jousselme, and C. Osswald, Conflict measure for the discounting
operation on belief functions, Information Fusion, 2008 11th International Conference
on, June 2008, pp. 1–8.
915. Arnaud Martin, About conflict in the theory of belief functions, pp. 161–168, Springer
Berlin Heidelberg, Berlin, Heidelberg, 2012.
916. Arnaud Martin, Marie-Hlne Masson, Frdric Pichon, Didier Dubois, and Thierry De-
nux, Relevance and truthfulness in information correction and fusion, International
Journal of Approximate Reasoning 53 (2012), no. 2, 159 – 175.
917. Arnaud Martin and Christophe Osswald, A new generalization of the proportional
conflict redistribution rule stable in terms of decision, CoRR abs/0806.1797 (2008).
918. , Toward a combination rule to deal with partial conflict and specificity in
belief functions theory, CoRR abs/0806.1640 (2008).
919. R. Martin and C. Liu, Inferential models: A framework for prior-free posterior proba-
bilistic inference, ArXiv e-prints (2012).
920. F. Martinerie and P. Foster, Data association and tracking from distributed sensors us-
ing hidden Markov models and evidential reasoning, Proceedings of 31st Conference
on Decision and Control, Tucson, December 1992, pp. 3803–3804.
References 595

921. M.-H. Masson and T. Denoeux, Belief functions and cluster ensembles, ECSQARU,
July 2009, pp. 323–334.
922. B. Mates, Elementary logic, Oxford University Press, 1972.
923. G. Matheron, Random sets and integral geometry, Wiley Series in Probability and
Mathematical Statistics.
924. , Random sets and integral geometry, Wiley, 1970.
925. , Random sets and integral geometry, Wiley, NY, 1975.
926. S. Mathevet, L. Trassoudaine, P. Checchin, and J. Auzon, Combinaison de segmenta-
tions en rgions, Traitement du signal (1999).
927. Thomas Maurer and Christoph von der Malsburg, Tracking and learning graphs and
pose on image sequences of faces, FG ’96: Proceedings of the 2nd International
Conference on Automatic Face and Gesture Recognition (FG ’96) (Washington, DC,
USA), IEEE Computer Society, 1996, p. 76.
928. Sally McClean and Bryan Scotney, Using evidence theory for the integration of dis-
tributed databases, International Journal of Intelligent Systems 12 (1997), 763–776.
929. Sally McClean, Bryan Scotney, and Mary Shapcott, Using background knowledge in
the aggregation of imprecise evidence in databases, Data and Knowledge Engineering
32 (2000), 131–143.
930. G. McLachlan and D. Peel, Finite mixture models, Wiley-Interscience, 2000.
931. G. V. Meghabghab and D. B. Meghabghab, Multiversion information retrieval: per-
formance evaluation of neural networks vs. Dempster-Shafer model, Proceedings of
the Third Golden West International Conference on Intelligent Systems (E.A. Yfantis,
ed.), Las Vegas, NV, USA, 6-8 June 1994, pp. 537–545.
932. T. Melkonyan and R. Chambers, Degree of imprecision: Geometric and algebraic ap-
proaches, International Journal of Approximate Reasoning (2006).
933. K. Mellouli, On the propagation of beliefs in networks using the Dempster-Shafer
theory of evidence, PhD dissertation, University of Kansas, School of Business, 1986.
934. Khaled Mellouli and Zied Elouedi, Pooling experts opinion using Dempster-Shafer
theory of evidence, Proceedings of IEEE, 1997, pp. 1900–1905.
935. D. Mercier, T. Denoeux, and M. h. Masson, General correction mechanisms for weak-
ening or reinforcing belief functions, 2006 9th International Conference on Informa-
tion Fusion, July 2006, pp. 1–7.
936. D. Mercier, T. Denoeux, and M. Masson, Refined sensor tuning in the belief function
framework using contextual discounting, IPMU, 2006.
937. David Mercier, Thierry Denœux, and Marie-Hélène Masson, Belief function correc-
tion mechanisms, pp. 203–222, Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.
938. David Mercier, Benjamin Quost, and Thierry Denœux, Contextual discounting of be-
lief functions, pp. 552–562, Springer Berlin Heidelberg, Berlin, Heidelberg, 2005.
939. David Mercier, Benjamin Quost, and Thierry Denux, Refined modeling of sensor re-
liability in the belief function framework using contextual discounting, Information
Fusion 9 (2008), no. 2, 246 – 258.
940. D. Metaxas and D. Terzopoulos, Shape and nonrigid motion estimation through
physics-based synthesis, IEEE Trans. Pattern Analysis and Machine Intelligence 15
(1993), 580–591.
941. D. Meyer, J. Denzler, and H. Niemann, Model based extraction of articulated objects
in image sequences, Fourth International Conference on Image Processing, 1997.
942. I. Mikic, Human body model acquisition and tracking using multi-camera voxel data,
PhD dissertation, University of California at San Diego, January 2002.
596 References

943. I. Mikic, M. Trivedi, E. Hunter, and P. Cosman, Articulated body posture estimation
from multi-camera voxel data, Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition CVPR’01, Hawaii, December 2001.
944. E. Miranda, I. Couso, and P. Gil, Extreme points of credal sets generated by 2-
alternating capacities, International Journal of Approximate Reasoning 33 (2003),
95–115.
945. E. Miranda and G. de Cooman, Marginal extension in the theory of coherent lower
previsions, Int. J. of Approximate Reasoning 46 (2007), no. 1, 188–225.
946. Enrique Miranda, A survey of the theory of coherent lower previsions, International
Journal of Approximate Reasoning 48 (2008), no. 2, 628 – 658, In Memory of Philippe
Smets (19382005).
947. Enrique Miranda, Hung T. Nguyen, and Jrg Kohlas, Special section: Random sets and
imprecise probabilities (issues in imprecise probability) uncertain information: Ran-
dom variables in graded semilattices, International Journal of Approximate Reasoning
46 (2007), no. 1, 17 – 34.
948. P. Miranda, M. Grabisch, and P. Gil, On some results of the set of dominating k-
additive belief functions, IPMU, 2004, pp. 625–632.
949. P. Miranda, M. Grabisch, and P. Gil, Dominance of capacities by k-additive belief
functions, European Journal of Operational Research 175 (2006), 912–930.
950. T. Moeslund, Summaries of 107 computer vision-based human motion capture papers,
Tech. report, Laboratory of Image Analysis, Aalborg University, Denmark, 1999.
951. T. Moeslund and E. Granum, A survey of computer vision-based human motion cap-
ture, Image and Vision Computing 81 (2001), 231–268.
952. T.B. Moeslund and E. Granum, 3D human pose estimation using 2D-data and an
alternative phase space representation, Workshop on Human Modeling, Analysis and
Synthesis at CVPR2000, Hilton Head Island, June 2000.
953. , Multiple cues in model-based human motion capture, Fourth International
Conference on Automatic Face and Gesture Recognition, Grenoble, France, March
2000.
954. S. M. Mohiddin and T. S. Dillon, Evidential reasoning using neural networks, Pro-
ceedings of IEEE, 1994, pp. 1600–1606.
955. I. Molchanov, Theory of random sets, Springer-Verlag, 2005.
956. Paul-Andr Monney, Analyzing linear regression models with hints and the dempster-
shafer theory, International Journal of Intelligent Systems 18 (2003), no. 1, 5–29.
957. Paul-Andr Monney, Moses W. Chan, Enrique H. Ruspini, and Marco E.G.V. Cattaneo,
Belief functions combination without the assumption of independence of the informa-
tion sources, International Journal of Approximate Reasoning 52 (2011), no. 3, 299 –
315.
958. Paul-Andr Monney, Moses W. Chan, Enrique H. Ruspini, and Johan Schubert, De-
pendence issues in knowledge-based systems conflict management in dempstershafer
theory using the degree of falsity, International Journal of Approximate Reasoning 52
(2011), no. 3, 449 – 460.
959. Paul-Andre Monney, A mathematical theory of arguments for statistical evidence,
Physica, 19 November 2002.
960. Paul-André Monney, Planar geometric reasoning with the thoery of hints, Computa-
tional Geometry. Methods, Algorithms and Applications, Lecture Notes in Computer
Science, vol. 553 (H. Bieri and H. Noltemeier, eds.), 1991, pp. 141–159.
961. Paul-André Monney, Dempster specialization matrices and the combination of belief
functions, pp. 316–327, Springer Berlin Heidelberg, Berlin, Heidelberg, 2001.
References 597

962. Andrew Moore, Very fast em-based mixture model clustering using multiresolution kd-
trees, Advances in Neural Information Processing Systems (340 Pine Street, 6th Fl.,
San Francisco, CA 94104) (M. Kearns and D. Cohn, eds.), Morgan Kaufman, April
1999, pp. 543–549.
963. D. Moore, I. Essa, and M. Hayes III, Exploiting human actions and object context for
recognition tasks, Proc. of the International Conference on Computer Vision, vol. 1,
1999, pp. 80–86.
964. S. Moral and L. M. de Campos, Partially specified belief functions, Proceedings of the
Ninth Conference on Uncertainty in Artificial Intelligence (A. Heckerman, D.; Mam-
dani, ed.), Washington, DC, USA, 9-11 July 1993, pp. 492–499.
965. S. Moral and N. Wilson, Importance sampling monte-carlo algorithms for the calcu-
lation of dempster-shafer belief, Proc. of IPMU’96, 1996.
966. Serafin Moral and Antonio Salmeron, A Monte-Carlo algorithm for combining
Dempster-Shafer belief based on approximate pre-computation, Proceedings of The
Fifth European Conference on Symbolic and Quantitative Approaches to Reasoning
with Uncertainty - Ecsqaru ( Lecture Notes in Computer Science Series), London, 5-9
July 1999.
967. D. Morris and J.M. Rehg, Singularity analysis for articulated object tracking, Pro-
ceedings of CVPR’98, 1998, pp. 289–296.
968. E. Moutogianni and M. Lalmas, A Dempster-Shafer indexing for structured document
retrieval: implementation and experiments on a Web museum collection, IEE Two-day
Seminar. Searching for Information: Artificial Intelligence and Information Retrieval
Approaches, Glasgow, UK, 11-12 Nov. 1999, pp. 20–21.
969. O. Munkelt, C. Ridder, D. Hansel, and W. Hafner, A model driven 3D image interpre-
tation system applied to person detection in video images, International Conference
on Pattern Recognition, 1998.
970. T. Murai, M. Miyakoshi, and M. Shimbo, Soundness and completeness theorems be-
tween the dempster-shafer theory and logic of belief, Fuzzy Systems, 1994. IEEE
World Congress on Computational Intelligence., Proceedings of the Third IEEE Con-
ference on, Jun 1994, pp. 855–858 vol.2.
971. Tetsuya Murai, Yasuo Kudo, and Yoshiharu Sato, Discovery science: 6th international
conference, ds 2003, sapporo, japan, october 17-19, 2003. proceedings, ch. Associ-
ation Rules and Dempster-Shafer Theory of Evidence, pp. 377–384, Springer Berlin
Heidelberg, Berlin, Heidelberg, 2003.
972. Toshiaki Murofushi and Michio Sugeno, Some quantities represented by the choquet
integral, Fuzzy Sets and Systems 56 (1993), no. 2, 229 – 235.
973. Catherine K. Murphy, Combining belief functions when evidence conflicts, Decision
Support Systems 29 (2000), 1–9.
974. , Combining belief functions when evidence conflicts, Decision Support Sys-
tems 29 (2000), no. 1, 1 – 9.
975. R. R. Murphy, Adaptive rule of combination for observations over time, Multisensor
Fusion and Integration for Intelligent Systems, 1996. IEEE/SICE/RSJ International
Conference on, Dec 1996, pp. 125–131.
976. Robin R. Murphy, Dempster-Shafer theory for sensor fusion in autonomous mobile
robots, IEEE Transactions on Robotics and Automation 14 (1998), 197–206.
977. M. Turk N. Jojic and T. Huang, Tracking self-occluding articulated objects in dense
disparity maps, Proceedings of the IEEE International Conference on Computer Vi-
sion ICCV’99, Corfu, Greece, September 1999.
598 References

978. A. Ashbrook N. Werghi, R. Fisher and C. Robertson, Object reconstruction by incor-


porating geometric constraints in reverse engineering, Computer-Aided Design 31
(1999), 363–399.
979. Louis Narens, Theories in probability: An examination of logical and qualitative foun-
dations, World Scientific Publishing Co., Inc., River Edge, NJ, USA, 2007.
980. R. E. Neapolitan, The interpretation and application of belief functions, Applied Arti-
ficial Intelligence 7:2 (April-June 1993), 195–204.
981. Richard E. Neapolitan, The interpretation and application of belief functions, Applied
Artificial Intelligence 7 (1993), no. 2, 195–204.
982. H. Nguyen, On random sets and belief functions, Classic Works of the Dempster-
Shafer Theory of Belief Functions, 2008, pp. 105–116.
983. H. T. Nguyen and Philippe Smets, On dynamics of cautious belief and conditional
objects, International Journal of Approximate Reasoning 8 (1993), 89–104.
984. H.T. Nguyen, On random sets and belief functions, J. Mathematical Analysis and Ap-
plications 65 (1978), 531–542.
985. , An introduction to random sets, Taylor and Francis, 2006.
986. H.T. Nguyen and T. Wang, Belief functions and random sets, Applications and Theory
of Random Sets, The IMA Volumes in Mathematics and its Applications, Vol. 97,
Springer, 1997, pp. 243–255.
987. Ojelanki K. Ngwenyama and Noel Bryson, Generating belief functions from qualita-
tive preferences: An approach to eliciting expert judgments and deriving probability
functions, Data and Knowledge Engineering 28 (1998), 145–159.
988. S. Niyogi and W.T. Freeman, Example-based head tracking, Proceedings of the Sec-
ond International Conference on Automatic Face and Gesture Recognition, 1996,
pp. 374–378.
989. S.A. Niyogi and E.H. Adelson, Analyzing and recognizing walking figures in xyt,
CVPR’94, 1994.
990. J. Njastad, S. Grinaker, and G.A. Storhaug, Estimating parameters in a 2 12 d human
model, 11th Scandinavian Conference on Image Analysis, Greenland, 1999.
991. H. Ogawa, K.S. Fu, and J.T.P. Yao, An inexact inference for damage assessment of
existing structures, International Journal of Man-Machine Studies 22 (1985), no. 3,
295 – 306.
992. J. Ohya and F. Kishino, Human posture estimation from multiple images using genetic
algorithm, Proceedings of ICPR, 1994.
993. K. Okada and C. von der Malsburg, Pose-invariant face recognition with parametric
linear subspaces, 2002, pp. 64–69.
994. R. Okada, Y. Shirai, and J. Miura, Object tracking based on optical flow and depth,
Proc. of the Conference on Multisensor Fusion and Integration for Intelligent Systems,
1996, pp. 565–571.
995. E.J. Ong and S. Gong, Tracking hybrid 2D-3D human models from multiple views,
International Workshop on Modeling People at ICCV’99, Corfu, Greece, September
1999.
996. C. Ordonez and E. Omiecinski, Frem: Fast and robust em clustering for large data
sets, 2002.
997. J. O’Rourke and N. Badler, Model-based analysis of human motion using constraint
propagation, IEEE Trans. Pattern Analysis and Machine Intelligence 2 (1980), 522–
536.
998. Pekka Orponen, Dempster’s rule of combination is np-complete, Artificial Intelligence
44 (1990), 245–253.
References 599

999. , Dempster’s rule of combination is p-complete, Artificial Intelligence 44


(1990), no. 1, 245 – 253.
1000. Margarita Osadchy, Yann Le Cun, and Matthew L. Miller, Synergistic face detection
and pose estimation with energy-based models, J. Mach. Learn. Res. 8 (2007), 1197–
1215.
1001. C. Osswald and A. Martin, Understanding the large family of dempster-shafer theory’s
fusion operators - a decision-based measure, 2006 9th International Conference on
Information Fusion, July 2006, pp. 1–7.
1002. J. Oxley, Matroid theory, Oxford University Press, 1992.
1003. James G. Oxley, Matroid theory, Oxford University Press, Great Clarendon Street,
Oxford, UK, 1992.
1004. D.F. DeMenthon P. David and R. Duraiswami, Softposit: Simultaneous pose and cor-
respondence determination, ECCV’02 (A. Heyden et al., ed.), 2002, pp. 698–714.
1005. Amir Padovitz, Arkady Zaslavsky, and Seng W. Loke, A unifying model for repre-
senting and reasoning about context under uncertainty, In Proceedings of the 11th
International Conference on Information Processing and Management of Uncertainty
in Knowledge-Based Systems (IPMU, 2006.
1006. Daniel Pagac, Eduardo M. Nebot, and Hugh Durrant-Whyte, An evidential approach
to map-bulding for autonomous vehicles, IEEE Transactions on Robotics and Automa-
tion 14, No 4 (August 1998), 623–629.
1007. N. Pal, J. Bezdek, and R. Hemasinha, Uncertainty measures for evidential reasoning
i: a review, International Journal of Approximate Reasoning 7 (1992), 165–183.
1008. , Uncertainty measures for evidential reasoning i: a review, International Jour-
nal of Approximate Reasoning 8 (1993), 1–16.
1009. P. Palacharla and P. C. Nelson, Understanding relations between fuzzy logic and ev-
idential reasoning methods, Proceedings of Third IEEE International Conference on
Fuzzy Systems, vol. 1”, pages =.
1010. P. Palacharla and P.C. Nelson, Evidential reasoning in uncertainty for data fusion, Pro-
ceedings of the Fifth International Conference on Information Processing and Man-
agement of Uncertainty in Knowledge-Based Systems, vol. 1, 1994, pp. 715–720.
1011. J. B. Paris, A note on the dutch book method, In Proceedings of the Second Interna-
tional Symposium on Imprecise Probabilities and Their applications, 2001.
1012. Jeff B. Paris, David Picado-Muino, and Michael Rosefield, Information from incon-
sistent knowledge: A probability logic approach, Interval / Probabilistic Uncertainty
and Non-classical Logics, Advances in Soft Computing (V.-N. Huynh, Y. Nakamori,
H. Ono, J. Lawry, V. Kreinovich, and H.T. Nguyen, eds.), vol. 46, Springer-Verlag,
Berlin - Heidelberg, 2008.
1013. Simon Parsons and E. H. Mamdani, Qualitative dempster-shafer theory, 1993.
1014. Simon Parsons and London Wca Px, Some qualitative approaches to applying the
dempster-shafer theory, 1994.
1015. Simon Parsons and Alessandro Saffiotti, A case study in the qualitative verification
and debugging of numerical uncertainty, International Journal of Approximate Rea-
soning 14 (1996), 187–216.
1016. V. Pavlovic, J. Rehg, T.-J. Cham, and K. Murphy, A dynamic Bayesian network
approach to figure tracking using learned dynamical models, Proceedings of the
ICCV’99, 1999, pp. 94–101.
1017. Zdzisław Pawlak, Rough sets, International Journal of Computer & Information Sci-
ences 11 (1982), no. 5, 341–356.
1018. Zdzislaw Pawlak, Vagueness and uncertainty: A rough set perspective, Computational
Intelligence 11 (1995), no. 2, 227–232.
600 References

1019. ZDZISLAW PAWLAK, Rough set theory and its applications to data analysis, Cy-
bernetics and Systems 29 (1998), no. 7, 661–688.
1020. J. Pearl, Readings in uncertain reasoning, Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA, 1990, pp. 540–574.
1021. Judea Pearl, On evidential reasoning in a hierarchy of hypotheses, Artificial Intelli-
gence 28:1 (1986), 9–15.
1022. Judea Pearl, On evidential reasoning in a hierarchy of hypotheses, Artif. Intell. 28
(1986), no. 1, 9–15.
1023. , Probabilistic reasoning in intelligent systems: Networks of plausible infer-
ence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988.
1024. Judea Pearl, Reasoning with belief functions: a critical assessment, Tech. report,
UCLA, Technical Report R-136, 1989.
1025. , Reasoning with belief functions: an analysis of compatibility, International
Journal of Approximate Reasoning 4 (1990), 363–389.
1026. , Reasoning with belief functions: An analysis of compatibility, International
Journal of Approximate Reasoning 4 (1990), 363–389.
1027. , Rejoinder to comments on ‘reasoning with belief functions: an analysis of
compatibility’, International Journal of Approximate Reasoning 6 (1992), 425–443.
1028. M. Pechwitz, S.S. Maddouri, V. Maergner, N. Ellouze, and H. Amiri, Ifn/enit -
database of handwritten arabic words, Colloque International Francophone sur l’Ecrit
et le Doucement (2002), 129–136.
1029. A. Pentland, Automatic extraction of deformable models, Int. J. Computer Vision 4
(1990), 107–126.
1030. A. Pentland and B. Horowitz, Recovery of non-rigid motion and structure, IEEE Trans.
Pattern Analysis and Machine Intelligence 13 (1991), 730–742.
1031. F.J. Perales and J. Torres, A system for human motion matching between synthetic and
real images based on biomechanic graphical models, IEEE Workshop on Motion of
Non-rigid and Articulated Objects, Austin, Texas, 1994.
1032. Andrs Perea, A model of minimal probabilistic belief revision, Theory and Decision
67 (2009), no. 2, 163–222.
1033. Joseph S. J. Peri, Dempster-shafer theory, bayesian theory, and measure theory, 2005,
pp. 378–389.
1034. C. Perneel, H. Van De Velde, and M. Acheroy, A heuristic search algorithm based on
belief functions, Proceedings of Fourteenth International Avignon Conference, vol. 1,
Paris, France, 30 May-3 June 1994, pp. 99–107.
1035. Laurent Perrussel, Luis Enrique Sucar, and Michael Scott Balch, Selected papers un-
certain reasoning at flairs 2010 mathematical foundations for a theory of confidence
structures, International Journal of Approximate Reasoning 53 (2012), no. 7, 1003 –
1019.
1036. C. Peterson, Local dempster shafer theory, Tech. report, CSC-AMTAS-98001,. C.S.C
Internal Report, 1998.
1037. Simon Petit-Renaud and Thierry Denoeux, Handling different forms of uncertainty
in regression analysis: a fuzzy belief structure approach, Proceedings of The Fifth
European Conference on Symbolic and Quantitative Approaches to Reasoning with
Uncertainty - Ecsqaru ( Lecture Notes in Computer Science Series), London, 5-9 July
1999.
1038. F. Pichon and T. Denoeux, T-norm and uninorm-based combination of belief functions,
Fuzzy Information Processing Society, 2008. NAFIPS 2008. Annual Meeting of the
North American, May 2008, pp. 1–6.
References 601

1039. F. Pichon and T. Denux, Interpretation and computation of alpha-junctions for com-
bining belief functions, Proceedings of the 6th International Symposium on Imprecise
Probability: Theories and Applications (ISIPTA’09), 2009.
1040. W. Pieczynski, Unsupervised Dempster-Shafer fusion of dependent sensors, Proceed-
ings of the 4th IEEE Southwest Symposium on Image Analysis and Interpretation,
Austin, TX, USA, 2-4 April 2000, pp. 247–251.
1041. W. Ping and Y. Genqing, Improvement method for the combining rule of dempster-
shafer evidence theory based on reliability, Journal of Systems Engineering and Elec-
tronics 16 (2005), no. 2, 471–474.
1042. C. S. Pinhanez and A. F. Bobick, Human action detection using pnf propagation of
temporal constraints, Proc. of the Conference on Computer Vision and Pattern Recog-
nition, 1998, pp. 898–904.
1043. Axel Pinz, Manfred Prantl, Harald Ganster, and Hermann Kopp-Borotschnig, Active
fusion - a new method applied to remote sensing image interpretation, Pattern Recog-
nition Letters 17 (1996), 1349–1359.
1044. L. Polkowski and A. Skowron, Rough mereology: A new paradigm for approximate
reasoning, International Journal of Approximate Reasoning 15 (1996), 333–365.
1045. , Rough mereology: A new paradigm for approximate reasoning, International
Journal of Approximate Reasoning 15 (1996), no. 4, 333 – 365, Rough Sets.
1046. R. Poppe and M. Poel, Comparison of silhouette shape descriptors for example-based
human pose recovery, 2006, pp. 541–546.
1047. R. W. Poppe and M. Poel, Example-based pose estimation in monocular images using
compact fourier descriptors, CTIT Technical Report series TR-CTIT-05-49, Univer-
sity of Twente, Enschede, 2005.
1048. R.W. Poppe, Evaluating example-based pose estimation: Experiments on the hu-
maneva sets, Online Proceedings of the Workshop on Evaluation of Articulated Hu-
man Motion and Pose Estimation (EHuM) at the International Conference on Com-
puter Vision and Pattern Recognition (CVPR) (Minnesota, Minneapolis), June 2007,
pp. 1–8.
1049. G. Priest, R. Routley, and J. Norman, Paraconsistent logic: Essays on the inconsistent,
Philosophia Verlag, 1989.
1050. G. Provan, An analysis of ATMS-based techniques for computing Dempster-Shafer
belief functions, Proceedings of the International Joint Conference on Artificial Intel-
ligence, 1989.
1051. Gregory Provan, An analysis of exact and approximation algorithms for Dempster-
Shafer theory, Tech. report, Department of Computer Science, University of British
Columbia, Tech. Report 90-15, 1990.
1052. , The validity of Dempster-Shafer belief functions, International Journal of Ap-
proximate Reasoning 6 (1992), 389–399.
1053. Gregory M. Provan, The application of Dempster-Shafer theory to a logic-based visual
recognition system, Uncertainty in Artificial Intelligence, 5 (L. N. Kanal M. Henrion,
R. D. Schachter and J. F. Lemmers, eds.), North Holland, Amsterdam, 1990, pp. 389–
405.
1054. , A logic-based analysis of Dempster-Shafer theory, International Journal of
Approximate Reasoning 4 (1990), 451–495.
1055. X.P. Wu Q. Ye and Y.X. Song, An evidence combination method of introducing weight
factors, Fire Control and Command Control 32.
1056. R. Qian and T. Huang, Motion analysis of articulated objects with applications to
human ambulatory patterns, DARPA’92, 1992, pp. 549–553.
602 References

1057. B. Quost, T. Denoeux, and M. Masson, One-against-all classifier combination in the


framework of belief functions, IPMU, 2006.
1058. P. Fua R. Plankers and N. D’Apuzzo, Automated body modeling from video sequences,
International Workshop on Modeling People at ICCV’99, Corfu, Greece, September
1999.
1059. Lawrence R. Rabiner, A tutorial on hidden markov models and selected applications
in speech recognition, Proceedings of the IEEE, 1989, pp. 257–286.
1060. Andrej Rakar, Ani Jurii, and Peter Ballé, Transferable belief model in fault diagnosis,
Engineering Applications of Artificial Intelligence 12 (1999), 555–567.
1061. E. Ramasso, M. Rombaut, and D. Pellerin, Forward-Backward-Viterbi procedures in
the Transferable Belief Model for state sequence analysis using belief functions, Sym-
bolic and Quantitative Approaches to Reasoning with Uncertainty (2007), 405–417.
1062. A. Ramer, Uniqueness of information measure in the theory of evidence, Random Sets
and Systems 24 (1987), 183–196.
1063. A. Ramer and G. J. Klir, Measures of discord in the Dempster-Shafer theory., Infor-
mation Sciences 67 (1993), no. 1-2, 35–50.
1064. Arthur Ramer, Text on evidence theory: comparative review, International Journal of
Approximate Reasoning 14 (1996), 217–220.
1065. Rajesh P.N. Rao and Dana H. Ballard, Dynamic model of visual recognition predicts
neaural response properties in the visual cortex, Tech. report, Department of computer
science, University of Rochester, November 1995.
1066. A Rashidi and H Ghassemian, Extended dempstershafer theory for multi-
system/sensor decision fusion, 2003.
1067. C. Rasmussen and G.D. Hager, Joint probabilistic techniques for tracking multi-part
objects, Int. Conf. on Computer Vision and Pattern Recognition, 1998.
1068. , Probabilistic data association methods for tracking complex visual objects,
IEEE Transaction on Patter Analysis and Machine Intelligence 23 (2001), 560–576.
1069. Bonnie K. Ray and David H. Krantz, Foundations of the theory of evidence: Resolving
conflict among schemata, Theory and Decision 40 (1996), no. 3, 215–234.
1070. Yann Rbill, A yosidahewitt decomposition for totally monotone set functions on locally
compact -compact topological spaces, International Journal of Approximate Reason-
ing 48 (2008), no. 3, 676 – 685, Special Section on Choquet Integration in honor of
Gustave Choquet (19152006) and Special Section on Nonmonotonic and Uncertain
Reasoning.
1071. S. Reece, Qualitative model-based multisensor data fusion and parameter estima-
tion using infinity -norm Dempster-Shafer evidential reasoning, Proceedings of the
SPIE - Signal Processing, Sensor Fusion, and Target Recognition VI (A. Heckerman,
D.; Mamdani, ed.), vol. 3068, Orlando, FL, USA, 21-24 April 1997, pp. 52–63.
1072. Marek Reformat, Michael R. Berthold, Hatem Masri, and Fouad Ben Abdelaziz, Be-
lief linear programming, International Journal of Approximate Reasoning 51 (2010),
no. 8, 973 – 983.
1073. Helen M. Regan, Scott Ferson, and Daniel Berleant, Equivalence of methods for un-
certainty propagation of real-valued random variables, International Journal of Ap-
proximate Reasoning 36 (2004), no. 1, 1 – 30.
1074. Giuliana Regoli, Decision theory and decision analysis: Trends and challenges,
ch. Rational Comparisons and Numerical Representations, pp. 113–126, Springer
Netherlands, Dordrecht, 1994.
1075. J. Rehg, Visual analysis of high dof articulated objects with application to hand track-
ing, PhD dissertation, Carnegie Mellon University, April 1995.
References 603

1076. J. Rehg and T. Kanade, Digiteyes: Vision-based human hand tracking, Tech. report,
CS-TR-93-220, Carnegie Mellon University, School of Computer Science, 1993.
1077. , Visual tracking of high dof articulated structures: an application to human
hand tracking, Proc. of the Third European Conference on Computer Vision, Stock-
holm, Sweden (J. Eklundh, ed.), vol. 2, 1994, pp. 35–46.
1078. J. M. Rehg and T. Kanade, Model-based tracking of self-occluding articulated objects,
Proceedings of the International Conference on Computer Vision ICCV’95, Cam-
bridge, MA, 20-23 June 1995, pp. 618–623.
1079. James M. Rehg and Takeo Kanade, Digiteyes: Vision-based human hand tracking,
Tech. report, School of Computer Science, Carnegie Mellon University, CMU-CS-93-
220, December 1993.
1080. , Visual tracking of self-occluding articulated objects, Tech. report, School of
Computer Science, Carnegie Mellon University, CMU-CS-94-224, December 1994.
1081. G. Resconi, G. Klir, U. St. Clair, and D. Harmanec, On the integration of uncertainty
theories, Fuzziness and Knowledge-Based Systems 1 (1993), 1–18.
1082. G. Resconi, A.J. van der Wal, and D. Ruan, Speed-up of the monte carlo method
by using a physical model of the Dempster-Shafer theory, International Journal of
Intelligent Systems 13 (1998), 221–242.
1083. Germano Resconi, George J Klir, David Harmanec, and Ute St Clair, Interpretations
of various uncertainty theories using models of modal logic: a summary, Fuzzy Sets
and Systems 80 (1996), 7–14.
1084. Germano Resconi, George J. Klir, David Harmanec, and Ute St. Clair, Interpretations
of various uncertainty theories using models of modal logic: A summary, Fuzzy Sets
and Systems 80 (1996), no. 1, 7 – 14, Fuzzy Modeling.
1085. Sergio Rinaldi and Lorenzo Farina, I sistemi lineari positivi: teoria e applicazioni,
Cittá Studi Edizioni.
1086. B. Ristic and P. Smets, Belief function theory on the continuous space with an appli-
cation to model based classification, IPMU, 2004, pp. 1119–1126.
1087. B. Ristic and Ph. Smets, The TBM global distance measure for the association of
uncertain combat ID declarations, Information Fusion 7(3) (2006), 276–284.
1088. Christoph Roemer and Abraham Kandel, Applicability analysis of fuzzy inference by
means of generalized Dempster-Shafer theory, IEEE Transactions on Fuzzy Systems
3:4 (November 1995), 448–453.
1089. Christopher Roesmer, Nonstandard analysis and Dempster-shafer theory, Interna-
tional Journal of Intelligent Systems 15 (2000), 117–127.
1090. K. Rohr, Towards model-based recognition of human movements in image sequences,
CVGIP: Image Understanding 59 (1994), 94–115.
1091. , Human movement analysis based on explicit motion models, chapter 8, pages
171-198, Kluwer Academic Publisher, Dordrecht Boston, 1997.
1092. Christoph Romer and Abraham Kandel, Constraints on belief functions imposed by
fuzzy random variables, IEEE Transactions on Systems, Man, and Cybernetics, Part
B: Cybernetics 25 (1995), 86–99.
1093. R. Rosales and S. Sclaroff, Learning and synthesizing human body motion and pos-
ture, Fourth Int. Conf. on Automatic Face and Gesture Recognition, Grenoble, France,
March 2000.
1094. R. Rosales, M. Siddiqui, J. Alon, and S. Sclaroff, Estimating 3d body pose using un-
calibrated cameras, 2001, pp. I:821–827.
1095. Kimmo I Rosenthal, Quantales and their applications, Longman scientific and tech-
nical, Longman house, Burnt Mill, Harlow, Essex, UK, 1990.
604 References

1096. David Ross, Random sets without separability, Annals of Probability 14:3 (July 1986),
1064–1069.
1097. Dan Roth, On the hardness of approximate reasoning, Artificial Intelligence 82
(1996), no. 12, 273 – 302.
1098. E. H. Ruspini, J.D. Lowrance, and T. M. Strat, Understanding evidential reasoning,
International Journal of Approximate Reasoning 6 (1992), 401–424.
1099. E.H. Ruspini, Epistemic logics, probability and the calculus of evidence, Proc. 10th
Intl. Joint Conf. on AI (IJCAI-87), 1987, pp. 924–931.
1100. Enrique H. Ruspini, The logical foundations of evidential reasoning, Tech. report, SRI
International, Menlo Park, CA, Technical Note 408, 1986.
1101. Enrique H. Ruspini, Classic works of the dempster-shafer theory of belief func-
tions, ch. Epistemic Logics, Probability, and the Calculus of Evidence, pp. 435–448,
Springer Berlin Heidelberg, Berlin, Heidelberg, 2008.
1102. Matthew J. Ryan, Violations of belief persistence in dempstershafer equilibrium,
Games and Economic Behavior 39 (2002), no. 1, 167 – 174.
1103. I. Bloch S. Le Hégarat-Mascle and D. Vidal-Madjar, Introduction of neighborhood
information in evidence theory and application to data fusion of radar and optical
images with partial cloud cover, Pattern Recognition 31 (1998), 1811–1823.
1104. Alessandro Saffiotti, A belief-function logic, Universit Libre de Bruxelles, MIT Press,
pp. 642–647.
1105. , A hybrid framework for representing uncertain knowledge, Procs. of the 8th
AAAI Conf. Boston, MA, 1990, pp. 653–658.
1106. , A hybrid belief system for doubtful agents, Uncertatiny in Knowledge Bases,
Lecture Notes in Computer Science 251, Springer-Verlag, 1991, pp. 393–402.
1107. , Using Dempster-Shafer theory in knowledge representation, Uncertainty in
Artificial Intelligence 6 (P. Smets B. Dámbrosio and P. P. Bonissone, eds.), Morgan
Kaufann, San Mateo, CA, 1991, pp. 417–431.
1108. , A belief function logic, Proceedings of the 10th AAAI Conf. San Jose, CA,
1992, pp. 642–647.
1109. , Issues of knowledge representation in Dempster-Shafer’s theory, Advances
in the Dempster-Shafer theory of evidence (R.R. Yager, M. Fedrizzi, and J. Kacprzyk,
eds.), Wiley, 1994, pp. 415–440.
1110. , Using dempster-shafer theory in knowledge representation, CoRR
abs/1304.1123 (2013).
1111. Alessandro Saffiotti, S. Parsons, and E. Umkehrer, Comparing uncertainty manage-
ment techniques, Microcomputers in Civil Engineering 9 (1994), 367–380.
1112. Alessandro Saffiotti and E. Umkehrer, PULCINELLA: A general tool for propagation
uncertainty in valuation networks, Tech. report, IRIDIA, Libre Universite de Brux-
elles, 1991.
1113. Leonard J. Savage, The foundations of statistics, John Wiley Sons, Inc., 1954.
1114. K. Schneider, Dempster-Shafer analysis for species presence prediction of the winter
wren (Troglodytes troglodytes), Proceedings of the 1st International Conference on
GeoComputation (R.J. Abrahart, ed.), vol. 2, Leeds, UK, 17-19 Sept. 1996, p. 738.
1115. J. Schubert, Cluster-based specification techniques in Dempster-Shafer theory, Pro-
ceedings of ECSQARU’95 (C. Froidevaux and J. Kohlas, eds.), 1995.
1116. , Managing inconsistent intelligence, Information Fusion, 2000. FUSION
2000. Proceedings of the Third International Conference on, vol. 1, July 2000,
pp. TUB4/10–TUB4/16 vol.1.
1117. Johan Schubert, On nonspecific evidence, International Journal of Intelligent Systems
8:6 (1993), 711–725.
References 605

1118. , Cluster-based specification techniques in Dempster-Shafer theory for an evi-


dential intelligence analysis of multipletarget tracks, PhD dissertation, Royal Institute
of Technology, Sweden, 1994.
1119. , Cluster-based specification techniques in Dempster-Shafer theory for an evi-
dential intelligence analysis of multipletarget tracks, AI Communications 8:2 (1995),
107–110.
1120. , Finding a posterior domain probability distribution by specifying nonspe-
cific evidence, International Journal of Uncertainty, Fuzziness and Knowledge-Based
Systems 3:2 (1995), 163–185.
1121. , On ŕhoı́n a decision-theoretic apparatus of Dempster-Shafer theory, Interna-
tional Journal of Approximate Reasoning 13 (1995), 185–200.
1122. , Specifying nonspecific evidence, International Journal of Intelligent Systems
11 (1996), 525–563.
1123. , Fast Dempster-Shafer clustering using a neural network structure, Informa-
tion, Uncertainty and Fusion (R. R. Yager B. Bouchon-Meunier and L. A. Zadeh, eds.),
Kluwer Academic Publishers (SECS 516), Boston, MA, 1999, pp. 419–430.
1124. , Managing decomposed belief functions, IPMU, 2006.
1125. , Conflict management in dempster-shafer theory by sequential discounting
using the degree of falsity, 2008.
1126. Johan Schubert, The internal conflict of a belief function, pp. 169–177, Springer Berlin
Heidelberg, Berlin, Heidelberg, 2012.
1127. Johan Schubert, Simultaneous Dempster-Shafer clustering and gradual determination
of number of clusters using a neural network structure, Proceedings of the 1999 Infor-
mation, Decision and Control Conference (IDC’99), Adelaide, Australia, 8-10 Febru-
ary 1999, pp. 401–406.
1128. , Creating prototypes for fast classification in Dempster-Shafer clustering,
Proceedings of the International Joint Conference on Qualitative and Quantitative
Practical Reasoning (ECSQARU / FAPR ’97), Bad Honnef, Germany, 9-12 June 1997.
1129. , A neural network and iterative optimization hybrid for Dempster-Shafer
clustering, Proceedings of EuroFusion98 International Conference on Data Fusion
(EF’98) (J. O’Brien M. Bedworth, ed.), Great Malvern, UK, 6-7 October 1998, pp. 29–
36.
1130. , Fast Dempster-Shafer clustering using a neural network structure, Proceed-
ings of the Seventh International Conference on Information Processing and Man-
agement of Uncertainty in Knowledge-based Systems (IPMU’98), Université de La
Sorbonne, Paris, France, 6-10 July 1998, pp. 1438–1445.
1131. Lev Ginzburg Davis S. Myers Scott Ferson, Vladik Kreinovich and Kari Sentz, Con-
structing probability boxes and dempster-shafer structures, Tech. report, Sandia Na-
tional Laboratories, SAND2002-4015, 2003.
1132. R. Scozzafava, Subjective probability versus belief functions in artificial intelligence,
International Journal of General Systems 22:2 (1994), 197–206.
1133. Romano Scozzafava, Subjective probability versus belief functions in artificial intelli-
gence, International Journal of General Systems 22 (1993), no. 2, 197–206.
1134. W.B. Seales and O.D. Faugeras, Building three-dimensional object models from image
sequences, CVIU 61 (1995), 308–324.
1135. H. Segawa, H. Shioya, N. Hiraki, and T. Totsuka, Constraint-conscious smoothing
framework for the recovery of 3D articulated motion from image sequences, Fourth Int.
Conf. on Automatic Face and Gesture Recognition, Grenoble, France, March 2000.
606 References

1136. H. Segawa and T. Totsuka, Torque-based recursive filtering approach to the recovery
of 3D articulated motion from image sequences, ICCV’99, Corfu, Greece, September
1999.
1137. T. Seidenfeld, Statistical evidence and belief functions, Proc. of the Biennial Meeting
of the Philosophy of Science Association, 1978, pp. 478–489.
1138. , Some static and dynamic aspects of rubust Bayesian theory, Random Sets:
Theory and Applications (Goutsias, Malher, and Nguyen, eds.), Springer, 1997,
pp. 385–406.
1139. T. Seidenfeld, M. Schervish, and J. Kadane, Coherent choice functions under uncer-
tainty, Proceedings of ISIPTA’07, 2007.
1140. T. Seidenfeld and L. Wasserman, Dilation for convex sets of probabilities, Annals of
Statistics 21 (1993), 1139–1154.
1141. Teddy Seidenfeld, Statistical evidence and belief functions, PSA: Proceedings of the
Biennial Meeting of the Philosophy of Science Association 1978 (1978), 478–489.
1142. K. Sentz and S. Ferson, Combination of evidence in Dempster-Shafer theory, Tech.
report, SANDIA Tech. Report, SAND2002-0835, April 2002.
1143. Pavel Sevastianov, Numerical methods for interval and fuzzy number comparison
based on the probabilistic approach and dempstershafer theory, Information Sciences
177 (2007), no. 21, 4645 – 4661.
1144. G. Shafer, A mathematical theory of evidence, Princeton University Press, 1976.
1145. , Jeffrey’s rule of conditioning, Philosophy of Sciences 48 (1981), 337–362.
1146. , Belief functions and parametric models, Journal of the Royal Statistical So-
ciety, Series B 44 (1982), 322–352.
1147. G. Shafer and P. P. Shenoy, Local computation on hypertrees, Working paper No. 201,
School of Business, University of Kansas (1988).
1148. Glenn Shafer, Foundations of probability theory, statistical inference, and statistical
theories of science: Proceedings of an international research colloquium held at the
university of western ontario, london, canada, 10–13 may 1973 volume ii foundations
and philosophy of statistical inference, ch. A Theory of Statistical Evidence, pp. 365–
436, Springer Netherlands, Dordrecht, 1976.
1149. Glenn Shafer, A mathematical theory of evidence, Princeton University Press, 1976.
1150. , A theory of statistical evidence, Foundations of Probability Theory, Statistical
Inference, and Statistical Theories of Science (W. L. Harper and C. A. Hooker, eds.),
vol. 2, Reidel, Dordrecht, 1976, with discussion, pp. 365–436.
1151. , Nonadditive probabilites in the work of Bernoulli and Lambert, Arch. History
Exact Sci. 19 (1978), 309–370.
1152. , Allocations of probability, Annals of Probability 7:5 (1979), 827–839.
1153. , Constructive probability, Synthese 48 (1981), 309–370.
1154. , Two theories of probability, Philosophy of Science Association Proceedings
1978 (P. Asquith and I. Hacking, eds.), vol. 2, Philosophy of Science Association, East
Lansing (MI), 1981.
1155. , Belief functions and parametric models, Journal of the Royal Statistical So-
ciety B.44 (1982), 322–352.
1156. , The combination of evidence, Tech. report, School of Business, University of
Kansas, Lawrence, KS, Working Paper 162, 1984.
1157. , Conditional probability, International Statistical Review 53 (1985), 261–277.
1158. , Nonadditive probability, Encyclopedia of Statistical Sciences (Kotz and
Johnson, eds.), Wiley, 1985, pp. 6, 271–276.
1159. , The combination of evidence, International Journal of Intelligent Systems 1
(1986), 155–179.
References 607

1160. Glenn Shafer, The combination of evidence, International Journal of Intelligent Sys-
tems 1 (1986), no. 3, 155–179.
1161. Glenn Shafer, Belief functions and possibility measures, Analysis of Fuzzy Informa-
tion 1: Mathematics and logic (Bezdek, ed.), CRC Press, 1987, pp. 51–84.
1162. , Probability judgment in artificial intelligence and expert systems, Statistical
Science 2 (1987), 3–44.
1163. , Perspectives on the theory and practice of belief functions, International Jour-
nal of Approximate Reasoning 4 (1990), 323–362.
1164. , Perspectives on the theory and practice of belief functions, International Jour-
nal of Approximate Reasoning 4 (1990), 323–36.
1165. , Perspectives on the theory and practice of belief functions, International Jour-
nal of Approximate Reasoning 4 (1990), no. 5, 323 – 362.
1166. , A note on Dempster’s Gaussian belief functions, Tech. report, School of Busi-
ness, University of Kansas, Lawrence, KS, 1992.
1167. , Rejoinders to comments on ‘perspectives on the theory and practice of belief
functions’, International Journal of Approximate Reasoning 6 (1992), 445–480.
1168. , Comments on ”constructing a logic of plausible inference: a guide to cox’s
theorem”, by kevin s. van horn, Int. J. Approx. Reasoning 35 (2004), no. 1, 97–105.
1169. , Probability judgement in artificial intelligence, CoRR abs/1304.3429 (2013).
1170. , Bayes’s two arguments for the rule of conditioning, Annals of Statistics 10:4
(December 1982), 1075–1089.
1171. Glenn Shafer and R. Logan, Implementing Dempster’s rule for hierarchical evidence,
Artificial Intelligence 33 (1987), 271–298.
1172. Glenn Shafer and Prakash P. Shenoy, Propagating belief functions using local compu-
tations, IEEE Expert 1 (1986), (3), 43–52.
1173. Glenn Shafer, Prakash P. Shenoy, and K. Mellouli, Propagating belief functions in
qualitative Markov trees, International Journal of Approximate Reasoning 1 (1987),
(4), 349–400.
1174. Glenn Shafer and R. Srivastava, The Bayesian and belief-function formalism: A gen-
eral perspective for auditing, Auditing: A Journal of Practice and Theory (1989).
1175. Glenn Shafer and Amos Tversky, Classic works of the dempster-shafer theory of be-
lief functions, ch. Languages and Designs for Probability Judgment, pp. 345–374,
Springer Berlin Heidelberg, Berlin, Heidelberg, 2008.
1176. Glenn Shafer and Vladimir Vovk, Probability and finance: It’s only a game!, Wiley,
New York, 2001.
1177. Gregory Shakhnarovich, Paul Viola, and Trevor Darrell, Fast pose estimation with
parameter-sensitive hashing, ICCV ’03: Proceedings of the Ninth IEEE International
Conference on Computer Vision (Washington, DC, USA), IEEE Computer Society,
2003, p. 750.
1178. L. Shapley, A value for n-person games, Contributions to the Theory of Games,.
1179. L.S. Shapley, Cores of convex games, Int. J. Game Theory 1 (1971), 11–26.
1180. Lokendra Shastri, Evidential reasoning in semantic networks: A formal theory and its
parallel implementation (inheritance, categorization, connectionism, knowledge rep-
resentation), Ph.D. thesis, 1985, AAI8528562.
1181. Lokendra Shastri and Jerome A. Feldman, Evidential reasoning in semantic networks:
A formal theory, Proceedings of the 9th International Joint Conference on Artificial
Intelligence - Volume 1 (San Francisco, CA, USA), IJCAI’85, Morgan Kaufmann
Publishers Inc., 1985, pp. 465–474.
608 References

1182. P.P. Shenoy, No double counting semantics for conditional independence, Tech. report,
Working Paper No. 307. School of Business, University of Kansas, Lawrence, KS,
2005.
1183. Prakash P. Shenoy, On spohn’s rule for revision of beliefs, International Journal of
Approximate Reasoning 5 (1991), no. 2, 149 – 181.
1184. Prakash P. Shenoy, Uncertainty in knowledge bases: 3rd international conference on
information processing and management of uncertainty in knowledge-based systems,
ipmu ’90 paris, france, july 2–6, 1990 proceedings, ch. On Spohn’s theory of epistemic
beliefs, pp. 1–13, Springer Berlin Heidelberg, Berlin, Heidelberg, 1991.
1185. Prakash P. Shenoy, Using Dempster-Shafer’s belief function theory in expert systems,
Advances in the Dempster-Shafer Theory of Evidence (M. Fedrizzi R. R. Yager and
J. Kacprzyk, eds.), Wiley, New York, 1994, pp. 395–414.
1186. Prakash P. Shenoy and K. Mellouli, Propagation of belief functions: a distributed ap-
proach, Uncertainty in Artificial Intelligence 2 (Lemmer and Kanal, eds.), North Hol-
land, 1988, pp. 325–336.
1187. Prakash P. Shenoy and Glenn Shafer, An axiomatic framework for Bayesian and belief
function propagation, Proceedings of the AAAI Workshop of Uncertainty in Artificial
Intelligence, 1988, pp. 307–314.
1188. , Axioms for probability and belief functions propagation, Uncertainty in Arti-
ficial Intelligence, 4 (L. N. Kanal R. D. Shachter, T. S. Lewitt and J. F. Lemmer, eds.),
North Holland, Amsterdam, 1990, pp. 159–198.
1189. Prakash P. Shenoy, Glenn Shafer, and K. Mellouli, Propagation of belief functions: a
distributed approach, Proceedings of the AAAI Workshop of Uncertainty in Artificial
Intelligence, 1986, pp. 149–160.
1190. F. K. J. Sheridan, A survey of techniques for inference under uncertainty, Artificial
Intelligence Review 5 (1991), 89–119.
1191. C. Shi, Y. Cheng, Q. Pan, and Y. Lu, A new method to determine evidence distance,
Proceedings of the 2010 International Conference on Computational Intelligence and
Software Engineering (CiSE), 2010, pp. 1–4.
1192. Margaret F. Shipley, Charlene A. Dykman, and Andre’ de Korvin, Project manage-
ment: using fuzzy logic and the Dempster-Shafer theory of evidence to select team
members for the project duration, Proceedings of IEEE, 1999, pp. 640–644.
1193. Margaret F. Shipley and Andr De Korvin, Rough set theory fuzzy belief functions
related to statistical confidence:application & evaluation for golf course closing,
Stochastic Analysis and Applications 13 (1995), no. 4, 487–502.
1194. H. Sidenbladh and M.J. Black, Learning the statistics of people in images and video,
IJCV 54 (2003), 189–209.
1195. H. Sidenbladh, M.J. Black, and D.J. Fleet, Stochastic tracking of 3D human figures
using 2d image motion, ECCV’00, 2000.
1196. H. Sidenbladh, F. de la Torre, and M.J. Black, A framework for modeling the ap-
pearance of 3D articulated figures, Int. Conference on Automatic Face and Gesture
Recognition, 2000.
1197. Roman Sikorski, Boolean algebras, Springer Verlag, 1964.
1198. M.-A Simard, J. Couture, and E. Bosse, Data fusion of multiple sensors attribute infor-
mation for target identity estimation using a Dempster-Shafer evidential combination
algorithm, Proceedings of the SPIE - Signal and Data Processing of Small Targets
(K. Anderson, P.G.; Warwick, ed.), vol. 2759, Orlando, FL, USA, 9-11 April 1996,
pp. 577–588.
References 609

1199. W. R. Simpson and J. W. Sheppard, The application of evidential reasoning in a


portable maintenance aid, Proceedings of the IEEE Systems Readiness Technology
Conference (V. Jorrand, P.; Sgurev, ed.), San Antonio, TX, USA, 17-21 September
1990, pp. 211–214.
1200. Andrzej Skowron and Jerzy Grzymal-Busse, Advances in the dempster-shafer theory
of evidence, John Wiley & Sons, Inc., New York, NY, USA, 1994, pp. 193–236.
1201. A. Slobodova, Multivalued extension of conditional belief functions, Qualitative and
quantitative practical reasoning, vol. 1244/1997, Springer Berlin/Heidelberg, 1997,
pp. 568–573.
1202. Anna Slobodova, Conditional belief functions and valuation-based systems, Tech.
report, Institute of Control Theory and Robotics, Slovak Academy of Sciences,
Bratislava, SK, 1994.
1203. , Multivalued extension of conditional belief functions, Proceedings of the In-
ternational Joint Conference on Qualitative and Quantitative Practical Reasoning (EC-
SQARU / FAPR ’97), Bad Honnef, Germany, 9-12 June 1997.
1204. , Multivalued extension of conditional belief functions, Proceedings of the In-
ternational Joint Conference on Qualitative and Quantitative Practical Reasoning (EC-
SQARU / FAPR ’97), Bad Honnef, Germany, 9-12 June 1997.
1205. , Multivalued extension of conditional belief functions, Proceedings of the In-
ternational Joint Conference on Qualitative and Quantitative Practical Reasoning (EC-
SQARU / FAPR ’97), Bad Honnef, Germany, 9-12 June 1997.
1206. , A comment on conditioning in the Dempster-Shafer theory, Proceedings of
the International ICSC Symposia on Intelligent Industrial Automation and Soft Com-
puting (K. Anderson, P.G.; Warwick, ed.), Reading, UK, 26-28 March 1996, pp. 27–
31.
1207. F. Smarandache and J. Dezert, An introduction to the DSm theory for the combination
of paradoxical, uncertain and imprecise sources of information, Proceedings of the
13th International Congress of Cybernetics and Systems, 2005, pp. 6–10.
1208. F. Smarandache, J. Dezert, and J. M. Tacnet, Fusion of sources of evidence with dif-
ferent importances and reliabilities, Information Fusion (FUSION), 2010 13th Con-
ference on, July 2010, pp. 1–8.
1209. P. Smets, Showing why measures of quantified beliefs are belief functions, Intelligent
Systems for Information Processing: From representations to Applications.
1210. P. Smets, No Dutch book can be built against the TBM even though update is not
obtained by Bayes rule of conditioning, Workshop on probabilistic expert systems,
Societa Italiana di Statistica, Roma, 1993, pp. 181–204.
1211. , Decision making in a context where uncertainty is represented by belief func-
tions, Belief functions in business decisions (2002), 17–61.
1212. , Decision making in the TBM: the necessity of the pignistic transformation,
International Journal of Approximate Reasoning 38 (2005), no. 2, 133–147.
1213. P. Smets and R. Kennes, The transferable belief model, Artificial intelligence 66
(1994), no. 2, 191–234.
1214. P. Smets and B. Ristic, Kalman filter and joint tracking and classification in the TBM
framework, Proceedings of the Seventh International Conference on Information Fu-
sion, vol. 1, Citeseer, 2004, pp. 46–53.
1215. Ph. Smets, Medical diagnosis : Fuzzy sets and degree of belief, Proceedings of MIC’79
(J. Willems, ed.), Wiley, 1979, pp. 185–189.
1216. , The degree of belief in a fuzzy event, Information Sciences 25 (1981), 1–19.
1217. , Medical diagnosis : Fuzzy sets and degrees of belief, Int. J. Fuzzy Sets and
systems 5 (1981), 259–266.
610 References

1218. , The combination of evidence in the transferable belief model, IEEE Tr. PAMI
12 (1990), 447–458.
1219. , Varieties of ignorance, Information Sciences 57-58 (1991), 135–144.
1220. , Belief functions: the disjunctive rule of combination and the generalized
Bayesian theorem, International Journal of Approximate reasoning 9 (1993), 1–35.
1221. , The axiomatic justification of the transferable belief model, Tech. report, Uni-
versite’ Libre de Bruxelles, Technical Report TR/IRIDIA/1995-8.1, 1995.
1222. , Belief functions on real numbers, International Journal of Approximate Rea-
soning 40 (2005), no. 3, 181–223.
1223. Ph. smets, Decision making in the tbm: the necessity of the pignistic transformation,
International Journal of Approximate Reasoning 38(2) (February 2005), 133–147.
1224. Ph. Smets, The application of the matrix calculus to belief functions, International
Journal of Approximate Reasoning 31(1-2) (October 2002), 1–30.
1225. Philippe Smets, Belief functions : the disjunctive rule of combination and the general-
ized Bayesian theorem, International Journal of Approximate Reasoning 9.
1226. , Theory of evidence and medical diagnostic, Medical Informatics Europe 78
(1978), 285–291.
1227. , Information content of an evidence, International Journal of Man Machine
Studies 19 (1983), 33–43.
1228. , Data fusion in the transferable belief model, Proceedings of the 1984 Amer-
ican Control Conference, 1984, pp. 554–555.
1229. , Bayes’ theorem generalized for belief functions, Proceedings of ECAI-86,
vol. 2, 1986, pp. 169–171.
1230. , Belief functions, Non-Standard Logics for Automated Reasoning (Ph. Smets,
A. Mamdani, D. Dubois, and H. Prade, eds.), Academic Press, London, 1988, pp. 253–
286.
1231. , Belief functions versus probability functions, Uncertainty and Intelligent Sys-
tems (Saitta L. Bouchon B. and Yager R., eds.), Springer Verlag, Berlin, 1988, pp. 17–
24.
1232. , Constructing the pignistic probability function in a context of uncertainty,
Uncertainty in Artificial Intelligence, 5 (M. Henrion, R.D. Shachter, L.N. Kanal, and
J.F. Lemmer, eds.), Elsevier Science Publishers, 1990, pp. 29–39.
1233. , The transferable belief model and possibility theory, Proceedings of
NAFIPS-90 (Kodratoff Y., ed.), 1990, pp. 215–218.
1234. , About updating, Proceedings of the 7th conference on Uncertainty in Arti-
ficial Intelligence (B. Dámbrosio, Ph. Smets, and Bonissone P. P. and, eds.), 1991,
pp. 378–385.
1235. , Patterns of reasoning with belief functions, Journal of Applied Non-Classical
Logic 1:2 (1991), 166–170.
1236. , Probability of provability and belief functions, Logique et Analyse 133-134
(1991), 177–195.
1237. , The transferable belief model and other interpretations of Dempster-Shafer’s
model, Uncertainty in Artificial Intelligence, volume 6 (P.P. Bonissone, M. Henrion,
L.N. Kanal, and J.F. Lemmer, eds.), North-Holland, Amsterdam, 1991, pp. 375–383.
1238. , The nature of the unnormalized beliefs encountered in the transferable belief
model, Proceedings of the 8th Annual Conference on Uncertainty in Artificial Intelli-
gence (UAI-92) (San Mateo, CA), Morgan Kaufmann, 1992, pp. 292–29.
References 611

1239. , The nature of the unnormalized beliefs encountered in the transferable be-
lief model, Proceedings of the 8th Conference on Uncertainty in Artificial Intelli-
gence (AI92) (DÁmbrosio B. Dubois D., Wellmann M.P. and Smets Ph., eds.), 1992,
pp. 292–297.
1240. , Resolving misunderstandings about belief functions’, International Journal
of Approximate Reasoning 6 (1992), 321–34.
1241. , The transferable belief model and random sets, International Journal of In-
telligent Systems 7 (1992), 37–46.
1242. , The transferable belief model for expert judgments and reliability problems,
Reliability Engineering and System Safety 38 (1992), 59–66.
1243. , Jeffrey’s rule of conditioning generalized to belief functions, Proceedings of
the 9th Conference on Uncertainty in Artificial Intelligence (UAI93) (Mamdani A.
Heckerman D., ed.), 1993, pp. 500–505.
1244. , Quantifying beliefs by belief functions : An axiomatic justification, Proceed-
ings of the 13th International Joint Conference on Artificial Intelligence, IJCAI93,
1993, pp. 598–603.
1245. , Belief induced by the knowledge of some probabilities, Proceedings of the
10th Conference on Uncertainty in Artificial Intelligence (AI94) (Lopez de Man-
taras R. Heckerman D., Poole D., ed.), 1994, pp. 523–530.
1246. , What is Dempster-Shafer’s model ?, Advances in the Dempster-Shafer The-
ory of Evidence (Fedrizzi M. Yager R.R. and Kacprzyk J., eds.), Wiley, 1994, pp. 5–34.
1247. , Non standard probabilistic and non probabilistic representations of uncer-
tainty, Advances in Fuzzy Sets Theory and Technology, 3 (Wang P.P., ed.), Duke Uni-
versity, Durham, NC, 1995, pp. 125–154.
1248. , Probability, possibility, belief : which for what ?, Foundations and Applica-
tions of Possibility Theory (Kerre E.E. De Cooman G., Ruan D., ed.), World Scientific,
Singapore, 1995, pp. 20–40.
1249. Philippe Smets, The α-junctions: Combination operators applicable to belief func-
tions, pp. 131–153, Springer Berlin Heidelberg, Berlin, Heidelberg, 1997.
1250. Philippe Smets, The normative representation of quantified beliefs by belief functions,
Artificial Intelligence 92 (1997), 229–242.
1251. , The transferable belief model for uncertainty representation, 1997.
1252. , The application of the transferable belief model to diagnostic problems, Int.
J. Intelligent Systems 13 (1998), 127–158.
1253. , Numerical representation of uncertainty, Handbook of Defeasible Reasoning
and Uncertainty Management Systems, Vol. 3: Belief Change (Gabbay D., Smets Ph.
(Series Eds). Dubois D., and Prade H. (Vol. Eds.), eds.), Kluwer, Doordrecht, 1998,
pp. 265–309.
1254. , Probability, possibility, belief: Which and where ?, Handbook of Defeasible
Reasoning and Uncertainty Management Systems, Vol. 1: Quantified Representation
of Uncertainty and Imprecision (Gabbay D. and Smets Ph., eds.), Kluwer, Doordrecht,
1998, pp. 1–24.
1255. , The transferable belief model for quantified belief representation, Handbook
of Defeasible Reasoning and Uncertainty Management Systems, Vol. 1: Quantified
Representation of Uncertainty and Imprecision (Gabbay D. and Smets Ph., eds.),
Kluwer, Doordrecht, 1998, pp. 267–301.
1256. , Practical uses of belief functions, Uncertainty in Artificial Intelligence 15
(Laskey K. B. and Prade H., eds.), 1999, pp. 612–621.
612 References

1257. Philippe Smets, Practical uses of belief functions, Proceedings of the Fifteenth Con-
ference on Uncertainty in Artificial Intelligence (San Francisco, CA, USA), UAI’99,
Morgan Kaufmann Publishers Inc., 1999, pp. 612–621.
1258. Philippe Smets, Quantified epistemic possibility theory seen as an hyper cautious
transferable belief model, 2000.
1259. , Decision making in a context where uncertainty is represented by belief func-
tions, Belief Functions in Business Decisions (Srivastava R., ed.), Physica-Verlag,
2001, pp. 495–504.
1260. , Analyzing the combination of conflicting belief functions, Information Fusion
8 (2007), no. 4, 387 – 412.
1261. , The a-junctions: the commutative combination operators applicable to belief
functions, Proceedings of the International Joint Conference on Qualitative and Quan-
titative Practical Reasoning (ECSQARU / FAPR ’97) (Nonnengart A. Gabbay D.,
Kruse R. and Ohlbach H. J., eds.), Bad Honnef, Germany, 9-12 June 1997, pp. 131–
153.
1262. , Probability of deductibility and belief functions, Proceedings of the European
Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty
(ECSQARU’93) (M. Clark, R. Kruse, and S. Moral, eds.), Granada, Spain, 8-10 Nov.
1993, pp. 332–340.
1263. , Upper and lower probability functions versus belief functions, Proceed-
ings of the International Symposium on Fuzzy Systems and Knowledge Engineering,
Guangzhou, China, 1987, pp. 17–21.
1264. , Applying the transferable belief model to diagnostic problems, Proceedings
of 2nd International Workshop on Intelligent Systems and Soft Computing for Nuclear
Science and Industry (D. Ruan, P. D’hondt, P. Govaerts, and E.E. Kerre, eds.), Mol,
Belgium, 25-27 September 1996, pp. 285–292.
1265. , The canonical decomposition of a weighted belief, Proceedings of the Inter-
national Joint Conference on AI, IJCAI95, Montréal, Canada, 1995, pp. 1896–1901.
1266. , The concept of distinct evidence, Proceedings of the 4th Conference on In-
formation Processing and Management of Uncertainty in Knowledge-Based Systems
(IPMU 92), Palma de Mallorca, 6-10 July 92, pp. 789–794.
1267. , Data fusion in the transferable belief model, Proc. 3rd Intern. Conf. Infora-
tion Fusion, Paris, France 2000, pp. 21–33.
1268. , Transferable belief model versus Bayesian model, Proceedings of ECAI 1988
(Kodratoff Y., ed.), Pitman, London, 1988, pp. 495–500.
1269. , No Dutch Book can be built against the TBM even though update is not ob-
tained by Bayes rule of conditioning, SIS, Workshop on Probabilistic Expert Systems
(R. Scozzafava, ed.), Roma, Italy, 1993, pp. 181–204.
1270. , Belief functions and generalized Bayes theorem, Proceedings of the Second
IFSA Congress, Tokyo, Japan, 1987, pp. 404–407.
1271. Philippe Smets and Roger Cooke, How to derive belief functions within probabilistic
frameworks?, Proceedings of the International Joint Conference on Qualitative and
Quantitative Practical Reasoning (ECSQARU / FAPR ’97), Bad Honnef, Germany,
9-12 June 1997.
1272. Philippe Smets and Y. T. Hsia, Default reasoning and the transferable belief model,
Uncertainty in Artificial Intelligence 6 (P.P. Bonissone, M. Henrion, L.N. Kanal, and
J.F. Lemmer, eds.), Wiley, 1991, pp. 495–504.
1273. Philippe Smets, Y. T. Hsia, Alessandro Saffiotti, R. Kennes, H. Xu, and E. Emkehrer,
The transferable belief model, Symbolic and Quantitative Approaches to Uncertainty
References 613

(Kruse R. and Siegel P., eds.), Springer Verlag, Lecture Notes in Computer Science
No. 458, Berlin, 1991, pp. 91–96.
1274. Philippe Smets and Yen-Teh Hsia, Defeasible reasoning with belief functions, Tech.
report, Universite’ Libre de Bruxelles, Technical Report TR/IRIDIA/90-9, 1990.
1275. Philippe Smets and Yen-Teh Hsia, Default reasoning and the transferable belief
model, Proceedings of the Sixth Annual Conference on Uncertainty in Artificial In-
telligence (New York, NY, USA), UAI ’90, Elsevier Science Inc., 1991, pp. 495–504.
1276. Philippe Smets and Robert Kennes, The transferable belief model, Artificial Intelli-
gence 66 (1994), 191–234.
1277. Philippe Smets and R. Kruse, The transferable belief model for belief representation,
Uncertainty Management in information systems: from needs to solutions (Motro A.
and Smets Ph., eds.), Kluwer, Boston, 1997, pp. 343–368.
1278. C. Sminchisescu and B. Triggs, Covariance scaled sampling for monocular 3D body
tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recog-
nition CVPR’01, Hawaii, December 2001.
1279. Cedric A. B. Smith, Consistency in statistical inference and decision, Journal of the
Royal Statistical Society, Series B 23 (1961), 1–37.
1280. , Personal probability and statistical analysis, Journal of the Royal Statistical
Society, Series A 128 (1965), 469–489.
1281. P. Smith and O.R. Jones, The philosophy of mind: An introduction, Cambridge Uni-
versity Press, 1986.
1282. M. J. Smithson, Ignorance and uncertainty: Emerging paradigms, Springer, New York
(NY), 1989.
1283. Paul Snow, The vulnerability of the transferable belief model to dutch books, Artificial
Intelligence 105 (1998), 345–354.
1284. Leen-Kit Soh, Costas Tsatsoulis, Todd Bowers, and Andrew Williams, Representing
sea ice knowledge in a Dempster-Shafer belief system, Proceedings of IEEE, 1998,
pp. 2234–2236.
1285. Y. Song, L. Goncaves, E. Di Bernardo, and P. Perona, Monocular perception of biolog-
ical motion - detection and labelling, Int. Conf. on Computer Vision, 1999, pp. 805–
812.
1286. Andrea Sorrentino, Fabio Cuzzolin, and Ruggero Frezza, Using hidden Markov mod-
els and dynamic size functions for gesture recognition, Proceedings of the 8th British
Machine Vision Conference (BMVC97) (Adrian F. Clark, ed.), vol. 2, September
1997, pp. 560–570.
1287. Z. A. Sosnowski and J. S. Walijewski, Generating fuzzy decision rules with the use of
Dempster-Shafer theory, Proceedings of the 13th European Simulation Multiconfer-
ence 1999 (H. Szczerbicka, ed.), vol. 2, Warsaw, Poland, 1-4 June 1999, pp. 419–426.
1288. M. Spies, Conditional events, conditioning, and random sets, IEEE Transactions on
Systems, Man, and Cybernetics 24 (1994), 1755–1763.
1289. R. Spillman, Managing uncertainty with belief functions, AI Expert 5:5 (May 1990),
44–49.
1290. Wolfgang Spohn, Causation in decision, belief change, and statistics: Proceedings
of the irvine conference on probability and causation, ch. Ordinal Conditional Func-
tions: A Dynamic Theory of Epistemic States, pp. 105–134, Springer Netherlands,
Dordrecht, 1988.
1291. , A general non-probabilistic theory of inductive reasoning, Proceedings of
the Fourth Annual Conference on Uncertainty in Artificial Intelligence (Amsterdam,
The Netherlands, The Netherlands), UAI ’88, North-Holland Publishing Co., 1990,
pp. 149–158.
614 References

1292. R. P. Srivastava and Glenn Shafer, Integrating statistical and nonstatistical audit ev-
idence using belief functions: a case of variable sampling, International Journal of
Intelligent Systems 9:6 (June 1994), 519–539.
1293. Rajendra P. Srivastava, Alternative form of dempster’s rule for binary variables, Inter-
national Journal of Intelligent Systems 20 (2005), no. 8, 789–797.
1294. T. Starner and A. Pentland, Real-time american sign language recognition from video
using hmm, Proc. of ISCV 95, vol. 29, 1997, pp. 213–244.
1295. R. Stein, The Dempster-Shafer theory of evidential reasoning, AI Expert 8:8 (August
1993), 26–31.
1296. R.S. Stephens, Real-time 3D object tracking, Image and Vision Computing 8 (1990),
91–96.
1297. Manfred Stern, Semimodular lattices, Cambridge University Press, 1999.
1298. P. R. Stokke, T. A. Boyce, John D. Lowrance, J. William, and K. Ralston, Evidential
reasoning and project early warning systems, Research and Technology Management
(1994).
1299. , Industrial project monitoring with evidential reasoning, Nordic Advanced
Information Technology Magazine 8 (1994), 18–27.
1300. E. Straszecka, On an application of Dempster-Shafer theory to medical diagnosis sup-
port, Proceedings of the 6th European Congress on Intelligent Techniques and Soft
Computing (EUFIT’98), vol. 3, Aachen, Germany: Verlag Mainz, 1998, pp. 1848–
1852.
1301. Thomas M. Strat, The generation of explanations within evidential reasoning systems,
Proceedings of the Tenth Joint Conference on Artificial Intelligence (Institute of Elec-
trical and Electronical Engineers, eds.), 1987, pp. 1097–1104.
1302. , Making decisions with belief functions, Proceedings of the 5th Workshop on
Uncertainty in AI, 1989, pp. 351–360.
1303. , Decision analysis using belief functions, International Journal of Approxi-
mate Reasoning 4 (1990), 391–417.
1304. , Making decisions with belief functions, Uncertainty in Artificial Intelligence,
5 (L. N. Kanal M. Henrion, R. D. Schachter and J. F. Lemmers, eds.), North Holland,
Amsterdam, 1990.
1305. , Decision analysis using belief functions, Advances in the Dempster-Shafer
Theory of Evidence, Wiley, New York, 1994.
1306. , Continuous belief functions for evidential reasoning, Proceedings of the Na-
tional Conference on Artificial Intelligence (Institute of Electrical and Electronical
Engineers, eds.), August 1984, pp. 308–313.
1307. Thomas M. Strat and John D. Lowrance, Explaining evidential analyses, International
Journal of Approximate Reasoning 3 (1989), no. 4, 299 – 353.
1308. , Explaining evidential analysis, International Journal of Approximate Rea-
soning 3 (1989), 299–353.
1309. R. L. Streit, The moments of matched and mismatched hidden Markov models, IEEE
Trans. on Acoustics, Speech, and Signal Processing Vol. 38(4) (April 1990), 610–622.
1310. Xiaoyan Su, Sankaran Mahadevan, Wenhua Han, and Yong Deng, Combining depen-
dent bodies of evidence, Applied Intelligence 44 (2016), no. 3, 634–644.
1311. J. J. Sudano, Pignistic probability transforms for mixes of low- and high- probability
events, Proceedings of the International Conference on Information Fusion, 2001.
1312. , Inverse pignistic probability transforms, Proceedings of the International
Conference on Information Fusion, 2002.
References 615

1313. J.J. Sudano, Pignistic probability transforms for mixes of low- and high-probability
events, Proceedings of the Fourth International Conference on Information Fusion
(ISIF’01), Montreal, Canada, 2001, pp. 23–27.
1314. , Equivalence between belief theories and nave bayesian fusion for systems
with independent evidential data, Proceedings of the Sixth International Conference
on Information Fusion (ISIF’03), 2003.
1315. Thomas Sudkamp, The consistency of Dempster-Shafer updating, International Jour-
nal of Approximate Reasoning 7 (1992), 19–44.
1316. M. Sugeno, Fuzzy automata and dicision processes, ch. Fuzzy measures and fuzzy
integrals: A survey, p. 89102, North-Holland, Amsterdam, 1977.
1317. Michio Sugeno, Yasuo Narukawa, and Toshiaki Murofushi, Choquet integral and fuzzy
measures on locally compact space, Fuzzy Sets and Systems 99 (1998), no. 2, 205 –
211.
1318. H. Sun and M. Farooq, Conjunctive and disjunctive combination rules of evidence,
Signal Processing, Sensor Fusion, and Target Recognition XIII. Edited by Kadar, Ivan.
Proceedings of the SPIE, Volume 5429, pp. 392-401 (2004). (I. Kadar, ed.), August
2004, pp. 392–401.
1319. P. Suppes and M. Zanotti, On using random relations to generate upper and lower
probabilities, Synthese 36 (1977), 427–440.
1320. Q. Pang S.Y. Zhang and H.C. Zhang, A new kind of combination rule of evidence
theory, Control and Decision 15 (2000), 540–544.
1321. Gabor Szasz, Introduction to lattice theory, Academic Press, New York and London,
1963.
1322. R. Szeliski and S.B. Kang, Recovering 3D shape and motion from image streams using
nonlinear least squares, J. Vis. Comm. Im. Repr. 5 (1994), 10–28.
1323. P. Dutta T. Ali and H. Boruah, A new combination rule for conflict problem of
dempster-shafer evidence theory, International Journal of Energy, Information and
Communications 3.
1324. M. Harville T. Darrell, G. Gordon and J. Woodfill, Integrated person tracking using
stereo, color, and pattern detection, CVPR’98, 1998, pp. 601–608.
1325. Hideo Tanaka and Hisao Ishibuchi, Evidence theory of exponential possibility distri-
butions, International Journal of Approximate Reasoning 8 (1993), no. 2, 123 – 140.
1326. Hideo Tanaka, Kazutomi Sugihara, and Yutaka Maeda, Non-additive measures by in-
terval probability functions, Inf. Sci. Inf. Comput. Sci. 164 (2004), no. 1-4, 209–227.
1327. Y. Tang and J. Zheng, Dempster conditioning and conditional independence in evi-
dence theory, AI 2005: Advance in Artificial Intelligence, vol. 3809/2005, Springer
Berlin/Heidelberg, 2005, pp. 822–825.
1328. H. Tao, H.S. Sawhney, and R. Kumar, Dynamic layer representation with applications
to tracking, CVPR’00, vol. 2, 2000, pp. 134–141.
1329. A. Tchamova and J. Dezert, On the behavior of dempster’s rule of combination and
the foundations of dempster-shafer theory, 2012 6th IEEE International Conference
Intelligent Systems, Sept 2012, pp. 108–113.
1330. O. Teichmuller, p-algebren, Deutsche Math. 1 (1936), 362–388.
1331. P. Teller, Conditionalization and observation, Synthese 26 (1973), no. 2, 218–258.
1332. D. Terzopoulos and D. Metaxas, Dynamic 3D models with local and global deforma-
tions: Deformable superquadrics, IEEE Trans. Pattern Analysis and Machine Intelli-
gence 13 (1991), 703–714.
1333. B. Tessem, Interval probability propagation, IJAR 7 (1992), 95–120.
1334. , Approximations for efficient computation in the theory of evidence, Artif.
Intell 61 (1993), no. 2, 315–329.
616 References

1335. Bjornar Tessem, Approximations for efficient computation in the theory of evidence,
Artificial Intelligence 61:2 (1993), 315–329.
1336. H. M. Thoma, Belief function computations, Conditional Logic in Expert Systems,
North Holland, 1991, pp. 269–308.
1337. Stelios C. Thomopoulos, Theories in distributed decision fusion: comparison and gen-
eralization, 1991, pp. 623–634.
1338. Sebastian Thrun, Wolfgang Burgard, and Dieter Fox, A probabilistic approach to con-
current mapping and localization for mobile robots, Autonomous Robots 5 (1998),
253–271.
1339. Tai-Peng Tian, Rui Li, and Stan Sclaroff, Articulated pose estimation in a learned
smooth space of feasible solutions, CVPR ’05: Proceedings of the 2005 IEEE Com-
puter Society Conference on Computer Vision and Pattern Recognition (CVPR’05) -
Workshops (Washington, DC, USA), IEEE Computer Society, 2005, p. 50.
1340. M. Tonko, K. Schafer, F. Heimes, and H.-H. Nagel, Towards visual servoed manip-
ulation of car engine parts, Proceedings of the IEEE International Conference on
Robotics and Automation ICRA’97, Albuquerque, NM, vol. 4, April 1997, pp. 3166–
3171.
1341. Bruce E. Tonn, An algorithmic approach to combining belief functions, International
Journal of Intelligent Systems 11 (1996), no. 7, 463–476.
1342. Vicen Torra, A new combination function in evidence theory, International Journal of
Intelligent Systems 10 (1995), no. 12, 1021–1033.
1343. M. Troffaes, Decision making under uncertainty using imprecise probabilities, Inter-
national Journal of Approximate Reasoning 45 (2007), no. 1, 17–29.
1344. Elena Tsiporkova, Bernard De Baets, and Veselka Boeva, Dempster’s rule of condi-
tioning traslated into modal logic, Fuzzy Sets and Systems 102 (1999), 317–383.
1345. , Evidence theory in multivalued models of modal logic, Journal of Applica-
tions of Nonclassical Logic (1999).
1346. Elena Tsiporkova, Veselka Boeva, and Bernard De Baets, Dempster-Shafer theory
framed in modal logic, International Journal of Approximate Reasoning 21 (1999),
157–175.
1347. , Dempstershafer theory framed in modal logic, International Journal of Ap-
proximate Reasoning 21 (1999), no. 2, 157 – 175.
1348. Simukai W. Utete, Billur Barshan, and Birsel Ayrulu, Voting as validation in robot
programming, International Journal of Robotics Research 18 (1999), 401–413.
1349. R. Vaillant and O. Faugeras, Using extremal boundaries for 3-d object modeling, IEEE
Trans. PAMI 14 (February 1992), 157–173.
1350. Vakili, Approximation of hints, Tech. report, Institute for Automation and Operation
Research, University of Fribourg, Switzerland, Tech. Report 209, 1993.
1351. P.; Bosse E. Valin, P.; Djiknavorian, A pragmatic approach for the use of dempster-
shafer theory in fusing realistic sensor data, Tech. report, DRDC-VALCARTIER-SL-
2010-457, Defence RD Canada - Valcartier), 2010.
1352. B. L. van der Waerden, Moderne algebra, vol. 1, Springer-Verlag, Berlin, 1937.
1353. P. Vasseur, C. Pegard, E. Mouaddib, and L. Delahoche, Perceptual organization ap-
proach based on Dempster-Shafer theory, Pattern Recognition 32 (1999), 1449–1462.
1354. G. Verghese, K. Gale, and C.R. Dyer, Real-time, parallel motion tracking of three-
dimensional objects from spatiotemporal image sequences, Parallel Algorithms for
Machine Intelligence and Vision (Kumar et al., ed.), Springer-Verlag, 1990.
1355. Christian Viard-Gaudin, Pierre Michel Lallican, Philippe Binter, and Stefan Knerr, The
ireste on/off (ironoff) dual handwriting database, International Conference on Docu-
ment Analysis and Recognition 0 (1999), 455–458.
References 617

1356. M. Vincze, M. Ayromlou, and W. Kubinger, An integrating framework for robust real-
time 3D object tracking, ICVS’99, 1999, pp. 135–150.
1357. J. von Neumann and O. Morgenstern, Theory of games and economic behavior,
Princeton University Press, 1944.
1358. F. Voorbraak, A computationally efficient approximation of Dempster-Shafer theory,
International Journal on Man-Machine Studies 30 (1989), 525–536.
1359. , On the justification of Dempster’s rule of combination, Artificial Intelligence
48 (1991), 171–197.
1360. Frans Voorbraak, On the justification of dempster’s rule of combination, Artificial In-
telligence 48 (1991), no. 2, 171 – 197.
1361. F. Vorbraak, A computationally efficient approximation of Dempster-Shafer theory,
International Journal on Man-Machine Studies 30 (1989), 525–536.
1362. S. Wachter and H.H. Nagel, Tracking persons in monocular image sequences, Work-
shop on Motion of Non-Rigid and Articulated Objects, Puerto Rico, USA, 1997.
1363. , Tracking persons in monocular image sequences, CVIU 74 (1999), 174–192.
1364. Peter P. Wakker, Dempster belief functions are based on the principle of complete ig-
norance, International Journal of Uncertainty, Fuzziness and Knowledge-Based Sys-
tems 08 (2000), no. 03, 271–284.
1365. , Dempster-belief functions are based on the principle of complete ignorance,
Proceedings of the 1st International Sysmposium on Imprecise Probabilites and Their
Applications, Ghent, Belgium, 29 June - 2 July 1999, pp. 535–542.
1366. P. Walley, Statistical reasoning with imprecise probabilities, Chapman and Hall, New
York, 1991.
1367. Peter Walley, Coherent lower (and upper) probabilities, Tech. report, University of
Warwick, Coventry (U.K.), Statistics Research Report 22, 1981.
1368. , The elicitation and aggregation of beliefs, Tech. report, University of War-
wick, Coventry (U.K.), 1982, Statistics Research Report 23.
1369. , The elicitation and aggregation of beliefs, Tech. report, University of War-
wick, Coventry (U.K.), Statistics Research Report 23, 1982.
1370. , Belief function representations of statistical evidence, The Annals of Statis-
tics 15 (1987), 1439–1465.
1371. , Statistical reasoning with imprecise probabilities, Chapman and Hall, Lon-
don, 1991.
1372. , Measures of uncertainty in expert systems, Artificial Intelligence 83 (1996),
1–58.
1373. , Imprecise probabilities, The Encyclopedia of Statistical Sciences (C. B.
Read, D. L. Banks, and S. Kotz, eds.), Wiley, New York (NY), 1997.
1374. , Towards a unified theory of imprecise probability, International Journal of
Approximate Reasoning 24 (2000), 125–148.
1375. Peter Walley and T. L. Fine, Towards a frequentist theory of upper and lower proba-
bility, The Annals of Statistics 10 (1982), 741–761.
1376. A. Wallner, Maximal number of vertices of polytopes defined by f-probabilities,
ISIPTA 2005 – Proceedings of the Fourth International Symposium on Imprecise
Probabilities and Their Applications (F. G. Cozman, R. Nau, and T. Seidenfeld, eds.),
SIPTA, 2005, pp. 126–139.
1377. C. C. Wang and H. S. Don, Evidential reasoning using neural networks, Neural Net-
works, 1991. 1991 IEEE International Joint Conference on, Nov 1991, pp. 497–502
vol.1.
618 References

1378. Chua-Chin Wang and Hen-Son Don, A continuous belief function model for evidential
reasoning, Proceedings of the Ninth Biennial Conference of the Canadian Society for
Computational Studies of Intelligence (R.F. Glasgow, J.; Hadley, ed.), Vancouver, BC,
Canada, 11-15 May 1992, pp. 113–120.
1379. Chua-Chin Wang and Hon-Son Don, Evidential reasoning using neural networks, Pro-
ceedings of IEEE, 1991, pp. 497–502.
1380. , A geometrical approach to evidential reasoning, Proceedings of IEEE, 1991,
pp. 1847–1852.
1381. , The majority theorem of centralized multiple bams networks, Information
Sciences 110 (1998), 179–193.
1382. , A robust continuous model for evidential reasoning, Journal of Intelligent
and Robotic Systems: Theory and Applications 10:2 (June 1994), 147–171.
1383. J. Wang, G. Lorette, and P. Bouthemy, Analysis of human motion: A model-based
approach, Scandinavian Conference on Image Analysis, 1991.
1384. , Human motion analysis with detection of sub-part deformations, SPIE -
Biomedical Image Processing and Three-Dimensional Microscopy, 1992.
1385. P. Wang, The reliable combination rule of evidence in dempster-shafer theory, Image
and Signal Processing, 2008. CISP ’08. Congress on, vol. 2, May 2008, pp. 166–170.
1386. Pei Wang, A defect in dempster-shafer theory, CoRR abs/1302.6849 (2013).
1387. S. Wang and M. Valtorta, On the exponential growth rate of Dempster-Shafer be-
lief functions, Proceedings of the SPIE - Applications of Artificial Intelligence X:
Knowledge-Based Systems, vol. 1707, Orlando, FL, USA, 22-24 April 1992, pp. 15–
24.
1388. Ying-Ming Wang, Jian-Bo Yang, Dong-Ling Xu, and Kwai-Sang Chin, On the com-
bination and normalization of interval-valued belief structures, Information Sciences
177 (2007), no. 5, 1230 – 1247, Including: The 3rd International Workshop on Com-
putational Intelligence in Economics and Finance (CIEF2003).
1389. Z. Wang and G.J. Klir, Fuzzy measure theory, New York: Plenum Press, 1992.
1390. Zhenyuan Wang and George J. Klir, Choquet integrals and natural extensions of lower
probabilities, International Journal of Approximate Reasoning 16 (1997), 137–147.
1391. , Choquet integrals and natural extensions of lower probabilities, International
Journal of Approximate Reasoning 16 (1997), no. 2, 137 – 147.
1392. Chua-Chin Wanga and Hon-Son Don, A polar model for evidential reasoning, Infor-
mation Sciences 77:3-4 (March 1994), 195–226.
1393. L. A. Wasserman, Belief functions and statistical inference, Canadian Journal of
Statistics 18 (1990), 183–196.
1394. , Comments on shafer’s ‘perspectives on the theory and practice of belief func-
tions‘, International Journal of Approximate Reasoning 6 (1992), 367–375.
1395. L.A. Wasserman, Prior envelopes based on belief functions, Annals of Statistics 18
(1990), 454–464.
1396. J. Watada, Y. Kubo, and K. Kuroda, Logical approach: to evidential reasoning under
a hierarchical structure, Proceedings of the International Conference on Data and
Knowledge Systems for Manufacturing and Engineering, vol. 1, Hong Kong, 2-4 May
1994, pp. 285–290.
1397. M. Weber, M. Welling, and P. Perona, Unsupervised learning of models for recog-
nition, Proc. of the 6th European Conference on Computer Vision, vol. 1, June/July
2000, pp. 18–32.
1398. , Unsupervised learning of models for recognition, Proc. of the 6th European
Conference on Computer Vision, vol. 1, June/July 2000, pp. 18–32.
References 619

1399. K. Weichselberger and S. Pohlmann, A methodology for uncertainty in knowledge-


based systems, Lecture Notes in Artificial Intelligence, vol. 419, Springer, Berlin,
1990.
1400. Kurt Weichselberger, Robust statistics, data analysis, and computer intensive meth-
ods: In honor of peter huber’s 60th birthday, ch. Interval Probability on Finite Sample
Spaces, pp. 391–409, Springer New York, New York, NY, 1996.
1401. Kurt Weichselberger, The theory of interval-probability as a unifying concept for un-
certainty, International Journal of Approximate Reasoning 24 (2000), no. 2, 149 –
170.
1402. T. Weiler, Approximation of belief functions, IJUFKS 11 (2003), no. 6, 749–777.
1403. M. Wertheimer, Laws of organization in perceptual forms, W. D. Ellis, editor, A
Sourcebook of Gestalt Psychology, pages 331–363. Harcourt, Brace and Company,
1939.
1404. L. P. Wesley, Evidential knowledge-based computer vision, Optical Engineering 25
(1986), 363–379.
1405. Leonard P. Wesley, Autonomous locative reasoning: an evidential approach, Proceed-
ings of IEEE, 1993, pp. 700–707.
1406. H. Whitney, On the abstract properties of linear dependence, American Journal of
Mathematics 57 (1935), 509–533.
1407. M.J. Wierman, Measuring conflict in evidence theory, Proceedings of the Joint 9th
IFSA World Congress, Vancouver, BC, Canada, vol. 3, 2001, pp. 1741–1745.
1408. S. T. Wierzchon, A. Pacan, and M. A. Klopotek, An object-oriented representation
framework for hierarchical evidential reasoning, Proceedings of the Fourth Interna-
tional Conference (AIMSA ’90) (V. Jorrand, P.; Sgurev, ed.), Albena, Bulgaria, 19-22
September 1990, pp. 239–248.
1409. S.T. Wierzchon and M.A. Klopotek, Modified component valuations in valuation
based systems as a way to optimize query processing, Journal of Intelligent Infor-
mation Systems 9 (1997), 157–180.
1410. Elwood Wilkins and Simon H. Lavington, Belief functions and the possible worlds
paradigm, Journal of Logic and Computation 12 (2002), no. 3, 475–495.
1411. G. G. Wilkinson and J. Megier, Evidential reasoning in a pixel classification
hierarchy-a potential method for integrating image classifiers and expert system rules
based on geographic context, International Journal of Remote Sensing 11:10 (October
1990), 1963–1968.
1412. P. M. Williams, On a new theory of epistemic probability, British Journal for the Phi-
losophy of Science 29 (1978), 375–387.
1413. , Discussion of shafer’s paper, Journal of the Royal Statistical Society B 44
(1982), 322–352.
1414. A. D. Wilson and A. F. Bobick, Realtime online adaptive gesture recognition, Tech.
report, M.I.T. Media Laboratory, Tech. Rep. No. 505, 1999.
1415. , Parametric hidden Markov models for gesture recognition, IEEE Trans. on
Pattern Analysis and Machine Intelligence, vol. 21(9), Sept. 1999, pp. 884–900.
1416. N. Wilson, Justification, computational efficiency and generalisation of the dempster-
shafer theory, Tech. report, Research Report no. 15, Department of Computing and
Mathematical Sciences, Oxford Polytechnic, June 1989.
1417. Nic Wilson, Chapter 10 : Belief functions algorithms, Algorithms for Uncertainty and
Defeasible Reasoning.
1418. , The combination of belief: when and how fast?, International Journal of Ap-
proximate Reasoning 6 (1992), 377–388.
620 References

1419. , How much do you believe?, International Journal of Approximate Reasoning


6 (1992), 345–365.
1420. Nic Wilson, Symbolic and quantitative approaches to reasoning and uncertainty: Eu-
ropean conference ecsqaru ’93 granada, spain, november 8–10, 1993 proceedings,
ch. Default logic and Dempster-Shafer theory, pp. 372–379, Springer Berlin Heidel-
berg, Berlin, Heidelberg, 1993.
1421. Nic Wilson, The assumptions behind dempster’s rule, CoRR abs/1303.1518 (2013).
1422. , Rules, belief functions and default logic, CoRR abs/1304.1134 (2013).
1423. , The representation of prior knowledge in a Dempster-Shafer approach,
TR/Drums Conference, Blanes, 1991.
1424. , Decision making with belief functions and pignistic probabilities, Proceed-
ings of the European Conference on Symbolic and Quantitative Approaches to Rea-
soning and Uncertainty, Granada, 1993, pp. 364–371.
1425. Nic Wilson and S. Moral, Fast Markov chain algorithms for calculating Dempster-
Shafer belief, Proceedings of the 12th European Conference on Artificial Intelligence
(ECAI’96) (W. Wahlster, ed.), Budapest, Hungary, 11-16 Aug. 1996, pp. 672–676.
1426. S. Wong and P. Lingas, Generation of belief functions from qualitative preference
relations, Proceedings of the Third International Conference IPMU, 1990, pp. 427–
429.
1427. S. K. M. Wong and Pawan Lingras, Representation of qualitative user preference by
quantitative belief functions, IEEE Transactions on Knowledge and Data Engineering
6:1 (February 1994), 72–78.
1428. S. K. M. Wong, Pawan Lingras, and Y. Y. Yao, Propagation of preference relations in
qualitative inference networks, Proceedings of the 12th International Joint Conference
on Artificial Intelligence - Volume 2 (San Francisco, CA, USA), IJCAI’91, Morgan
Kaufmann Publishers Inc., 1991, pp. 1204–1209.
1429. S. K. M. Wong, L. S. Wang, and Y. Y. Yao, Interval structure: A framework for rep-
resenting uncertain information, Proceedings of the Eighth International Conference
on Uncertainty in Artificial Intelligence (San Francisco, CA, USA), UAI’92, Morgan
Kaufmann Publishers Inc., 1992, pp. 336–343.
1430. S. K. M. Wong, L. S. Wang, and Y. Y. Yao, Nonnumeric belief structures, Computing
and Information, 1992. Proceedings. ICCI ’92., Fourth International Conference on,
May 1992, pp. 274–277.
1431. S. K. M. Wong, Y. Y. Yao, P. Bollmann, and H. C. Burger, Axiomatization of qualita-
tive belief structure, IEEE Transactions on Systems, Man, and Cybernetics 21 (1990),
726–734.
1432. S.K.M. Wong, Y.Y. Yao, and P. Bollmann, Characterization of comparative belief
structures, International Journal of Man-Machine Studies 37 (1992), no. 1, 123 – 133.
1433. A.D. Worrall, G.D. Sullivan, and K.D. Baker, Pose refinement of active models using
forces in 3D, ECCV’94 (J. Eklundh, ed.), vol. 2, May 1994, pp. 341–352.
1434. C.R. Wren and A.P. Pentland, Dynaman: Recursive modeling of human motion, Tech.
report, TR-415, Medialab, MIT, 1997.
1435. , Dynamic models of human motion, Int. Conf. on Automatic Face and Gesture
Recognition, Nara, Japan, 1998.
1436. , Understanding purposeful human motion, International Workshop on Mod-
eling People at ICCV’99, Corfu, Greece, September 1999.
1437. J.J. Wu, R.E. Rink, T.M. Caelli, and V.G. Gourishankar, Recovery of the 3-d location
and motion of a rigid object through camera image, Int. J. Computer Vision 2 (1989),
373–394.
References 621

1438. W. Z. Wu, Y. Leung, and J. S. Mi, On generalized fuzzy belief functions in infinite
spaces, IEEE Transactions on Fuzzy Systems 17 (2009), no. 2, 385–397.
1439. Wei-Zhi Wu, Yee Leung, and Wen-Xiu Zhang, Connections between rough set theory
and dempster-shafer theory of evidence, International Journal of General Systems 31
(2002), no. 4, 405–430.
1440. Wei-Zhi Wu, Mei Zhang, Huai-Zu Li, and Ju-Sheng Mi, Knowledge reduction in ran-
dom information systems via dempstershafer theory of evidence, Information Sciences
174 (2005), no. 34, 143 – 164.
1441. Weizhi Wu and Jusheng Mi, Rough sets and knowledge technology: First international
conference, rskt 2006, chongquing, china, july 24-26, 2006. proceedings, ch. Knowl-
edge Reduction in Incomplete Information Systems Based on Dempster-Shafer The-
ory of Evidence, pp. 254–261, Springer Berlin Heidelberg, Berlin, Heidelberg, 2006.
1442. Yong-Ge Wu, Jing-Yu Yang, Ke-Liu, and Lei-Jian Liu, On the evidence inference the-
ory, Information Sciences 89 (1996), no. 3, 245 – 260.
1443. P. Wunsch, S. Winkler, and G. Hirzinger, Real-time pose estimation of 3D objects from
camera images using neural networks, ICRA’97, vol. 3, 1997, pp. 3232–3237.
1444. L Liu X Wu, Q Ye, Dempster-shafer theory of evidence based on improved bp and its
application, Journal of Wuhan University of Technology (2007).
1445. Yan Xia, S.S. Iyengar, and N.E. Brener, An event driven integration reasoning scheme
for handling dynamic threats in an unstructured environment, Artificial Intelligence
95 (1997), 169–186.
1446. Xiaoming Sun Xin Guan, Xiao Yi and You He, Efficient fusion approach for conflict-
ing evidence, Journal of Tsinghua University(Science and Technology) 1 (2009).
1447. Guoping Xu, Weifeng Tian, Li Qian, and Xiangfen Zhang, A novel conflict reassign-
ment method based on grey relational analysis (gra), Pattern Recognition Letters 28
(2007), no. 15, 2080 – 2087.
1448. H. Xu, An efficient implementation of the belief function propagation, Proc. of the 7th
Uncertainty in Artificial Intelligence (Smets Ph. DÁmbrosio B. D. and Bonissone P.
P., eds.), 1991, pp. 425–432.
1449. , An efficient tool for reasoning with belief functions, Proc. of the 4th Inter-
national Conference on Information Proceeding and Management of Uncertainty in
Knowledge-Based Systems, 1992, pp. 65–68.
1450. , An efficient tool for reasoning with belief functions uncertainty in intelligent
systems, Advances in the Dempster-Shafer Theory of Evidence (Valverde L. Bouchon-
Meunier B. and Yager R. R., eds.), North-Holland: Elsevier Science, 1993, pp. 215–
224.
1451. , Computing marginals from the marginal representation in Markov trees,
Proc. of the 5th International Conference on Information Proceeding and Management
of Uncertainty in Knowledge-Based Systems, 1994, pp. 275–280.
1452. , Computing marginals from the marginal representation in Markov trees, Ar-
tificial Intelligence 74 (1995), 177–189.
1453. H. Xu and R. Kennes, Steps towards an efficient implementation of Dempster-
Shafer theory, Advances in the Dempster-Shafer Theory of Evidence (R.R. Yager,
M. Fedrizzi, and J. Kacprzyk, eds.), John Wiley and Sons, Inc., 1994, pp. 153–174.
1454. H. Xu and Philippe Smets, Evidential reasoning with conditional belief functions, Pro-
ceedings of the 10th Uncertainty in Artificial Intelligence (Lopez de Mantaras R. and
Poole D., eds.), 1994, pp. 598–605.
1455. , Generating explanations for evidential reasoning, Proceedings of the 11th
Uncertainty in Artificial Intelligence (Besnard Ph. and Hanks S., eds.), 1995, pp. 574–
581.
622 References

1456. , Reasoning in evidential networks with conditional belief functions, Interna-


tional Journal of Approximate Reasoning 14 (1996), 155–185.
1457. , Some strategies for explanations in evidential reasoning, IEEE Transactions
on Systems, Man and Cybernetics 26:5 (1996), 599–607.
1458. Hong Xu, A decision calculus for belief functions in valuation-based systems, Pro-
ceedings of the 8th Uncertainty in Artificial Intelligence (Dubois D. Wellman M.
P. DÁmbrosio B. and Smets Ph., eds.), 1992, pp. 352–359.
1459. , Valuation-based systems for decision analysis using belief functions, Deci-
sion Support Systems 20 (1997), 165–184.
1460. Hong Xu, Yen-Teh Hsia, and Philippe Smets, The transferable belief model for deci-
sion making in the valuation-based systems, IEEE Transactions on Systems, Man, and
Cybernetics 26A (1996), 698–707.
1461. Hong Xu, Y.T. Hsia, and Philippe Smets, A belief-function based decision support
system, Proceedings of the 9th Uncertainty in Artificial Intelligence (Heckerman D.
and Mamdani A, eds.), 1993, pp. 535–542.
1462. , Transferable belief model for decision making in valuation based systems,
IEEE Transactions on Systems, Man, and Cybernetics 26:6 (1996), 698–707.
1463. Hong Xu and Philippe Smets, Reasoning in evidential networks with conditional belief
functions, International Journal of Approximate Reasoning 14 (1996), no. 2-3, 155–
185.
1464. L. Xu, A. Krzyzak, and C.Y. Suen, Methods of combining multiple classifiers and their
applications to handwriting recognition, IEEE Trans. Syst., Man, Cybern. (1992),
no. 3.
1465. G. Xu Y. Guo and S. Tsuji, Tracking human body motion based on a stick figure model,
Journal of Visual Communication and Image Representation 5 (1994), 1–9.
1466. M. Werman Y. Hel-Or, Pose estimation by fusing noisy data of different dimensions,
IEEE PAMI 17 (1995), 195–201.
1467. Z. P. Yang Y. M. Xiong, A novel combination method of conflict evidence in multi-
sensor target recognition, Advanced Materials Research 143–144 (2011), 920–924.
1468. Y. Yacoob and M. J. Black, Parameterized modeling and recognition of activities,
Computer Vision and Image Understanding, vol. 73(2), 1999, pp. 232–247.
1469. Y. Yacoob and L. Davis, Learned temporal models of image motion, ICCV’98, 1998,
pp. 446–453.
1470. R. R. Yager, On the dempster-shafer framework and new combination rules, Informa-
tion Sciences 41 (1987), 93–138.
1471. R. R. Yager, Machine learning and uncertain reasoning, Academic Press Ltd., Lon-
don, UK, UK, 1990, pp. 291–311.
1472. R. R. Yager, On the aggregation of prioritized belief structures, IEEE Transactions
on Systems, Man, and Cybernetics - Part A: Systems and Humans 26 (1996), no. 6,
708–717.
1473. Ronald R. Yager, Decision making under Dempster-Shafer uncertainties, Tech. report,
Machine Intelligence Institute, Iona College, Tech. Report MII-915.
1474. , Nonmonotonicity and compatibility relations in belief structures.
1475. , On a general class of fuzzy connectives, Fuzzy Sets and Systems 4 (1980),
no. 3, 235 – 242.
1476. , Generalized probabilities of fuzzy events from fuzzy belief structures, Infor-
mation Sciences 28 (1982), no. 1, 45 – 62.
1477. , Entropy and specificity in a mathematical theory of evidence, International
Journal of General Systems 9 (1983), 249–260.
References 623

1478. , Hedging in the combination of evidence, Journal of Information and Opti-


mization Sciences 4 (1983), no. 1, 73–81.
1479. , On the relationship of methods of aggregating evidence in expert systems,
Cybernetics and Systems 16 (1985), no. 1, 1–21.
1480. , Arithmetic and other operations on Dempster-Shafer structures, International
Journal of Man-Machine Studies 25 (1986), 357–366.
1481. , The entailment principle Dempster-Shafer granules, International Journal of
Intelligent Systems 1 (1986), 247–262.
1482. Ronald R. Yager, The entailment principle for Dempster-Shafer granules, Interna-
tional Journal of Intelligent Systems 1 (1986), no. 4, 247–262.
1483. , Toward a general theory of reasoning with uncertainty. i: Nonspecificity and
fuzziness, International Journal of Intelligent Systems 1 (1986), no. 1, 45–67.
1484. Ronald R. Yager, Toward a general theory of reasoning with uncertainty. part ii: prob-
ability, International Journal of Man-Machine Studies 25 (1986), no. 6, 613 – 631.
1485. , On the Dempster-Shafer framework and new combination rules, Information
Sciences 41 (1987), no. 2, 93 – 137.
1486. RONALD R. YAGER, Quasi-associative operations in the combination of evidence,
Kybernetes 16 (1987), no. 1, 37–41.
1487. Ronald R. Yager, On the normalization of fuzzy belief structures, International Journal
of Approximate Reasoning 14 (1996), 127–153.
1488. , Class of fuzzy measures generated from a Dempster-Shafer belief structure,
International Journal of Intelligent Systems 14 (1999), 1239–1247.
1489. Ronald R. Yager, A class of fuzzy measures generated from a dempstershafer belief
structure, International Journal of Intelligent Systems 14 (1999), no. 12, 1239–1247.
1490. Ronald R. Yager, Modeling uncertainty using partial information, Information Sci-
ences 121 (1999), 271–294.
1491. Ronald R. Yager, Dempstershafer belief structures with interval valued focal weights,
International Journal of Intelligent Systems 16 (2001), no. 4, 497–512.
1492. Ronald R. Yager, Aggregating non-independent dempster-shafer belief structures, in:
12th International Conference on Information Processing and Management of Uncer-
tainty in Knowledge-Based Systems (IPMU, 2008, pp. 289–297.
1493. , Comparing approximate reasoning and probabilistic reasoning using the
dempstershafer framework, International Journal of Approximate Reasoning 50
(2009), no. 5, 812 – 821.
1494. Ronald R. Yager, Joint cumulative distribution functions for dempster–shafer belief
structures using copulas, Fuzzy Optimization and Decision Making 12 (2013), no. 4,
393–414.
1495. Ronald R. Yager and D. P. Filev, Including probabilistic uncertainty in fuzzy logic
controller modeling using Dempster-Shafer theory, IEEE Transactions on Systems,
Man, and Cybernetics 25:8 (1995), 1221–1230.
1496. Ronald R. Yager and Liping Liu, Classic works of the dempster-shafer theory of belief
functions, 1st ed., Springer Publishing Company, Incorporated, 2010.
1497. R.R. Yager, On the Dempster-Shafer framework and new combination rules, Informa-
tion Sciences 41 (1987), 93–138.
1498. A. Ben Yaghlane, T. Denoeux, and K. Mellouli, Coarsening approximations of belief
functions, Proceedings of ECSQARU’2001 (S. Benferhat and P. Besnard, eds.), 2001,
pp. 362–373.
1499. B. Ben Yaghlane and K. Mellouli, Belief function propagation in directed evidential
networks, IPMU, 2006.
624 References

1500. B. Ben Yaghlane, Philippe Smets, and K. Mellouli, Independence concepts for be-
lief functions, Proceedings of Information Processing and Management of Uncertainty
(IPMU’2000), 2000.
1501. Koichi Yamada, A new combination of evidence based on compromise, Fuzzy Sets and
Systems 159 (2008), no. 13, 1689 – 1708.
1502. M. Yamamoto and K. Koshikawa, Human motion analysis based on a robot arm
model, CVPR’91, 1991, pp. 664–665.
1503. M. Yamamoto, Y. Ohta, T. Yamagiwa, and K. Yamanaka, Human action tracking
guided by key-frames, Fourth Int. Conf. on Automatic Face and Gesture Recognition,
Grenoble, France, March 2000.
1504. M. Yamamoto, A. Sato, S. Kawada, T. Kondo, and Y. Osaki, Incremental tracking
of human actions from multiple views, Proceedings of the Conference on Computer
Vision and Pattern Recognition CVPR’98, Santa Barbara, CA, June 1998, pp. 2–7.
1505. S. Yamamoto, Y. Mae, Y. Shirai, and J. Miura, Realtime multiple object tracking based
on optical flows, Proc. Robotics and Automation, vol. 3, 1995, pp. 2328–2333.
1506. Jian-Bo Yang and Madan G. Singh, An evidential reasoning approach for multiple-
attribute decision making with uncertainty, IEEE Transactions on Systems, Man, and
Cybernetics 24:1 (January 1994), 1–18.
1507. Jian-Bo Yang and Dong-Ling Xu, Evidential reasoning rule for evidence combination,
Artificial Intelligence 205 (2013), 1 – 29.
1508. Miin-Shen Yang, Tsang-Chih Chen, and Kuo-Lung Wu, Generalized belief function,
plausibility function, and dempster’s combinational rule to fuzzy sets, International
Journal of Intelligent Systems 18 (2003), no. 8, 925–937.
1509. Y. Y. Yao, Two views of the theory of rough sets in finite universes, International Jour-
nal of Approximate Reasoning 15 (1996), 291–317.
1510. , A comparative study of fuzzy sets and rough sets, Information Sciences
109(1-4) (1998), 227–242.
1511. , Granular computing: basic issues and possible solutions, Proceedings of the
5th Joint Conference on Information Sciences, 2000, pp. 186–189.
1512. Y. Y. Yao and P. J. Lingras, Interpretations of belief functions in the theory of rough
sets, Information Sciences 104(1-2) (1998), 81–106.
1513. Yan-Qing Yao, Ju-Sheng Mi, and Zhou-Jun Li, Attribute reduction based on gener-
alized fuzzy evidence theory in fuzzy decision systems, Fuzzy Sets and Systems 170
(2011), no. 1, 64 – 75, Theme: Information processing.
1514. Yiyu (Y. Y. ). Yao, Churn-Jung Liau, and Ning Zhong, Foundations of intelligent sys-
tems: 14th international symposium, ismis 2003, maebashi city, japan, october 28-31,
2003. proceedings, ch. Granular Computing Based on Rough Sets, Quotient Space
Theory, and Belief Functions, pp. 152–159, Springer Berlin Heidelberg, Berlin, Hei-
delberg, 2003.
1515. Y.Y. Yao and P.J. Lingras, Interpretations of belief functions in the theory of rough
sets, Information Sciences 104 (1998), no. 12, 81 – 106.
1516. J. Yen, GERTIS: a Dempster-Shafer approach to diagnosing hierarchical hypotheses,
Communications ACM 32 (1989), 573–585.
1517. , Generalizing the Dempster-Shafer theory to fuzzy sets, IEEE Transactions on
Systems, Man, and Cybernetics 20:3 (1990), 559–569.
1518. John Yen, A reasoning model based on an extended dempster-shafer theory, Pro-
ceedings of the Fifth AAAI National Conference on Artificial Intelligence, AAAI’86,
AAAI Press, 1986, pp. 125–131.
1519. John Yen, Computing generalized belief functions for continuous fuzzy sets, Interna-
tional Journal of Approximate Reasoning 6 (1992), 1–31.
References 625

1520. , Can evidence be combined in the dempster-shafer theory, CoRR


abs/1304.2718 (2013).
1521. , Implementing evidential reasoning in expert systems, CoRR abs/1304.2731
(2013).
1522. Lu Yi, Evidential reasoning in a multiple classifier system, Proceedings of the Sixth In-
ternational Conference on Industrial and Engineering Applications of Artificial Intel-
ligence and Expert Systems (IEA/AIE 93) (P.W.H. Chung, G. Lovegrove, and M. Ali,
eds.), Edimburgh, UK, 1-4 June 1993, pp. 476–479.
1523. Deng Yong, Shi WenKang, Zhu ZhenFu, and Liu Qi, Combining belief functions based
on distance of evidence, Decision Support Systems 38 (2004), no. 3, 489 – 493.
1524. Virginia R. Young, Families of update rules for non-additive measures: Applications
in pricing risks, Insurance: Mathematics and Economics 23 (1998), no. 1, 1 – 14.
1525. Virginia R. Young and Shaun S. Wang, Updating non-additive measures with fuzzy
information, Fuzzy Sets and Systems 94 (1998), 355–366.
1526. Chunhai Yu and Fahard Arasta, On conditional belief functions, International Journal
of Approxiomate Reasoning 10 (1994), 155–172.
1527. C. Yujun, The evidence aggregation method in the theory of evidence, Journal Of Xi’an
Jiaotong University 31 (1997), no. 6, 106–110.
1528. Q. Pan Z. Liu, Y. Cheng and Z. Miao, Combination of weighted belief functions based
on evidence distance and conflicting belief, Control Theory Applications 26 (2009).
1529. Q Pan Z Miao Z Liu, Y Cheng, Weight evidence combination for multi-sensor conflict
information, Chinese Journal of Sensors and Actuators (2009), 366–370.
1530. L. Zadeh, A simple view of the Dempster-Shafer theory of evidence and its implication
for the rule of combination, AI Magazine 7 (1986), no. 2, 85–90.
1531. L. A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems
1 (1978), 3–28.
1532. , On the validity of dempster’s rule of combination of evidence, Tech. report,
Memo No. ERL M79/24, U. of California, Berkeley, 1979.
1533. , A mathematical theory of evidence (book review), AI Magazine 5:3 (1984),
81–83.
1534. L. A. Zadeh, Syllogistic reasoning as a basis for combination of evidence in expert sys-
tems, Proceedings of the 9th International Joint Conference on Artificial Intelligence
- Volume 1 (San Francisco, CA, USA), IJCAI’85, Morgan Kaufmann Publishers Inc.,
1985, pp. 417–419.
1535. L. A. Zadeh, Is probability theory sufficient for dealing with uncertainty in AI: a neg-
ative view, Uncertainty in Artificial Intelligence (L. N. Kanal and J. F. Lemmer, eds.),
vol. 2, North-Holland, Amsterdam, 1986, pp. 103–116.
1536. Lofti A. Zadeh, A simple view of the Dempster-Shafer theory of evidence and its im-
plications for the rule of combination, AI Magazine 7:2 (1986), 85–90.
1537. Lotfi A. Zadeh, Soft methods in probability, statistics and data analysis, ch. Toward
a Perception-Based Theory of Probabilistic Reasoning with Imprecise Probabilities,
pp. 27–61, Physica-Verlag HD, Heidelberg, 2002.
1538. Lotfi A. Zadeh, Toward a generalized theory of uncertainty (gtu)an outline, Informa-
tion Sciences 172 (2005), no. 12, 1 – 40.
1539. , Toward a generalized theory of uncertainty (gtu)an outline, Information Sci-
ences 172 (2005), no. 12, 1 – 40.
1540. , Generalized theory of uncertainty (gtu)principal concepts and ideas, Com-
putational Statistics Data Analysis 51 (2006), no. 1, 15 – 46, The Fuzzy Approach to
Statistical Analysis.
626 References

1541. , Generalized theory of uncertainty (gtu)principal concepts and ideas, Com-


putational Statistics Data Analysis 51 (2006), no. 1, 15 – 46, The Fuzzy Approach to
Statistical Analysis.
1542. Marco Zaffalon and Enrico Fagiuoli, Tree-based credal networks for classification.
1543. D.K. Zarley, An evidential reasoning system, Tech. report, No.206, University of
Kansas, 1988.
1544. D.K. Zarley, Y.T. Hsia, and Glenn Shafer, Evidential reasoning using DELIEF, Proc.
Seventh National Conference on Artificial Intelligence, vol. 1, 1988, pp. 205–209.
1545. BERNARD P. ZEIGLER, Some properties of modified Dempster-Shafer operators in
rule based inference systems, International Journal of General Systems 14 (1988),
no. 4, 345–356.
1546. J. Zhang and C. Liu, Dempster-shafer inference with weak beliefs, Statistica Sinica
(2010).
1547. Mei Zhang, Li Da Xu, Wen-Xiu Zhang, and Huai-Zu Li, A rough set approach to
knowledge reduction based on inclusion degree and evidence reasoning theory, Expert
Systems 20 (2003), no. 5, 298–304.
1548. ZHANG Hong-cai ZHANG Shan-ying, PAN Quan, Conflict problem of Dempster-
Shafer evidence theory, Acta Aeronautica Et Astronautica Sinica 4 (2001).
1549. J. Zhao, Moving posture reconstruction from perspective projections of jointed figure
motion, PhD dissertation, University of Pennsylvania, 1993.
1550. J.Y. Zheng, Acquiring 3-d models from sequences of contours, IEEE Trans. PAMI 16
(1994), 163–178.
1551. J.Y. Zheng and S. Suezaki, A model based approach in extracting and generating
human motion, International Conference on Pattern Recognition, 1998.
1552. Y. Zheng, X.S. Zhou, B. Georgescu, S.K. Zhou, and D. Comaniciu, Example based
non-rigid shape detection, 2006, pp. IV: 423–436.
1553. Chunlai Zhou, Belief functions on distributive lattices, Proceedings of AAAI 2012,
2012, pp. 1968–1974.
1554. , Belief functions on distributive lattices, Artificial Intelligence 201 (2013), 1
– 31.
1555. Chunlai Zhou, J. michael dunn on information based logics, ch. Logical Foundations
of Evidential Reasoning with Contradictory Information, pp. 213–246, Springer Inter-
national Publishing, Cham, 2016.
1556. Hongwei Zhu and Otman Basir, Extended discounting scheme for evidential reasoning
as applied to ms lesion detection, Proceedings of the 7th International Conference on
Information Fusion, FUSION 2004 (Per Svensson and Johan Schubert, eds.), 2004,
pp. 280–287.
1557. Qing Zhu and E. S. Lee, Dempster-shafer approach in propositional logic, Interna-
tional Journal of Intelligent Systems 8 (1993), no. 3, 341–349.
1558. Yunmin Zhu and X. Rong Li, Extended dempster-shafer combination rules based on
random set theory, 2004, pp. 112–120.
1559. L. M. Zouhal and Thierry Denoeux, An adaptive k-nn rule based on Dempster-Shafer
theory, Proceedings of the 6th International Conference on Computer Analysis of Im-
ages and Patterns (CAIP’95) (R. Hlavac, V.; Sara, ed.), Prague, Czech Republic, 6-8
Sept. 1995, pp. 310–317.
1560. Lalla Meriem Zouhal and Thierry Denoeux, Evidence-theoretic k-nn rule with pa-
rameter optimization, IEEE Transactions on Systems, Man and Cybernetics Part C:
Applications and Reviews 28 (1998), 263–271.

You might also like