The Geometry of Uncertainty
The Geometry of Uncertainty
Springer
Berlin Heidelberg NewYork
Barcelona Hong Kong
London Milan Paris
Tokyo
Your dedication comes here
Table of Contents
VII
VIII Table of Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
Introduction: Theories of Uncertainty
1
1.1 Uncertainty
Uncertainty is of paramount importance in artificial intelligence, applied science,
and many other areas of human endevour. Whilst each and every one of us possesses
some intuitive grasp on what uncertainty is, providing a formal definition can be
elusive. Uncertainty can be understood as lack of information about an issue of
interest for a certain agent (e.g., a human decision maker or a machine), a condition
of limited knowledge in which it is impossible to exactly describe the state of the
world or its future trajectories.
According to Dennis Lindley [?], for instance:
“There are some things that you know to be true, and others that you know to be
false; yet, despite this extensive knowledge that you have, there remain many things
whose truth or falsity is not known to you. We say that you are uncertain about them.
You are uncertain, to varying degrees, about everything in the future; much of the
past is hidden from you; and there is a lot of the present about which you do not
have full information. Uncertainty is everywhere and you cannot escape from it”.
1
2 1 Introduction: Theories of Uncertainty
tal uncertainty about the laws themselves which govern the variations. Following on
with our example, suppose the player is presented with ten different doors, which
lead to rooms containing a roulette modelled by a different probability distribution.
They will then be uncertain about the very game they are suppose to play. How will
this affect they betting behaviour, for instance?
Uncertainty of the second kind is often called Knightian uncertainty [?], from
Chicago economist Frank Knight. He would distinguish ‘risk’ from ‘uncertainty’ as
follows:
“Uncertainty must be taken in a sense radically distinct from the familiar notion
of risk, from which it has never been properly separated.... The essential fact is that
‘risk’ means in some cases a quantity susceptible of measurement, while at other
times it is something distinctly not of this character; and there are far-reaching and
crucial differences in the bearings of the phenomena depending on which of the two
is really present and operating.... It will appear that a measurable uncertainty, or
‘risk’ proper, as we shall use the term, is so far different from an unmeasurable one
that it is not in effect an uncertainty at all.”
In Knights terms, ‘risk’ is what people normally call probability or chance, while
the term ‘uncertainty’ is reserved for second-order uncertainty.
Second-order uncertainty has a consequence on human behaviour: people are em-
pirically averse to unpredictable variations (as highlighted by Ellsberg’s paradox
[?]).
This difference between predictable and unpredictable variation is one of the
fundamental issues in the philosophy of probability, and is sometimes referred to
as distinction between common-cause and special-cause. Different interpretations
of probability treat these two aspects of uncertainty in different ways. Economists
John Maynard Keynes [?] and G. L. S. Shackle have also contributed to this debate.
Formally [?], let Ω be the sample space, and let 2Ω represent its power set
.
2 = {A ⊂ Ω}. A subset F ⊂ 2Ω is called a σ-algebra if it satisfies the following
Ω
Fig. 1.1. A spinning wheel is a physical mechanism whose outcomes are associated with a
(discrete) probability measure.
to (usually) the real line R: see Figure 1.2-left for an illustration of the random vari-
able associated with a die.
The function X : Ω → R is subject to a condition of measurability: in rough words,
each interval of values of the real line must have an anti-image which belongs to the
σ-algebra F, and therefore has a probability value. In this way we have a means of
assigning probability values to sets of real numbers.
Even assuming that (some form of mathematical) probability is inherent to the phys-
ical world, people cannot agree on what it is. Quoting Savage [?]:
likelihood principle. The validity of such an assumption is still hotly debated [].
Given a parametric model {f (.|θ), θ ∈ Θ}, a family of probability distributions of
the data given a (possibly vector) parameter θ, the maximum likelihood estimate of
θ is defined as:
θ̂MLE ⊆ arg max L(θ ; x1 , . . . , xn ) ,
θ∈Θ
where the likelihood of the parameter given the observed data x1 , . . . , xn is:
1.3.4 Propensity
The posterior distribution is then the distribution of the parameter(s) after taking
into account the observed data, as determined by Bayes’ rule:
3
https://en.wikipedia.org/wiki/Daniel Kahneman.
1.3 Interpretations of probability 9
p(X|θ)p(θ|α)
p(θ|X, α) = ∝ p(X|θ)p(θ|α). (1.2)
p(X|α)
The posterior predictive distribution is the distribution of a new data point x̃,
marginalized over the posterior:
Z
p(x̃|X, α) = p(x̃|θ)p(θ|X, α) dθ,
θ
amounting to a distribution over possible new data values. The prior predictive dis-
tribution, instead, is the distribution of a new data point marginalized over the prior:
Z
p(x̃|α) = p(x̃|θ)p(θ|α) dθ
θ
p(x|θ) p(θ)
θ̂MAP (x) = arg max Z = arg max p(x|θ) p(θ).
θ θ
p(x|ϑ) p(ϑ) dϑ
ϑ
Note that MAP and MLE estimates coincide when the prior g is uniform MAP
estimation is not very representative of Bayesian methods, as the latter are charac-
terized by the use of distributions over parameters to draw inferences. Also, unlike
ML estimators, the MAP estimate is not invariant under reparameterization.
Summarising, in frequentist inference unknown parameters are often, but not al-
ways, treated as having fixed but unknown values that are not capable of being
treated as random variates. Bayesian inference allows, instead, probabilities to be
associated with unknown parameters. The frequentist approach does not depend on
a subjective prior that may vary from one investigator to another. However, Bayesian
inference (e.g. Bayes’ rule) can be used by frequentists4 .
The issues with Bayesian reasoning Bayesian reasoning is also flawed in a num-
ber of ways. It is extremely bad at representing ignorance: Fisher uninformative
priors, the common way of handling ignorance in a Bayesian setting, lead to dif-
ferent results for different reparameterisations of the universe of discourse. Bayes’
rule assumes the new evidence comes in the form of certainty: ‘A is true’: in the
real world, often this is not usually the case. Finally, model selection is trouble-
some in Bayesian statistics: whilst one is forced by the mathematical formalism to
1.4 Beyond probability 11
pick a prior distribution, there is no clear-cut criterion on how to pick it. In the Au-
thor’s view, this is the result of a confusion between the original description of a
person’s subjective system of beliefs and the way it is updated, and the ‘objectivist’
view of Bayesian reasoning as a rigorous procedure for updating probabilities when
presented with new information.
Indeed, Bayesian reasoning requires modelling the data and a prior. Human beings
do have ‘priors’, which is just a word for denoting what they have learned (or they
think they have learned) about the world throughout their existence. In particular,
they have well sedimented beliefs about the likelihood of various (if not all) events.
There is no need to ‘pick’ a prior, for prior (accumulated) knowledge is indeed
there. As soon as we idealise this mechnism for, say, we want a machine to reason
in this way, we find ourselves forced to ‘pick’ a prior for an entity (an algorithm)
which does not have any past experience, and has not sedimented any beliefs as a
result. Bayesians content themselves by claiming that all will be fine in the end, as,
asymptotically, the choice of the prior does not matter, as proven by he Bernstein-
von Mises theorem [?].
The frequentist approach, on its side, is inherently unable to describe pure data with-
out having to make additional assumptions on the data-generating process. Never-
theless, in Nature one cannot ‘design’ the process which produces the data: data
come our way, whether we want it or not. In the frequentist terminology, we cannot
set the ‘stopping rules’ in most applications (think of driverless cars, for instance).
Again, the frequentist setting recalls the old image of a scientist ‘analysing’ (from
the Greek terms ‘ana’ and ‘lysis’, breaking up) a specific aspect of the world in their
constrained laboratory.
Even more strikingly, it is well known that the same data can lead to opposite
conclusions when analysed in a frequentist way. The reason is that different random
experiments can lead to the same data, whereas the parametric model employed (the
family of probability distributions which is assumed to produce the data) is linked
to a specific experiment6 . Apparently, however, frequentists are just fine with this.
“It is sometimes said, in defence of the Bayesian concept, that the choice of
prior distribution is unimportant in practice, because it hardly influences the pos-
terior distribution at all when there are moderate amounts of data. The less said
about this ‘defence’ the better.”
‘Uninformative’ priors can be dangerous, i.e., bias the reasoning process so badly it
can recover only asymptotically7 .
On the other hand, reasoning with belief functions does not require any prior
as belief functions encoding data are combined as they as with no need for priors.
Ignorance is naturally represented in belief function theory by the ‘vacuous’ belief
function, assigning mass 1 to the whole hypothesis space.
A die (Figure 1.2) is a simple example of (discrete) random variable. Its probability
space is defined on the sample space Ω = {face1, face 2, · · · , face 6}, whose ele-
ments are mapped to the real numbers 1, 2, ..., 6, respectively (no need to consider
measurability here).
Now, imagine that faces 1 and 4 are cloaked, and we roll the die. How do we
model this new experiment, mathematically? Actually, the probability space has not
7
http://andrewgelman.com/2013/11/21/hidden-dangers-noninformative-priors/
1.4 Beyond probability 13
Fig. 1.2. Left: the random variable associated with a die. Right: the random set (set-valued
random variable) associated with the cloaked die in which faces 1 and 4 are not visible.
changed (as the physical die has not been altered, its faces still have the same prob-
abilities). What has changes is the mapping: since we cannot observe the outcome
when a cloaked face is shown (we assume that only the top face is observable), both
face 1 and face 4 (as elements of Ω) are mapped to the set of possible values {1, 4}.
Mathematically, this is called a random set [?, ?, ?], i.e., a set-valued random vari-
able.
A more realistic scenario is that in which we roll, say, four dice in such a way
that for some of them, their top face is occluded, but some of the side faces are still
visible, providing information on the outcome. For instance, I see the top face of
Red die , Green die and Purple die but, say, I cannot see the outcome of Blue
die. However, I see sides faces and of Blue, therefore the outcome of Blue is
the set {2, 4, 5, 6}.
14 1 Introduction: Theories of Uncertainty
This is just an example of a very common situation called missing data: for part of
the sample I observe in order to make my inference, data are partly or totally miss-
ing. Missing data appears (or disappears?) everywhere in science and engineering.
In computer vision, for instance, this phenomenon is called ‘occlusion’ and is one
of the main nuisance factors in estimation.
The bottom line is, whenever data are missing, observations are inherently set-
valued. Mathematically, we are not sampling a (scalar) random variable but a set-
valued random variable – a random set. My outcomes are sets? My probability dis-
tribution has to be defined over sets.
In opposition, traditional statistical approaches deal with missing data either by:
deletion (discarding any case that has a missing value, which may introduce bias or
affect the representativeness of the results); single imputation (replacing a missing
value with another one, e.g. from a randomly selected similar record in the same
dataset, with the mean of that variable for all other cases, or by using a stochas-
tic regression model); multiple imputation (averaging the outcomes across multiple
imputed data sets using, for instance, stochastic regression). Multiple imputation
involves drawing values of the parameters from a posterior distribution, therefore
simulating both the process generating the data and the uncertainty associated with
the parameters of the probability distribution of the data.
When using random sets, there is no need for imputation or deletion whatsoever.
All observations are set-valued, some of them just happen to be pointwise. Indeed,
when part of the data used to estimate the desired probability distribution is missing,
the resulting constraint is a credal set [837] of the type associated with a belief
function [?].
Kolmogorov’s probability measures, however, are not the only or the most general
type of measure available for sets. Under a minimal requirement of monotoniticy,
any measure can potentially be suitable to describe probabilities of events: these
objects are called capacities. We will study capacities in more detail in Chapter ??.
For the moment, it suffices to note that random sets are capacities, those for which
the numbers assigned to events are given by a probability distribution. As capacities
(and random sets in particular), belief functions therefore allow us to assign mass
directly to propositions.
The current debate on the likelihood of biological life in the universe is an extreme
example of inference from very scarce data. How likely is for a planet to give birth
to life forms? Modern analysis of planetary habitability is largely an extrapolation
of conditions on Earth and the characteristics of the Solar System: a weak form of
the old anthropic principle, so to speak.
What people seem to do is model perfectly the (presumed) causes of the emergence
of life on Earth: the planet needs to circle a G-class star, in the right galactic neigh-
borhood, it needs to be in a certain habitable zone around a star, have a large moon
to deflect hazardous impact events ... The question arises: how much can one learn
from a single example? More, how much can one be sure about what they learned
from very few examples?
Another example is provided by the field of machine learning, which is about de-
signing algorithms that can learn from what they observe. The problem is, machine
learning algorithms are typically trained on ridicously small amount of data, com-
pared to wealth of information truly contained in the real world8 . For instance, action
recognition tools are trained (and tested) over benchmark datasets that contain, at
best, a few tens of thousands of videos – compare that to the billions of videos one
can access on YouTube. How can we make sure they learn the right lesson? Should
they not aim to work with sets of models rather than precise ones?
assume (‘design’) probability distributions for their p-values, and test their hypothe-
ses on them. In opposition, those who do estimate probability distributions from the
data (the Bayesians) do not think of probabilities as infinite accumulations of evi-
dence, but as degrees of belief, and content themselves with being able to model the
likelihood function of the data.
What is true is that both frequentists and Bayesians seem to be happy with solving
their problems ‘asymptotically’: thanks to the limit properties of maximum like-
lihood estimation, and the Bernstein-von Mises theorem’s guarantees on the limit
behaviour of posterior distributions.
Clearly this does not fit at all with novel applications of AI, for instance, in which
machines need to make decisions on the spot to the best of their abilities.
Logistic regression Actually, frequentists do estimate probabilities from scarce
data when they do stochastic regression.
Logistic regression allows us, given a sample Y = {Y1 , ..., Yn }, X = {x1 , ..., xn }
where Yi ∈ {0, 1} is a binary outcome at time i and xi is the corresponding mea-
surement, to learn the parameters of a conditional probability relation between the
two, of the form:
1
P (Y = 1|x) = , (1.3)
1 + e−(β0 +β1 x)
where β0 and β1 are two scalar parameters. Given a new observation x, (1.3) delivers
the probability of a positive outcome Y = 1.
Logistic regression generalises deterministic linear regression, as it is a function of
the linear combination beta0 + β1 x. The n trials are assumed independent but not
equally distributed, for πi = P (Yi = 1|xi ) varies with the index i (i.e., the time
instant of collection).
The parameters β0 , β1 of the logistic function are estimated by maximum like-
lihood of the sample, where the likelihood is given by:
n
Y
L(β|Y ) = πiYi (1 − πi )Yi .
i=1
normally distributed, we can ask what values of the mean are such that P (−z ≤
X−µ
Z ≤ z) = 0.95 (for instance). Since Z = σ/√ , this yield an interval for µ, e.g.
n
P (X − 0.98 ≤ µ ≤ X + 0.98).
While scarce data denotes situations in which data are of insufficient quantity, rare
events [902] is a term which indicates cases in which the training data are of in-
sufficient quality, in the sense that they do not reflect well enough the underlying
distribution. An equivalent term, coined by Nassim Nicholas Taleb, is ‘black swan’.
It refers to an unpredictable event (also called ‘tail risk’) which, once occurred, is
(wrongly) rationalised in hindsight as being predictable/describable by the exist-
ing risk models. Basically, Knightian uncertainty is presumed to not exist, typically
with extremely serious consequences. Examples include: financial crises, plagues,
but also unexpected scientific or societal developments. In the most extreme cases,
these events may have never even occurred (this is the case of the question ‘will
your vote will be decisive in the next presidential election?’ pose in [?, ?]).
What does consitute a ‘rare’ event? Clearly we are only interested in them be-
cause they are not so rare, after all. We can say that an event is ‘rare’ when it covers
a region of the hypothesis space which is seldom sampled. Although they rarely
take place when considering a single system, while becoming a tangible possibility
when assembling very many systems together (as it is the case in the real world).
Given the rarity of samples of extreme behaviours (tsunami, meltdowns), scientists
are forced to infer probability distributions for these systems’ behaviour using in-
formation captured in ‘normal’ times (e.g. while a nuclear power plant is working
just fine). Using these distributions to extrapolate results at the ‘tail’ of the curve via
popular statistical procedures (e.g. logistic regression, Section 1.4.7) may then lead
to sharply underestimating the probability of rare events []. In response, Harvard’s
G. King [?] proposed corrections to logistic regression based on oversampling rare
events (represented by 1s) with respect to normal ones (0s). Other people prefer
to drop generative probabilistic models entirely, in favour of discriminative ones
[?, ?]. Once again, we fail to understand the root cause of the problem, namely that
uncertainty affects our very models of uncertainty.
Rather, we should aim at explictly modelling second-order (Knightian) uncer-
tainties. The most straightforward way of doing this, is to consider sets of probability
18 1 Introduction: Theories of Uncertainty
distributions as models for the problem. Mathematically, belief functions (and their
random sets generalisation) amount indeed to (convex) sets of probability distribu-
tions – objects which go under the name of credal sets.
When discussing how different frameworks cope with scarce or unusual data, we
always implicitly assumed that information comes in in the form of certainty: e.g., I
measure vector x, so that my conditioning event is A = {x} and I can apply Bayes’
rule to update my state of the world. Indeed this is the way Bayes’ rule is used by
Bayesians to reason (in time) when new evidence becomes available. Frequentists,
on the other hand, use it to condition a parametric distribution on the gathered (cer-
tain) measurements and generate their p-values (recall Section 1.3.3).
This is quite reasonable or even correct in many situations: in science and engineer-
ing measurements, which are assumed to be accurate, flow in as a form of certain
evidence, so that one can apply Bayes’ rule to condition a parametric model given
a time series of measurements x1 , ..., xT to construct likelihood functions (or p-
values, if you are a frequentist).
Fuzzy data In many real world problems, though, the information provided cannot
be put in a similar form. For instance, concepts themselves can be not well defined,
e.g. ‘this object is dark’ or ‘it is somewhat round’: in the literature, this is referred
to as qualitative data. Qualitative data is common in decision making, in which
expert surveys act as sources of evidence, but can hardly be put into the form of
measurements being equal to sharp values.
As we will see in Chapter ??, fuzzy theory [] is able to account for not-well-defined
concepts via the notion of graded membership of a set (e.g. by assigning every
element of the sample space a certain degree of membership in any given set).
Likelihood data Last but not least, evidence is often directly provided in the
form of whole probability distributions. For instance, ‘experts’ (e.g., medical doc-
tors) tend to express themselves directly in terms of chances of an event happen-
ing (e.g. ‘diagnosis A is most likely given the symptoms, otherwise it is either
A or B’, or ‘there is an 80% chance this is a bacterial infection’). If the doctors
1.4 Beyond probability 19
were frequentists, provided with the same data, they would probably apply logis-
tic regression and come up with the same prediction on the conditional probability
P (disease|symptoms): unfortunately, doctors are not statisticians.
In addition, some sensors also provide as output a PDF on the same sample space:
think of two separate Kalman filters based one on color, the other on motion (optical
flow), providing a Gaussian predictive PDF on the location of a target in an image.
Equation (1.4) is sometimes called also the law of total probability, and obviously
generalises Bayesian conditioning (obtained when P 0 (B) = 1 for some B).
Beyond Jeffrey’s rule What if the new probability P 0 is defined on the same σ-
algebra A? Jeffrey’s rule cannot be applied. As we discussed, this does happen when
multiple sensors provide predictive PDFs on the same sample space.
Belief functions deal with uncertain evidence by moving away from the concept
of conditioning (e.g., via Bayes’ rule), to that of combining pieces of evidence simul-
taneously supporting multiple propositions to various degrees. While conditioning
is an inherently asymmetric operation, in which the current state of the world and
the new evidence are represented by a probability distribution and a single event,
respectively, combination in belief function reasoning is completely symmetric, as
both the current beliefs about the state of the world and the new evidence are both
represented by a belief function.
Belief functions naturally encode uncertain evidence of the kinds discussed
above (vague concepts, unreliable data, likelihoods) as well as they represent tra-
ditional ‘certain’ events. Vague, ‘fuzzy’ concepts are represented in the formalism
by consonant belief functions, in which supported events are nested – unreliable
measurements can be naturally portrayed as ‘discounted’ probabilities (see Section
??).
20 1 Introduction: Theories of Uncertainty
Now, suppose you have an urn containing 30 red balls and 60 balls, either black
or yellow. Then, consider the following gambles:
– f1 : you receive 100 euros if you draw a red (R) ball;
– f2 : you receive 100 euros if you draw a black (B) ball;
– f3 : you receive 100 euros if you draw a red or a yellow (Y) ball;
– f4 : you receive 100 euros if you draw a black or a yellow ball.
In this example Ω = {R, B, Y }, fi : Ω → R and X = R (consequences are
measured in terms of monetary returns). The four acts correspond to the mappings
in the following table:
R B Y
f1 100 0 0
f2 0 100 0
f3 100 0 100
f4 0 100 100
Empirically, it is observed that most people strictly prefer f1 to f2 , while strictly
preferring f4 to f3 . Now, pick E = {R, B}. By definition (1.5):
Since f1 < f2 , i.e., f1 {R, B}0 < f2 {R, B}0, the Sure Thing principle would im-
ply that f1 {R, B}100 < f2 {R, B}100, i.e., f3 < f4 .
In conclusion, the Sure Thing Principle is empirically violated: this is what consti-
tutes the so-called Ellsberg paradox.
1.4 Beyond probability 21
Aversion to ‘uncertainty’ The argument has been widely studied in economics and
decision making9 , and has to do with people’s instinctive aversion to (second-order)
uncertainty. They favour f1 over f2 for the former ensures a guaranteed 1/3 chance
of winning, while the latter is associated with a (balanced) interval of chances be-
tween 0 and 2/3. Although the average probability of success is still 1/3, the lower
bound is 0 - people tend to find that unacceptable.
Investors, for instance, are known to favour ‘certainty over uncertainty. This was
shown, for instance, in their reaction to ‘brexit’, the UK referendum on leaving the
European Union.
Does certainty, in this context, mean a certain outcome of their bets? Certainly not.
It means being confident that their models can handle the observed patterns of vari-
ation.
Climatic change An emblematic application in which second-order uncertainty
is paramount is climatic change models. Admittedly, this constitutes an extremely
challenging decision making problem, where decision makers need to decide whether
to invest billions of dollars/euros/pounds on expensive engineering projects to mit-
igate the effects of climate change, knowing the outcomes of their decision will be
known only in twenty to thirty years time.
Rather surprisingly, the mainstream in climatic change is not about explicitly
modelling uncertainty at all: the onus is really on developing ever more complex
dynamical models of the environment and validate their predictions. This is all the
more surprising as it is well known that even deterministic (but nonlinear) models
tend to display chaotic behaviour, which induces uncertainty on predictions of their
future state whenever initial conditions are not known with certainty.
Climatic change, in particular, requires making predictions very far off in the future:
as dynamical models are obviously much simplified versions of the world: as such,
they become more and more inaccurate as time passes.
What are the challenges of modelling statistical uncertainty explicitly, in this
context? First of all, the lack of priors (ouch, Bayesians) for the climate space, whose
points are very long vectors whose components are linked by complex dependen-
cies. Data are also relatively scarce, especially as we go back in time: as we just saw,
scarcity is a source of Knightian uncertainty as it puts constraints on our ability to
estimate probability distributions.
Finally, hypothesis testing cannot really be used, either (too bad, frequentists): this
is clearly not a designed experiment where one can make sensible assumptions on
the underlying data-generating mechanism.
9
http://www.econ.ucla.edu/workingpapers/wp362.pdf
10
http://www.wsj.com/articles/global-investors-wake-up-to-brexit-threat-1466080015
22 1 Introduction: Theories of Uncertainty
Similar reflections have led numerous scientists to recognise the need for a coherent
mathematical theory of uncertainty able to tackle all these aspects. Both alternatives
to and extensions of classical probability theory have been proposed, starting from
De Finetti’s pioneering work on subjective probability [478].
Formalisms include possibility-fuzzy set theory [1531, 412], probability intervals
[592], credal sets, monotone capacities [1390], random sets [984] and imprecise
probability theory [1371]. New original foundations of subjective probability in be-
havioral terms [1374] or by means of game theory [1176] have been brought for-
ward.
Also referred to as imprecise probabilities (as most of them comprise classical prob-
abilities as a special case) they form in fact, as we will see in more detail in Chapter
5, an entire hierarchy of encapsulated formalisms.
The theory of belief functions or ‘theory of evidence’ is one of the most popular
such formalisms for a mathematics of uncertainty.
The notion of belief function originally derives from a series of seminal works
[336, 344, 345] by Arthur Dempster on upper and lower probabilities induced by
multi-valued mappings. Given a probability distribution p on a given domain, and a
one-to-many map x 7→ Γ (x) to another domain, the original probability induces a
probability distribution on the power set of the bottom domain [336], i.e., a ‘random
set’ [925, 984]. The term ‘belief function’ was coined when Glenn Shafer [1149, ?]
1.5 Mathematics (plural) of uncertainty 23
Shafer called his 1976 proposal ‘A mathematical theory of evidence’ [1149], while
the mathematical objects it deals with are called ‘belief functions’. Where do these
names come from, and what interpretation of probability (in its wider acception) do
they entail?
Indeed, belief theory is a theory of epistemic probability: it is about probabilities
as a mathematical representation of knowledge (never mind a humans knowledge,
or a machines). Belief is often defined as the state of mind in which a person thinks
something to be the case, with or without there being empirical evidence in support.
Knowledge is rather more controversial a notion, for it is regarded by some as the
part of belief that is true, while other consider it as that part of belief which is justi-
fied to be true. Epistemology is the branch of philosophy concerned with the theory
of knowledge. Epistemic probability is the study of probability as a representation
of knowledge.
The theory of evidence is also, as the name itself suggests, a theory of eviden-
tial probability: one in which the probabilities representing knowledge are induced
(‘elicited’) by the available evidence. In probabilistic logic [], statements such as
‘hypothesis H is probably true’ are interpreted to mean that the empirical evidence
E supports hypothesis H to a high degree – this degree of support is called the epis-
temic probability of H given E.
As a matter of fact, Pearl [] and others have supported a view of belief functions
as probabilities on the logical causes of a certain proposition (the so-called proba-
24 1 Introduction: Theories of Uncertainty
bility of provability interpretation), closely related to modal logic []. To be fair, this
connection to evidence has often been overlooked in much of the subsequent work.
In conclusion, the rationale for belief function theory can be summarised as
follows: there exists evidence in the form of probabilities, which supports degrees
of belief on the matter at hand. The space where the (probabilistic) evidence lives
is different from the hypothesis space (where belief measures are defined). The two
spaces are linked by a map one to many, yielding a mathematical object known as
random set [].
The remaining Chapter 9 starts extending the applicability of the geometric ap-
proach to other uncertainty measures, focussing in particular on possibility measures
(consonant belief functions), and the related notion of consistent belief function.
Part III is concerned with the interplay of uncertainty measures of different
kinds, and the geometry of their relationship.
Chapters 10 and Chapter 11 study the problem of transforming a belief function into
a classical probability measure. In particular, Chapter 10 introduces the affine fam-
ily of probability transformations, whose which commute with affine combination
in the belief space.
Chapter 11 focusses instead on the epistemic family of tranforms, relative belief and
relative plausibility, studies their dual properties with respect to Dempster’s sum,
and describes their geometry on both the probability simplex and the belief space.
Chapter 12 extends the analysis to the consonant approximation problem, the prob-
lem of finding the possibility measure which best approximates a given belief func-
tion. In particular, approximations induced by classical Lp norms are derived, and
compared with classical outer consonant approximations.
Chapter 13 concludes Part III by describing Lp consistent approximations in both
the mass and the belief space.
Part IV
Part I
Theories of uncertainty
Shafer’s belief functions
2
The theory of evidence [1149] was introduced in the Seventies by Glenn Shafer
as a way of representing epistemic knowledge, starting from a sequence of semi-
nal works ([336], [344], [345]) by Arthur Dempster, Shafer’s advisor [349]. In this
formalism the best representation of chance is a belief function (b.f.) rather than a
classical probability distribution. Belief functions assign probability values to sets
of outcomes, rather than single events: their appeal rests on their ability to naturally
encode evidence in favor of propositions.
The theory embraces the familiar idea of assigning numbers between 0 and 1 to
measure degrees of support but, rather than focusing on how these numbers are de-
termined, it concerns itself with the mechanisms driving the combination of degrees
of belief.
The formalism provides indeed a simple method for merging the evidence carried
by a number of distinct sources (called Dempster’s rule [580]), with no need for
29
30 2 Shafer’s belief functions
any prior distributions [1423]. In this sense, according to Shafer, it can be seen
as a theory of probable reasoning. The existence of different levels of granularity
in knowledge representation is formalized via the concept of family of compatible
frames.
As we recall in this Chapter, the Bayesian framework (see Chapter ??, Section
1.3.5) is actually contained in the theory of evidence as a special case, since:
1. Bayesian functions form a special class of belief functions, and
2. Bayes’ rule is a special case of Dempster’s rule of combination.
In the following we will neglect most of the emphasis Shafer put on the notion of
‘weight of evidence’, which in our view is not strictly necessary to the comprehen-
sion of what follows.
The quantity m(A) is called the basic probability number or ‘mass’ [797, 796] as-
signed to A, and measures the belief committed exactly to A ∈ 2Θ . The elements of
the power set 2Θ associated with non-zero values of m are called the focal elements
of m and their union is called its core:
. [
Cm = A. (2.1)
A⊆Θ:m(A)6=0
Now suppose that empirical evidence is available so that a basic probability assign-
ment can be introduced over a specific FOD Θ.
Definition 3. The belief function associated with a basic probability assignment
m : 2Θ → [0, 1] is the set function b : 2Θ → [0, 1] defined as:
X
b(A) = m(B). (2.2)
B⊆A
1
For a note about the intuitionistic origin of this denomination see Rosenthal, Quantales
and their applications [1095].
2.1 Belief functions as set functions 31
The domain Θ on which a belief function is defined is usually interpreted as the set
of possible answers to a given problem, exactly one of which is the correct one. For
each subset (‘event’) A ⊂ Θ the quantity b(A) takes on the meaning of degree of
belief that the truth lies in A, and represents the total belief committed to a set of
possible outcomes A by the available evidence m.
Example: the Ming vase. A simple example (from [1149]) can clarify the notion
of degree of belief. We are looking at a vase that is represented as a product of
the Ming dynasty, and we are wondering whether the vase is genuine. If we call
θ1 the possibility that the vase is original, and θ2 the possibility that it is indeed
counterfeited, then
Θ = {θ1 , θ2 }
is the set of possible outcomes, and
∅, Θ, {θ1 }, {θ2 }
is the (power) set of all its subsets. A belief function b over Θ will represent the
degree of belief that the vase is genuine as b({θ1 }), and the degree of belief the
vase is a fake as b({θ2 }) (note we refer to the subsets {θ1 } and {θ2 }). Axiom 3
of Definition 25 poses a simple constraint over these degrees of belief, namely:
b({θ1 }) + b({θ2 }) ≤ 1. The belief value of the whole outcome space Θ, therefore,
represents evidence that cannot be committed to any of the two precise answers θ1
and θ2 and is therefore an indication of the level of uncertainty about the problem.
As the Ming vase example illustrates, belief functions readily lend themselves
to the representation of ignorance, in the form of the mass assigned to the whole set
of outcomes (FOD). Indeed, the simplest belief function assigns all the basic prob-
ability to the whole frame Θ and is called vacuous belief function.
Bayesian theory, in comparison, has trouble with the whole idea of encoding igno-
rance, for it cannot distinguish between ‘lack of belief’ in a certain event A (1−b(A)
in our notation) and ‘disbelief’ (the belief in the negated event Ā = Θ \ A). This is
due to the additivity constraint: P (A) + P (Ā) = 1.
The Bayesian way of representing the complete absence of evidence is to assign an
equal degree of belief to every outcome in Θ. As we will see in this Chapter, Section
2.7.2, this generates incompatible results when considering different descriptions of
the same problem at different levels of granularity.
Moebius inversion formula Given a belief function b there exists a unique basic
probability assignment which induces it. The latter can be recovered by means of
the Moebius inversion formula2 :
X
m(A) = (−1)|A\B| b(B). (2.3)
B⊂A
Expression (2.3) establishes a 1-1 correspondence between the two set functions m
and b [537].
2
See [1297] for an explanation in term of the theory of monotone functions over partially
ordered sets.
32 2 Shafer’s belief functions
Other expressions of the evidence generating a given belief function b are what can
.
be called the degree of doubt d(A) = b(Ā) on an event A and, more importantly,
the upper probability of A:
.
pl(A) = 1 − d(A) = 1 − b(Ā), (2.4)
as opposed to the lower probability of A, i.e., its belief value b(A). The quantity
pl(A) expresses the ‘plausibility’ of a proposition A or, in other words, the amount
of evidence not against A [260]. Once again the plausibility function pl : 2Θ →
[0, 1] conveys the same information as b, and can be expressed as
X
pl(A) = m(B) ≥ b(A).
B∩A6=∅
Fig. 2.1. An example of (consonant, see Section 2.8) belief function on a frame of discern-
ment Θ = {θ1 , θ2 , θ3 } of cardinality 3, with focal elements B2 = {θ1 } ⊂ B1 = {θ1 , θ2 }.
accounts for the mass that might be assigned to some element of A0 , and measures
the evidence not surely against it.
Confirming what said when discussing the superadditivity axiom (3.6), in the theory
of evidence a (finite) probability function is simply a belief function satisfying the
additivity rule for disjoint sets.
Definition 4. A Bayesian belief function b : 2Θ → [0, 1] meets the additivity condi-
tion:
b(A) + b(Ā) = 1
whenever A ⊆ Θ.
Obviously, as it meets the axioms of Definition 25, a Bayesian belief function is
indeed a belief function. It can be proved that [1149]:
Θ
P function b : 2 → [0, 1] is Bayesian if and only if ∃ p :
Proposition 1. A belief
Θ → [0, 1] such that θ∈Θ p(θ) = 1 and:
X
b(A) = p(θ) ∀A ⊆ Θ.
θ∈A
2.2.1 Definition
Figure 2.2 pictorially expresses Dempster’s algorithm for computing the basic
probability assignment of the combination b1 ⊕ b2 of two belief functions. Let a unit
square represent the total, unitary probability mass one can assign to subsets of Θ,
and associate horizontal and vertical strips with the focal elements A1 , ..., Ak and
B1 , ..., Bl of b1 and b2 , respectively. If their width is equal to their mass value, then
their area is also equal to their own mass m(Ai ), m(Bj ). The area of the intersection
of the strips related to any two focal elements Ai and Bj is then equal to the product
m(Ai ) · m(Bj ), and is committed to the intersection event Ai ∩ Bj . As more than
one such rectangle can end up being assigned to the same subset A (as different
pairs of focal elements can have the same intersection) we need to sum up all these
contributions, obtaining:
X
mb1 ⊕b2 (A) ∝ m1 (Ai )m2 (Bj ).
i,j:Ai ∩Bj =A
Finally, as some of these intersections may be empty, we need to discard the quantity
X
m1 (Ai )m2 (Bj )
i,j:Ai ∩Bj =∅
Fig. 2.2. Graphical representation of Dempster’s rule of combination: the sides of the square
are divided into strips associated with the focal elements Ai and Bj of the belief functions
b1 , b2 to combine.
Fig. 2.3. Example of Dempster’s sum. The belief functions b1 with focal elements A1 , A2
and b2 with f.e.s B1 , B2 (left) are combinable via Dempster’s rule. This yields a new belief
function b1 ⊕ b2 (right) with focal elements X1 and X2 .
The normalization constant in (2.6) measures the level of conflict between the two
input belief functions, for it represents the amount of evidence they attribute to con-
tradictory (i.e., disjoint) subsets.
Definition 6. We call weight of conflict K(b1 , b2 ) between two belief functions b1
and b2 the logarithm of the normalisation constant in their Dempster’s combination:
1
K = log P .
1− i,j:Ai ∩Bj =∅ m1 (Ai )m2 (Bj )
Dempster’s rule describes the way the assimilation of new evidence b0 changes our
beliefs previously encoded by a belief function b, determining new belief values
given by b ⊕ b0 (A) for all events A. In this formalism, a new body of evidence is not
constrained to be in the form of a single proposition A known with certainty, as it
happens in Bayesian theory.
Yet, the incorporation of new certainties is permitted as a special case. In fact, this
special kind of evidence is represented by belief functions of the form:
1 if B ⊂ A
b0 (A) = ,
0 if B 6⊂ A
where B is the proposition known with certainty. Such a belief function is combin-
able with the original b.f. b as long as b(B̄) < 1, and the result has the form:
pl(A ∩ B)
pl(A|B) = . (2.7)
pl(B)
Expression (2.7) strongly reminds us of Bayes’s rule of conditioning (1.1) – Shafer
calls it Dempster’s rule of conditioning.
2.3 Simple and separable support functions 37
Dempster’s rule (2.6) is clearly symmetric in the role assigned to the two pieces
of evidence b and b0 (due to the commutativity of set-theoretical intersection). In
Bayesian theory, instead, we are constrained to represent new evidence as a true
proposition, and condition a Bayesian prior probability on that proposition. There is
no obvious symmetry, but even more importantly we are forced to assume that the
consequence of any new piece of evidence is to support a single proposition with
certainty!
As our intuition would suggest, the combined evidence supports A ∩ B with degree
σ1 σ2 .
When the two propositions have empty intersection A ∩ B = ∅, instead, we say
that the evidence is conflicting. In this situation the two bodies of evidence contrast
the effect of each other.
The following example is also taken from [1149].
Example: the alibi A criminal defendant has an alibi: a close friend swears that
the defendant was visiting his house at the time of the crime. This friend has a good
reputation: suppose this commits a degree of support of 1/10 to the innocence of the
defendant (I). On the other side, there is a strong, actual body of evidence providing
a degree of support of 9/10 for his guilt (G).
To formalize this case we can build a frame of discernment Θ = {G, I}, so
that the defendant’s friend provides a simple support function focused on {I} with
bI ({I}) = 1/10, while the hard piece of evidence corresponds to another simple
support function bG focused on {G} with bG ({G}) = 9/10.
Their orthogonal sum b = bI ⊕ bG yields then:
The effect of the testimony has mildly eroded the force of the circumstantial evi-
dence.
In general, belief functions can support more than one proposition at a time.
The next simplest class of b.f.s is that of ‘separable support functions’.
b = b1 ⊕ · · · ⊕ bn ,
Proposition 5. If b is a separable belief function, and A and B are two of its focal
elements with A ∩ B 6= ∅, then A ∩ B is a focal element of b.
The set of f.e.s of a separable support function is closed under set-theoretical inter-
section. Such a n.f. b is coherent in the sense that if it supports two propositions, then
it must support the proposition ‘naturally’ implied by them, i.e., their intersection.
Proposition 5 gives us a simple method to check whether a given belief function is
indeed a separable support function.
Since a separable support function can support pairs of disjoint subsets, it flags the
existence of what we can call ‘internal’ conflict.
Definition 9. The weight of internal conflict Wb for a separable support function b
is defined as:
– 0 if b is a simple support function;
– inf K(b1 , ..., bn ) for the various possible decompositions of b into simple support
functions b = b1 ⊕ · · · ⊕ bn if b is not simple.
It is easy to see (see [1149] again) that Wb = K(b1 , ..., bn ) where b1 ⊕ · · · ⊕ bn is
the canonical decomposition of b.
40 2 Shafer’s belief functions
One appealing idea in the theory of evidence is the simple, sensible claim that our
knowledge of any given problem is inherently imperfect and imprecise. As a con-
sequence, new evidence may allow us to make decisions on more detailed decision
spaces (represented by frames of discernments). All these frames need to be ‘com-
patible’ with each other, in a sense that we will precise in the following.
One frame can certainly be assumed compatible with another if it can be obtained
by introducing new distinctions, i.e., by analyzing or splitting some of its possible
outcomes into finer ones. This idea is embodied by the notion of refining.
Definition 10. Given two frames of discernment Θ and Ω, a map ρ : 2Θ → 2Ω is
said to be a refining if it satisfies the following conditions:
1. ρ({θ}) 6= ∅ ∀θ ∈ Θ;
2. ρ({θ}) ∩ ρ({θ0 }) = ∅ if θ 6= θ0 ;
3. ∪θ∈Θ ρ({θ}) = Ω.
In other words, a refining maps the coarser frame Θ to a disjoint partition of the
finer one Ω (see Figure 2.4).
The finer frame is called a refinement of the first one, and we call Θ a coarsening
of Ω. Both frames represent sets of admissible answers to a given decision problem
(see Chapter ?? as well) – the finer one is nevertheless a more detailed description,
obtained by splitting each possible answer θ ∈ Θ in the original frame. The image
ρ(A) of a subset A of Θ consists of all the outcomes in Ω that are obtained by
splitting an element of A.
Proposition 6 lists some of the properties of refinings [1149].
2.4 Families of compatible frames of discernment 41
Roughly speaking, ρ(A) is the largest subset of Θ that implies A ⊂ Ω, while ρ̄(A)
is the smallest subset of Θ that is implied by A. As a matter of fact:
Proposition 7. Suppose ρ : 2Θ → 2Ω is a refining, A ⊂ Ω and B ⊂ Θ. Let ρ̄
and ρ the related outer and inner reductions. Then ρ(B) ⊂ A iff B ⊂ ρ(A), and
A ⊂ ρ(B) iff ρ̄(A) ⊂ B.
Roughly speaking, two frames are compatible if and only if they concern propo-
sitions which can be both expressed in terms of propositions of a common, finer
frame.
By property (6) each collection of compatible frames has many common refine-
ments. One of these is particularly simple.
Theorem 1. If Θ1 , ..., Θn are elements of a family of compatible frames F, then
there exists a unique frame Θ ∈ F such that:
1. ∃ a refining ρi : 2Θi → 2Ω for all i = 1, ..., n;
2. ∀θ ∈ Θ ∃ θi ∈ Θi f or i = 1, ..., n such that
Fig. 2.5. The different digital representations of the same real number r ∈ [0, 1] constitute a
simple example of family of compatible frames.
whenever ∅ =
6 Ai ⊂ Θi for i = 1, ..., n.
Equivalently, condition (2.12) can be expressed as follows:
– if Ai ⊂ Θi for i = 1, ..., n and ρ1 (A1 ) ∩ · · · ∩ ρn−1 (An−1 ) ⊂ ρn (An ) then
An = Θn or one of the first n − 1 subsets Ai is empty.
The notion of independence of frames is illustrated in Figure 2.6.
In particular, it is easy to see that if ∃j ∈ [1, .., n] s.t. Θj is a coarsening of
some other frame Θi , |Θj | > 1, then {Θ1 , ..., Θn } are not independent. Mathe-
matically, families of compatible frames are collections of Boolean subalgebras of
their common refinement [1197], as Equation (2.12) is nothing but the independence
condition for the associated Boolean sub-algebras 3 .
3
The following material comes from [1197].
Definition 15. A Boolean algebra is a non-empty set U provided with three internal oper-
ations
∩ : U × U −→ U ∪ : U × U −→ U ¬ : U −→ U
A, B 7→ A ∩ B A, B 7→ A ∪ B A 7→ ¬A
called respectively meet, join and complement, characterized by the following properties:
A ∪ B = B ∪ A, A∩B =B∩A
A ∪ (B ∪ C) = (A ∪ B) ∪ C, A ∩ (B ∩ C) = (A ∩ B) ∩ C
(A ∩ B) ∪ B = B, (A ∪ B) ∩ B = B
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C), A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
(A ∩ ¬A) ∪ B = B, (A ∪ ¬A) ∩ B = B
As a special case, the collection (2S , ⊂) of all the subsets of a given set S is a Boolean
algebra.
– b is a support function;
– C has a positive basic probability number, m(C) > 0.
Since there exist belief functions whose core has mass zero, Proposition 9 tells us
that not all the belief functions are support ones (see Section 2.7).
There are occasions in which the impact of a body of evidence on a frame Θ is fully
discerned by one of its coarsening Ω, i.e., no proposition discerned by Θ receives
greater support than what is implied by propositions discerned by Ω.
Definition 19. A belief function b : 2Θ → [0, 1] on Θ is the vacuous extension of a
second belief function b0 : 2Ω → [0, 1], where Ω is a coarsening of Θ, whenever:
We say that b is ‘carried’ by the coarsening Ω. We will make use of this all important
notion in our treatment of two computer vision problems in Part III, Chapter ?? and
Chapter ??.
In its 1976 essay [1149] Glenn Shafer distinguishes between a ‘subjective’ and an
‘evidential’ vocabulary, keeping distinct objects with the same mathematical de-
scription but different philosophical interpretations.
Each body of evidence E supporting a belief function b (see [1149]) simultane-
ously affects the whole family F of compatible frames of discernment the domain
of b belongs to, determining a support function over every element of F. We say
that E determines a family of compatible support functions {sΘ
E }Θ∈F .
The complexity of this family depends on the following property.
Definition 20. The evidence E affects F sharply if there exists a frame Ω ∈ F that
carries sΘ
E for every Θ ∈ F that is a refinement of Ω. Such a frame Ω is said to
exhaust the impact of E on F.
(s1 ⊕ s2 )|2Ω ,
while its application on the coarser frame Θ after computing the restrictions of s1
and s2 to it yields:
(s1 |2Ω ) ⊕ (s2 |2Ω ).
In general, the outcomes of these two combination strategies will be different. Nev-
ertheless, a condition on the refining linking Ω to Θ can be imposed which guaran-
tees their equivalence.
Proposition 10. Assume that s1 and s2 are support functions over a frame Θ, their
Dempster’s combination s1 ⊕ s2 exists, ρ̄ : 2Θ → 2Ω is an outer reduction, and
Definition 21. Suppose F is a family of compatible frames, {sΘ E1 }Θ∈F is the family
of support functions determined by a body of evidence E1 , and {sΘ E2 }Θ∈F is the
family of support functions determined by a second body of evidence E2 .
Then, a particular frame Ω ∈ F is said to discern the relevant interaction of E1 and
E2 if:
ρ̄(A ∩ B) = ρ̄(A) ∩ ρ̄(B)
whenever Θ is a refinement of Ω, where ρ̄ : 2Θ → 2Ω is the associated outer
reduction, A is a focal element of sΘ Θ
E1 and B is a focal element of sE2 .
48 2 Shafer’s belief functions
actual, finite evidence. On the other hand, statistical inference already teaches us
that chances can be evaluated only after infinitely many repetitions of independent
random experiments.4
for all θ ∈ Θ, where pls is the plausibility function for s and the constant c does not
depend on θ.
Proposition 15. (Bayes’ theorem) Suppose b0 and s are a Bayesian belief function
and a support function on the same frame Θ, respectively. Suppose l : Θ → [0, ∞)
expresses the relative plausibilities of singletons under s. Suppose also that their
Dempster’s sum b0 = s ⊕ b0 exists. Then b0 is Bayesian, and
b0 ({θ}) = K · b0 ({θ})l(θ) ∀θ ∈ Θ,
where X −1
K= b0 ({θ})l(θ) .
θ∈Θ
This implies that the combination of a Bayesian b.f. with a support function requires
nothing more than the latter’s relative plausibilities of singletons.
It is interesting to note that the latter functions behave multiplicatively under com-
bination,
Proposition 16. If s1 , ..., sn are combinable support functions, and li represents
the relative plausibilities of singletons under si for i = 1, ..., n, then l1 · l2 · · · · · ln
expresses the relative plausibilities of singletons under s1 ⊕ · · · ⊕ sn .
providing a simple algorithm to combine any number of support functions with a
Bayesian b.f.
4
Using the notion of weight of evidence Shafer gives a formal explanation of this intuitive
observation by showing that a Bayesian b.f. indicates an infinite amount of evidence in
favor of each possibility in its core [1149].
50 2 Shafer’s belief functions
Example: Sirius’ planets A team of scientists wonder whether there is life around
Sirius. Since they do not have any evidence concerning this question, they adopt a
vacuous belief function to represent their ignorance on the frame
Θ = {θ1 , θ2 },
where θ1 , θ2 are the answers “there is life” and “there is no life”. They can also
consider the question in the context of a more refined set of possibilities. For exam-
ple, our scientists may raise the question of whether there even exist planets around
Sirius. In this case the set of possibilities becomes
Ω = {ζ1 , ζ2 , ζ3 },
where ζ1 , ζ2 , ζ3 are respectively the possibility that there is life around Sirius, that
there are planets but no life, and there are no planets at all. Obviously, in an eviden-
tial setup our ignorance still needs to be represented by a vacuous belief function,
which is exactly the vacuous extension of the vacuous b.f. previously defined on Θ.
From a Bayesian point of view, instead, it is difficult to assign consistent degrees
of belief over Ω and Θ both symbolizing the lack of evidence. Indeed, on Θ a
uniform prior yields p({θ1 }) = p({θ1 }) = 1/2, while on Ω the same choice will
yield p0 ({ζ1 }) = p0 ({ζ2 }) = p0 ({ζ3 }) = 1/3. Ω and Θ are obviously compatible
(as the former is a refinement of the latter): the vacuous extension of p onto Ω
produces a Bayesian distribution
p({ζ1 }) = 1/3, p({ζ1 , ζ2 }) = 2/3
which is inconsistent with p0 !
Definition 24. A belief function is said to be consonant if its focal elements A1 , ..., Am
are nested: A1 ⊂ A2 ⊂ · · · ⊂ Am .
The following Proposition illustrates some of their properties.
Proposition 17. If b is a belief function with upper probability function pl, then the
following conditions are equivalent:
1. b is consonant;
2. b(A ∩ B) = min(b(A), b(B)) for every A, B ⊂ Θ;
3. pl(A ∪ B) = max(pl(A), pl(B)) for every A, B ⊂ Θ;
4. pl(A) = maxθ∈A pl({θ}) for all non-empty A ⊂ Θ;
5. there exists a positive integer n and a collection of simple support functions
s1 , ..., sn such that b = s1 ⊕ · · · ⊕ sn and the focus of si is contained in the
focus of sj whenever i < j.
Consonant b.f.s represent collections of pieces of evidence all pointing towards the
same direction. Moreover,
Proposition 18. Suppose s1 , ..., sn are non-vacuous simple support functions with
foci Cs1 , ..., Csn respectively, and b = s1 ⊕ · · · ⊕ sn is consonant. If Cb denotes the
core of b, then all the sets Csi ∩ Cb , i = 1, ..., n are nested.
By condition (2) of Proposition 17 we have that:
Fig. 3.1. Mathematical formulations to the theory of belief functions are grouped in this
diagram in terms of their similarity as mathematical frameworks, and as epistemic approaches
to uncertainty representation.
53
54 3 Understanding belief functions
Chapter outline
In the first part of the Chapter we provide an overview of the main mathematical
formulations and semantic interpretations of the theory of belief functions (Section
3.1).
We start from Dempster’s original proposal in terms of upper and lower probabilities
induced by multi-valued mappings (3.1.1), also termed ‘compatibility relations’,
and explain the origins of Dempster’s rule of combination in this context. We then
recall belief functions’ alternative axiomatic definition as generalised, non-additive
measures (Section 3.1.2), a mathematical formulation closely related to the notion
of ‘inner measure’ (Section 3.1.3). In a robust statistical perspective, belief functions
can also be interpreted as (a specific class of) convex sets of probability measures
(Section 3.1.4), but the most general mathematical framework with the potential
to extend their definition to continuous domains is arguably that of random sets
(Section 3.1.5).
Other interpretations have been proposed along the years, among which Zadeh’s
‘simple view’ as necessity measures associated with second-order relations seem to
be the most widely cited (Section 3.1.6).
The second part is devoted to the long-standing debate on the epistemic nature
of belief functions and their ‘correct’ interpretation (Section 3.2).
Starting from a brief description of Shafer’s evolving position on the matter and the
early support received by his mathematical theory of evidence (Section 3.2.1), we
focus on the specific scientific debate on the relationship between evidential and
Bayesian approaches to reasoning under uncertainty (Section 3.2.2).
After summarising Judea Pearl’s and others’ criticisms to belief theory (Section
3.2.3), we argue that most misunderstandings are due to confusions between the
various interpretations of these objects (Section 3.2.4), and summarise the main
rebuttals and formal justifications which have been advanced in response (Section
3.2.5).
Finally, in the third part (Section 3.3) we review the manifold frameworks which
have been proposed in the last fifty years based on (at least the most fundamental
notions of) the theory of belief functions, and various generalisation and extensions
brought forward by a number of authors. Starting with the most widely applied ap-
proach, Philippe Smets’ Transferable Belief Model (Section 3.3.1), we recall the
main notions of Kohlas and Monney’s theory of hints (Section 3.3.2) and Dezert-
Smandarache Theory (Section 3.3.3). We give quite some space to Dempster and
Liu’s Gaussian belief functions (Section 3.3.4), Ivan Kramosil’s probabilistic inter-
pretation (Section 3.3.5) and Hummel and Landy’s statistics of experts’ opinions
(Section 3.3.6). We review a number of interval or credal extensions of the notion of
basic probability assignment in Section 3.3.7, including Denoeux’s Imprecise Belief
Structures. Finally, a rather comprehensive review of less well-known approaches
(including Lowrance’s evidential reasoning and Grabisch’s belief functions on lat-
tices) concludes the Chapter.
3.1 The multiple semantics of belief functions 55
The notion of belief function [1153, 1162] originally derives from a series of Demp-
ster’s works on upper and lower probabilities induced by multi-valued mappings,
introduced in [336], [344] and [345].
Multi-valued mappings Indeed, the idea that intervals rather than probability val-
ues should be used to model degrees of belief had been suggested and investigated
by earlier researchers [480, 526, 764, 763, 1280]. Dempster, however, defined upper
and lower probability values in terms of statistics of set-valued functions defined
over a measure space. In [336] he gave examples of the use of upper and lower
probabilities in terms of finite populations with discrete univariate observable char-
acteristics.
Shafer later reformulated Dempster’s work by identifying his upper and lower prob-
abilities with epistemic probabilities or ‘degrees of belief’, i.e., the quantitative as-
sessments of one’s belief in a given fact or proposition. The following sketch of the
nature of belief functions is abstracted from [1163]: another analysis on the relation
between belief functions and upper and lower probabilities is developed in [1263].
Let us consider a problem in which we have probabilities (coming from arbi-
trary sources, for instance subjective judgement or objective measurements) for a
question Q1 and we want to derive degrees of belief for a related question Q2 . For
example, Q1 could be the judgement on the reliability of a witness, and Q2 the
decision about the truth of the reported fact. In general, each question will have a
number of possible answers, only one of them being correct.
Let us call Ω and Θ the sets of possible answers to Q1 and Q2 respectively. So,
given a probability measure P on Ω we want to derive a degree of belief b(A) that
A ⊂ Θ contains the correct response to Q2 (see Figure 3.2).
If we call Γ (ω) the subset of answers to Q2 compatible with ω ∈ Ω, each
element ω tells us that the answer to Q2 is somewhere in A whenever
56 3 Understanding belief functions
Γ (ω) ⊂ A.
The degree of belief b(A) of an event A ⊂ Θ is then the total probability (in Ω) of
all the answers ω to Q1 that satisfy the above condition, namely:
b : 2Θ → [0, 1]
. X
A ⊂ Θ 7→ b(A) = P (ω).
ω∈Ω:Γ (ω)⊂A
of evidence.
Formally, we need to find a new probability space (Ω, P ) and a multivalued map
from Ω to Θ. The independence assumption allows us to build the product space
(P1 × P2 , Ω1 × Ω2 ): two outcomes ω1 ∈ Ω1 and ω2 ∈ Ω2 then tell us that the an-
swer to Q2 is somewhere in Γ1 (ω1 ) ∩ Γ2 (ω2 ). When this intersection is empty the
two pieces of evidence are in contradiction. We then need to condition the product
measure P1 × P2 over the set of non-empty intersections
obtaining:
n o
Ω = (ω1 , ω2 ) ∈ Ω1 × Ω2 Γ1 (ω1 ) ∩ Γ2 (ω2 ) 6= ∅ ,
(3.5)
P = P1 × P2 |Ω , Γ (ω1 , ω2 ) = Γ1 (ω1 ) ∩ Γ2 (ω2 ).
It is easy to see that the new belief function Bel is linked to the pair of belief
functions being combined by Dempster’s rule, as defined in (2.6).
The combination of compatibility relations defined by (3.4) can be called the
‘optimistic’ one, as the beliefs (not the doubts) of different subjects (experts) are
combined. Indeed, this is a hidden assumption of Dempster combination rule (3.5),
as important as the assumption of independence of sources.
An alternative approach supported by Kramosil [768, 775] is based on the dual idea
that doubts are shared instead, and that an outcome θ ∈ Θ is incompatible if its is
considered incompatible by all the experts separately, namely whenever:
Γ1 (ω) ∪ Γ2 (ω) = ∅.
This results what Smets calls disjunctive rule of combinition in his Transferable
Belief Model (Section 4.2.2).
to allow the function p to meet additivity only as a lower bound, and restrict our-
selves to finite sets, we obtain what Shafer [1149] called a belief function (see Defi-
nition 3).
Definition 25. Suppose Θ is a finite set, and let 2Θ = {A ⊆ Θ} denote the set of
all subsets of Θ. A belief function (b.f.) on Θ is a function b : 2Θ → [0, 1] from the
power set 2Θ to the real interval [0, 1] such that:
– b(∅) = 0;
58 3 Understanding belief functions
– b(Θ) = 1;
– for every positive integer n and for every collection A1 , ..., An ∈ 2Θ we have
that:
X X
b(A1 ∪ ... ∪ An ) ≥ b(Ai ) − b(Ai ∩ Aj ) + ... + (−1)n+1 b(A1 ∩ ... ∩ An ).
i i<j
(3.6)
It can be proven that [1149]:
Proposition 19. Definitions 25 and 3 are equivalent formulations of the notion of
belief function.
Condition (3.6), called superadditivity, obviously generalizes Kolmogorov’s addi-
tivity (Definition 1). Belief functions can then be seen as generalizations of the
familiar notion of (discrete) probability measure.
Q:F → [0, 1]
C ∩ (E × Θ) 7→ P (E).
Q∗ : 2C → [0, 1]
A ⊂ C 7→ Q∗ (A) = max{P (E)|E ⊂ Ω, C ∩ (E × Θ)) ⊂ A}.
i.e., the classical definition (??) of the belief value of A induced by a multi-valued
mapping Γ . This connection between inner measures and belief functions appeared
in the literature in the second half of the Eighties ([1099, 1098], [466]).
The interpretation of belief values as lower bounds to the true unknown probabil-
ity value of an event generates, in turn, an additional angle of the nature of belief
functions [809]. Belief functions admit the following order relation:
called weak inclusion. A b.f. b is weakly included in b0 whenever its belief values
are dominated by those of b0 for all the events of Θ.
A probability distribution P in which a belief function b is weakly included
(P (A) ≥ b(A) ∀A) is said to be consistent with b [801]. Each belief function b then
uniquely identifies a lower envelope of the set of probabilities consistent with it:
i.e., the set of probability measures whose values dominate that of b on all events
A. Accordingly, the theory of evidence is seen by some authors as a special case of
robust statistics [1138]. This position has been heavily critised along the years.
Convex sets of probabilities are often called credal sets [838, 1542, 255, 34]. A
number of scholars, as it turns out, have argued in favour of belief representation
in terms of convex sets of probabilities, including Koopman [763], Good [526] and
Smith [1279, 1280].
Of course not all credal sets ‘are’ belief functions. The set (3.10) is a polytope in the
simplex of all probabilities we can define on Θ. Its vertices are all the distributions
pπ induced by any permutation π = {xπ(1) , ..., xπ(|Θ|) } of the singletons of Θ of
the form [168, 246]:
X
pπ [b](xπ(i) ) = m(A), (3.11)
A3xπ (i); A63xπ (j) ∀j<i
assigning to a singleton element put in position π(i) by the permutation π the mass
of all focal elements containing it, but not containing any elements preceeding it in
the permutation order [1376].
A landmark work by Kyburg [802] establishes a number of results relating belief
updating in belief theory and credal sets (Section 3.1.4).
60 3 Understanding belief functions
Proposition 20. Closed convex sets of classical probability functions include Shafer’s
probability mass functions as a special case.
Proposition 21. ([802], Theorem 4) The probability intervals resulting from Dempster-
Shafer updating are included in (and may be properly included in) the intervals that
result from applying Bayesian updating to the associated credal set.
Whether this is a good or a bad thing, he argues, depends on the situation. Exam-
ples in which one of the two operators leads to more ‘appealing’ results are provided.
Paul Black [112] emphasised the importance of Kyburg’s result by looking at
simple examples involving Bernoulli trials, and showing that many convex sets of
probability distributions generate the same belief function.
we do not know what’s beneath. A cloacked dice is a random variable which “spits”
subsets of possible outcomes: a random set. This approach has been emphasized in
particular by Nguyen ([528], [986, 984]) and Hestir [612], and resumed in [1173].
Consider a multi-valued mapping Γ : Ω → 2Θ . The lower inverse of Γ is
defined as:
Γ∗ : 2Θ → 2Ω
. (3.12)
A 7→ Γ∗ (A) = {ω ∈ Ω : Γ (ω) ⊂ A, Γ (ω) 6= ∅},
Γ ∗ : 2Θ → 2 Ω
. (3.13)
A 7→ Γ ∗ (A) = {ω ∈ Ω : Γ (ω) ∩ A 6= ∅}.
P̂ [I(B)] = P∗ (B) ∀B ∈ B,
According to Shafer [1169], belief function theory has even older antecedents than
Bayesian theory, as similar arguments appear in the work of George Hooper (1640-
1723) and James Bernoulli (1654-1705), as emphasised in [1160].
In a 1976 paper [1148], Shafer illustrated his rationale for his mathematical
theory of evidence, claiming that the impact of evidence on a proposition may either
support it to various degrees, or cast doubt on it to various possible degrees.
3.2 Genesis and debate 63
Shafer’s proposal for a new theory of statistical evidence and epistemic proba-
bility received rather positive attention [476, 1412]. Fine [476] commented in his
review that the fact that probability takes its meaning from and is used to describe
phenomena as diverse as propensities for actual behavior (the behavioural inter-
pretation), propositional attitudes of belief (subjective degrees of belief) and experi-
mental outcomes under prescribed conditions of unlinked repetitions (the frequentist
view), has long been the source of much controversy, resulting in a dualistic con-
ception of probability as being jointly epistemic (oriented towards ‘subjective’ belief
assessment) and aleatory (focussed on the ‘objective’ description of the outcomes
of ’random’ experiments) with most of the present-day emphasis on the latter.
Lowrance and Garvey [894] were early supporters of the use of Shafer’s math-
ematical theory of evidence as a framework for evidential reasoning in expert sys-
tems.
Gordon and Shortliffe [529] argued that the advantage of belief function theory
over other previous approaches is its ability to model the narrowing of the hypothesis
set with the accumulation of evidence in expert reasoning. There, experts rely on
evidence which typically supports whole subsets of hypothesis in the hypothesis
space at hand.
Strat and Lowrance [1307] focused on the difficulty of generating explanations
for the conclusions drawn by evidential reasoning systems based on belief functions,
and presented a methodology for augmenting an evidential-reasoning system with a
versatile explanation facility.
Curley and Golden [219] constructed a very interesing experiment in legal situ-
ations, in which the belief assessor is interested in judging the degree of support or
justification that the evidence affords hypotheses, to determine (a) if subjects could
be trained in the meanings of belief-function responses; and, (b) once trained, how
they use those belief functions in a legal setting. They found that subjects could use
belief functions, identified limits to belief functions’ descriptive representativeness,
and discovered patterns in the way subjects use belief functions which inform our
understanding of their uses of evidence.
from a Bayesian probability model in that one does not condition on those parts
of the evidence for which no probabilities are specified. The significance of this
difference in conditioning assumptions is illustrated with two examples giving rise
to identical belief functions but different Bayesian probability distributions.
Lindley [857] brought forward arguments, based on the work of Savage and De
Finetti, purporting to show that probability theory is the unique correct description
of‘uncertainty. Other authors have argued that the additivity axiom is felt to be too
restrictive in case one has to deal with uncertainty deriving from partial ignorance
[802].
In a paper published on Cognitive Science, Shafer and Tversky [1175] described
and compared the semantics and syntax of the Bayesian and belief function lan-
guages, and investigated designs for probability judgment afforded by the two lan-
guages.
Klopotek and Wierzcho [] claimed in 2002 that previous attempts to interpret
belief functions in terms of probabilities had failed to produce a fully compatible
interpretation [?, ?, ?], and proposed in reponse three models: a ‘marginally correct
approximation’, a ‘qualitative’ and a ‘quantitative’ model.
One of the very first counterintuitive parts of belief function is its definition, it
takes a simple minded ”direct” sum of the measures of focal elements. In this com-
putation the measures of the intersections are ignored. As a consequence atomic in-
tersections of focal elements (the intersection contains no other focal elements) have
zero belief measure. One may wonder: is belief function a reasonable and mathe-
matically sound measure? Our answer is yes. Belief functions can be viewed as a
generalization of random variables, however, the generalization sounds too ”nave”
in the first impression: In classical theory, the inner probability is the least upper
bound of the direct sum of the probabilities of all masses. Belief functions adopt the
same definition; it is, the least upper bound of the direct sum of the basic probabil-
ities of all focal elements. Indeed, they are the same, however, in classical theory
all masses are disjoints, while focal elements may not. The goal of this paper is to
show that in spite of these, belief functions are sound measures.
We use measure theoretic methods to describe the relationship between the
Dempster Shafer (DS) theory and Bayesian (i.e. probability) theory. Within this
framework, we demonstrated the relationships among Shafer’s belief and plausibil-
ity, Dempster’s lower and upper probabilities and inner and outer measures. Demp-
ster’s multivalued mapping is an example of a random set, a generalization of the
concept of the random variable. Dempster’s rule of combination is the product mea-
sure on the Cartesian product of measure spaces. The independence assumption of
Dempster’s rule arises from the nature of the problem in which one has knowledge
of the marginal distributions but wants to calculate the joint distribution.
In ‘Perspectives on the theory and practice of belief functions’ [1165] Shafer re-
viewed in 1992 the work conducted until then to the interpretation, implementation
66 3 Understanding belief functions
and mathematical foundations of the theory, placing belief theory within the broader
topic of probability and the probability itself within artificial intelligence.
Wasserman [1394], although agreeing with Shafer that there are situations where
belief functions are appropriate, raised a number of questions about these objects,
motivated by statistical considerations. He argued that the betting paradigm has a
status in the foundations of probability of a different nature than that of the canonical
examples that belief function theory is built on. In addition to using the betting
metaphor to make judgments, we use this analogy at a higher level to judge the
theory as a whole. The author’s thesis is that a similar argument would make belief
functions easier to understand.
Wasserman also questioned the separation of belief from frequency, arguing that it
is a virtue of subjective probability that it contains frequency probability as a special
case, by way of de Finetti’s theory of exchangeability, and mentioned the notion of
asymptotics in belief functions.
Judea Pearl [1022, 1024, 1026, 1020] also contributed to the debate in the early
Nineties. He claimed that belief functions have difficulties representing incomplete
knowledge, in particular knowledge expressed in conditional sentences, and that
encoding if-then rules as belief function expressions (a standard practice at the time)
leads to counterintuitive conclusions. As for the belief function updating process,
he found that the latter violates what he called ‘basic patterns of plausibility’ and
the resulting beliefs cannot serve as a basis for rational decisions. As for evidence
pooling, although belief functions offer in his view a rich language for describing
the evidence, the available combination operators cannot exploit this richness and
are challenged by simpler methods based on likelihood functions.
Detailed answers to several of the criticisms raised by Pearl were provided by
Smets in 1992 [1240], within his transferable belief model interpretation.
In the same year Dubois and Prade tried to clarify some aspects of the theory of
belief functions, addressing most of the questions raised by Pearl in [1022, 1026].
They pointed out that their mathematical model can be useful beyond a theory of
evidence, for the purpose of handling imperfect statistical knowledge. They com-
pared Dempster’s rule of conditioning with upper and lower conditional probabil-
ities, concluding that Dempster’s rule is a form of updating, whereas the second
operation expresses ‘focussing’ (see Section 4.3). Finally, they argued that the con-
cept of focusing models the meaning of uncertain statements in a more natural way
than updating.
Wilson [1419] also responded to Pearl’s criticisms. He noted that Pearl criti-
cised belief functions for not obeying the laws of Bayesian belief, whereas these
laws lead to well-known problems in the face of ignorance, and seem unreasonably
restrictive. He argued that it is not reasonable to expect a measure of belief to obey
Pearl’s sandwich principle, whereas the standard representation of ‘if-then’ rules in
Dempster-Shafer theory, criticised by Pearl, is in his view justified and compares
favorably with a conditional probability representation.
Shafer [1167] addressed Pearl’s remarks by arguing that the interpretation of
beliefs function is controversial because the interpretation of probability is contro-
3.2 Genesis and debate 67
The fact that belief functions possess multiple interpretations, all of which sensi-
ble from a certain angle, has ignited, especially in the early Nineties, a debate to
which many scholars have contributed. Halpern and Fagin, for instance, underlined
in [588] two different views of belief functions, as generalized probabilities (cfr.
Sections 3.1.2 and 3.1.3) and as mathematical representation of evidence (that we
completely neglected in our brief summary of Chapter 2). Their claim is that many
problems about the use of belief functions can be explained as a consequence of a
confusion of these two interpretations. As an example, they cite comments by Pearl
[1024, 1025] and others that belief theory leads to incorrect or counterintuitive an-
swers in a number of situations.
Philippe Smets was particularly active in this debate. In [1244] he gave an axiomatic
justification of the use of belief functions to quantify partial beliefs, while in [1240]
he rebuffed Pearl’s criticisms [1024] by accurately distinguishing the different epis-
temic interpretations of the theory of evidence (resounding Halpern et al. in [588]),
focusing in particular on his transferable belief model (Section 3.3.1).
In 1992, Halpern and Fagin [589] argued that there are at least two different ways
of understanding belief functions: as generalised probability functions (technically,
as inner measures induced by a probability function), or as a way of representing
evidence which, in turn, can be understood as a mapping from probability functions
to probability functions.
Under the first interpretation, they argue, it makes sense to think of updating a be-
lief function because of its nature of generalised probability. If we think of belief
functions as a mathematical representation of evidence, instead, using combination
rules to merge two belief functions is the natural thing to do. Problems that have
been pointed out with the belief function approach, therefore, can be explained as a
consequence of confounding these two semantics.
68 3 Understanding belief functions
theory (especially in what was then the situation for decision making) and its po-
tential usefulness. They also considered methodological criticisms of the approach,
focusing primarily on the alleged counterintuitive nature of Dempster’s combination
formula, showing that such results are the result of its misapplication.
Philippe Smets [1250] wrote extensively on the axiomatic justification of the
use of belief functions [1244]. Essentially, he postulated that degrees of belief are
quantified by a function in [0,1] that give the same degrees of beliefs to subsets
that represent the same propositions according to an agent’s evidential corpus. The
impact of coarsening and refining a frame of discernment is derived, as is the condi-
tioning process. A closure axiom is proposed that asserts that any measure of belief
can be derived from other measures of belief defined on less specific frames.
In [8, 10], we present an axiomatic justification for the fact that quantified beliefs
should be represented by belief functions. We show that the mathematical function
that can represent quantified beliefs should be a Choquet capacity monotone of or-
der 2. In order to show that it must be monotone of order infinite, thus a belief
function, we propose several extra rationality requirements. One of them is based
on the negation of a belief function, a concept introduced by Dubois and Prade [2].
Shafer himself [4] produced a very extensive, and somewhat frustrated, docu-
ment addressing the various contributions to the discussion on the interpretation of
belief functions. The main argument is that disagreement on the ‘correct’ meaning
of belief functions boils down, fundamentally, to a lack of consensus on how to
interpret probability, as belief functions are built on probability. In response, he il-
lustrated his own constructive interpretation of probability, probability bounds and
belief functions and related it to the views and concerns of Pearl, Smets, Ruspini
and others making use of the main canonical examples for belief functions, namely
the partially reliable witness and its generalization, the randomly coded message.
Neapolitan [981] further elaborated on Shafer’s defense in two ways: (1) by
showing that belief functions, as Shafer intends them to be interpreted, use proba-
bility theory in the same way as the traditional statistical tool, significance testing;
and (2) describing a problem for which the application of belief functions yields a
meaningful solution, while a Bayesian analysis does not.
3.3 Frameworks
A number of researchers have proposed their own variation on the theme of be-
lief functions, with various degrees of success. Arguably the frameworks with
the highest impact are: Smets’ Transferable Belief Model (TBM) (Section 3.3.1);
Kohlas and Monnet’s Theory of Hints (Section 3.3.2), and the so-called Dezert-
Smandarache Theory (DSmT, Section 3.3.3).
Other significant frameworks, thanks to the mathematical interest or founda-
tional contribution include Dempster and Liu’s Gaussian belief functions (Sec-
tion 3.3.4); Ivan Kramosil’s probabilistic analysis (Section 3.3.5), Lowrance and
Strat’s approach (Section 3.3.8) and Grabisch’s lattice-theoretical formulation (Sec-
tion 3.3.8). A number of scientists have analysed the extension of the theory of
evidence to interval or credal belief structures (Section 3.3.7).
We conclude the section by surveying in less detail other proposed frameworks
in Section 3.3.8.
In his 1990’s seminal work [1218] Philippe Smets introduced the Transferable Belief
Model (TBM) as a framework based on the mathematics of belief functions for the
quantification of a rational agent’s degrees of belief [1251]. The TBM is by far the
3.3 Frameworks 71
approach to the theory of evidence which achieved the largest diffusion and impact.
Its philosophy, however, is very different from Dempster’s (and Shafer’s) original
probabilistic semantics for belief functions, as it cuts all ties with the notion of an
underlying probability measure to employ belief functions directly to represent an
agent’s belief.
As usual here we have room only for a brief introduction to the principles of
the TBM. Aspects of the framework which concern evidence combination ([1225],
Section ??), probability transformation (Section 4.4.2), conditioning (Section 4.3.2)
and decision making (Section 4.5.1) are discussed in more detail in Chapter 4. In
[1276, 1221] (but also [1255] and [1277]) the reader will find an extensive explana-
tion of the features of the transferable belief model. An interesting criticism of the
TBM in terms of Dutch books is conducted by Snow in [1283].
As far as applications are concerned, the transferable belief model has been
employed to solve a variety of problems, including data fusion [1267], diagnostics
[1252] and reliability issues [1242]. In [418] Dubois et al. used the TBM approach
on an illustrative example: the assessment of the value of a candidate.
Credal and pignistic levels In the TBM, beliefs are represented at two distinct
levels:
1. a credal level (from the Latin word credo for ‘I believe’) where the agent’s
degrees of beliefs in a phenomenon are maintained as belief functions;
2. a pignistic level (from Latin pignus, ‘betting’) where decisions are made,
through an appropriate probability function, called the pignistic function:
X m(B)
BetP (A) = . (3.14)
|B|
B⊆A
The credal level At the credal level, each agent is characterised by an ‘evidential
corpus’, a collection of pieces of evidence they have collected in their past. This ev-
idential corpus has an effect on the frame of discernment associated with a certain
problem (e.g., who is the culprit of a certain murder). As in logic-based approaches
to belief theory (compare Chapter 5, Section 5.6), a frame of discernment Θ is a
collection of possible worlds (‘interpretations’, in the logical language), determined
by the problem at hand, and a (logical) proposition is mapped the subset of possible
worlds in which it holds true.
The basic assumption postulated by the TBM is that the impact of a piece of evi-
dence on an agent’s degrees of belief consists in an allocation of parts of an initial
unitary amount of belief among all the propositions in the frame of discernment.
The mass m(A) is the part of the agent’s belief that supports A, i.e. that the ‘actual
world’ θ ∈ Θ is in A ⊂ Θ and that, due to lack of information, does not support any
strict subproposition of A.
As in Shafer’s work, then, each piece of evidence directly supports a proposition. To
underline the lack of any underlying probabilistic model, Smets uses the terminol-
ogy ‘basic belief assignment’, rather then Shafer’s ‘basic probability assignment’,
to refer to mass assignments. Note that Smets does not claim that every form of
72 3 Understanding belief functions
The pignistic level At the pignistic level, the pignistic function is axiomatically
derived by Smets in [1232], as the only transformation which meets a number of
rationality requirements:
– the probability value BetP (x) of x only depends on the mass of propositions
containing x;
– BetP (x) is a continuous function of m(B), for each B containing x;
– BetP is invariant to permutations of the elements of the frame Θ;
– the pignistic probability does not depend on propositions not supported by the
belief function which encodes the agent’s beliefs.
Note that, in time, Smets has justified the pignistic transform first in terms of the
Principle of Insufficient Reason, then as the only (sic) transform satisfying a linear-
ity constraint. However, as we show in Chapter 10, the pignistic transform is not the
only probability approximation to commute with convex combination.
Decisions are finally made in a utility theory setting, in which Savage’s axioms hold
[1113] (cfr. Section 4.5.1).
An interesting point Smets makes is that, although beliefs are necessary ingredi-
ents for our decisions, that does not mean that beliefs cannot be entertained without
being manifested in terms of actual behaviour [1281]. In his example, one may have
some beliefs about the status of a traffic light in a particular road in Brussels even
though they are not in Brussels at the moment and they do not intend to make a
decision based on this belief. This is in stark constrast with behavioural interpreta-
tions of probability, of which Walley’s imprecise probability theory is a significant
example (as we will see in Section 5.1).
We will also see in Chapter 4, Section 4.4.2 that the pignistic transform is only
one of many possible probability transforms, each of which comes with a different
rationale.
Unnormalised belief functions Within the TBM, positive basic belief values
can be assigned to the empty set itself, generating unnormalized belief functions
[1239, 1238]. Unnormalised belief functions are indeed a sensible representation
under an ‘open-world’ assumption that the hypothesis set (frame) itself is not known
with certainty (in opposition to the ‘closed-world’ situation in which all alternative
hypotheses are perfectly known).
3.3 Frameworks 73
Kohlas and Monney introduced a notion of belief functions on real numbers in-
spired by Dempster’s multi-valued setting. Indeed, some of the relations introduced
in [1222] and [756, 745] had already been introduced in [344]. This idea has de-
veloped into their mathematical theory of hints [744, 753, 747, 959] (see [745] as
introduction, and the monograph [756] for a detailed exposition). Hints ([753]) are
bodies of information inherently imprecise and uncertain, that do not point to pre-
cise answers but are used to judge hypotheses, leading to support and plausibility
functions similar to those introduced by Shafer. The following introduction to the
theory of hints is abstracted from [374].
Functional models Functional models [756] describe the process by which a datum
x is generated from a parameter θ and some random element ω. The set of possible
values of the data x is denoted by X, whereas the domain of the parameter θ is
denoted by Θ and the domain of the random element ω is denoted by Ω.
The data generation process is specified by a function
f : Θ × Ω → X.
If θ is the correct value of the parameter and the random element ω occurs, then
the data x is uniquely determined by x = f (θ, ω). The function f together with a
probability measure p : Ω → [0, 1] constitute a functional model for a statistical
experiment E.
A functional model induces a parametric family of probability distributions (statis-
tical specifications) on the sample space X, which is usually assumed a priori in
modelling statistical experiments, via:
X
pθ (x) = p(ω). (3.15)
ω:x=f (θ,ω)
74 3 Understanding belief functions
Note that different functional models may induce the same statistical specifications,
i.e., they contain more information that the families of probability distributions
(3.15).
Assumption-based reasoning Consider an experiment E represented by a func-
tional model x = f (θ, ω) with given probabilities p(ω) for the random elements.
Suppose that the observed outcome of the experiment is x. What can be inferred
about the value of the unknown parameter θ?
The basic idea of assumption-based reasoning is to assume that a random ele-
ment ω generated the data, and then determine the consequences of this assumption
on the parameter. The observation x induces an event in Ω, namely:
.
vx = {ω ∈ Ω|∃θ ∈ Θ : x = f (θ, ω).}
Since we know that vx ⊂ Ω has happened, in a Bayesian setting we need to con-
dition the initial probabilities p(ω) with respect to vx , obtaining p0 (ω) = Pp(ω)
(vx ) ,
0 0
P
whose probability measure is trivially P (A) = ω∈A p (ω).
Note that it is unknown what element ω ∈ vx has actually generated the ob-
servation. Assuming ω was the cause, the possible values for the parameter θ are
obviously restricted to the set:
Tx (ω) = {θ ∈ Θ|x = f (θ, ω)}.
Summarising, an observation x in a functional model f, p generates a structure
Hx = (vx , P 0 , Tx , Θ), (3.16)
which Kohlas and Monney call a hint.
A theory of hints In general, if Θ denotes the set of possible answers to a question
of interest, then a hint on Θ is a quadruple of the form H = (Ω, P, Γ, Θ), where Ω
is a set of assumptions, P is a probability measure on Ω reflecting the probability
of the different assumptions, and Γ is a mapping between the assumptions and the
power set of Θ, Γ : Ω → 2Θ . If assumption ω is correct, then the answer in certainly
within Γ (ω).
Note that the mathematical setting of hints is identical to Dempster’s multivalued
mapping framework described in Section 3.1.1. Degrees of support and plausibility
can then be computed as in (3.1), (3.2).
What is (arguably) different is the interpretation of degrees of support/plausibility.
While Dempster interpreted them as lower and upper bounds to the amount of prob-
ability assigned to A, Kohlas and Monney do not assume that there exists an un-
known true probability of A, but rather adopt Pearl’s probability of provability in-
terpretation [1023] (see Section 5.6.7), related to the classical AI paradigm called
‘truth-maintenance systems’ [811]. These systems contain a symbolic mechanism
for identifying the set of assumptions needed to create a proof of a hypothesis A,
so that when probabilities are assigned to the assumptions support and plausibility
functions can be obtained. In the theory of hints, these assumptions form an ar-
gument for the hypothesis A, and their probability is the weight assigned to each
argument.
3.3 Frameworks 75
Functional models and hints Consider again a functional model f (ω, θ) and an
observed datum x. Given a hint (3.16), any hypothesis H ⊆ Θ regarding the correct
value of the parameter can then be evaluated with respect to it.
The arguments for the validity of H are the chance elements in the set ux = {ω ∈
vx : Tx (ω) ⊆ H}, with degree of support P 0 (ux (H)), those compatible with H are
vx = {ω ∈ vx : Tx (ω) ∩ H 6= ∅}, with degree of plausibility P 0 (vx (H))
Conversely, the concept of hint to represent the functional model itself as well
as the observed data. Therefore, not only the result of the inference can be expressed
in terms of hints, but also the experiment that is used to make the inference can be
expressed with hints.
A functional model (??) can be represented by the hint:
Hf = (Ω, P, Γf , X × Θ)
where
Γf (ω) = {(x, θ) ∈ X × Θ|x = f (θ, ω)},
while an observation x can be represented by the hint
Ox = ({v}, P, Γ, X × Θ)
where v is the assumption stating that x has been observed, which is true with prob-
ability P ({v}) = 1, and Γ (v) = {x} × Θ.
This equation is justified by the fact that no restriction can be imposed on Θ when
the observed value x is the only piece of information that is being considered. The
hints Hf and Ox represent two pieces of information that can be put together in
order to determine the information on the parameter that can be derived from the
model and the data, resulting in the combined hint Hf ⊕ Ox .
By marginalising to Θ we obtain the desired information on the value of θ: it is easy
to show that the result is (3.16).
By extension, the hint derived by a series of n repeated experiments Ei under
functional models fi , i = 1, ..., n, whose outcomes are x1 , ..., xn can be written as:
The basis of the DSmT [394, 395, 397, 845] is the refutation of the principle of
the third excluded middle, since for many problems (especially in sensor fusion)
the nature of hypotheses themselves (i.e., the elements of the frame of discernment)
is known only vaguely. As a result a ‘precise’ description of the set of possible
outcomes is difficult to obtain, so that the exclusive elements θ cannot be properly
identified or separated.
76 3 Understanding belief functions
∅, θ1 , ..., θ3 , θ1 ∪ θ2 , ..θ2 ∪ θ3 , · · · θi ∩ θj , · · · θ1 ∪ θ2 ∪ θ3 ,
and therefore has cardinality 19. In general it will follow the sequence of Dedekind
numbers1 1,2,5,19,167,7580, ... Note that classical complement is not contemplated,
as DSmt refuses the law of excluded third.
A hyperpowerset is also called a free model in DSmT. If the problem at hand
allows some constraints to be enforced on the hypotheses which form the frame
(e.g. the non existence or the disjointness of some elements of DΘ ) we obtaine a
so-called hybrid model. The most restrictive hybrid model is the usual frame of
discernment of Shafer’s formulation, in which all hypotheses are disjoint.
For all A ∈ DΘ it still holds that Bel(A) ≤ P l(A) - however, in the free model
(whole hyperpowerset) we have P l(A) = 1 for all A ∈ DΘ , A 6= ∅.
1
Sloane N.J.A., The On-line Encyclopedia of Integer Sequences 2003, (Sequence No.
A014466), http://www.research.att.com/njas/sequences/.
3.3 Frameworks 77
Rules of combination Under the free DSmT mode, the rule of combination be-
comes: X
m(C) = m1 (A)m2 (B) ∀C ∈ DΘ ,
A,B∈D Θ ,A∩B=C
where this time A and B are elements of the hyperpowerset (i.e., conjunctions and
disjunctions of elements of the initial frame). Just like the TBM’s disjunctive rule,
this rule of combination is commutative and associative.
Obviously, the formalism is potentially very computationally expensive. The au-
thors, however, note that in most practical applications bodies of evidence allocate
a basic belief assignment only to few elements of the hyper-power set.
In case of a hybrid model things get quite more complicated. The rule of com-
bination becomes:
h i
m( A) = φ(A) S1 (A) + S2 (A) + S3 (A)
(S2 and S3 have similar expressions, which depend on the list of sets forced to be
empty by the constraints of the problem).
A generalization of the classic combination rules to DSm hyper-power sets was
also proposed by Daniel in [312]. Daniel has also contributed to the development of
DSmT in [314].
Interestingly, an extension of theory of evidence with non-exclusive elementary
propositions was separately proposed by Horiuchi in 1996 [622].
The notion of Gaussian belief function [864] was proposed by A. Dempster [339,
1166] and formalized by L. Liu in 1996 [868].
Technically, a Gaussian belief function is a Gaussian distribution over the mem-
bers of the parallel partition of an hyperplane (Figure 3.3). The idea is to encode
each proposition (event) as a linear equation, so that all parallel sub-hyperplanes of
a given hyperplane are possible focal elements and a Gaussian belief function is a
Gaussian distribution over these sub-hyperplanes. As focal elements (hyperplanes)
cannot intersect, the framework is less general than stadard belief theory, where the
focal elements normally have non-empty intersection. However, it is more general
than Shafer’s original finite formulation as f.e.s form a continuous domain.
By adapting Dempster’s rule to the continuous case, Liu also derived a rule
of combination and proved its equivalence to Dempster’s geometrical description
[339]. In [869], Liu proposed a join-tree computation scheme for expert systems
78 3 Understanding belief functions
Fig. 3.3. Graphical representation of the concept of Gaussian belief function (from [868])
.
using Gaussian belief functions, for he proved their rule of combination satisfies the
axioms of Shenoy and Shafer [1188].
The framework was applied to portfolio evaluation in [870].
its total extension is the relation C ∗ ⊂ P(Ω) × P(Θ) such that (X, Y ) ∈ C ∗ iff
there exist x ∈ X, y ∈ Y such that (x, y) ∈ C. A partial generalized compatibility
relation is then a mapping is a relation defined on a proper subset of P(Ω) × P(Θ)
that can be extended to a total relation.
Signed belief functions A note is due to the notion of signed belief function [774],
in which the domain of classical belief functions is replaced by a measurable space
equipped by a signed measure, i.e. a σ-additive set function which can take values
also outside the unit interval, including the negative and infinite ones. An assertion
analogous to the Jordan decomposition theorem for signed measures is stated and
proved [785], according to which each signed belief function, when restricted to its
finite values, can be defined by a linear combination of two classical probabilistic
belief functions, supposing that the basic set is finite.
A probabilistic analysis of Dempster’s rule is developed [789], and an extension of
Dempster’s rule to signed belief functions is formulated [784].
Relation with Shafer’s ‘coded messages’ Hummel and Landy’s space of Boolean
opinions of experts (Section 3.3.6) is equivalent to Shafer’s coded-message formula-
tion [217]. Moreover, the combination of coded messages, in which a pair of codes
ci , cj is chosen independently with probability pi pj , and the combination of ele-
ments in the space of Boolean opinions coincide.
The authors’ point, in introducing the spaces of experts, is that the requisite of (con-
ditional) independence includes not only the choice of messages, but also an as-
sumption that the message is formed by the intersection of the subsets designated
by the constituent messages.
and X
either pω (λ) = 1 or pω (λ) = 0 ∀λ, (3.17)
λ∈Λ
with the last constraint describing the case in which expert ω has no opinion on the
matter.
This setting generalises Dempster-Shafer theory, in which probabilistic opinions
are used only in terms of test for zero. The indicator functions
1 pω (λ) > 0
xω (λ) = (3.18)
0 pω (λ) = 0
where the probabilistic opinions pω meet the constraint (3.17), and the following
binary operation is defined:
such that
E = E1 × E 2 , µ({(ω1 , ω2 )}) = µ1 ({ω1 }) · µ2 ({ω2 })
and
pω1 (λ)pω2 (λ)kλ−1
p(ω1 ,ω2 ) (λ) = P 0 0 −1
.
λ0 pω1 (λ )pω2 (λ )kλ0
Belief measures from statistics of experts Let then X be the set of Boolean opin-
ions of experts. We can define:
the intervals [ai , bi ] specifying an IBS are not unique. Upper and lower bounds to m
determine interval ranges for belief and plausibility functions, and also for pignistic
probabilities.
Combination of IBSs can be defined either as:
n o
m0 = m = m1 ~ m2 |m1 ∈ m1 , m2 ∈ m2 , (3.20)
82 3 Understanding belief functions
Yager’s interval valued focal weights Yager [1491] also considers a similar situa-
tion in which the masses of the focal elements lie in some known interval, allowing
us to model more realistically situations in which the basic probability assignments
cannot be precisely identified.
As he points out, this amounts to uncertainty on the actual belief structure of the pos-
sibilistic type. Measures of plausibility and belief and possible rules of combination
for interval valued belief structures are introduced.
where p ∈ [0, ∞] and kkp denotes the classical Lp norm. Two interval-valued b.p.a.s
can be combined via (cfr. conjunctive combination):
X
M (C) = M (A) ? M (B),
A∩B=C
Many other frameworks have been proposed - here we list them, roughly in the
order of their impact in terms of citations (as of June 2016) [882, 1066, 1444]. For
some of them not many details are available, including Zarley’s evidential reasoning
system [1543], Fister and Mitchell’s ‘entropy based belief body compression’ [481],
Peterson’s ‘Local Dempster Shafer Theory’ [1036], Mahler’s customisation of belief
theory via a priori evidence [907].
For all the others, we provide in the following a brief description.
Lowrance and Strat’s framework Lowrance and Garvey’s early evidential rea-
soning framework [503, 896] uses Shafer’s theory belief functions in its original
form for encoding evidence.
Their original contribution in [503] was a set of inference rules for computing
belief/plausibility intervals of dependent propositions, from the mass assigned to the
2
Terminology has been changed to that of this book.
84 3 Understanding belief functions
focal elements. Their framework for evidential reasoning systems [896], instead,
focusses more on the issue of specifying a set of distinct frames of discernment,
each of which defines a set of possible world situations, and their interrelationships,
and establishing paths for the bodies of evidence to move through distinct frames
by means of evidential operations, eventually converging on spaces where the target
questions can be answered.
As such, their work is quite related to the author of this Book’s algebraic analysis
of families of compatible frames [222, 227, ?], and his belief modeling regression
approach to pose estimation in computer vision [288, 280, ?, ?].
This is done through a compatibility relation, a subset ΘA,B ⊂ ΘA × ΘB of the
Cartesian product of the two related frames (their common refinement in Shafer’s
terminology). A compatibility mapping taking statements Ak in ΘA to obtain state-
ments of ΘB can then be defined as:
n o
CA7→B (Ak ) = bj ∈ ΘB (ai , bj ) ∈ ΘA,B , ai ∈ Ak .
Commonality, possibility and necessity measures with the usual properties can also
be defined.
Interestingly, any capacity is a belief function iff L is linear (in other words, a
total order) [65]. The approach has been very recently further extended by C. Zhou
[1553, 1554].
‘Improved’ evidence theory In [468] Fan and Zuo (2006) ‘improve’ standard ev-
idence theory by introducing a fuzzy membership function, an importance index,
and a conflict factor to address the issues of scarce and conflicting evidence, and
propose new decision rules.
Γ (α + β) α−1
f (α, β) = p (1 − p)β−1 ,
Γ (α)Γ (β)
where α, β are the parameters specifying the density function, and Γ denotes the
gamma distribution, for frames of discernments of arbitrary atomicity in a three-
dimensional representation with parameters r, s and a.
A mapping between this three-dimensional representation and belief functions is
then applied as follows:
r s
Bel(A) = , Dis(A) = Bel(Ac ) = .
r+s+2 r+s+2
After this mapping, Dempster and consensual combination can be compared [674].
Each Aωj is called a granule, and the collection {Aωj , j} is the granule set associ-
ated with ω. In rough words, each focal element of standard belief theory is broken
down into a union of disjoint granules, each with an attached (conditional) probabil-
ity. The mass of each focal element A ⊂ Θ in this extended multi-valued framework
can then be expressed as:
X
m(A) = P (A|ω)P (ω),
ω∈Ω
i.e. by multiplying the conditional probability expressing the mapping by the prior
probability of the elements of Ω.
In this context, Dempster’s rule is used to combine belief update rather than ab-
solute belief, obtaining results consistent with Bayes’ theorem. The combined belief
intervals form probability bounds under two conditional independence assumptions
that are weaker than those of PROSPECTOR and MYCIN.
A further generalisation of Yen’s probabilistic multi-valued mapping is pre-
sented in [880] in which uncertain relations ω → A between elements ω ∈ Ω
and subsets A of Θ replace the disjoint partitions {Aωi , i} of Θ (see Definition 31,
item 2.) considered by Yen, and mass functions on these uncertain relations gener-
alise conditional probabilities.
Interestingly, evidential mappings that uses mass functions to express the uncertain
relationships have been separately introduced by Guan and Bell [556].
Belief with Minimum Commitment In [631] a new approach for reasoning with
belief functions, fundamentally unrelated to probabilities and consistent with Shafer
88 3 Understanding belief functions
and Tversky’s canonical examples, is proposed. Basically the idea is to treat all
available partial information, in the form of marginal or conditional beliefs, as con-
straints the overall belief function needs to satisfy. The principle of minimal com-
mitment then prescribed the least committed such belief function (in the usual weak
inclusion order (3.9)).
A theory of mass assignments Baldwin [51] [55] proposed a theory of mass as-
signments for evidential reasoning, which mathematically correspond to Shafer’s
basic probability assignments, but are treated using a different rule of combination
which uses an assignment algorithm subject to constrains derived from operations
research. An algebra for mass assignments is given, and a conditioning process is
proposed which generalizes Bayesian updating to the case of updating prior mass
assignments with uncertain evidence expressed as another mass assignment.
Relation-based evidential reasoning In [30] the authors argue that the difficult and
ill-understood task of estimating numerical degrees of belief for the propositions to
be used in evidential reasoning (an issue referred to as ‘inference’, see Chapter 4,
Section 4.1) can be avoided by replacing estimations of absolute values with more
defensible assignments of relations. The claim that it is difficult to justify decisions
based on numerical degrees of belief, leading to a framework based on representing
arguments such as ‘evidence e supports alternative set A’ and their relative strengths,
as in ‘e1 supports A1 better than e2 supports A2 ’.
The authors prove that belief functions (in a precise sense) are equivalent to a special
case of the proposed method, in which all arguments are based on only one piece of
evidence.
and no normalization is necessary, because conflict does not arise (under the as-
sumption of independence).
to the true incidence sets. A set of lower bounds is called an interval structure [1430]
if it meets the following axioms:
1. F (∅) = ∅;
2. F (Θ) = W ;
3. F (A ∩ B) = F (A) ∩ F (B);
4. F (A ∪ B) ⊇ F (A) ∪ F (B).
By observing the close relationships between the above qualitative axioms and the
quantitative axioms of belief functions, we may regard the lower bound of an in-
terval structure as non-numeric belief and the upper bound as the corresponding
nonnumeric plausibility.
92 3 Understanding belief functions
Chapter outline
Each section of this chapter deals with one of the fundamental elements of reasoning
with belief function: the inference problem (Section 4.1); the mathematics of evi-
dence combination (Section 4.2); the notion of conditional belief function (Section
4.3); the approaches proposed to limit the computational complexity of an approach
based on power sets (Section 4.4), in particular those based on local propagation
on graphical models (Section 4.4.4); making decision under uncertainty with belief
functions (Section 4.5).
Section 4.7 illustrates the set of tools currently available, based on belief theory,
which allow working scientists to address classification, estimation and regression
problems, often in connection with machine learning.
93
94 4 Reasoning with belief functions
The last part of the Chapter is devoted to more advances topics, such as the for-
mulation of belief functions on arbitrary domains (including the real line, Section
4.6), and the various mathematical facets of belief functions as complex mathemat-
ical objects (Section 4.8).
4.1 Inference
Inference is the first step in any estimation/decision problem. In this context, by
inference we mean constructing a belief function from the available evidence. Now,
belief functions can be constructed from both statistical data (quantitative inference)
and experts’ preferences (qualitative inference).
The question of how to transform a set of available data (typically in the form
of a series of trials) into a belief function (or inference problem) is crucial to al-
low practical statistical inference with belief functions. The data can be of different
nature: statistical [Seidenfeld78], logical, expressed in terms of mere preferences,
subjective. The problem has been studied by scholars of the caliber of Shafer, Sei-
denfeld, Walley, and others, who delivered an array of approaches to the problem.
Unfortunately, different approaches to the inference problem produce different be-
lief functions from the same statistical data.
A very general exposition by Chateauneuf and Vergnaud providing some founda-
tion for a belief revision process, in which both the initial knowledge and the new
evidence is a belief function can be found in [170]. We give here a brief survey of
the main proposals on this topic.
Concerning inference from statistical data, the two dominant approaches are Demp-
ster’s method based on an auxiliary variable, and Wasserman and Shafer’s one based
on the likelihood function. The problem can be posed as follows.
Consider a statistical model
n o
f (x; θ), x ∈ X, θ ∈ Θ ,
where X is the sample space and Θ is a parameter space. Having observed x, how
do we quantify the uncertainty about the parameter θ, without specifying a prior
probability distribution?
– Likelihood principle: the desired belief function BelΘ (·; x) should be based
only on the likelihood function L(θ; x) = f (x; θ);
– Compatibility with Bayesian inference: when a Bayesian prior P0 is available,
combining it with BelΘ (·, x) using Dempster’s rule should yield the Bayesian
posterior:
BelΘ (·; x) ⊕ P0 = P (·|x);
– Principle of minimal commitment: among all the belief functions satisfying
the previous two requirements, BelΘ (·; x) should be the least committed (see
Sectionsoa:least-commitment-principle).
These constraints lead to uniquely identify BelΘ (·; x) as the consonant belief func-
tion with contour function (plausibility of singletons) equal to the normalized like-
lihood:
L(θ; x)
pl(θ; x) = .
supθ0 ∈Θ L(θ0 ; x)
Its plausibility function is:
supθ∈A L(θ; x)
P lΘ (A; x) = sup pl(θ; x) = , ∀A ⊆ Θ,
θ∈A supθ∈Θ L(θ; x)
while the corresponding random set is: (Ω, B(Ω), µ, Γx ) with Ω = [0, 1], µ =
U([0, 1]) and n o
Γx (ω) = θ ∈ Θpl(θ; x) ≥ ω .
θy (1 − θ)n−y
pl(θ; x) = ,
θ̂y (1 − θ̂)n−y
Pn
where y = i=1 xi and θ̂ is the maximum likelihood (MLE) estimate. As an ex-
ample, for n = 20 and y = 10 we get the consonant belief function of Figure ??.
Wasserman [] showed that the likelihood-based belief function can indeed be
used to handle partial prior information, and related it to robust Bayesian inference.
MORE FROM WASSERMAN
Dempster’s auxiliary variable Suppose that the sampling model X ∼ f (x; θ) can
be represented by an “a-equation” of the form
X = a(θ, U ),
0.7
0.7
BelΘ([0,θ];x),PlΘ([0,θ];x)
0.6
0.6
pl(z’T)
0.5
0.5
0.4
0.4
0.3
0.3
0.2 BelΘ([0,θ];x)
0.2
Pl ([0,θ];x)
Θ
0.1
0.1
0
8.5 9 9.5 10 10.5 11 11.5 12 0
z’T 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
θ
Fig. 4.1. Plausibility function (left) and cumulative distribution function (right) generated in
the Bernoulli sample’s example (courtesy Thierry Denoeux).
X = Fθ−1 (U ).
The equation X = a(θ, U ) defines a multi-valued mapping (or, equivalently, a
“compatibility relation”) as follows:
n o
Γ : U → Γ (U ) = (X, θ) ∈ X × ΘX = a(θ, U ) .
Under the usual measurability conditions (see 3.1.5), the probability space (U, B(U), µ)
and the multi-valued mapping Γ induce a belief function BelΘ×X on X × Θ. Con-
ditioning (by Dempster’s rule) BelΘ×X on θ yields then the desired sampling dis-
tribution f (·; θ) on X - conditioning it on X = x gives a belief function BelΘ (·; x)
on Θ.
[U(y) , U(y+1) ],
where U(i) denotes the i-th order statistics from U1 , . . . , Un . Quantities such as
BelΘ ([a, b]; x) or P lΘ ([a, b]; x) can then be readily calculated.
Dempster’s model has several nice features: it allows us to quantify the uncer-
tainty on Θ after observing the data, without having to specify a prior distribution
on Θ. In addition, whenever a Bayesian prior P0 is available, combining it with
BelΘ (·; x) using Dempster’s rule yields the Bayesian posterior: BelΘ (·; x) ⊕ P0 =
4.1 Inference 97
Shafer’s later proposals Later [Shafer82] Shafer illustrated three different ways
of doing statistical inference in the belief framework, according to the nature of the
available evidence. He stressed how the strength of belief calculus is really about
allowing inference under partial knowledge or ignorance, when simple parametric
models are not available.
PROPOSALS FROM SHAFER82
Walley In the late Eighties ([1370]), Walley characterized the classes of belief and
commonality functions for which statistical independent observations can be com-
bined by Dempster’s rule, and those for which Dempster’s rule is consistent with
Bayes’ rule.
Other statistical approaches Van den Acker ([350]) designed a method to rep-
resent statistical inference as belief functions, designed for application in an audit
context.
An original paper of Hummel and Landy ([635]) has given a new interpretation
of Dempster’s rule of combination as statistics of opinions of experts, combining
information in a Bayesian fashion.
Liu et al. ([863]) described an algorithm for inducting implication networks
from empirical data samples. The validity of the the method was tested by means of
several Monte-Carlo simulations. The values in the implication networks were pre-
dicted by applying the belief updating scheme and then compared to Pearl’s stochas-
tic simulation method, showing that the evidential-based inference has a much lower
computational cost.
A number of works have been done on inference from preferences as well. Among
them, Wong and Lingras’ perceptron idea, the so called Qualitative Discrimination
Process, and Ben Yaghlane’s constrained optimisation framework.
Wong and Lingras [1427] have proposed a method for generating BFs from a
body of qualitative preference relations between propositions. Preferences are not
needed for all pairs of propositions. Expert opinions are expressed through two bi-
nary relations: preference · > and indifference ∼. The goal is to build a belief func-
tion Bel such that A· > B iff Bel(A) > Bel(B) and A ∼ B iff Bel(A) = Bel(B).
They have proved that such as belief function exists if · > is a “weak order” and ∼
is an equivalence relation. Their algorithm can be summarised as follows:
Algorithm
98 4 Reasoning with belief functions
1. consider all propositions that appear in the preference relations as potential focal
elements;
2. elimination step: if A ∼ B for some B ⊂ A then A is not a FE;
3. a perceptron algorithm is used to generate the mass m by solving the system of
remaining equalities and disequalities
A negative of this approach is that it selects arbitrarily one solution over the many
admissible ones. Also, it does not address possible inconsistencies in the given body
of expert preferences.
To address these issues, Ben Yaghlane et al have proposed a constrained op-
timisation approach which uses preference and indifference relations as in Wong
and Lingras, obeying the same axioms, but converts them into a constrained optimi-
sation problem. The objective is to maximise the entropy/uncertainty of the BF to
generate (in order to select the least informative belief function) under constraints
derived from input preferences/indifferences, in the following way:
4.2 Combination
The question of how to update or revise the state of belief represented by a belief
function when new evidence becomes available is also crucial in the theory of evi-
dence. In Bayesian reasoning, this role is performed by Bayes’ rule. In the theory of
belief functions, after an initial proposal by Arthur Dempster, several other aggre-
gation operators have been proposed, leaving the matter still far from settled.
4.2 Combination 99
Dempster’s rule [522] is not really given a convincing justification in Shafer’s sem-
inal book [1149], leaving the reader wondering whether a different rule of combi-
nation could be chosen instead [1170, 1530, 464, 300, 1315, 1229]. This question
has been posed by several authors (e.g. [1359], [1536], [1159] and [1418] among
the others), most of whom tried to provide an axiomatic support to the choice of this
mechanism for combining evidence. Smets, for instance, tried [1266] to formalise
the concept of distinct evidence that is combined by Demster’s rule.
Early on, Seidenfeld [1141] objected to the rule of combination in the context
of statistical evidence, suggesting it was inferior to conditionalization.
As the two masses are highly conflicting, normalisation yields the categorical be-
lief function focussed on C – a strong statement that it is definitively concussion,
although both experts had left it as only a fringe possibility.
Zadeh’s dilemma was discussed in 1983 by Yager [1478], where he suggested a
solution based on the inclusion of a ‘hedging’ element.
In [571] Haenni showed, however, that the counter-intuition in Zadeh’s example is
not a problem with Dempster’s rule, but a problem with Zadeh’s own model, which
the author claimed does not correspond to reality.
First of all, the mass functions in (4.1) are Bayesian (i.e., probability measures):
100 4 Reasoning with belief functions
thus, Bayesian reasoning leads to the very same conclusions. The example would
then lead to reject Bayes’ rule as well. Secondly, diseases are never exclusive, so
that it may be argued that Zadeh’s choice of a frame of discernment is misleading
and the root of the apparent ‘paradox’.
Finally, experts are never fully reliable. In the example, they disagree so much that
any person would conclude that one of the them is just wrong. This can be addressed
by introducing two product frames
Θ1 = {R1 , U1 } × Θ, Θ2 = {R2 , U2 } × Θ
and by discounting the reliability of each expert prior to combining their views
(i.e., by assigning a certain p(Ri )). The result of such a combination, followed by a
marginalisation on the original frame Θ, adequately follows intuition.
A number of other authors have reasoned on the discounting techniques, as we will
see in Section ??.
Dubois and Prade’s analysis In 1985, Dubois and Prade [421] had already pointed
out that, when analyzing the behavior of Dempster’s rule of combination, assessing
a zero value or a very small value may lead to very different results. Theirs was
also a criticism to the idea of having ‘certain’ evidence (‘highly improbable is not
impossible’), which led them to something similar to discounting.
In a 1986 note [422] the same authors proved the unicity of Dempster’s rule under a
certain independence assumption, while stressing the existence of alternative rules
corresponding to different assumptions or different types of combination. Eventu-
ally (1988) Dubois and Prade came to the conclusion that the justification for the
pooling of evidence by Dempster’s rule was problematic [440]. As a response, they
proposed a new combination rule based on the minimum specifficity principle (Sec-
tion 4.2.2).
Axiomatic justifications Frank Klawonn and Erhard Schwecke (1992) [716] pre-
sented a set of axioms that uniquely determine Dempster’s rule, and which reflect
the intuitive idea of partially moveable evidence masses.
A nice paper by Nic Wilson (1993) [1421] took a similar axiomatic approach to
the combination of belief functions. The following requirements are formulated1 :
Definition 35. A combination rule π :7→ P s : Ω → [0, 1] mapping a collection
of random sets to a probability distribution on Ω is said to respect contradictions
if for any finite collectionTof combinable random sets, = {(Ωi , Pi , Γi ), i ∈ I} and
.
ω ∈ ×i∈I Ωi , if Γ (ω) = i∈I Γi (ωi ) = ∅ then π s (ω) = 0.
If I s (ω) = 0 then ω cannot be true since that would imply I s (ω) to be true, and
∅ represents the contradictory proposition. Therefore any sensible combination rule
must respect contradictions.
Definition 36. A combination rule π is said to respect zero probabilities if for any
combinable multiple source structure s and ω ∈ Ω s , if Pis (ω) = 0 for some i ∈ ψ s ,
then π s (ω) = 0.
1
Once again, the author’s original statements are translated into the more standard termi-
nology used in this Book.
102 4 Reasoning with belief functions
If Pis (ω) = 0 for some i then ω is considered impossible (since frames are finite).
Therefore, since ω is the conjunction of the propositions ωi , ω should clearly have
zero probability.
A benefit of this approach is that it makes the independence or irrelevance as-
sumptions explicit.
Absorptive behaviour The issue is still open to debate, and some interesting points
on Dempster’s rule behavious have been made.
As recently as of 2012 Dezert, Tchamova et al [393] challenged the validity of
Dempster-Shafer Theory by using an example derived by Zadeh’s classical ‘para-
dox’ to show that Dempster’s rule produces counter-intuitive results. This time, the
two doctors generate the following mass assignments over Θ = {M, C, T }:
a A = {M } b1 A = {M, C}
m1 (A) = 1 − a A = {M, C} m2 (A) = b2 A=Θ (4.2)
0 otherwise 1 − b1 − b2 A = {T }.
formulated in the rule. They concluded that, by strictly following what Dempster has
suggested, there should be no counterintuitive results when combining evidence.
Bhattacharya (2000) [95] analysed the ‘non-hierarchical’ aggregation of belief func-
tions, showing that the values of certain functions defined on a family of belief struc-
tures decrease when the latter are combined by Dempster’s rule. Similar results hold
when an arbitrary belief structure is prioritised while computing the combination.
Furthermore, the length of the belief-plausibility interval decreases during a non-
hierarchical aggregation of belief structures.
A method for dispelling the ‘absurdities’ (probably paradoxes) of DempsterShafer’s
rule of combination was proposed in [847], based on making all experts make their
decision on the same focussed collection.
In [601] it was demonstrated by Hau et al that Dempster’s rule of combination is
not robust when combining highly conflicting belief functions. It was also shown
that Shafer’s (1983) discounted belief functions also suffer from this lack of robust-
ness with respect to small perturbations in the discount factor. A modified version
of Dempster’s rule was proposed to remedy this difficulty.
In [584] a concrete example of the use of the Dempster rule presented by Weichsel-
berger and Pohlmann was discussed, showing how their approach has to be modified
to yield an intuitively adequate result.
In [1520] the authors describe a model in which masses are represented as condi-
tional granular distributions. By comparing it with Zadeh’s relational model, they
show how Zadeh’s conjecture on combinability does not affect the applicability of
Dempster’s rule.
principle to the cases in which focal elements B, C of two input belief functions do
not intersect, which results in assigning their product mass to B ∪ C. As a result:
X
mD (A) = m∩ (A) + m1 (B)m2 (C). (4.4)
B∪C=A,B∩C=∅
Obviously the resulting belief function dominates that generated by Yager’s rule.
Smets’ combination rules in the TBM Just like Dempster does, Smets also as-
sumes that all sorces to combine are reliable – conflict is only the result of an incor-
rectly defined frame of discernment.
Rather than normalising (as in Dempster’s rule) or re-assigning the conflicting
mass m(∅) to other non-empty subsets (as in Yager’s and Dubois’ proposals), his
disjunctive rule leaves the conflicting mass with the empty set:
m∩ (A) ∅ =6 A⊆Θ
m( A) = (4.5)
m(∅) A = ∅,
and thus is applicable to unnormalised belief functions. As Lefevre et al note [819],
a similar idea is also present in [1485], in which a new hypothesis is instead intro-
duced in the existing frame.
This amounts to an open world assumption in which the current frame of discern-
ment only approximately describes the set of possible outcomes (hypotheses).
In [918] a mixed conjunctive and disjunctive rule together with a generalization
of conflict repartition rules are presented.
[719] The fundamental updating process in the transferable belief model is re-
lated to the concept of specialization and can be described by a specialization ma-
trix. The degree of belief in the truth of a proposition is a degree of justified sup-
port. The Principle of Minimal Commitment implies that one should never give
more support to the truth of a proposition than justified. We show that Dempster’s
rule of conditioning corresponds essentially to the least committed specialization,
and that Dempster’s rule of combination results essentially from commutativity re-
quirements. The concept of generalization, dual to the concept of specialization, is
described.
Denoeux’s cautious and bold rules
Cautious rule Another major alternative to Dempster’s rule is the so-called cautious
rule of combination [377, 359], based on Smets’ canonical decomposition of non-
dogmatic (i.e. such as m(Θ) 6= 0) belief functions of (generalised) simple belief
functions, namely:
w
m=
∩ A(Θ mA , (4.6)
where mw 3
A denotes the simple pseudo belief function such that:
mw
A (A) = 1 − w, mw
A (Θ) = w, mw Θ
A (B) = 0 ∀B ∈ 2 \ {A, Θ}
and the weights w(A) satisfy: w(A) ∈ [0, +∞) for all A ( Θ.
3
This is denoted by Aw(A) in the author’s original papers.
4.2 Combination 105
m=
∪ A6=∅ mA,v(A) (4.8)
where mA,v(A) is the unnormalised belief function assigning mass v(A) to ∅, and
1 − v(A) to A.
Denoeux calls (4.8) the canonical disjunctive decomposition of m.
Let then Gx (m) be the set of basic probability assignments x-most committed than
m. The bold combination corresponds to the most committed element in the inter-
section set Gv (m1 ) ∩ Gv (m2 ).
Definition 38. Let m1 and m2 be two unnormalised4 basic probability assign-
ments. The v-most committed element in Gv (m1 ) ∩ Gv (m2 ) exists and is unique.
It is defined by the following disjunctive weight function:
The fact that the bold disjunctive rule is only applicable to unnormalised belief
functions is a severe restriction, as admitted by the author.
A new combination rule called the cautious-adaptive rule is brought forward in
[470], based on generalized discounting defined for separable basic belief assign-
ments (bbas), to be applied to the source correlation derived from the cautious rule.
The cautious-adaptive rule varies between the conjunctive rule and the cautious one,
depending on the discounting level.
Pichon and Denoeux (2008) [1038] pointed out that the cautious and unnormal-
ized Dempster’s rules can be seen as the least committed members of families of
combination rules based on triangular norms and uninorms, respectively.
|A ∩ B|
a(A/B) = ,
|B|
o1 = (b1 (A), d1 (A), u1 (A), a1 (A)), o2 = (b2 (A), d2 (A), u2 (A), a2 (A))
be opinions held by two agents about the same proposition/event A. The consensus
combination o1 ⊕ o2 is defined as:
b1 u2 +b2 u1 , d1 u2 +d2 u1 , u1 u2 , a1 u2 +a2 u1 −(a1 +a2 )u1 u2 κ 6= 0
κ κ κ u1 +u2 −2u1 u2
o1 ⊕ o2 = γb +b γd +d γa1 +a2
1 2
γ+1 , 1
γ+1
2
, 0, γ+1 κ = 0,
(4.10)
where κ = u1 + u2 − u1 u2 , and γ = uu12 .
The consensus operator is derived by the posterior combination of beta distributions,
and it proved to be commutative, associative, besides satisfying:
Clearly, (4.10) combines lower and upper probabilities, rather than belief functions
per se.
A ‘cumulative’ rule and an ‘averaging’ rule of belief fusion are presented by
Josang et al in [691] (2010). They represent generalisations of the subjective logic
4.2 Combination 107
consensus operator for independent and dependent opinions respectively, and are
applicable to the combination of general basic probability (belief) assignments.
The authors argue that these rules can be directly derived from classical statistical
theory, and produce results in line with human intuition. In particular, the cumula-
tive rule is equivalent to a posteriori updating of Dirichlet distributions, while the
averaging rule is equivalent to averaging the evidence provided by Dirichlet distri-
butions. Both are based on a bijective mapping between Dirichlet distributions and
belief functions, described in [691].
Sup(mi ) . X
Crd(mi ) = P , Sup(mi ) = 1 − d(mi , mj ).
j Sup(mj ) j6=i
The latter can be used to compute a weighted average of the input masses as:
X
m̃ = Crd(mi ) · mi .
i
As in Murphy’s approach, one can then use Dempster’s rule to combine the resulting
weighted average n times, when n is the number of input masses.
Albeit rather empirical, these methods try to address the crucial issue with
Dempster’s combination (already pointed out in Section 4.2.1), namely that each
piece of evidence has ‘veto’ powers on the possible consensus outcomes. If any
of them gets it wrong, the combined belief function will never give support to the
‘correct’ hypothesis.
Other proposals A number of other proposal combination rules have been brought
forward along the years [172, 563, 1527, 76].
In [673] Josang and Daniel discussed and compared various strategies for deal-
ing with ‘dogmatic’ beliefs, including Lefevre’s weighting operator (cfr. Section
4.2.3), Josang’s own consensus operator (Section 4.2.2) and Daniel’s MinC ap-
proach.
In [1388], existing approaches to combination were reviewed and critically anal-
ysed by Wang et al (2007). In the authors’ view they either ignore the normaliza-
tion or separate it from the combination process, leading to irrational or suboptimal
interval-valued belief structures. In response, a new ‘logically correct’ approach was
108 4 Reasoning with belief functions
developed, where combination and normalization are optimised together rather than
separately.
In [1479] some alternative methods to Dempster’s rule for combining evidence
were given based on interpreting plausibility and belief as a special case of the
compatibility of a linguistically quantified statement with a data base consisting of
an expert’s fragmented opinion as to the location of a special element.
Yamada (2008) [1501] proposes a new combination model called ‘combination
by compromise’ as a consensus generator.
The focus of [1472] is to provide a procedure for aggregating ‘prioritized’ belief
structures, i.e., ... An alternative to the normalization step used in Dempster’s rule
is suggested, inspired by nonmonotonic logics. We show how this procedure allows
us to make inferences in inheritance networks where the knowledge is in the form
of a belief structure.
A generalized evidence combination formula relaxing the requirement of evi-
dence independence is presented by Wu (1996) [1442].
The combination process on non-exhaustive frames of discernment was anal-
ysed by Janez and Appriou in [666]. In previous works (in French), [665] the au-
thors had already presented methods based on a technique called ‘deconditioning’
which allows the combination of such sources. Addition methods based on the same
framework were proposed in [].
In [1507] the concept of Weighted Belief Distribution (WBD) is proposed and
extended to WBD with Reliability (WBDR) to characterise evidence in complement
of Belief Distribution (BD) introduced in DempsterShafer theory of evidence. The
implementation of the orthogonal sum operation on WBDs and WBDRs leads to the
establishment of the new ER rule. It is proven that Dempsters rule is a special case
of the ER rule when each piece of evidence is fully reliable.
Baldwin [50] described an iterative procedure which generalises Bayes’ method
of updating an a priori assignment over the power set of the frame of discernment
using uncertain evidence.
A new combination rule, named ‘absorptive’ method, was proposed in [1320]
which exploits conflict information ....
Destercke and Dubois [383, 380] note that, when dependencies between sources
are illknown, it is sensible to require idempotence from a belief function combina-
tion rule, as this property captures the possible redundancy of dependent sources. In
[380], they study the feasibility of extending the idempotent fusion rule of possibil-
ity theory (the ‘minimum’) to belief functions. However they reach the conclusion
that, unless we accept the idea that the result of the fusion process can be a family
of belief functions, such an extension is not always possible.
In [156], Campos (2003) presents an extension of belief theory that allows the
combination of highly conflicting pieces of evidences, avoiding the tendency to re-
ward low probability but common possible outcomes of otherwise disjoint hypoth-
esis.
In [463] focuses on the possible modifications of combination rules and evidence
sources in case of highly conflicting evidence. The paper proposes to extract the
4.2 Combination 109
To clarify the theoretical foundation of the Dempster combination rule and pro-
vide a direction as how to solve these problems, the Dempster combination rule is
formulated in [1558] based on the random set theory first. Then, under this frame-
work, all possible combination rules are presented, and these combination rules
based on correlated sensor confidence degrees (evidence supports) are proposed.
The optimal Bayes combination rule is given finally.
[1467] Based on the analysis of existing modified combination algorithms, a
new combination method is proposed. First the similar matrix is calculated, then the
credit vector is derived, finally the evidence is averaged by the normalized credit
vector and combined n-1 times by DS rule.
[1529] When several pieces of evidences are combined, the mutual support de-
gree can be calculated according to the evidence distance. Eigenvector for the maxi-
mal eigenvalue of evidence support degree matrix is considered to be weight vector.
Then, evidence discount coefficient can be gained, and it is used to modify every
evidence, which is combined by DS rule.
Ma et al [904, 656] Combination rules proposed so far in the theory of evidence,
especially Dempster rule, are symmetric. They rely on a basic assumption, that is,
pieces of evidence being combined are considered to be on a par, i.e. play the same
role. In the case of revision, the idea is to let prior knowledge of an agent be altered
by some input information. The change problem is thus intrinsically asymmetric.
Assuming the input information is reliable, it should be retained whilst the prior
information should be changed minimally to that effect. To deal with this issue, this
paper defines the notion of revision for the theory of evidence in such a way as
to bring together probabilistic and logical views. Several revision rules previously
proposed are reviewed and we advocate one of them as better corresponding to
the idea of revision. It is extended to cope with inconsistency between prior and
input information. It reduces to Dempster rule of combination, just like revision in
the sense of Alchourron, Gardenfors, and Makinson (AGM) reduces to expansion,
when the input is strongly consistent with the prior belief function. Properties of
this revision rule are also investigated and it is shown to generalize Jeffreys rule of
updating, Dempster rule of conditioning and a form of AGM revision.
[1342] After introducing an interpretation of the mass function we show that
given two bpa Dempster’s rule of combination does not build a coherent bpa with
respect to the interpretation. Next we give a new combination function that over-
comes this problems and study some of its properties.
[1323] we have also proposed a new rule of combination. Efficiency and validity
of our approach have been demonstrated with numerical examples and comparing
with other existing methods.
This article [1341] presents an alternative combination method that is capable
of handling inconsistent evidence and relates evidence focusing to the amount of
information resident in pieces of evidence. The method is capable of combining
belief functions.
[1528] Dempster-Shafer(DS) theory involves counter-intuitive behaviors when
evidence highly conflicts. A new approach of combination of weighted belief func-
4.2 Combination 111
tions is proposed to solve the problem. If many pieces of evidence are to be com-
bined, the amount of conflict between evidences is at first evaluated by both evidence
distance and conflicting belief, and every piece of evidence is given a weight coeffi-
cient according to its amount of conflict with the others. Two different methods are
separately used to modify the belief function of each piece of evidence based on its
weight coefficient. Finally, the modified functions are combined by DS rule.
[1055] The D-S evidence combination method may be useless when the conflict
among evidences is rather great or even complete. Yager had presented some modi-
fied combination methods, but these methods have some deficiencies. According to
the importance of the evidences, the paper is concerned with a new evidence com-
bination method with introducing weight factors and allotting the conflicted prob-
ability again. This method improves the rationality and reliability of the evidence
combination and better results can be acquired.
[303] A new approach to a combination of belief functions, a combination ”per
elements”. A particular regard is devoted to a combination and distribution of con-
tradictive belief masses. Several different instances of this method are compared,
MinC and MaxC combinations are suggested as alternatives of Dempster’s rule of
combination of belief functions.
The thesis [1385] puts forward an improvement method for the combination rule
of Dempster-Shafer evidence theory based on the reliability of evidence and the cor-
relation between evidences. Different from D-S theory, two key parameters are in-
troduced, one is the reliability of evidence, and another is the correlation coefficient
between evidences. By weighting the evidence based on its reliability, the negative
effect of unreliable evidence is reduced. By decreasing the probability evaluation of
certainty and increasing the probability evaluation of uncertainty, the effect of cor-
relative evidence to fusion result is reduced too, and then we can get better fusion
result.
[1492] We suggest an approach to the aggregation of non-independent belief
structures that makes use of a weighted aggregation of the belief structures where
the weights are related to the degree of dependence. It is shown that this aggregation
is non-commutative, the fused value depends on the sequencing of the evidences.
We then consider the problem of how best to sequence the evidence. We investigate
using the measure of information content of the fused value as a method for selecting
the appropriate way to sequence the belief structures.
Guan [557] presented a one-step method for combining evidence from different
evidential sources based on Yen’s extension of D-S theory based on conditional
compatibility relations, and prove that it gives the same results as Yen’s.
[307] Principal ideas of the minC combination are recalled. A mathematical
structure of generalized frames of discernment is analysed and formalized. A gener-
alized schema for a computation of the minC combination is presented. Conflicting
belief masses redistribution among non-conflicting focal elements is overviewed. Fi-
nal general formulas for computation of the minC combination are presented. Some
examples of computation of the minC combination follow. A brief comparison of
the minC combination with other combination rules is presented.
112 4 Reasoning with belief functions
creases with the number of information sources, a strategy for re-assigning the con-
flicting mass (they claim) is essential. The family of combination rules they propose
distributes the conflicting mass to each proposition A of a set of subsets P = {A}
according to a weighting factor w(A, m), where m = {m1 , ..., mj , · · · , mJ }:
where5
w(A, m) · m(∅) A ∈ P
mc (A) =
0 otherwise
P
under the constraint that the weights are normalised: A∈P w(A, m) = 1.
This (weighted) family subsumes Smets’ and Yager’s rules when P = {∅} and
P = {Θ}, respectively. We get Dempster’s rule when P = 2Θ \ {∅} with weights:
m∩ (A)
w(A, m) = ∀A ∈ 2Θ \ {∅}.
1 − m(∅)
Dubois and Prade’s operator can also be obtained by appropriately computing the
weight factors.
In addition, the authors proposed in [819] to learn the most appropriate weights for
a specific problem.
Other authors have proposed similar conflict redistribution strategies.
A similar idea is indeed presented in [842], where the conflicting mass is distributed
to every proposition according to its average supported degree.
In [351] a global conflict is first calculated as the weighted average of the local con-
flict. Then, a validity coefficient is defined to show the effect of conflicting evidence
on the results of the combination.
Han et al [593] proposed in 2008 a modified combination rule which is based on
Ambiguity Measure (AM), a recently proposed uncertainty measure for belief func-
tions. Weight factors based on the AM of the bodies of evidence are used to reallo-
cate conflicting mass assignments.
Haenni (2002) [570] criticised Lefevre’s proposal of a parametrised combination
rule.
Lefevre et al further replied to Haenni in [822], from the point of view of the
Transferable Belief Model, as opposed to the probabilistic argumentation systems
(PAS), proposed by Haenni.
Denoeux’s families induced by t-norms and conorms Cautious and bold rules
are shown [377] to be particular members of infinite families of conjunctive and
disjunctive combination rules, based on triangular norms and conorms.
We recall that a t-norm is a commutative and associative binary operator > on
the unit interval satisfying the monotonicity property:
and the boundary condition x>1 = x, ∀x ∈ [0, 1]. A t-conorm ⊥ meets the same
three basic properties (commutativity, associativity, monotonicity), and differs only
by the boundary condition x ⊥ 0 = x. T-norms and t-conorms are usually inter-
preted, respectively, as generalized conjunction and disjunction operators in fuzzy
logic. Denoeux notes that the conjuctive combination is such that:
w1c
c c
∩ 2 (A) = w1 (A) · w2 (A), (4.12)
where w1c (A) = 1 ∧ w(A). New rules for combining nondogmatic belief functions
can then be defined by replacing the minimum ∧ in (4.12) by a positive t-norm6 :
1 w (A)∗>,⊥ w2 (A)
m1 ~>,⊥ m2 =
∩ A⊂Θ mA , (4.13)
Others Denneberg [354] studies three conditioning rules for updating non-additive
measures. Two of these update rules, the Bayes’ and the Dempster-Shafer, are ex-
treme cases of a family of update rules [518]
In [1524] the authors introduce a family of update rules more general than
the one of Gilboa and Schmeidler. We also show how to embed the general and
Dempster-Shafer update formulas in another family of update rules.
[1001] Distances between fusion operators are measured using a class of random
belief functions. With similarity analysis, the structure of this family is extracted, for
two and three information sources. The conjunctive operator, quick and associative
but very isolated on a large discernement space, and the arithmetic mean are iden-
tified as outliers, while the hybrid method and six proportional conictredistributing
rules (PCR) form a continuum. The hybrid method is showed as being central for
the family of fusion methods.
[721] Instead of introducing another rule, we propose to use existing ones as part
of a hierarchical and conditional combination scheme. The sources are represented
by mass functions which are analysed and labelled regarding unreliability and im-
precision. This conditional step divides the problem into specific sub-problems. In
each of these sub-problems, the number of constraints is reduced and an appropriate
rule is selected and applied. Two functions are thus obtained and analysed, allow-
ing another rule to be chosen for a second (and final) fusion level. This approach
provides a fast and robust way to combine disrupted sources using contextual infor-
mation brought by a particle filter.
[150] on the basis of random set theory, an unified formulation of combina-
tion rules is presented, which can describe most classical combination rules besides
Dempster’s rule, and what’s more, which attempt to provide an original ideas for
constructing more applicable and effective combination rules. Finally, by use of this
formulation, a new combination rule is constructed for overcoming a class of coun-
terintuitive phenomena pointed out by Lotfi Zadeh.
As we learned from the debate on Dempster’s rule (Section ??), most criticisms of
the original combination operator focus on its behaviour in situations in which the
various pieces of evidence are highly conflicting.
Several solutions were proposed: the TBM solution where masses are not renor-
malized and conflict is stored in the mass given to the empty set, Yagers solution []
where the conflict is transferred to the universe and Dubois and Prades solution []
where the masses resulting from pairs of conflictual focal elements are transferred
to the union of these subsets.
The jungle of combination rules been proposed as a result of the conflict prob-
lem was discussed by Smets (2007) [1260]. There he discussed the nature of the
combinations (conjunctive versus disjunctive, revision versus updating, static ver-
sus dynamic data fusion), argued in favour of normalization, examined the possible
origins of the conflicts, determined whether a combination is justified and analyzed
many of the proposed solutions.
The most relevant work on the issue of conflict is probably [974]. There, Murphy
(2000) presented the problem of failing to balance multiple evidence, illustrates
the proposed solutions and describes their limitations. Of the proposed methods,
averaging best solves the normalization problems, but it does not offer convergence
toward certainty, nor a probabilistic basis. To achieve convergence, this research
suggests incorporating average belief into the combining rule.
Liu (2006) [874] provides a formal definition of when two basic belief assign-
ments are in conflict, using a combination of both quantitative measures of the mass
of the combined belief assigned to the emptyset before normalization and the dis-
tance between betting commitments of beliefs. She argued that only when both mea-
sures are high, it is safe to say the evidence is in conflict.
In [736] a new definition of consistency is introduced and applied to the theory
of evidence
[914] propose some alternative measures of conflict as the distance between
belief functions. These measures of conflict are further used for an a-posteriori esti-
mation of the relative reliability between the sources of information which does not
need any training or prior knowledge.
In [821] combination operators which allow an arbitrary redistribution of the
conflicting mass on the propositions are proposed.
In [315, 305] alternative ways of distributing the contradiction among nonempty
subsets of frame of discernment are studied. The paper employes a new approach
to understanding contradictions and introduces an original notion of potential con-
tradiction. A method of an associative combination of generalized belief functions
minC combination and its derivation is presented as part of the new approach.
In [958] the idea is that each piece of evidence is discounted in proportion to
the degree that it contributes to the conflict. Discounting is performed in a sequence
of incremental steps, with conflict updated at each step, until the overall conflict is
brought down exactly to a predefined acceptable level.
4.2 Combination 117
intelligence reports. These methods are used when intelligence reports arrive which
concerns different events which should be handled independently, when it is not
known a priori to which event each intelligence report is related. We use clustering
that runs as a back-end process to partition the intelligence into subsets represent-
ing the events, and in parallel, a fast classification that runs as a front-end process
in order to put the newly arriving intelligence into its correct information fusion
process.
[316] This contribution deals with conflicts of belief functions. Internal conflicts
of belief functions and conflicts between belief functions are described and ana-
lyzed here. Differences between belief functions are distinguished from conflicts
between them. Three new different approaches to conflicts are presented: combi-
national, plausibility, and comparative. The presented approaches to conflicts are
compared to Lius interpretation of conflicts.
In [1446] a method was developed for dealing with seriously conflicting ev-
idence when the Dempster-Shafer combination result can not identify the actual
conditions. The method utilizes the advantages of the Dempster-Shafer evidence
theory and an additive strategy. The conflicting evidence was first verified and then
the additive strategy was used to modify the properties of the conflicting evidence.
[915] the mass appearing on the empty set during the conjunctive combina-
tion rule is generally considered as conflict, but that is not really a conflict. Some
measures of conflict have been proposed, we recall some of them and we show
some counter-intuitive examples with these measures. Therefore we define a con-
flict measure based on expected properties. This conflict measure is build from the
distance-based conflict measure weighted by a degree of inclusion introduced in this
paper.
[379] Recently, the problem of measuring the conflict between two bodies of ev-
idence represented by belief functions has known a regain of interest. In most works
related to this issue, Dempsters rule plays a central role. In this paper, we propose
to study the notion of conflict from a different perspective. We start by examining
consistency and conflict on sets and extract from this settings basic properties that
measures of consistency and conflict should have. We then extend this basic scheme
to belief functions in different ways. In particular, we do not make any a priori
assumption about sources (in)dependence and only consider such assumptions as
possible additional information.
[872] Recently, two new approaches to measuring the conflict among belief
functions are proposed in [JGB01,Liu06]. The former provides a distance-based
method to quantify how close a pair of beliefs is while the latter deploys a pair of
values to reveal the degree of conflict of two belief functions. On the other hand,
in possibility theory, this is done through measuring the degree of inconsistency of
merged information. However, this measure is not sufficient when pairs of uncer-
tain information have the same degree of inconsistency. At present, there are no
other alternatives that can further differentiate them, except an initiative based on
coherence-intervals ([HL05a,HL05b]). In this paper, we investigate how the two
4.2 Combination 119
new approaches developed in DS theory can be used to measure the conflict among
possibilistic uncertain information.
[678] we first bring out the limitation of the combination rule introduced by
Zhang [16]. Subsequently, we focus our study on two other rules. The first one was
proposed by Dubois and Prade [2, 3] and is known as Disjunctive rule of combina-
tion. Incidentally, this rule also appeared in the Hau and Kashyap’s work [5]. The
other combination rule was due to Yager [13]. Even though these rules are robust,
we show that in some cases these rules treat evidences asymmetrically and give
counterintuitive results. We then propose a combination rule which doesn’t have
these drawback.
[497] presents an improved D-S algorithm, which verifies and modifies the con-
flicting evidences.
[824] The conjunctive combination provides interesting properties, as the com-
mutativity and the associativity. However, it is characterized by having the empty
set, called also the conflict, as an absorbing element. So, when we apply a signif-
icant number of conjunctive combinations, the mass assigned to the conflict tends
to 1 which makes impossible returning the distinction between the problem arisen
during the fusion and the effect due to the absorption power of the empty set.
The objective of this paper is then to define a formalism preserving the initial
role of the conflict as an alarm signal announcing that there is a kind of disagreement
between sources. More exactly, that allows to preserve some conflict, after the fusion
by keeping only the part of conflict reflecting the opposition between the belief
functions. This approach is based on dissimilarity measures and on a normalization
process between belief functions.
[382] In this paper, we propose to revisit conflict from a different perspective.
We do not make a priori assumption about dependencies and start from the defini-
tion of conflicting sets, studying its possible extensions to the framework of belief
functions.
[633] Current research shows it is very important to define new conflict coeffi-
cients to determine the conflict degree between two or more pieces of evidence. The
evidential sources of information are considered in this work and the definition of a
conflict measure function (CMF) is proposed for selecting some useful CMFs in the
next fusion work when sources are available at each instant. Firstly, the definition
and theorems of CMF are put forward. Secondly, some typical CMFs are extended
and then new CMFs are put forward.
[388] This paper compares the expressions obtained from an analysis of a prob-
lem involving conflicting evidence when using Dempster’s rule of combination and
conditional probabilities. Several results are obtained showing if and when the two
methodologies produce the same results. The role played by the normalizing con-
stant is shown to be tied to prior probability of the hypothesis if equality is to occur.
This forces further relationships between the conditional probabilities and the prior.
Ways of incorporating prior information into the Belief function framework are ex-
plored and the results are analyzed. Finally a new method for combining conflicting
120 4 Reasoning with belief functions
[458] We develop a method for the evaluation of the reliability of a sensor when
considered alone. The method is based on finding the discounting factor minimiz-
ing the distance between the pignistic probabilities computed from the discounted
beliefs and the actual values of data. Next, we develop a method for assessing the
reliability of several sensors that are supposed to work jointly and their readings
are aggregated. The discounting factors are computed on the basis of minimizing
the distance between the pignistic probabilities computed from the combined dis-
counted belief functions and the actual values of data.
In [939] an extension of the discounting operation is proposed, allowing to use
more detailed information regarding the reliability of the source in different con-
texts, i.e., conditionally on different hypotheses regarding the variable on interest.
122 4 Reasoning with belief functions
[1041] makes the point that pieces of evidence may have different reliability -
by weighting evidence according to reliability, the effect of unreliable evidence is
reduced.
[936] presents an objective way of assessing the reliability of a sensor or an
expert expressing its opinion by way of a belief function. Using the contextual dis-
counting, labeled data and an error function, we generalize an approach proposed
by Elouedi, Mellouli and Smets (2004).
[1556] extends a conventional discounting scheme commonly used with the
Dempster-Shafer evidential reasoning to deal with conflict.
In [935] a new interpretation of the de-discounting operation introduced as the
inverse operation of the discounting operation by Denoeux and Smets is presented in
this paper. A more general form of reinforcement process, as well as a parameterized
family of transformations encompassing all previous schemes, are also introduced.
In [332] the authors propose to estimate discounting factors from the conflict
arising between sources, and from past knowledge about the qualities of these
sources. Under the assumption that conflict is generated by defective sources, an
algorithm is proposed for detecting them and mitigating the problem.
[555] discusses two of the basic operations on evidential functions, the discount
operation and the well-known orthogonal sum operation. We show that the discount
operation is not commutative with the orthogonal sum operation, and derive expres-
sions for the two operations applied to the various evidential functions.
[913] In a belief application, the definition of the basic belief assignments and
the tasks of reduction of focal elements number, discounting, combination and de-
cision, must be thought at the same time. Moreover these tasks can be seen as a
general process of belief transfer. The second aspect of this paper involves the in-
troduction of the reliability in the combination rule directly and not before. Indeed,
in general, the discounting process is made with a discounting factor that is a relia-
bility factor of the sources. Here we propose to include in the combination rule an
estimation of the reliability based on a local conflict estimation.
[490] This paper investigates the conjunctive combination of belief functions
from dependent sources based on the cautious conjunctive rule (CCR). Weight func-
tions in the canonical decomposition of a belief function are divided into two parts,
namely, positive and negative weight functions, whose characteristics are described.
Positive and negative weight functions of two belief functions are used to construct
a new partial ordering between the belief functions. The partial ordering determines
the committed relationship between two belief functions, which is different from
that generated by the weight function based partial ordering in the CCR when one
or two belief functions are not unnormalized separable. A new rule is developed
using the constructed partial ordering to combine belief functions from dependent
sources.
124 4 Reasoning with belief functions
4.3 Conditioning
We will first review the most significant contributions to the definition of conditional
belief functions; then, we will review the work done on the generalisation of the law
of total probability (also called “Jeffrey’s rule”) to belief functions; finally, we will
summarise the basic of Smets’ Generalised Bayes Theorem (GBT).
Lower and upper conditional envelopes Fagin and Halpern [464] proposed an
approach based on the credal (robust Bayesian) interpretation of belief functions, as
lower envelopes of a family of probability distributions:
They defined the conditional belief function associated with Bel as the lower enve-
lope (that is, the infimum) of the family of conditional probability functions P (A|B),
where P is consistent with Bel:
. .
Bel(A|B) = inf P (A|B), P l(A|B) = sup P (A|B).
P ∈P[Bel] P ∈P[Bel]
4.3 Conditioning 125
Note that lower/upper envelopes of arbitrary sets of probabilities are not in gen-
eral belief functions, but these actually are, as Fagin and Halpern have proven. A
direct comparison shows that they are quite different from the results of Dempster’s
conditioning:
Bel(A∪B̄) P l(A∩B)
Bel⊕ (A|B) = 1−Bel(B̄)
, P l⊕ (A|B) = P l(B) .
Bel(A ∩ B)
BelG (A|B) = .
Bel(B)
Remember that in the TBM belief functions which assign mass to ∅ can exist, under
the “open world” assumption). In terms of plausibilities the rule goes P lU (A|B) =
P l(A ∩ B) - in the TBM the mass m(A) is transferred by conditioning on B to
A ∩ B. BelU (.|B) is also the minimal commitment specialization of Bel such that
P l(B c |B) = 0 [718].
Conditional events as equivalence classes On the other side, Spies [1288] es-
tablished a link between conditional events and discrete random sets. Conditional
events were defined as sets of equivalent events under the conditioning relation. By
applying to them a multivalued mapping (see Section ??) he gave a new definition
of conditional belief function. This yields an intriguing approach to conditioning,
which lies within the random set interpretation. Finally, an updating rule (that is
equivalent to the law of total probability is all beliefs are probabilities) was intro-
duced.
Namely, let (C, F, P ) and Γ : C → 2Ω the source probability space and ran-
dom set, respectively. The null sets for P (.|A) are the collection of events with
conditional probability 0: N (P (.|A)) = {B ∈ A : P (B|A) = 0}. Let 4 be the
symmetric difference A4B = (A ∩ B̄) ∪ (Ā ∩ B) of two sets A, B. We can prove
that:
Lemma 1. If ∃Z ∈ A s.t. B, C ∈ Z4N (PA ) then P (B|A) = P (C|A).
In rough words, two events have the same conditional probability if they both are the
symmetric difference between a same event and some null set. We can then define
conditional events as the following equivalence classes:
7
Author’s notation.
4.3 Conditioning 127
Definition 40. A conditional event [B|A] with A, B ⊆ Ω is a set of events with the
same conditional probability P (B|A):
plX (x1 |y) plX (x1 ) BelX (x1 |y) BelX (x1 )
= , = .
plX (x2 |y) plX (x2 ) BelX (x2 |y) BelX (x2 )
Generalised Likelihood Principle The Likelihood Principle requires the likeli-
hood of an hypothesis given the data to be equal to the conditional probability of the
data given the hypothesis. Namely:
Note that the form of the function is not assumed (not necessarily the max, as in the
original likelihood principle). Both pl(x|θi ) and pl(x̄|θi ) are necessary, according
to Smets, to account for the non-addivitivity of belief functions.
The GLP is justified by the two following requirements: (1) pl(x|θ) remains the
same on the coarsening of X formed by just x and x̄, and (2) the plausibilities of
the θj 6∈ θ are irrelevant for the computation of pl(x|θ).
4.3 Conditioning 129
Then, condition (1) of the GLP plΘ (θ|x) = plX (x|θ) directly yields Smets’ Gener-
alised Bayes Theorem:
1 Y
P lΘ (θ|x) = 1− (1 − plX (x|θi ))
K
θi ∈θ
1Y Y
BelΘ (θ|x) = BelX (x̄|θi ) − BelX (x̄|θi ) ,
K
θi ∈θ̄ θi ∈Θ
Q
where K = 1 − θi ∈Θ (1 − plX (x|θi )). Formulas for unnormalised belief functions
are also provided.
It can be proven that there is a unique solution to the above problem, given by
Jeffrey’s rule: X
P 00 (A) = P (A|B)P 0 (B).
B∈B
The most compelling interpretation of such a scenario if that the initial probability
measure stands corrected by the second one on a number of events (but not all).
Therefore, the law of total probability generalises standard conditioning, which is
just the special case in which P 0 (B) = 1 for some B and the sub-algebra B reduced
to a single event B.
130 4 Reasoning with belief functions
Spies’ solution Spies has proven the existence of a solution to the generalisation
of Jeffrey’s rule to belief functions, within his conditioning frameworks (Section
4.3.1). The problem generalises as follows.
Let Π = {B1 , ..., Bn } a disjoint partition of Ω, and:
– m1 , ..., mn are the mass assignments of a collection of conditional belief function
Bel1 , ..., Beln on B1 , ..., Bn respectively;
– mB is the mass of an unconditional belief function BelB on the coarsening asso-
ciated with the partition Π.
Then:
Proposition 24. The belief function Beltot : 2Ω → [0, 1] with
X
Beltot (A) = (mB ⊕ ⊕ni mBi ) (C)
C⊆A
is a marginal belief function on Ω, such that Beltot (.|Bi ) = Beli ∀i and the
marginalisation of Beltot to the partition Π coincides with BelB . Furthermore,
if all the belief functions involved are probabilities Beltot reduces to the result of
Jeffrey’s rule of total probability.
The bottom line of Proposition 24 is that by combining the a-priori with all the
conditionals we get an admissible marginal which generalised total probability.
Whether this is this the only admissible solution to the problem will be discussed
in Section ??.
m(A) X
mJG (A) = P m0 (B(A)) ∀A ∈ A s.t. m(X) 6= 0;
X∈B(A) m(X) X∈B(A)
4.4 Efficient computation 131
m(A|B(A)) X
mJD (A) = P m0 (B(A)) ∀A ∈ A s.t. m(X|B(A)) 6= 0,
X∈B(A) m(X|B(A)) X∈B(A)
Evidential reasoning using neural networks [1377] A method for using a neural
network to model the learning of evidential reasoning is presented. In the proposed
method, the belief function associated with a piece of evidence is represented as a
probability density function which can be in a continuous or discrete form. The neu-
rons are arranged as a roof-structured network which accepts the quantized belief
functions as inputs. The mutual dependency between two pieces of evidence is used
132 4 Reasoning with belief functions
as another input to the network. This framework can resolve the conflicts resulting
from either the mutual dependency among many pieces of evidence or the struc-
tural dependency due to the evidence combination order. Belief conjunction based
on the proposed method is presented, followed by an example demonstrating the
advantages of this method.
Probability and possibility transforms reduce the number of focal elements to store
to O(N ) by re-distributing the mass assignment of a belief function to size-1 sub-
sets or chains of subsets, respectively. An alternative approach to efficiency can be
sought by re-distributing all the mass to subsets of size up to k, obtaining a k-additive
belief function.
Some approaches to probability transformation explicitly aimed at reducing the
complexity of belief calculus. Tessem [1335], for instance, incorporated only the
highest-valued focal elements in his mklx approximation. A similar approach in-
spired the summarization technique formulated by Lowrance et al. [896]. SOME
DETAIL
One approach to efficient belief calculus that has been explored since the late Eight-
ies consists indeed on approximating belief functions by means of appropriate prob-
ability measures prior to combining them for making decisions. This is known as
the probabilistic transformation problem [e.g., Cobb03]. A number of distinct trans-
formations can and have been introduced, starting from Voorbraak’s plausibility
transform and Smets’ pignistic transform [Smets05], to more recent proposals by
Daniel, Sudano and others []. Different approximations appear to be aimed at dif-
ferent goals, besides that of reducing computational complexity.
In [69], Mathias Bauer reviews a number of approximation algorithms and de-
scribes and empirical study of the appropriateness of these procedures in decision-
making situations.
Probability transformation The relation between belief and probability in the the-
ory of evidence has been an important subject of study. Given a frame of discern-
ment Θ, let us denote by B the set of all belief functions on Θ, and by P the set of
all probability measures on Θ.
According to [311], we call a probability transform of belief functions an operator
pt : B → P, b 7→ pt[b] mapping belief measures onto probability measures, such
that b(x) ≤ pt[b](x) ≤ plb (x) = 1 − b({x}c ). Note that such definition requires
the probability which results from the transform to be compatible with the upper
and lower bounds the original b.f. b enforces on the singletons only, and not on all
the focal sets as in Equation (3.10). This is a minimal, sensible constraint which
does not require probability transforms to adhere to the upper-lower probability se-
mantics of belief functions. As a matter of fact, important such transforms are not
4.4 Efficient computation 133
More recently, other proposals have been brought forward by Dezert et al. [391],
Burger [?] and Sudano [1314], based on redistribution processes similar to that of
the pignistic transform. More recently, two new Bayesian approximations of belief
functions have been derived by the author from purely geometric considerations
[267] in the context of the geometric approach to the ToE [244], in which belief and
probability measures are represented as points of a Cartesian space.
called weak inclusion. It is then possible to introduce the notion of outer consonant
approximations [?] of a belief function b, i.e. those co.b.f.s such that ∀A ⊆ Θ
co(A) ≤ b(A) (or equivalently ∀A ⊆ Θ plco (A) ≥ plb (A)). In other words we seek
co.b.f.s which are less informative than b in the sense specified above.
Outer consonant approximations and their geometry are studied in detail in Chapter
??.
A completely different consonant transformation is proposed within Smets’
Transferable Belief Model [Dubois, Aregui].
Definition 42. The isopignistic” approximation of a belief function Bel : 2Ω →
[0, 1] is the unique consonant belief function whose pignistic probability BetP :
Ω → [0, 1] coincides with that of Bel .
Its contour function or plausibility of singletons is:
X n o
pliso (x) = min BetP (x), BetP (x0 ) ,
x0 ∈Θ
in which codes are randomly sampled from the “source” probability space, and
the number of times their image implies A ⊆ Ω is counted to provide an estimator
for the desired combination.
The proportion of trials which succeed converges to Bel(A): E[T̄ ] = Bel(A),
1
V ar[T̄ ] ≤ 4N . We say that the algorithm has accuracy k if 3σ[T̄ ] ≤ k. Picking
c ∈ C involves m random numers so it takes A · m, A constant. Testing if xj ∈ Γ (c)
takes less then Bm, constant B. Therefore, the expected time of the algorithm is:
N
m · (A + B|Ω|)
1−κ
where κ is Shafer’s conflict measure (??). The expected time to achieve accuracy k
9
turns out to be 4(1−κ)κ 2 m · (A + C|Ω|) for constant C, better in the case of simple
support functions.
In conclusion, unless κ is close to 1 (highly conflicting evidence), Dempster’s com-
bination is feasible for large values of m (the number of belief functions to combine)
and large cardinality of the hypothesis space Ω.
An improved version of the algorithm was proposed by Wilson and Moral for the
case in which trials are not independent but form a Markov chain (Markov-Chain-
Monte-Carlo). This is based on a non-deterministic operator OP ERAT IONi
which changes at most the i-th coordinate c0 (i) of a code c0 to y, with chance Pi (y):
136 4 Reasoning with belief functions
where BELN
K (c0 ) is the output of Algorithm 4.4.3
A further step based on importance sampling, in which we pick samples c1 , ..., cN
according to an “easy to handle” probability distribution P ∗ , was later proposed.
Assign to each sample a weight wi = PP∗(c) ∗
(c) . If P (c) > 0 implies P (c) > 0
P
i wi
then the average Γ (c N)⊆X is an unbiased estimator of Bel(X). Obviously we
want to try to use P ∗P
as close as possible to the real one. In [] strategies are proposed
to compute P (C) = c P (c).
Resconi et al. achieved a speed-up of the Monte-Carlo method by using a physi-
cal model of the belief measure as defined in the ToE. Conversely, in [786] Kramosil
adapted the Monte-Carlo estimation method to belief functions.
A similar result holds when the belief function to combine are dychotomic on ele-
ments of a coarsening of Ω.
The computation of a specific plausibility value P l(A) is therefore linear in the
size of Ω (as only elements of A and not its subsets are involved). However, the
number of events A themselves is still exponential - this is address by later authors.
Gordon and Shortliffe’s diagnostic trees Gordon and Shortliffe are interested
in computing degrees of belief only for events forming a hierarchy (diagnostic tree,
see Figure 4.3). This is motivated by the fact that in some applications certain events
are not relevant, e.g. certain classes of diseases in medical diagnosis. Their scheme
Fig. 4.3. An example of Gordon and Shortliffe’s diagnostic tree, from [].
combines simple support functions focused on or against the nodes of the tree, and
138 4 Reasoning with belief functions
Shafer and Logan’s hierarchical evidence In response, Shafer and Logan pro-
posed an exact implementation of linear complexity for the combinition of hierar-
chical evidence of a more general type. Indeed, although evidence in their scheme
is still focussed on nodes of a tree, it produces degrees of belief for a wider collec-
tion of hypotheses, including the plausibility values of the node events. The scheme
operates on local families of hypotheses, formed by a node and its children.
Namely, suppose again that we have a dichotomous belief function for every
↓
non-terminal node A. Let ϑ be the set of non-terminal nodes; let BelA = ⊕{BelB :
↑
B < A} (the vacuos b.f. when A is a terminal node); let BelA = ⊕{BelB : B 6<
A, B 6= A} (vacuos if A = Θ is the root node), and define:
L ↓ U ↑
BelA = BelA ⊕ BelA , BelA = BelA ⊕ BelA .
The goal of the scheme is to compute BelT = ⊕{BelA : A ∈ ϑ} (note that this is
↓ ↑
equal to BelA ⊕ BelA ⊕ BelA for any node A).
Barnett’s technique can be applied to (1) to further improve efficiency. The al-
gorithm starts from terminal nodes and works its way up the tree until we get:
L L
BelA (A), BelA (Ā) ∀A ∈ ϑ.
4.4 Efficient computation 139
In stage 2 we start from the family whose parent is Ω andwe work our way down
the tree until we obtain:
↑ ↑
BelA (A), BelA (Ā) ∀A ∈ ϑ.
similarly for BelT (Ā). Throughout the algorithm, at each node A of the tree twelve
belief values need to be stored.
P ∩ P1 ∩ ... ∩ Pn 6= ∅
i.e., ...
Qualitative Markov trees Given a tree, deleting a node and all incident edges yields
a forest. Let us denote the collection of nodes of the j-th subtree by αm (j).
Definition 44. A qualitative Markov tree (QMT) is a tree of partitions such that for
every node i the minimal refinements of partitions in αm (j) for j = 1, ..., k are QCI
given Ψi .
A Bayesian causal tree becomes a qualitative Markov tree whenever we asso-
ciate each node B with the partition ΨB associated with the random variable vB .
A QMT remains such if we insert between parent and child a node associated with
their common refinement. Qualitative Markov trees can also be constructed from
diagnostic trees (see ?? for an example extracted from []) - the same interpolation
property holds in this case as well.
Propagation Assume now that each belief function to combine is carried by a par-
tition (node) in a qualitative Markov tree. The bottom line of Shenoy and Shafer’s
propagation scheme is to replace Dempster’s combination over the whole frame Ω
with multiple implementations over the partitions associated with the nodes of a
QMT. In a message-passing style, a “processor” located at each node Ψi combines
belief functions using Ψi as a frame and projects b.f.s to its neighbours.
The operations performed by each processor node can be summarised as follows
(see Figure 4.4.4).
1. it sends Beli to its neighbours;
2. whenever the node receives a new input, it computes
3. it computes
Fig. 4.4. Left: a qualitative Markov tree constructed from a diagnostic tree. Right: graphical
representation of a local processor’s operations in the Shenoy-Shafer architecture.
Fast division architecture Markov trees and clique trees are the alternative rep-
resentations of valuation networks and belief networks that are used by local com-
putation techniques for efficient reasoning ([1452]). Bissig, Kohlas and Lehmann
propose an architecture called Fast-Division architecture ([103]) for Dempster’s
rule computation, that has the advantage, with respect to the Shenoy-Shafer and the
Lauritzen-Spiegelhalter architectures, of guaranteeing the intermediate results to be
belief functions. Each of them has a Markov tree as the underlying computational
structure.
Cano’s directed acyclic networks When the evidence is ordered in a complete
direct acyclic graph it is possible to formulate algorithms with lower computational
complexity ([85]).
Shafer and Shenoy’s valuation networks
Ordered valuation algebras Haenni (2003) [576] brought forward a generic ap-
proach of approximating inference based on the concept of valuation algebras. Con-
venient resource-bounded anytime algorithms are presented, in which the maximal
computation time is determined by the user.
networks (Cano et al); evidential networks with conditional belief functions (Xu
and Smets); a graphical representation of valuation-based systems (VBS), called
valuation networks (Shenoy); and Ben Yaghlane and Mellouli’s Directed Evidential
Networks.
Evidential networks with conditional belief functions In [1456], Xu and Smets
used conditional belief functions (a la Dempster) to represent relations between
variables in evidential networks, and presented a propagation algorithm for such
networks. ENCs contain a directed acyclic graph with conditional beliefs defined
in a different manner from conditional probabilities in Bayesian networks (BNs), as
edges represent the existence of a conditional belief function, while no form of inde-
pendence assumed. Also, ENC were initially defined only for binary (conditional)
relationships.
Directed Evidential Networks Ben Yaghlane and Mellouli later generalised ENCs
to any number of nodes, proposing their Directed Evidential Network (DEVNs).
These are directed acyclic graphs (DAGs) in which directed arcs describe the con-
ditional dependence relations expressed by conditional BFs for each node given its
parents. New observations introduced in the network are represented by belief func-
tions allocated to some nodes.
Given n BFs Bel1 , ..., Beln over X1 , ..., Xn , the goal if to compute the marginal
on Xi of their joint belief function. DENs use the generalised Bayesian theorem
(GBT, see Section 4.3.2) to compute the posterior Bel(x|y) given the conditional
Bel(y|x). The marginal is computed for each node by combining all the messages
received from its neighbors and its own prior belief:
BelX = Bel0X ⊕ BelY →X , BelY →X (x) = y⊆ΘY m0 (y)Bel(x|y)
P
Let < be a preference relation on F, such that f < g means that f is at least as
desirable as g. Savage (1954) showed that < verifies a number of sensible rationality
requirements iff there exists a probability measure P on Ω and a utility function
u : X → R such that:
∀f, g ∈ F, f < g ⇔ EP (u ◦ f ) ≥ EP (u ◦ g)
where EP denotes the expectation w.r.t. P . Also, P and u are unique up to a posi-
tive affine transformation. Does such a result imply that basing decisions on belief
functions is irrational?
The answer is no, and indeed several authors have proposed decision making
frameworks under belief function uncertainty based on (generalisations of) utility
theory.
Strat’s decision framework Perhaps the first one who noted the lack in Shafer’s
theory of belief functions of a formal procedure for making decision was Strat in
[1305]. He proposed a simple assumption that disambiguates decision problems rep-
resented as b.f.s, maintaining the separation between evidence carrying information
about the decision problem, and assumptions that has to be made to disambiguate
the choices. He also showed how to generalize the methodology for decision analy-
sis employed in probabilistic reasoning to the use of belief functions, allowing their
use within the framework of decision trees.
Strat’s decision apparatus is based on computing intervals of expected values,
and assumes that the decision frame Ω is itself a set of scalar values (e.g. dollar
values, see Figure 4.5). In other words, it does not distinguish between utilities and
elements of Ω (returns), so that an interval of expected values can be computed:
E(Ω) = [E∗ (Ω), E ∗ (Ω)], where
. X . X
E∗ (Ω) = inf(A)m(A), E ∗ (Ω) = sup(A)m(A).
A⊆Ω A⊆Ω
He argues that this is not good enough to make a decision - for instance, should we
pay a 6$ ticket when the expected interval is [5$, 8$]?
In response, Strat identifies the probability ρ that the value assigned to the hidden
sector is the one the player would choose (1 − ρ is the probability that the sector is
chosen by the carnival hawker). Then [1305]:
Proposition 25. The expected value of the mass function of the wheel is E(Ω) =
E∗ (Ω) + ρ(E ∗ (Ω) − E∗ (Ω)).
To decide whether to play the game we only need to assess ρ. Basically, this amounts
to a specific probability transform (like the pignistic one) - Lesh, 1986 had also
proposed a similar approach.
Schubert [1121] subsequently studied the influence of the ρ parameter in Strat’s
decision apparatus.
144 4 Reasoning with belief functions
Decision making in the TBM In the TBM, decision making is done by maximising
the expected utility of actions based on the pignistic transform. The set of possible
actions F and the set Ω of possible outcomes are distinct, and the utility function
is defined on F × Ω. In [] Smets proved the necessity of the pignistic transform by
maximizing the expected utility:
X
E[u] = u(f, ω)P ign(ω)
ω∈Ω
Elouedi, Smets et al. ([460], [459]) adapted the decision tree technique to the
presence of uncertainty about the class value, that is represented by a belief function.
A decision system based on the Transferable Belief Model was developed [1462]
and applied to a waste disposal problem by Xu et al.
Both Xu and Yang [1506] propose a decision calculus in the framework of val-
uation based systems [1459] and show that decision problems can be solved using
local computations.
A classical example of how Knightian uncertainty empirically affects human
decision making is provided by Ellsberg’s paradox [].
Given a belief function Bel on Ω and a utility function u, this theorem supports
making decisions based on the Choquet integral of u with respect to Bel or P l.
Upper and lower expected utilities For finite Ω, it can be shown that:
X
CBel (u ◦ f ) = m(B) min u(f (ω)),
ω∈B
B⊆Ω
X
CP l (u ◦ f ) = m(B) max u(f (ω)).
ω∈B
B⊆Ω
Let P(Bel) as usual be the set of probability measures P compatible with Bel, i.e.,
such that Bel ≤ P . Then, it follows that:
Decision criteria For each act f we have two expected utilities E(f ) and E(f ).
How do we make a decision? Various decision criteria can be formulated, based on
interval dominance:
1. f < g iff E(u ◦ f ) ≥ E(u ◦ g) (conservative strategy);
2. f < g iff E(u ◦ f ) ≥ E(u ◦ g) (pessimistic strategy);
3. f < g iff E(u ◦ f ) ≥ E(u ◦ g) (optimistic strategy);
4. f < g iff
Expected utility interval decision rule In [10] a non-ad hoc decision rule based on
the expected utility interval is proposed. The authors study the effect of redistribut-
ing the confidence levels after getting rid of propositions to reduce computational
complexity. The eliminated confidence levels can in particular be assigned to igno-
rance, or uniformly added to the remaining propositions and to ignorance.
A work of Beynon et al. [92] explores the potentiality of the theory of evidence as
an alternative approach to multicriteria decision modeling.
A number of decision rules not based on the application of utility theory to the
result of a probability transform have also been proposed, for instance by Troffaes.
Most of those proposals are based on order relations between uncertainty measures
[Denoeux], in particular the least commitment principle, the analogous of maximum
entropy in belief function theory. DETAILS FROM TROFFAES
Since the late Seventies, the need of a general formulation of the theory of evi-
dence to continuous domains has been recognized. Indeed, the original formulation
of the theory of evidence summarized in Chapter 2 was inherently linked to finite
frames of discernment. Numerous proposals have been brought forward since in or-
der to extend the notion of belief function to infinite hypothesis sets. Among them,
Shafer’s allocations of probability [], Nguyen’s random sets [], Strat and Smets’ ran-
dom closed intervals [], XXX’s generalised evidence theory [] and Kroupa’s belief
functions on MV algebras [].
The first attempt (1979) is due to Shafer himself, and goes under the name of al-
locations of probabilities ([1152]). Shafer proved that every belief function can be
represented as an allocation of probability, i.e. a ∩-homomorphism into a positive
and completely additive probability algebra, deduced from the integral represen-
tation due to Choquet. For every belief function Bel defined on a class of events
E ⊆ 2Ω there exists a complete Boolean algebra M, a positive measure µ and an
allocation of probability ρ between E and M such that Bel = µ ◦ ρ.
Two regularity conditions for a belief function over an infinite domain are con-
sidered: continuity and condensability .
Canonical continuous extensions of belief functions defined on “multiplicative
subclasses” E to an arbitrary power set can then be introduced by allocation of
4.6 Continuous formulations 147
Indeed there are many such extensions, of which Bel is the minimal one.
The proof is based on the existence of an allocation for the desired extension.
Note the similarity with the superadditivity axiom - the notion is also related to
that of inner measure (Section 3.1.3), which provides approximate belief values for
subsets outside the initial sigma-algebra.
What about evidence combination? The condensability property ensures that
the Boolean algebra M represents intersection properly for arbitrary (not just finite)
collections B of subsets:
^
ρ(∩B) = ρ(B) ∀B ⊂ 2Ω ,
B∈B
Almost at the same time, Strat [Strat84] and Smets had the idea of making the prob-
lem tractable via the standard methods of calculus by allowing only focal elements
which are closed intervals of the real line.
Fig. 4.6. Strat’s representation of belief functions on intervals - finite case (from [?]). Left:
frame of discernment for unit-length sub-intervals of [0, 4]. Right: how to compute belief and
plausibility values for a sub-interval [a, b] (Strat’s notation).
Strat’s initial idea is very simple. Take a real interval I and split it into N bits.
Define as frame of discernment the set of possible intervals with such extreme
points: [0, 1), [0, 2), [1, 4], etcetera. A belief function there has therefore ∼ N 2 /2
possible focal elements, so that its mass function lives on a discrete triangle (see
Figure 4.6-left), and one can compute belief and plausibility values simply by inte-
gration (right).
This idea trivially generalises to all arbitrary intervals of I as in Figure 4.7 Belief
Fig. 4.7. Strat’s representation in the case of arbitrary sub-intervals [a, b].
1 a N
Z Z
Bel1 ⊕ Bel2 ([a, b]) = m1 (x, b)m2 (a, y) + m2 (x, b)m1 (a, y)+
K 0 b
+m1 (a, b)m2 (x, y) + m2 (a, b)m1 (x, y) dydx
and can be easily extended to the real line, by considering belief functions defined
on the Borel σ-algebra of subsets of R generated by the collection I of closed
intervals. The theory provides also a way of building a continuous belief function
from a pignistic density, by applying the least commitment principle and assuming
unimodal pignistic PDFs, namely:
dBet(s)
Bel(s) = −(s − s̄) ,
ds
where s̄ is such that Bet(s) = Bet(s̄). For example, a normally distributed pignistic
function Bet(x) = N (x, µ, σ) generates a contonuous belief functions of the form
2
Bel(y) = √2y 2π
e−y , where y = (x − µ)/σ.
(C,A,P)
V(c)
c
Γ
U(c)
– a fuzzy set onto the real line induces a mapping to a collection of nested intervals,
parameterised by the level c (Figure 4.9-left);
– a p-box, i.e, a pair of upper and lower bounds to a cumulative distribution function
(see Section 5.8) also induces a family of intervals (Figure 4.9-right).
F*
Γ(c)
Γ(c)
c
c
x
x
0
0
U(c)
V(c)
U(c)
V(c)
The approach based on Borel sets of real line has seemed to prove more fer-
tile than more general approaches such as random sets or allocations of probabil-
ity. Generalizations of combination and conditioning rules follow quite naturally
[Smets]; inference with predictive belief function on real numbers have been pro-
posed [Denoeux]; the calculation of the pignistic probability for continuous b.f.s is
straightforward, allowing TBM-style decision making with continuous BFs.
An interesting open problem within the Borel formulation of continuous b.f.s is
therefore the generalization of other probability transforms to the continuous case.
The extension of the author’s geometric approach to random closed interval has
been recently initiated by Kroupa et al [Kroupa10].
4.6.5 MV algebras
A new, interesting approach studies belief functions in a more general setting than
that of a Boolean algebras of events, inspired by generalization of classical prob-
ability towards “many-valued” events, such as those resulting from formulas in
Lukasiewicz infinite-valued logic.
152 4 Reasoning with belief functions
1 = ¬0 f g = ¬(¬f ⊕ ¬g), f ≤ g if ¬f ⊕ g = 1.
Definition 50. A state is a mapping s : M → [0, 1] such that s(1) = 1 and s(f +
g) = s(f ) + s(g) whenever f g = 0.
Belief functions on MV algebras Now, consider the MV-algebra [0, 1]P(X) of all
functions P(X) → [0, 1], where X is finite. Let ρ : [0, 1]X → [0, 1]P(X) be defined
as:
min{f (x), x ∈ B} B 6= ∅;
ρ(f )(B) =
ρ(f )(B) = 1 otherwise.
Definition 51. Bel : [0, 1]X → [0, 1] is a belief function on [0, 1]X if there is a
state on the MV-algebra [0, 1]P(X) such that s(1∅ ) = 0 and Bel(f ) = s(ρ(f )), for
every f ∈ [0, 1]X . The state s is called a state assignment.
Fig. 4.10. Relationships between classical belief functions on P(X) and belief functions on
[0, 1]X (from [?]).
All standard properties of classical b.f.s are met (e.g. superadditivity). In addi-
tion, the set of belief functions on [0, 1]X is a simplex whose extreme points corre-
spond to the generalisation of categorical b.f.s (see Chapter ??).
4.7.1 Classification
Xi
(Xi,mi)
di
di
X
X
?
Fig. 4.11. Left: classification is about finding out the class label of a test point “?” given the
information provided by a training set whose elements are labelled as belonging to specific
classes. Middle: principle of the k-nearest neighbour (k-NN) classifier. Right: evidential k-
NN classifier.
Evidential K-NN Let Nk (x) ⊂ L denote the set of the k nearest neighbors of
x in L, based on some appropriate distance measure d. Each xi ∈ Nk (x) can be
considered as a piece of evidence regarding the class of x represented by a mass
function mi on Ω:
The strength of this evidence decreases with the distance di between x and xi - ϕ
M function such that limd→+∞ ϕ(d) = 0. Evidence is then pooled as:
is a decreasing
m = mi . The function ϕ can be fixed heuristically or selected among a
xi ∈Nk (x)
family {ϕθ |θ ∈ Θ} using, e.g., cross-validation. Finally, the class with the highest
plausibility is selected.
Evidential k-NN rule for partially supervised data In some applications, training
instances are labeled by experts or indirect methods, without the use of ground truth.
As the class labels themselves of the training data are uncertain we have a partially
supervised learning problem. The training set can be formalised as:
L = {(xi , mi ), i = 1, . . . , n},
156 4 Reasoning with belief functions
where xi is the attribute vector for instance i, and mi is a mass function representing
uncertain expert knowledge about the class yi of instance i.
Special cases are:
– mi ({ωk }) = 1 for all i (supervised learning);
– mi (Ω) = 1 for all i (unsupervised learning).
The evidential k-NN rule can easily be adapted to handle such uncertain learning
data (Figure 4.11-right). Each mass function mi is first “discounted” (see Section
??) by a rate depending on the distance di :
M
The k diuscounted mass functions m0i are then combined: m = m0i .
xi ∈Nk (x)
4.7.2 Clustering
0,44 0,44
A B A B
Fig. 4.12. Pairwise preferences in the example from Tritchler & Lockwood.
Assuming the existence of a unique consensus linear ordering L∗ and seeing the
expert assessments as sources of information, what can we say about L∗ ?
4.7 A toolbox for the working scientist 157
Formalisation In this problem the frame of discernment is the set L of linear orders
over O. Each pairwise comparison (oi , oj ) yields a pairwise mass function mΘij on
a coarsening Θij = {oi oj , oj oi } with:
mΘij (oi oj ) = αij , mΘij (oj oi ) = βij , mΘij (Θij ) = 1 − αij − βij .
The mass assignment mΘij may come from a single expert (e.g., an evidential clas-
sifier) or from the combination of the evaluations of several experts.
Let Lij = {L ∈ L|(oi , oj ) ∈ L}. Vacuously extending mΘij in L yields
mΘij ↑L (Lij ) = αij , mΘij ↑L (Lij ) = βij , mΘij ↑L (L) = 1 − αij − βij .
Subsequently combining the pairwise mass functions using Dempster’s rule pro-
duces: M
mL = mΘij ↑L .
i<j
L
The plausibility of the combination m is:
1 Y
pl(L) = (1 − βij )`ij (1 − αij )1−`ij ,
1 − κ i<j
where `ij = 1 if (oi , oj ) ∈ L, 0 otherwise (an algorithm for computing the degree
of conflict κ has been given by [Tritchler & Lockwood, 1991]).
Its logarithm pl(L) can be maximized by solving the following binary integer pro-
gramming problem:
X 1 − βij
max `ij ln
`ij ∈{0,1}
i<j
1 − αij
subject to:
`ij + `jk − 1 ≤ `ik , ∀i < j < k (1)
`ik ≤ `ij + `jk , ∀i < j < k (2)
Constraint (1) ensures that `ij = 1 and `jk = 1 ⇒ `ik = 1, while (2) ensures that
`ij = 0 and `jk = 0 ⇒ `ik = 0.
In conclusion, belief calculus allows us to model uncertainty in paired compar-
isons. The most plausible linear order can be computed efficiently using a binary
linear programming approach. Such approach has been applied to label ranking,
in which the task is to learn a “ranker” that maps p-dimensional feature vectors x
describing an agent to a linear order over a finite set of alternatives, describing the
agent’s preferences [Denœux and Masson, 2012]. As described in Section X, the
method can easily be extended to the elicitation of belief functions from preference
relations [Denœux and Masson. AOR 195(1):135-161, 2012].
4.7.4 Regression
[956] The analysis of classical linear regression models according to the ideas and
principles of the Dempster-Shafer Theory of Evidence is presented. Assumption-
based reasoning plays a central role in the analysis and the Theory of Hints is used
158 4 Reasoning with belief functions
4.7.5 Estimation
4.7.6 Optimisation
[1072]
This paper proposes solution approaches to the belief linear programming
(BLP). The BLP problem is an uncertain linear program where uncertainty is ex-
pressed by belief functions. The theory of belief function provides an uncertainty
measure that takes into account the ignorance about the occurrence of single states
of nature. This is the case of many decision situations as in medical diagnosis, me-
chanical design optimization and investigation problems. We extend stochastic pro-
gramming approaches, namely the chance constrained approach and the recourse
approach to obtain a certainty equivalent program. A generic solution strategy for
the resulting certainty equivalent is presented.
4.8 Advances
Belief functions are rather complex mathematical objects - thus, they possess links
with a number of fields of (applied) mathematics, on one side, and lead to interesting
generalisations of standard results of classical probability (e.g. Bayes’ theorem, total
probability), on the other.
Indeed many new results have been recently achieved, proving that the discipline
is alive and evolving towards maturity. It is useful to briefly mention some remark-
able results concerning the major open problems of the fields, in order to appreciate
the collocation of the work developed in Part II, too.
The work of Roesmer ([1089]) deserves a note for its original connection be-
tween nonstandard analysis and theory of evidence.
Given an ordering of the subsets of Ω mass, belief, and plausibility functions can be
represented as vectors, which we can denote by m, bel and pl. Various operations
with belief functions can then be expressed via linear algebra operators acting on
vectors and matrices.
We can define the negation of a mass vector m as: m(A) = m(A). Smets has
shown that m = Jm, where J is the matrix whose inverse diagonal is made of 1s
4.8 Advances 159
(FIX). Given a mass vector m, the vector bel of belief values turns out to be bel =
Bf rM m, where Bf rM is the transformation matrix such that Bf rM (A, B) = 1
iff B ⊆ A and 0 otherwise.
Notably, such transformation matrices can be built recursively, as
10
Bf rMi+1 = ⊗ Bf rMi ,
11
M f rB = Bf rM −1 , Qf rM = JBf rM J, M f rQ = JBf rM −1 J.
The vectors associated with normalised BFs and plausibilities can be computed as:
Bel = b − b(∅)1, pl = 1 − Jb.
Fig. 4.13. Detail of the FMT when Ω = {a, b, c}. The symbols a, ab, etcetera, denote m(a),
m(a, b) and so on.
Fig. 4.14. .
A number of norms have been introduced for belief functions, with the goal of ...
For example, people have proposed generalizations to belief functions of the
R∞
classical Kullback-Leibler divergence DKL (P |Q) = −∞ p(x) log( p(x) q(x) )dx of two
probability distributions P, Q, measures based on information theory such as fi-
delity, or entropy-based norms [Jousselme IJAR’11]. Many others have been pro-
posed [?, ?, ?, ?]. Any exhaustive analysis would be a huge task, although Jousselme
et al have managed to compile a very nice survey on the topic [].
Figure 4.15, extracted from [], summarises the main families of distances and
dissimilarities that have been proposed in the last twenty years or so.
Fig. 4.15. Some significant dissimilarity measures among belief functions proposed in the
last fifty years (from []).
Jousselme’s distance The most popular and cited measure of dissimilarity was pro-
posed by Jousselme et al [] as a “measure of performance” of algorithms (e.g. object
identification) in which successive evidence combination leads to convergence to the
“true” solution.
It is based on the geometric representation m of mass functions m, and reads
as:
4.8 Advances 161
Fig. 4.16. Empirical testing led Jousselme et al [] to the detection of four separate families of
dissimilarity measures.
r
. 1
dJ (m1 , m2 ) = (m1 − m2 )T D(m1 − m2 )
2
|A∩B|
where D(A, B) = |A∪B| for all A, B ∈ 2Θ . Jousselme’s distance so defined: (1)
is definite positive, thus it defines a metric distance; (2) takes into account the simi-
larity among subsets (focal elements); (3) is such that D(A, B) < D(A, C) if C is
“closer” to A than B. CLARIFY
(|A| − 1)(|B| − 1)
G(A, B) = ;
(|Θ| − 1)2
(note that the L1 distance was earlier introduced by Klir and Harmanec []);
162 4 Reasoning with belief functions
– Fixen and Mahler’s Bayesian Percent Attribute Miss (BPAM): induced by the
inner product
m01 P m2 ,
p(A∩B)
where P (A, B) = p(A)p(B) and p is an a-priori probability on Θ;
– Zouhal and Denoeux’s inner product of pignistic functions [];
– Dempster’s conflict κ and Ristic’s closely related “additive global dissimilarity
measure”: − log(1 − κ);
√ √
– “fidelity” or Bhattacharia coefficient: m1 T W m2 ;
– the family of information-based distances:
NEED FOR THEM? Various measures of uncertainty have been proposed - consult
for instance the 1990’s survey by Nikhil Pal [].
Some of themPare directly inspired by Shannon’s entropy of probability mea-
sures H(p) = − x p(x) log p(x). Yager’s entropy measure [] is a direct generali-
sation of Shannon’s entropy in which probabilities are replaced by plausibilities:
X
E(m) = − m(A) log P l(A).
A∈F
is the dual measure in which belief measures replace probabilities in classical en-
tropy.
A different class of measures is designed to capture the specificity of belief mea-
sures, such as:
X m(A)
N (m) = .
|A|
A∈F
This measures the dispersion of the pieces of evidence generating a belief function,
and is rather clearly related to the pignistic function.
Klir has proposed a different non-specificity measure (later extended by Dubois &
Prade): X
I(m) = m(A) log |A|.
A∈F
4.8 Advances 163
Composite measures such as Lamata and Moral’s E(m) + I(m) try to capture
both entropy and specificity. E(m), however, was criticised by Klir & Ramer for it
expresses conflict as A ∩ B = ∅ rather than B 6⊆ A. C(m), instead, was criticised
for it does not measure to what extent two focal elements disagree (i.e., the size of
A ∩ B).
Klir & Ramer’s proposed then a global uncertainty measure defined as: D(m) +
I(m), where
" #
X X |A ∩ B|
D(m) = − m(A) log m(B) .
|B|
A∈F B∈F
Pal [] later argued that none of them is really satisfactory: none of the composite
measures have a unique maximum; there is no sounding rationale for simply adding
conflct and non-specificity measures together to get a “total” one, and finally; some
are computationally very expensive.
In opposition, Harmanec’s Aggregated Uncertainty (AU) is defined as the max-
imal Shannon entropy of all consistent probabilities, obviously in the credal set in-
terpretation of b.f.s. As the author proved [], it is the minimal measure meeting a set
of rationality requirements including: symmetry, continuity, expansibility, subaddi-
tivity, additivity, monotonicity, normalisation. AU itself was once againa criticised
by Klir and Smith for being insensitive to arguably significant changes in evidence,
and replaced by a linear combination of AU and nonspecificity I(m). This is obvio-
suly still characterised by high computational complexity: in response, Jousselme et
al, 2006 brought forward their Ambiguity Measure (AM), as the classical entropy of
the pignistic function.
Indeed, one can prove that compatible frames and vector subspaces share the alge-
braic structure of semi-modular lattice.
In a family of frames we can define the following order relation:
together with its dule ≤∗ . Both (F, ≤) and (F, ≤∗ ) are lattices.
In particular, for sets of compatible frames Θ1 , ..., Θn these relations read as:
O
Θ1 , ..., Θn I1∗ ⇔ Θj ⊕ Θi 6= Θj ∀ j = 1, ..., n
i6=j
j−1
O
Θ1 , ..., Θn I2∗ ⇔ Θj ⊕ Θi = 0F ∀ j = 2, ..., n
i=1
On n
X
Θ1 , ..., Θn I3∗ ⇔ Θi − 1 = (|Θi | − 1)
i=1 i=1
Notably, relation I3∗ is equilavent to say that the dimension of the probability poly-
tope for the minimal refinement is the sum of the dimensions of the polytopes asso-
ciated with the individual frames.
The relationship between these lattice-theoretic forms of independence and in-
dependence of frames is summarised by the following diagram:
In the upper semimodular case IF is mutually exclusive with all lattice-theoretic
relations I1 , I2 , I3 . In the lower semimodular case IF is a stronger condition than
both I1∗ and I2∗ . IF is mutually exclusive with the third independence relation (a
form of matroidal [] independence).
9
x “covers” y (x y) if x ≥ y and there is no intermediate element in the chain linking
them
4.8 Advances 165
This analysis hints at the possibility that independence of sources may be ex-
plained algebraically. Although families of frames and projective geometries share
the same kind of lattice structure, independence of sources is not a form of lattice-
theoretic independence, nor a form of matroidal independence, but they are related
in a rather complex way.
A possible algebraic solution to the conflict problem (see Section 4.2.5) by
means of a “generalised Gram-Schmidt” can also be outlined.
Starting from a set of belief functions Beli : 2Θi → [0, 1] defined over Θ1 , · · · , Θn ,
we seek a new collection of independent frames of the same family:
Θ1 ⊗ · · · ⊗ Θn = Θ10 ⊗ · · · ⊗ Θm
0
.
Then, we project the n original b.f.s Bel1 , ..., Beln onto the new set of frames,
achieving a set of surely combinable belief functions Bel10 , ..., Belm
0
equivalent (in
some meaningful sense) to the initial collection of bodies of evidence.
The question of how to define an inverse operation to the Dempster combination rule
for basic probability assignments and belief functions possesses a natural motivation
and an intuitive interpretation. If Dempster’s rule reflects a modification of one’s
system of degrees of beliefs when the subject in question becomes familiar with
the degrees of beliefs of another subject and accepts the arguments on which these
degrees are based, the inverse operation would enable to erase the impact of this
modification, and to return back to one’s original degrees of beliefs, supposing that
the reliability of the second subject is put into doubts.
Within the algebraic framework this inversion problem was solved by Ph. Smets
in [1265].
166 4 Reasoning with belief functions
SMETS’ SOLUTION
Kramosil proposed a solution to the inversion problem within the measure-
theoretic approach ([790]).
167
168 5 The bigger picture
Fig. 5.1. The major approaches to uncertainty theories surveyed in this Chapter are arranged
into a hierarchy, in which less general frameworks are at the bottom and more general ones
at the top. A link between them indicates that the top formalism comprises the bottom one as
a special case.
5.1 Imprecise probability 169
Figure 5.1 arranges all these theories into a hierachy, according to their gener-
ality. Similar diagrams, for a smaller subset of methodologies, appear in [381] and
[728], among others.
Chapter outline
most extensive effort into a general theory of imprecise probabilities, whose gener-
ality is comparable with the theory based on arbitrary closed and convex probability
distributions, and is formalized in terms of lower and upper previsions [211].
A lower probability [477] P is a function from 2Θ , the power set of Θ, to the unit
interval [0, 1]. With any lower probability P is associated a dual upper probabil-
ity function P , defined for any A ⊆ Θ as P (A) = 1 − P (Ac ), where Ac is the
complement of A. With any lower probability P we can associate a (closed convex)
set n o
P(P ) = p : P (A) ≥ P (A), ∀A ⊆ Θ (5.1)
Definition 55. A lower probability P is called ‘tight’ (‘coherent’ for Walley’s) if:
or, equivalently:
" n
# n
X X
sup ξEi (x) − m · ξE0 (x) ≥ P (Ei ) − m · P (E0 ), (5.3)
θ∈Θ i=1 i=1
Consistency means that the lower bound constraints P (A) can indeed be sat-
isfied by some probability measure, while tightness indicates that P is the lower
envelope on subsets of P(P ). Any coherent lower probability is monotone and su-
peradditive.
5.1 Imprecise probability 171
The concepts of avoiding sure loss and coherence are also applicable to any func-
tional defined on a class of bounded functions on Θ (gambles). According to this
point of view, a lower probability is a functional defined on the class of all charac-
teristic (indicator) functions of sets.
The behavioural rationale for general imprecise probability theory derives from
equalling ‘belief’ to ‘inclination to act’. An agent believes in a certain outcome to
the extent it is willing to accept a gamble on that outcome. A gamble is a decision
which generated different utilities in different states (outcomes) of the world. The
following outline is abstracted from [?].
Definition 56. Let Ω be the set of possible outcomes ω. A gamble is a bounded
real-valued function on Ω: X : Ω → R, ω 7→ X(ω).
Clearly the notion of gamble is very close to that of utility (see Section ??). Note
that gambles are not constrained to be normalised or non-negative. Whether one is
willing to accept a gamble depends on their belief on the outcome.
Let us denote an agent’s set of desirable gambles by D ⊆ L(Ω), where L(Ω)
is the set of all bounded real valued functions on Ω. Since whether a gamble is
desirable depends on the agent’s belief on the outcome, D can be used as a model
of the agent’s uncertainty about the problem.
Definition 57. A set D of desirable gambles is coherent iff:
1. 0 (the constant gamble X(ω) = 0 for all ω) 6∈ D;
2. if X > 0 (i.e., X(ω) > 0 for all ω) then X ∈ D;
3. if X, Y ∈ D, then X + Y ∈ D;
4. if X ∈ D and λ > 0 then λX ∈ D.
As a consequence, if X ∈ D and Y > X then Y ∈ D. In other words, a coherent
set of desirable gambles is a convex cone (it is closed under convex combination).
Now, suppose the agent buys a gamble X for a price µ. This yields a new gamble
X − µ.
Definition 58. The lower prevision P (X) of a gamble X:
.
P (X) = sup{µ : X − µ ∈ D}
is the supremum acceptable price for buying X.
In the same way, selling a gamble X for a price µ yields a new gamble µ − X.
Definition 59. The upper prevision P (X) of a gamble X:
.
P (X) = inf{µ : µ − X ∈ D}
is the supremum acceptable price for selling X.
172 5 The bigger picture
Fig. 5.2. Interpretation of lower, upper and precise previsions in term of acceptability of
gambles (transactions).
and upper previsions can be defined, for any price in the interval [P (X), P (X)] we
remain undecided.
If the first condition is not met, there exists a positive combination of gambles which
the agent finds individually desirable which is not desirable to them. One conse-
quence of avoiding sure loss is that P (A) ≤ P (A). A consequence of coherence is
that lower previsions are subadditive: P (A) + P (A) ≤ lpr(A ∪ B) for A ∩ B 6= 0.
A precise prevision P is coherent iff: (i) P (λX + µY ) = λP (X) + µP (Y ); (ii)
if X > 0 then P (X) ≥ 0; (iii) P (Ω) = 1, and coincides with de Finetti’s notion of
coherent prevision.
Special cases of coherent lower/upper previsions include probability measures,
de Finetti previsions, 2-monotone capacities, Choquet capacities, possibility/necessity
measures, belief/plausibility measures, random sets but also probability boxes,
(lower and upper envelopes of) credal sets, and robust Bayesian models.
c ∈ {−∞, +∞} ∀f ∈ L
(5.4)
is called the natural extension of P .
When P is a classical (‘precise’) probability the natural extension agrees with the
expectation. Also, E(ξA ) = P (A) for all A iff P is coherent.
Natural extension and Choquet integrals of belief measures Both the Choquet
integral (4.20) with respect to monotone set functions (such as belief functions) and
the natural extensions of lower probabilities are generalizations of the Lebesgue
integral with respect to σ-additive measures. Wang and Klir [1390] investigated the
relations between Choquet integrals, natural extension and belief measures, showing
that the Choquet integral with respect to a belief measure is always greater than or
equal to the corresponding natural extension.
R
More precisely, the Choquet
R integral f dP for all f ∈ L is a nonlinear func-
tional on L, and (X, L, f dP ) is a lower prevision [1371]. It can be proven that
the latter is coherent when P is a belief measure, and:
R
Proposition 27. E(f ) ≤ f dP for any f ∈ L whenever P is a belief measure.
Conceptual autonomy of belief functions Baroni and Vicig [62] claim that the
answers to the questions .. tend to exclude the existence of intuitively appreciable
relationships between belief functions and coherent lower probabilities, confirming
the conceptual autonomy of belief functions with respect to imprecise probability.
When Θ is finite the last two requirements are trivially satisfied and can be disre-
garded. Monotone decreasing measures can be obtained by replacing ≤ with ≥ in
condition 2.
just as for belief functions (2.3). For infinitely monotone capacities, as we know, the
Moebius inverse (the basic probability assignment) is non-negative.
Klir et al. published an excellent discussion [731] on the relations between belief
and possibility theory [325, 814], and examined different methods for constructing
fuzzy measures in the context of expert systems.
The product of capacities representing belief functions is studied in [609]. The re-
sult ([609], Equation (12)) is nothing but the unnormalised Dempster combination
(or, equivalently, a disjunctive combination in which mass zero is assigned to the
empty set), and is proved to satisfy a linearity property (commutativity with convex
combination).
In [1489], Yager analysed a class of fuzzy measures generated by a belief measure,
seen as providing partial information about an underlying fuzzy measure. An entire
class of such fuzzy measures exists - the notion of entropy of a fuzzy measure is
used to select significant representatives from this class.
Given gλ (x) for all x ∈ Θ and λ, the values gλ (A) of the λ-measure for all subsets
A ∈ 2Θ are then determined by (5.6).
The following three cases must be distinguished:
P
1. if x gλ (x) < 1, then gλ is a lower probability and, thus, a superadditive
measure; λ is determined by the root of Equation (5.7) in the interval (0, ∞),
which
P is unique;
2. if x gλ (x) = 1 then gλ is a probability measure, λ = 0, and it is the only root
of P
Equation (5.7);
3. if x gλ (x) > 1 then gλ is an upper probability and, hence, a subadditive
measure; λ is determined by the root of (5.7) in the interval (−1, 0), which is
unique.
Finally, [1389] lower and upper probabilities based on λ-measures are special belief
and plausibility measures, respectively.
Proposition 29. [320] The lower and upper probability measures associated with
a feasible (‘reachable’) set of probability intervals are Choquet capacities of order
2, namely:
Probability intervals [218, 1429, 1326] were introduced as a tool for uncertain rea-
soning in [320, 964], where combination and marginalization of intervals were stud-
ied in detail. The authors also studied the specific constraints such intervals ought
to satisfy in order to be consistent and tight.
As pointed out for instance in [642], probability intervals typically arise through
measurement errors. As a matter of fact, measurements can be inherently of interval
nature (due to the finite resolution of the instruments). In that case the probability
interval of interest is the class of probability measures consistent with the measured
interval.
A set of constraints of the form (16.1) also determines a credal set: credal sets
generated by probability intervals are a sub-class of all credal sets generated by
lower and upper probabilities [1263]. Their vertices can be computed as in [320], p.
174.
A set of probability intervals may be such that some combinations of values
taken from the intervals do not correspond to any probability distribution function,
indicating that the intervals are unnecessarily broad.
Definition 63. A set of probability intervals is called feasible if and only if for each
x ∈ Θ and every value v(x) ∈ [l(u), u(x)] there exists a probability distribution
function p : Θ → [0, 1] for which p(x) = v(x).
If P(l, u) is not feasible, it can be converted to a set of feasible intervals via:
X X
l0 (x) = max l(x), 1 − u(y) , u0 (x) = min u(x), 1 − l(y) .
y6=x y6=x
In a similar way, given a set of bounds P(l, u) we can obtain lower and upper prob-
ability values P (A) on any subset A ⊆ Θ by using the following simple formulas:
X X X X
P (A) = max l(x), 1 − u(x) , P (A) = min u(x), 1 − l(x) .
x∈A x6∈A x∈A x6∈A
(5.10)
A generalised Bayesian inference framework based on interval probabilities is pro-
posed in [1].
Belief functions are also associated with a set of lower and upper probability
constraints of the form (16.1): they correspond therefore to a special class of interval
probability systems, associated with credal sets of a specific form.
The opposite problem of finding, given an arbitrary set of probability intervals, a be-
lief function such that (5.11) is met can only be solved whenever ([320], Proposition
14):
X X X X
l(x) ≤ 1, l(y) + u(x) ≤ 1 ∀x ∈ Θ, l(x) + u(x) ≥ 2.
x y6=x x x
In that case several pairs (b, pl) exist which satisfy (5.11): Lemmer and Kyburg
[828] have proposed an algorithm for selecting one. When proper and reachable
sets are considered, the first two conditions are trivially met.
The opposite question, namely approximating an arbitrary probability interval
with a pair belief/plausibility is also considered in [320] - it turns out that such
approximations only have focal elements of size less than or equal to 2 (Proposition
16).
where D is the data (evidence) and P r a prior on space of (first order) probability
distributions p.
Bounding probability is different from the approach of second-order or twodi-
mensional probability (e.g., Hoffman and Hammonds 1994; Cullen and Frey 1999)
in which uncertainty about probabilities is itself modeled with probability.
5.5 Fuzzy theory 179
for every family of subsets {Ai |Ai ∈ 2Θ , i ∈ I}, where I is an arbitrary set index.
1
http://www.scholarpedia.org/article/Possibility theory
180 5 The bigger picture
where M is the collection of all fuzzy subsets of Ω, A and X are two such fuzzy
subsets, m is a mass function defined this time on the collection of fuzzy subsets
on Ω (rather than the collection of crisp subsets, or power set), and I(A ⊆ X) is a
measure of how much the fuzzy set A is included in the fuzzy set X.
Indeed, defining the notion of inclusion for fuzzy sets is not trivial - various
measures of inclusion can and have been proposed.
Just as a fuzzy set is completely determined by its membership function, different
measures of inclusion between fuzzy sets are associated with a function I : X ×
Y → [0, 1], from which one can get [1438]:
^
I(A, B) = I A(x), B(y) .
x∈Θ
Among the most popular we can cite Lukasiewicz’s inclusion: I(x, y) = min{1, 1−
x − y}, proposed by Ishizuka [650], and Kleene-Dienes’: I(x, y) = max{1 − x, y},
supported by Yager [1476].
Although these extensions to belief theory all arrive at frameworks within which
both probabilistic and vague information can be handled, they are all restricted to
5.5 Fuzzy theory 181
finite frames of discernments. Moreover, it is unclear whether or not the belief and
plausibility functions so obtained satisfy subadditivity (respectively, superadditiv-
ity) (3.6) in the fuzzy environment. Biacino [96] studied fuzzy belief functions in-
duced by an infinitely monotone inclusion and proved that they are indeed lower
probabilities.
In response, Wu et al [1438] have recently developed a theory of fuzzy belief func-
tions on infinite spaces.
Numerous fuzzy extensions of belief theory have indeed been proposed [326, 439,
1508, 649]. Constraints on belief functions imposed by fuzzy random variables have
been studied in [1092, 767]. Fuzzy evidence theory was used for decision making
in [1513].
Lucas’ fuzzy-valued measure In [161] Lucas and Araabi proposed their own gen-
eralization of the Dempster-Shafer theory [1517] to a fuzzy valued measure.
182 5 The bigger picture
Yager’s work In [1495], Ronald Yager [1481, 1488] and D. Filev proposed a
combined fuzzy-evidential framework for fuzzy modeling. In another work [1487],
Yager investigated the issue of normalization (i.e., the assignment of non-zero val-
ues to empty sets as a consequence of the combination of evidence) in the fuzzy
Dempster-Shafer theory of evidence, proposing in response a technique called
‘smooth normalization’.
5.6 Logic
Many generalizations of classical logic in which propositions are assigned proba-
bility values [?] rather than truth values (0 or 1) have been proposed in the past2 .
As belief functions naturally generalize probability measures, it is quite natural to
define non-classical logic frameworks in which propositions are assigned belief val-
ues, rather than probability values.
This approach has been brought forward in particular by Ruspini [1100, 1099],
Saffiotti [1104], Josang [?], Haenni [572], and others.
In propositional logic, propositions or formulas are either true or false, i.e., their
truth value is either 0 or 1 [922]. Formally, an interpretation or model of a for-
mula φ is a valuation function mapping φ to the truth value ‘true’ (1). Each formula
can therefore be associated with the set of interpretations or models (or ‘hyper-
interpretations’ [1108]) under which its truth value is 1. If we define a frame of
discernment formed by all possible interpretations, each formula φ is associated
with the subset A(φ) of this frame which collects all its interpretations.
If the available evidence allows to define a belief function (or ‘bf-interpretation’
[1108]) on this frame of possible interpretations, each formula A(φ) ⊆ Θ is then
naturally assigned a degree of belief b(A(φ)) between 0 and 1 [1104, 572], measur-
ing the total amount of evidence supporting the proposition ‘φ is true’.
Alessandro Saffiotti, in particular, built in 1992 a hybrid logic attaching belief
values to the classical first-order logic, which he called belief functions logic (BFL)
2
https://en.wikipedia.org/wiki/Probabilistic logic
5.6 Logic 183
[1108], giving new perspectives on the role of Dempster’s rule. Many formal prop-
erties of first-order logic directly generalise to BFL.
Formally, BFL works with formula of the form F : [a, b], where F is a sentence of
a first-order language, and 0 ≤ a ≤ b ≤ 1. Roughly speaking, a is the degree of
belief that F is true, (1 − b) the degree of belief that F is false.
Definition 65. A belief function b is a bf-model of a belief function (bf-) formula
F : [a, b] iff b(F ) ≥ a and b(F ) ≤ b.
[465]
[1422, 1420]
184 5 The bigger picture
It is worth mentioning the work of Resconi, Harmanec et al. [1081, 595, 598, 1083,
596], who proposed the semantics of propositional modal logic as unifying frame-
work for various uncertainty theories, such as fuzzy set, possibility and evidential
theory, and established an interpretation of belief measures on infinite sets. This
work is closely related to that of Ruspini [1100, 1099], which is based on a form
of epistemic logic. Harmanec et al., however, use a more general system of modal
logic and also address the completeness of the interpretation. Ruspini’s approach,
instead, is a generalization of the method proposed by Carnap [?] for the develop-
ment of logical foundations of probability theory.
Modal logic is a type of formal logic primarily developed in the 1960s that
extends classical propositional logic to include operators expressing modality3 .
Modalities are formalised via modal operators. In particular, modalities of truth
include possibility (‘It is possible that p’, p) and necessity (‘It is necessary that p’,
p). These notions are often expressed using the idea of possible worlds: neces-
sary propositions are those which are true in all possible worlds, whereas possible
propositions are those which are true in at least one possible world.
Formally, the language of modal logic consists of a set of atomic propositions,
logical connectives ¬, ∨, ∧, →, ↔, and modal operators of possibility and neces-
sity . Sentences or propositions of the language are of the following form:
1. atomic propositions;
2. if p and q are propositions, so are ¬p, p ∧ q, p ∨ q, p → q, p ↔ q, p, and p.
A standard model of modal logic is a triplet M = hW, R, V i, where W denotes
a set of possible worlds, R is a binary relation on W called accessibility relation
(e.g. world v is accessible from world w when wRv), and V is the value assignment
function V (w, p) ∈ {T, F }, whose output is the truth value of proposition p in world
w. The accessibility relation expresses the fact that some things may be possible
in one world and impossible from the standpoint of another. Different restrictions
on the accessibility relation yield different classes of standard models. A standard
model M is called a T-model if R is reflexive.
The notation kpkM denotes the truth set of a proposition p (what we call above
‘hyper-interpretation’), i.e. the set of all worlds in which p is true:
kpkM = ww ∈ W, V (w, p) = T .
(5.13)
Proposition 33. [595] A finite T-model M = hW, R, V i that satisfies SVA induces
a basic probability assignment mM on 2Θ , defined by:
|kEA kM |
mM (A) =
|W |
where ^
EA = eA ∧ (¬(eB )) .
B⊂A
Proposition 34. [595] The modal logic interpretation of basic probability assign-
ments introduced in Proposition 33 is complete, i.e. for every rational-valued basic
probability assignment m on 2Θ , there exists a finite T-model M satisfying SVA such
that mM = m.
Within this approach, Besnard and Kohlas [89] model reasoning by consequence
relations in the sense of Tarski, showing that it is possible to construct evidence the-
ory on top of the very general logics defined by these consequence relations. Support
functions can be derived which are, as usual, set functions, monotone of infinite or-
der. Furthermore, plausibility functions can also be defined. However, as negation
need not be defined in these general logics, the usual duality relations between sup-
port and plausibility functions of Dempster-Shafer theory do not hold in general.
A great deal of other logic-based frameworks have been proposed [1097, 49, 1100,
1555, 1101, 80, 1054, 328] [548, 1011, 624, 590, 586, 581] [87, 1534, 32, 639, 591]
[569, 979, 970, 1557, 1110, 8] [93, 875, 78, 614, 180, 1410] [187, 190, 189].
The relationship between belief functions and rough set algebras was studied by Yao
and Lingras [1512]. Indeed, some very highly cited papers focus on this topic [].
5.8 Probability boxes 187
In a Pawlak rough set algebra, the qualities of lower and upper approximations
of a subset A ⊆ Θ are defined as:
. |apr(A)| . |apr(A)|
q(A) = , q(A) = .
|Θ| |Θ|
Clearly, the qualities of lower/upper approximations measure the fraction of defin-
able elements, over all those definable on Θ, involved in the approximation. This
recalls Harmanec’s modal logical interpretation of belief functions (5.14). Indeed, it
can be proven that Pawlak’s rough set algebra corresponds to modal logic S5 [1509],
in which the lower and upper approximation operators correspond to the necessity
and possibiliy operators. Furthermore:
Proposition 35. The quality of lower approximation q is a belief function, with ba-
|E|
sic probability assignment m(E) = |Θ| for all E ∈ Θ/R, 0 otherwise.
One issue with this interpretation is that belief and plausibility values obviously
need to be rational numbers. Therefore, given an arbitrary belief function Bel, it
may not be possible to build a rough set algebra such that q(A) = Bel(A).
The following results establishes a sufficient condition under which this is possible.
Proposition 36. Suppose Bel is a belief function on Θ with mass m such that:
1. the set of focal elements of Bel is a partition of Θ;
|A|
2. m(A) = |Θ| for every focal element of Bel.
Then there exists a rough sets algebra such that q(A) = Bel(A).
A more general condition can be established via inner and outer measures
(Chapter 4, Section 3.1.3).
Given a σ-algebra F of subsets of Θ, one can construct a rough set algebra such
that F = σ(Θ/R). Suppose P is a probability on F - then it can be extended to 2Θ
using inner and outer measures as follows:
n o
P∗ (A) = sup P (X)|X ∈ σ(Θ/R), X ⊆ A = P (apr(A))
n o
P ∗ (A) = sup P (X)|X ∈ σ(Θ/R), X ⊇ A = P (apr(A)).
Pawlak call these ‘rough probabilities’ of A - in fact, these are a pair of belief and
plausibility functions!
Note that the set of focal elements Θ/R is a partition of the universe (frame) Θ.
Therefore, Pawlak rough set algebra can only interpret belief function whose focal
elements form a partition of the frame of discernment. Nevertheless further gener-
alisations via serial rough algebras and interval algebras can be achieved [1512].
P-boxes and random sets/belief functions are very closely related. Indeed, every pair
of belief/plausibility functions Bel, P l defined on the real line R (a random set),
generates a unique p-box whose CDFs are all those consistent with the evidence
generating the belief function:
where
−1 . .
F (α) = inf{F (x) ≥ α}, F −1 (α) = inf{F (x) ≥ α}
are the ‘quasi-inverses’ of the upper and lower CDFs F and F , respectively.
5.8 Probability boxes 189
In an infinite random set belief and plausibility values are computed via the follow-
ing integrals:
Z Z
Bel(A) = I[Γ (ω) ⊂ A]dP (ω) P l(A) = I[Γ (ω) ∩ A 6= ∅]dP (ω),
ω∈Ω ω∈Ω
(5.17)
where Γ : Ω → 2Θ is the multi-valued mapping generating the random set (see
Chapter 3, Section 3.1.5). This is not trivial at all - however, we can use the p-box
representation of infinite random sets (5.15), with set of focal elements (5.16), to
compute approximations of similar integrals [25]. The idea is to index each of its
focal elements by a number α ∈ [0, 1].
Consider then the unique p-box (5.15) associated with the random set Bel. If
there exists a cumulative distribution function Fα for α over [0, 1] we can draw
values of α at random from it, obtaining sample focal elements of the underlying
random set (Figure 5.3). We can then compute the belief and plausibility integrals
(5.17) by adding the mass of the sample intervals.
Fig. 5.3. A p-box amounts to a multi-valued mapping associating values α ∈ [0, 1] with
closed intervals γ of R, i.e., focal elements of the underlying random set [25].
The joint focal element can be represented either by the hypercube γ = ×di=1 γi ⊆
X (Figure 5.4-left) or by the point α = [α1 , ..., αd ] ∈ (0, 1]d (Figure 5.4-right).
190 5 The bigger picture
Fig. 5.4. X representation (left) and α representation (right) of the focal elements sampled
from a p-box [25].
If all input random sets are independent, these integrals decompose into a series of
d nested integrals (see [25], Equation (36)).
Alvarez [25] has proposed the following Monte-Carlo approach to their calcula-
tion. For j = 1, ..., n:
1. randomly extract a sample αj from the copula C;
2. form the corresponding focal element Aj = ×i=1,...,d γid ;
3. assign to it mass m(Aj ) = n1 .
It can be proven that such an approximation converges as n → +∞ almost surely
to the actual random set.
While generalising p-boxes, these objects are a special case of random sets (∞-
monotone capacities) and thus a special case of probability intervals (Figure 5.5).
Fig. 5.5. Generalised p-boxes in the (partial) hierarchy of uncertainty measures [381].
Definition 69. An epistemic state is said to be consistent if the following five axioms
are satisfied:
1. for any propositions A, exactly one of the following conditions holds: (i) A is
believed; (ii) A is disbelieved; (iii) A is neither believed nor disbelieved;
2. ΘX is (always) believed;
3. A is believed if and only if Ac is disbelieved;
4. if A is believed and B ⊇ A, then B is believed;
192 5 The bigger picture
5.9.3 α-conditionalisation
Spohn proposed the following rule for modifying a disbelief function in light of new
information.
for all θ ∈ Θg .
As stated by Shenoy [1184], Spohn’s theory of epistemic beliefs shares the essential
abstract features of probability theory and of Dempster-Shafer theory, in particular:
(1) a functional representation of knowledge (or beliefs), (2) a rule of marginaliza-
tion, and (3) a rule of combination.
Shenoy [1184] goes on to show that disbelief functions can also be propagated via
local computations as shown in Chapter 4, Section 4.4.4 for belief functions.
Fig. 5.6. Quantisation (left) versus granularisation (right) of variable ‘Age’ (from [1538]).
embraces all possible mixtures and therefore accomodates most theories of uncer-
tainty.
Secondly, bivalence is abandoned throughout GTU, and the foundation of GTU
is shifted from bivalent logic to fuzzy logic. As a consequence, in GTU everything is
or is allowed to be a matter of degree or, equivalently, fuzzy. All variables are, or are
allowed to be ‘granular’, with a granule being ‘a clump of values ... which are drawn
together by indistinguishability, equivalence, similarity, proximity or functionality’
(see Figure 5.6).
Thirdly, one of the principal objectives of GTU is the capability to operate on
information described in natural language. As a result, a generalized constraint lan-
guage (GCL) is defined as the set of all generalized constraints together with the
rules governing syntax, semantics and generation. Examples of elements of GCL
are: (X is small) is likely; ((X, Y ) isp A) ∧ (X is B), where ‘isp’ denotes a proba-
bilistic constraint, ‘is’ denotes a possibilistic constraint, and ∧ denotes conjunction.
Eventually, in GTU computation/deduction is treated as an instance of question-
answering. Given a system of propositions described in a natural language p, and a
query q, likewise expressed in a natural language, GTU performs generalized con-
straint propagation governed by deduction rules that, in Zadeh’s words, ‘drawn from
the Computation/Deduction module. The Computation/Deduction module com-
prises a collection of agent-controlled modules and submodules, each of which con-
tains protoformal deduction rules drawn from various fields and various modalities
of generalized constraints’.
In the author’s view, generality is achieved by GTU in a rather nomenclative
way, which explains the complexity and lack of naturalness of the formalism.
5.11 Baoding Liu’s Uncertainty Theory 195
for all Cartesian products of events from individual uncertain spaces (Θk , Fk , Mk ).
However, the product axiom was introduced by Liu only in 2009 [2] (much after his
introduction of uncertain theory in 2002 [3]). Also, extension of uncertain measures
to any subset of a product algebra is rather cumbersome and unjustified (see Equa-
tion (1.10) in [860], or Figure 5.7, extracted from Figure 1.1 in [860]). More in
general, the justification the author provides for his choice of axioms is somewhat
lacking.
Based on such measures a straightforward generalisation of random variables
can then be defined (‘uncertain variables’), as measurable (in the usual sense) func-
tions from an uncertainty space (Θ, F, M) to the set of real numbers.
A set ξ1 , ..., ξn of uncertain variables are said to be independent if:
m
!
\
M {ξi ∈ Bi } = min M({ξi ∈ Bi }).
i
i=1
Fig. 5.7. Extension of rectangle to product algebras in Liu’s uncertain theory (from [860]).
The uncertain measure of Λ (the disk) is the size of the inscribed rectangle Λ1 × Λ2 if the
latter is greater than 0.5. Otherwise, if the inscribed rectangle of Λc is greater than 0.5, then
M(Λc ) is just its inscribed rectangle and M(Λ) = 1 − M(Λc ). If no inscribed rectangle
exists for either Λ or Λc which is greater than 0.5, then we set M(Λ) = 0.5.
every uncertain variable and are subject to rather complex and seemingly arbitrary
axioms (see [2], Definition 4), e.g.:
Z Z
sup λ(x) + ρ(x)dx ≥ 0.5 and/or sup λ(x) + ρ(x)dx ≥ 0.5, (5.18)
B B Bc Bc
where λ and ρ and nonnegative functions on the real line, and B is any Borel set of
real numbers. Equation (5.18) resounds the extension method of Figure 5.7. While
an uncertain entropy and an uncertain calculus are built by Liu on this basis, the
general lack of rigour and convincing justification for a number of elements of the
theory leaves the author of this book quite unimpressed with this work. No mention
of belief functions or other well-established alternative representations of subjective
probabilities is made in [860].
[77]
http://www.sciencedirect.com/science/article/pii/S0301479707000448
5.12.3 Others
5.12 Other formalisms 197
Granular computing Y.Y. Yao [1511] surveyed granular computing (GrC), in-
tended as a set of theories and techniques which make use of granules, i.e., groups
or clusters of concepts. In [1511] the author discussed basic issues of GrC, focussing
in particular on the construction of and computation with granules. A set-theoretic
model of granular computing was proposed, based on the notion of power algebras.
of intelligent behavior and are precursors to more complex reasoning. These con-
siderations lead to an evidential framework for representing conceptual knowledge,
wherein the principle of maximum entropy is applied to deal with uncertainty and
incompleteness. It is demonstrated that the proposed framework offers a uniform
treatment of inheritance and categorization, and can be encoded as an interpreter-
free, connectionist network.
In [1181] the author proposes an evidence combination rule which is incremen-
tal, commutative and associative and hence, shares most of the attractive features of
Dempster’s rule, while being ‘demonstrably better’ (in the author’s words) than the
Dempster’s rule in the context considered there.
Groen’s extension of Bayesian theory Groen and Mosleh (2004) [545] have pro-
posed an extension of Bayesian theory based on a view of inference according to
which observations are used to rule out possible valuations of the variables. The
extension is different from probabilistic approaches such as Jeffrey’s rule (see Sec-
tion 4.3.3), in which certainty in a single proposition A is replaced by a probability
on a disjoint partition of the universe, and Cheeseman’s rule of distributed mean-
ing [179], while non-probabilistic analogues are found in evidence and possibility
theory.
Inferential models In [919], Martin and Liu presented a new framework for proba-
bilistic statistical inference without priors, alternative to Fisher’s fiducial inference,
belief function theory and Bayesian inference with default priors, based on infer-
ential models (IMs). The framework provides data-driven probabilistic measures of
uncertainty about an unknown parameter, and does so with an automatic long-run
frequency calibration. The approach identifies an unobservable auxiliary variable,
associated with observable data and unknown parameter, and predicts it using a ran-
dom set before conditioning on data.
Padovitz’s unifying model In 2006 Padovitz et al. [1005] proposed a novel ap-
proach for representing and reasoning about context in the presence of uncer-
tainty, based on multi-attribute utility theory as the means to integrate heuristics
about the relative importance, inaccuracy and characteristics of sensory informa-
tion. The authors qualitatively and quantitatively compare their reasoning approach
with Dempster-Shafer’s sensor data fusion.
5.12 Other formalisms 199
Preference relations In [1428] Wong, Lingras and Yao argue that preference re-
lations can provide a more realistic model of random phenomena than quantitative
probability or belief functions. In order to use preference relations for reasoning un-
der uncertainty, it is necessary to perform sequential and parallel combinations of
propagated information in a qualitative inference network, which are discussed in
[1428].
203
204 6 The geometry of belief functions
From a more general point of view, the notion of representing uncertainty mea-
sures such as belief functions [1380] and probability distributions as points of a
certain space [110, 111, 279, 265, 960] can be appealing, as it provides a picture in
which different forms of uncertainty descriptions are unified in a single geometric
framework. Distances can there be measured, approximations sought, and decom-
positions easily calculated.
It is worth mentioning the work of P. Black, who devoted his doctoral thesis to the
study of the geometry of belief functions and other monotone capacities [110]. An
abstract of his results can be found in [111], where he uses shapes of geometric
loci to give a direct visualization of the distinct classes of monotone capacities. In
particular a number of results about lengths of edges of convex sets representing
monotone capacities are given, together with their ‘size’ meant as the sum of those
lengths.
Black’s work amounts therefore to a geometric analysis of belief functions as
special types of credal sets in the probability simplex.
In opposition, in this Chapter we introduce a geometric approach to the theory of
evidence, in which belief measures and the corresponding basic probability assign-
ments are represented by points in a (convex) belief space, immersed in a Cartesian
space.
Chapter Outline
A central role is played by the notion of belief space B, introduced in Section 6.1, as
the space of all the belief functions one can define on a given frame of discernment.
In Section 6.2 we characterize the relation between the focal elements of a belief
function and the convex closure operator in the belief space. In particular, we show
that every belief function can be uniquely decomposed as a convex combination
of ‘basis’ or ‘categorical’ belief functions, giving B the form of a simplex, i.e., the
convex closure of a set of affinely independent points.
In Section ??, instead, the Moebius inversion lemma (2.3) is exploited to investigate
the symmetries of the belief space. With the aid of some combinatorial results, a
recursive bundle structure of B is proved and an interpretation of its components
(bases and fibers) in term of important classes of belief functions is provided.
For instance, if the frame of discernment has cardinality three, Θ = {x, y, z}, each
such vector has the form:
6.1 The space of belief functions 205
h i0
v = v{x} , v{y} , v{z} , v{x,y} , v{x,z} , v{y,z} , vΘ .
X f
X
b(A) = ai · m(Ai )
A⊆Θ i=1
X f
X f
X
b(A) = m(Ai )2|Θ\A| ≤ 2|Θ|−1 m(Ai ) = 2|Θ|−1 · 1 = 2|Θ|−1 ,
A⊆Θ i=1 i=1
where the equality holds iff |Ai | = 1 for every focal element of b, i.e., b is Bayesian.
It is important to point out that P does not, in general, sell out the limit simplex
L. Similarly, the belief space does not necessarily coincide with the entire region
bounded by L.
The L1 distance (7.20) between a belief function and any Bayesian belief func-
tion p dominating it is not a function of p, and depends only on b. A probability
distribution satisfying the hypothesis of Theorem 8 is said to be consistent with b
[801]. Ha et al. [567] proved that the set P[b] of probability measures consistent
with a given belief function b can be expressed (in the probability simplex P) as the
sum of the probability simplexes associated with its focal elements Ai , i = 1, ..., k,
weighted by the corresponding masses:
k
X
P[b] = m(Ai )conv(Ai )
i=1
These preliminary results suggest that the belief space may have the form of a sim-
plex. To proceed in our analysis we need to resort to the axioms of basic probability
assignments (Definition 2).
Given a belief function b, the corresponding basic probability assignment can be
found by applying the Moebius inversion lemma (2.3), which we recall here:
X
m(A) = (−1)|A\B| b(B). (6.4)
B⊆A
Θ|
We can exploit it to determine whether a point b ∈ R|2 corresponds indeed to a
belief function, by simply computing the related b.p.a. and checking whether the
resulting m meets the axioms b.p.a.s
P must obey.
The normalization constraint A⊆Θ m(A) = 1 trivially translates into B ⊆ {b :
b(Θ) = 1}. The positivity condition is more interesting, for it implies an inequality
which echoes the third axiom of belief functions (cf. Definition 25 or [1149], page
5):
X X
b(A) − b(B) + · · · + (−1)|A\B| b(B) + · · ·
B⊆A,|B|=|A|−1 |B|=k
|A|−1
X (6.5)
· · · + (−1) b({θ}) ≥ 0 ∀A ⊆ Θ.
θ∈Θ
208 6 The geometry of belief functions
Example: ternary frame Let us see how these constraints act on the belief space
in the case of a ternary frame Θ = {θ1 , θ2 , θ3 }. After introducing the notation
Note that b(Θ) is not needed as a coordinate, for it can be recovered by normal-
ization. By combining the last equation in (6.6) with the others, it follows that the
belief space B is the set of points [x, y, z, u, v, w]0 of R6 such that:
0 ≤ x + y + z ≤ 1, 0 ≤ u + v + w ≤ 2.
.
After defining k = x + y + z, it necessary follows that points of B ougth to meet:
Now, all the positivity constraints of Equation (6.5) (which determine the shape of
the belief space B) are of the form:
X X
xi ≥ xj
i∈G1 j∈G2
where G1 and G2 are two disjoint sets of coordinates, as the above example and
Equation (6.6) confirm. It immediately follows that:
Proof. Let us consider two points of the belief space b0 , b1 ∈ B (two belief func-
tions) and prove that all the points bα of the segment b0 + α(b1 − b0 ), 0 ≤ α ≤ 1,
belong to B. Since b0 , b1 belong to B:
X X X X
x0i ≥ x0j , x1i ≥ x1j
i∈G1 j∈G2 i∈G1 j∈G2
|Θ|
where x0i , x1i are the i-th coordinates in R2 of b0 , b1 , respectively. Hence, for
every point bα with coordinates xα i we have that:
6.1 The space of belief functions 209
X X X X
xα
i = [x0i + α(x1i − x0i )] = x0i + α (x1i − x0i )
i∈G1 i∈G1 X X i∈G1 i∈G
X1 X
= (1 − α) x0i + α x1i ≥ (1 − α) x0j + α x1j
X i∈G1 i∈G1 X j∈G2 j∈G2
= [x0j + α(x1j − x0j )] = xα
j,
j∈G2 j∈G2
In the ternary example 6.1.3, the system of equations (6.6) exhibits a natural symme-
try which reflects the intuitive partition of the variables in two sets, each associated
with subsets of Θ of the same cardinality, respectively {x, y, z} ∼ |A| = 1 and
{u, v, w} ∼ |A| = 2.
It is easy to see that the symmetry group of B (i.e., the group of transformations
which leave the belief space unchanged) is the permutation group S3 , acting onto
{x, y, z} × {u, v, w} via the correspondence:
x ↔ w, y ↔ v, z ↔ u.
This observation can be extended to the general case of a finite n-dimensional frame
Θ = {θ1 , · · · , θn }. Let us adopt here for sake of simplicity the following notation:
.
xi xj ...xk = b({θi , θj , ..., θk }).
The symmetry of the belief space in the general case is described by the following
logic expression:
_ n−1
^ ^
xi xi1 · · · xik−1 ↔ xj xi1 · · · xik−1 ,
1≤i,j≤n k=1
{i1 , ..., ik−1 } ⊂ {1, ..., n} \ {i, j}
WV
where ( ) denotes the logical or (and), while ↔ indicates the permutation of pairs
of coordinates.
To see this, let us rewrite the Moebius constraints using the above notation:
k−1
X X
xi1 · · · xik ≥ (−1)k−l+1 xj1 · · · xjl .
l=1 {j1 ,...,jl }⊂{i1 ,...,ik }
210 6 The geometry of belief functions
Focussing on the right side of the equation, it is clear that only a permutation be-
tween coordinates associated with subsets of the same cardinality may leave the
inequality inalterate.
Given the triangular form of the system of inequalities (the first group concerning
variables of size 1, the second one variables of size 1 and 2, and so on), permuta-
tions of size-k variables are bound to be induced by permutations of variables of
smaller size. Hence, the symmetries of B are determined by permutations of single-
tons. Each such swap xi ↔ xj determines in turn a number of permutations of the
coordinates related to subsets containing θi and θj .
The resulting symmetry Vk induced by xi ↔ xj for the k-th group of constraints
is then: ∀{i1 , ..., ik−1 } ⊂ {1, ..., n} \ {i, j}
(xi ↔ xj ) ∧ · · · ∧ (xi xi1 · · · xik−1 ↔ xj xi1 · · · xik−1 ).
Since Vk is obviously implied by Vk+1 , and Vn is always trivial (as a simple check
confirms), the overall symmetry induced by a permutation of singletons is deter-
mined by Vn−1 , and by considering all the possible permutations xi ↔ xj we have
as desired.
In other words, the symmetries of B are determined by the action of the per-
mutation group Sn on the collection of cardinality-1 variables, and the action of Sn
naturally induced on higher-size variables by set-theoretical membership:
s ∈ Sn : Pk (Θ) → Pk (Θ)
(6.7)
xi1 · · · xik 7→ sxi1 · · · sxik ,
where Pk (Θ) is the collection of the size-k subsets of Θ.
It is not difficult to recognize in (6.7) the symmetry properties of a simplex, i.e.,
the convex closure of a collection v0 , v1 , ..., vk of k + 1 of affinely independent2
points (vertices) of Rm .
2
The points v0 , v1 , ..., vk are said to be affinely independent iff v1 − v0 , ..., vk − v0 are
linearly independent.
3
Here and in the rest of the Chapter we will denote both a belief function and the vector of
RN −2 representing it by b. This should not lead to confusion.
6.2 Simplicial form of the belief space 211
But
X 0 X X
b= m(B), ∅ =
6 A(Θ = m(B)bB = m(B)bB
B⊆A,B∈Eb B∈Eb B∈X
Fig. 6.1. The belief space B2 for a binary frame is a triangle in R2 whose vertices are the
categorical belief functions bx , by , bΘ focused on {x}, {y} and Θ, respectively.
Obviously a Bayesian belief function (a finite probability) is a b.f. with focal ele-
ments in the collection of singletons: Cb = {{x1 }, ..., {xn }}. Immediately by The-
orem 11
Corollary 4. The region of the belief space corresponding to probability functions
is the part of its border determined by all simple probabilities, i.e. the simplex2
P = Cl(bx , x ∈ Θ).
P is then an (n − 1)-dimensional face of B (whose dimension is instead N − 2 =
2n − 2 as it has 2n − 1 vertices).
Some one-dimensional faces of the belief space have also an intuitive meaning
in terms of belief. Consider the segments Cl(bΘ , bA ) joining the vacuous belief
function bΘ (mbΘ (Θ) = 1,mbΘ (B) = 0 ∀B 6= Θ) with the basis b.f. bA (??). Points
of Cl(bΘ , bA ) can be written as a convex combination as b = αbA +(1−α)bΘ . Since
convex combinations are b.p.a.s in B, such a belief function b has b.p.a. mb (A) = α,
mb (Θ) = 1 − α i.e. b is a simple support function
S focused on A (Chapter 2). The
union of these segments for all events A: S = A⊂Θ Cl(bΘ , bA ), is the region of
simple support belief functions on Θ. In the binary case (Figure ??-right) simple
support functions focused on {x} lie on the horizontal segment Cl(bΘ , bx ), while
simple support b.f. focused on {y} form the vertical segment Cl(bΘ , by ).
After giving an informal presentation of the way the b.p.a. mechanism induces
a recursive decomposition of B we will analyze the simple case study of a ternary
frame (6.3.1) to get an intuition on how to prove our conjecture on the bundle struc-
ture of the belief space in the general case, and give the formal definition of smooth
fiber bundle (6.3.2). After noticing that points of RN −2 outside the belief space can
be also seen as (normalized) sum functions (Section 6.3.3), we will proceed to prove
the recursive bundle structure of the space of all sum functions (Section 6.4). As B
is immersed in this Cartesian space it inherits a “pseudo” bundle structure (6.4.2) in
which bases and fibers are no more vector spaces but simplices in their own right
(Section 6.4.3), and possess meanings in terms of i-additive belief functions.
Let us then first consider the structure of the belief space for a frame of cardinality
n = 3: Θ = {x, y, z}, according to the principle of assigning mass recursively to
subsets of increasing size. In this case each BF b is represented by the vector:
Fig. 6.2. Bundle structure of the belief space in the case of ternary frames Θ3 = {x, y, z}.
and a number of fibers F(d) passing each through a point d of the base;
– points on the base are parameterized by the masses assigned to singletons d =
[mb (A), |A| = 1]0 , while points on the fibers have as coordinates the mass values
assigned to higher size events, [mb (A), 1 < |A| < n]0 ;
216 6 The geometry of belief functions
φα : π −1 (Uα ) → Uα × F
(6.16)
e 7→ (φ0α (e), φ00α (e))
T αβ = (T βα )−1 , T αβ T βγ T γα = 1. (6.18)
6.4 Recursive bundle structure 217
As the belief space does not exhaust the whole RN −2 it is natural to wonder whether
arbitrary points of RN −2 , possibly ‘outside’ B, have any meaningful interpretation
in this framework [265]. In fact, each vector v = [vA , ∅ ( A ⊆ Θ]0 ∈ RN −1
can be thought of as a set function ς : 2Θ \ ∅ → R s.t. ς(A) = vA . By applying
P functions ς we obtain another set function
the Möbius transformation (2.3) to such
mς : 2Θ \ ∅ → R such that ς(A) = B⊆A mς (B). In other words each vector ς of
RN −1 can be thought of as a sum function. However, contrarily to basic probability
assignments, the Möbius inverses mς of generic sum functions ς ∈ RN −1 are not
guaranteed to meet the non-negativity constraint: mς (A) 6≥ 0 ∀A ⊆ Θ.
Now, the section {v ∈ RN −1 : vΘ = 1} of RN −1 corresponds to the constraint
ς(Θ) = 1. Therefore,Pall the points of this section are sum functions meeting the
normalization axiom A⊂Θ mς (A) = 1 or normalized sum functions (n.s.f.s). Nor-
malized sum functions are the natural extensions of belief functions in our geometric
framework.
We can now reinterpret our analysis of the ternary case by means of the formal def-
inition of smooth fiber bundle. The belief space B3 can be in fact equipped with
a base (6.12), and a projection (6.13) from the total space R6 to the base, which
generates fibers of the form (6.14). However, the original definition of fiber bundle
requires the involved spaces to be manifolds, while the ternary case suggests we
have here to deal with simplices.
We can notice though how the idea of recursively assigning mass to subsets of in-
creasing size does not necessarily require the mass itself to be positive. In other
words, this procedure can be in fact applied to normalized sum functions, yielding a
classical fiber bundle structure for the space S = RN −2 of all NSFs on Θ, in which
218 6 The geometry of belief functions
all the involved bases and fibers are linear spaces. We will see in the following what
happens when considering proper belief functions.
Theorem 12. The space S = RN −2 of all the sum functions ς with domain on a
finite frame Θ of cardinality |Θ| = n has a recursive fiber bundle structure, i.e.,
there exists a sequence of smooth fiber bundles
n o
(i−1) (i) (i)
ξi = FS , DS , FS , πi , i = 1, ..., n − 1
(i−1)
X n n o
dim FS = = A ⊂ Θ : i ≤ |A| < n ,
k
k=i,...,n−1
(i−1)
each point ς i−1 of FS can be written as
h i0
ς i−1 = ς i−1 (A), A ⊂ Θ, i ≤ |A| < n
and the smooth direct product coordinates (6.16) at the i-th bundle level are
n o n o
φ0 (ς i−1 ) = ς i−1 (A), |A| = i , φ00 (ς i−1 ) = ς i−1 (A), i < |A| < n .
The projection map πi of the i-th bundle level is a full-rank differentiable application
(i−1) (i)
πi : F S → DS
ς i−1 7→ πi [ς i−1 ]
Bases and fibers are simply geometric counterparts of the mass assignment
mechanism. Having assigned a certain amount of mass to subsets of size smaller
than i, the fraction of mass attributed to size-i subsets determines a point on a linear
(i) (i)
space: DS . For each point of DS the remaining mass can “float” among the higher
(i)
size subsets, describing again a vector space FS .
As we have seen in the ternary example of Section 6.3.1, as the belief space is a
simplex immersed in S = RN −2 , the fibers of RN −2 do intersect the space of belief
6.4 Recursive bundle structure 219
functions too. B then inherits some sort of bundle structure from the Cartesian space
in which it is immersed. The belief space can also be recursively decomposed into
fibers associated with events A of the same size. As one can easily conjecture, the
intersections of the fibers of RN −2 with the simplex B are themselves simplices:
bases and fibers in the case of the belief space are therefore polytopes instead of
linear spaces. Due to the decomposition of RN −2 into basis and fibers, we can ap-
ply the non-negativity and normalization constraints which distinguish belief func-
tions from NSFs separately at each level, eliminating at each step the fibers passing
through points of the base that do not meet these conditions.
We first need a simple combinatorial result.
i−1 X
i−(m+1) n − (m + 1)
X X
Lemma 4. b(A) ≤ 1 + (−1) · b(B), and
m=1
i−m
|A|=i |B|=m
P P
the upper bound is reached when |A|=i mb (A) = 1 − |A|<i mb (A).
The bottom line of Lemma 4 is P that, given a mass assignment for events of
size 1, ..., i − 1 the upper bound for |A|=i b(A) is obtained by assigning all the
remaining mass to the collection of size i subsets.
Theorem 13. The belief space B ⊂ S = RN −2 inherits by intersection with the
recursive bundle structure of S a “convex”-bundle decomposition. Each i-th level
“fiber” can be expressed as
n o
(i−1) 1
FB (d , ..., di−1 ) = b ∈ B : Vi ∧ · · · ∧ Vn−1 (d1 , ..., di−1 ) , (6.20)
and depends on the mass assigned to lower size subsets dm = [mb (A), |A| = m]0 ,
(i)
m = 1, ..., i − 1. The corresponding “base” DB (d1 , ..., di−1 ) is expressed in terms
of basic probability assignments as the collection of BFs b ∈ F (i−1) (d1 , ..., di−1 )
such that
mb (A) = 0, ∀A : i < |A| < n
m (A) ≥ 0, ∀A : |A| = i
b
X X (6.22)
m b (A) ≤ 1 − m b (A).
|A|=i |A|<i
Simplicial and bundle structure coexist in the space of belief functions, both of them
consequences of the interpretation of belief functions as sum functions, and of the
basic probability assignment machinery. It is then natural to conjecture that bases
220 6 The geometry of belief functions
(i−1)
the collections of belief functions on the fiber FB (d1 , ..., di−1 ) assigning all the
remaining basic probability 1 − k to subsets of size i or to Θ, respectively.
As the simplicial coordinates of a BF in B are given by its basic probability as-
(i−1) 1
signment (??), each belief function b ∈ FB (d , ..., di−1 ) on such a fiber can be
written as:
X X X
b= mb (A)bA = mA bA + mb (A)bA
A⊆Θ |A|<i |A|≥i
k X 1−k X
= mA bA + mb (A)bA
k 1−k
|A|<i |A|≥i
k X 1−k X
= P m A bA + P mb (A)bA .
|A|<i mA |A|<i |A|≥i mb (A) |A|≥i
We can therefore define two new belief functions b0 , b0 associated with any b ∈
(i−1) 1
FB (d , ..., di−1 ), with basic probability assignments
. mA
mb0 (A) = P |A| < i, mb0 (A) = 0 |A| ≥ i;
mB|B|<i
. mb (A)
mb0 (A) = P |A| ≥ i, mb0 (A) = 0 |A| < i
|B|≥i mb (B)
both b0 and b0 are indeed admissible BFs, b0 assigning non-zero mass to subsets of
size smaller than i only, b0 assigning mass to subsets of size i or higher.
(i−1) 1
However, b0 is the same for all the BFs on the fiber FB (d , ..., di−1 ), as
it is determined by the mass assignment (8.5). The other component b0 is instead
free to vary in Cl(bA : |A| ≥ i). Hence, we get the following convex expres-
(i−1)
sions for FB , P (i) and O(i) (neglecting for sake of simplicity the dependence on
1 i−1
d , ..., d or, equivalently, on b0 ):
n o
(i−1)
FB = b = kb0 + (1 − k)b0 , b0 ∈ Cl(bA , |A| ≥ i) = kb0 + (1 − k)Cl(bA , |A| ≥ i),
P (i) = kb0 + (1 − k)Cl(bA : |A| = i),
O(i) = kb0 + (1 − k)bΘ .
(6.24)
(i)
By definition the i-th base DB is the collection of BFs such that
Appendix: proofs
Proof of Theorem 12
Proof. the bottom line of the proof is that the mass associated with a sum function
can be recursively assigned to subsets of increasing size. We prove Theorem 12 by
induction.
First level of the bundle structure. As we mentioned above, each normalized
sum function ς ∈ RN −2 is uniquely associated with a mass function mς through the
inversion lemma. To define a base space of the first level, we set to zero the mass of
all events of size 1 < |A| < n. This determines a linear space DS ⊂ S = RN −2
defined by the system of linear equations
.
n X o
DS = ς ∈ RN −2 : mς (A) = (−1)|A−B| ς(B) = 0, 1 < |A| < n
B⊂A
The second step is to precise a projection map between the total space S = RN −2
and the base DS . The Moebius inversion lemma (2.3) induces indeed a projection
map from S to DS
π : S = RN −2 → DS ⊂ RN −2
ς 7→ π[ς]
mapping each NSF ς ∈ RN −2 to a point p[ς] of the base space D:
Finally, to define a bundle structure we need to describe the fibers of the total space
S = RN −2 , i.e., the vector subspaces of RN −2 which project onto a given point
d ∈ D of the base.
Each point d ∈ D is of course associated with the linear space of all the NSFs
ς ∈ RN −2 whose projection p[ς] on D is d:
.
n o
FS (d) = ς ∈ S : π[ς] = d ∈ D .
It is easy to see that as d varies on the base space D, the linear spaces we obtain are
.
all diffeomorphic to F = RN −2−n .
According to Definition 74 this defines a bundle structure, since:
6.5 Open questions 223
.
– E = S = RN −2 is a smooth manifold, in particular a linear space;
.
– B = DS , the base space, is a smooth (linear) manifold;
– F = FS , the fiber, is a smooth manifold, again a linear space.
Finally, the projection π : S = RN −2 → DS is differentiable (as it is a linear
function of the coordinates ς(A) of ς) and has full rank n in every point ς ∈ RN −2 .
This is easy to see when representing π as a matrix (for as ς is a vector, a linear
function of ς can always be thought of as a matrix)
π[ς] = Πς,
where
1 0 ··· 0 0 ··· 0
0 1 0 0 0 ··· 0 = [In |0n×(N −2−n) ]
Π=
··· ···
0 ··· 0 1 0 ··· 0
according to Equation (6.25), and the rows of Π are obviously linearly independent.
As mentioned above the bundle structure (Definition 74, (6)) is trivial, since DS
is linear and can be covered by a single coordinate system (6.16). The direct product
coordinates are
φ : S = RN −2 → DS × FS
ς 7→ (π[ς], f [ς])
where the coordinates of ς on the fiber FS are the mass values it assigns to higher
size events:
f [ς] = [mς (A), 1 < |A| < n]0 .
Bundle structure of level i.
By induction, let us suppose that S admits a recursive bundle structure for all
sizes from 1 to i − 1 characterized according to the hypotheses, and prove that
(i−1)
FS can in turn be decomposed in the same way into a linear base space and a
(i−1)
collection of diffeomorphic fibers. By inductive hypothesis FS has dimension
Pi−1 n i−1 (i−1) 3
N − 2 − k=1 k and each point ς ∈ FS has coordinates
We can then apply the constraint ς i−1 (A) = 0, i < |A| < n which identifies the
linear variety
(i) .
n o
(i−1)
DS = ς i−1 ∈ FS : ς i−1 (A) = 0, i < |A| < n (6.26)
(i−1)
, of dimension ni (the number of size-i subsets of Θ).
embedded in FS
(i−1)
The projection map (6.19) induces in FS fibers of the form
3
The quantity ς i−1 (A) is in fact the mass mς (A) the original NSF ς attaches to A, but this
is irrelevant for the purpose of the decomposition.
224 6 The geometry of belief functions
(i) . (i−1)
FS = {ς i−1 ∈ FS : πi [ς i−1 ] = const}
which are also linear manifolds, and induce in turn a trivial bundle structure in
(i−1)
FS
(i−1) (i) (i)
φ : FS → DS × F S
i−1 0 i−1 00 i−1
ς 7→ (φ (ς ), φ (ς ))
with φ0 (ς i−1 ) = πi [ς i−1 ] = [ς i−1 (A), |A| = i]0 .
Again, the map (6.19) is differentiable and has full rank, for its ni rows are inde-
pendent.
(n)
The decomposition ends when dim FS = 0, and all fibers reduce to points of
S.
Proof of Lemma 4
X X X Xi X n − m
b(A) = mb (B) = mb (B)
m=1 |B|=m
i−m
|A|=i |A|=i B⊆A
i−1 X
X X n−m
= mb (B) + mb (B) (6.27)
m=1
i−m
|B|=i |B|=m
i−1 X
X X n−m
≤ 1− mb (B) + mb (B),
m=1
i−m
|B|<i |B|=m
X X
as mb (B) = 1 − mb (B) by normalization. By Möbius inversion (2.3):
|B|=i |B|<i
X X X
mb (A) = (−1)|A−B| b(B)
|A|<i |A|<i B⊆A
i−1 m X (6.28)
X X
m−l n−l
= (−1) b(B)
m−l
|A|=m=1 |B|=l=1 |B|=l
n−l
for, again, m−l is the number of subsets of size m containing a fixed set B, |B| =
l in a frame with n elements. The role of the indexes m and l can be exchanged,
obtaining:
i−1 i−1 X i−1
X X X n−l
mb (B) = b(B) · (−1)m−l . (6.29)
m−l
|B|=l=1 |B|=l=1 |B|=l m=l
Now, a well known combinatorial identity ([?], volume 3, Equation (1.9)) states that,
for i − (l + 1) ≥ 1:
6.5 Open questions 225
i−1
X
m−l n−l i−(l+1) n − (l + 1)
(−1) = (−1) . (6.30)
m−l i − (l + 1)
m=l
n−l n−m i−l n−l
as it is easy to verify that = .
m−l i−m m−l i−l
By applying (6.30) again to the last equality we get:
i−1 X i−1 X
X n−m X
i−(l+1) n − l
mb (B) = (−1) . (6.32)
m=1
i−m i−l
|B|=m l=1 |B|=l
Proof of Theorem 13
(1)
acts only on the base DS , yielding a new set
n o
(1)
X
DB = b ∈ B : mb (A) = 0 1 < |A| < n, mb (A) ≥ 0 |A| = 1, mb (A) ≤ 1
|A|=1
229
230 7 Geometry of Dempster’s rule
The second part of the Chapter, instead, is dedicated to the analysis of the “point-
wise” behavior of Dempster’s rule. We first discuss (Section 7.6) a toy problem, the
geometry of ⊕ in the binary belief space B2 , to gather useful intuition about the
general case. We observe that Dempster’s rule exhibits a rather elegant behavior
when applied to collections of belief functions assigning the same mass k to a fixed
subset A (constant mass loci), which turn out to be affine subspaces of normalized
sum functions. As a consequence, their images under the mapping b ⊕ (.) can be
derivedby applying the commutativity results of Section 7.4.
Perhaps the most striking result of our geometric analysis of Dempster’s rule states
that for each subset A the resulting mapped affine spaces have a common inter-
section for all k ∈ [0, 1], a geometric entity which is therefore characteristic of the
belief function b being combined. We call the latter the A-th focus of the conditional
subspace hbi. In Section 7.7 we formally prove the existence and study the geometry
of such foci. This eventually leads us to an interesting algorithm for the geometric
construction of the orthogonal sum of two belief functions.
The material presented in this Chapter is a realaboration of results first published
in [265]. All proofs have been collected in an Appendix at the end of the Chapter.
where mς1 and mς1 denote the Moebius transforms of the two n.s.f.s ς1 , ς2 , respec-
tively.
Note that in the case of normalised sum functions the normalization factor
∆(ς1 , ς2 ) can be zero even in the presence of non-empty intersections between focal
elements of ς1 , ς2 . This becomes clear as soon as we rewrite it in the form:
X X
∆(ς1 , ς2 ) = mς1 (A)mς2 (B),
C6=∅ A,B⊆Θ:A∩B=C
since there can exist non-zero products mς1 (A) · mς2 (B) whose overall sum is zero
(being mς1 (A), mς2 (B) arbitrary real numbers).
7.2 Dempster’s sum of affine combinations 231
Example. A simple example can be useful to grasp this point more easily. Con-
sider a sum function ς1 with focal elements A1 , A2 , A3 and masses m1 (A1 ) =
1, m1 (A2 ) = −1, m1 (A3 ) = 1 such that A2 ⊆ A1 , as in Figure 7.1. If we com-
bine ς1 with a new n.s.f. ς2 with a single focal element B: m2 (B) = 1 (which,
.
incidentally, is a belief function), we can see that even if A1 ∩ B = D 6= ∅ and
A2 ∩ B = D 6= ∅ the denominator of Equation (2.6) becomes 1 · (−1) + 1 · 1 = 0
and the two functions turn out to be not combinable.
Fig. 7.1. Example of a pair of non combinable normalised sum functions whose focal ele-
ments have nevertheless non-empty intersections.
We can then proceed to show how Dempster’s rule applies to affine combinations
of pseudo belief functions, and to convex closures of (proper) belief functions in
particular. We first consider the issue of combinability.
232 7 Geometry of Dempster’s rule
Proof. By definition (Equation (7.1)) two n.s.f.s ς and τ are combinable iff
X
mς (A)mτ (B) 6= 0.
A∩B6=∅
P P
If τ = i αi ςi is an affine combination, its Moebius transform is mτ = i αi mςi
and the combinability condition becomes, as desired:
X X X X X
mς (A) αi mςi (B) = αi mς (A)mςi (B) = αi ∆i 6= 0.
A∩B6=∅ i i A∩B6=∅ i
P
A couple of remarks. If ∆i = 0 for all i then P i αi ∆i = 0, so that if ς is not
combinable with any ςi then the combination ς ⊕ i αi ςi does not exists, in accor-
dance with our intuition. On the other hand, even if all the n.s.f.s ςi are combinable
withPς there is always a choice of the coefficients αi of the affine combination such
that i αi ∆i = 0, so that ς is still not combinable with the affine combination.
This remains true when considering affine combinations of belief functions (for
which ∆i > 0 ∀i).
where
. X
Ni (C) = mςi (B)mς (A)
B∩A=C
Theorem
P 15. Consider
P a collection {ς, ς1 , ..., ςn } of normalized sum functions such
that P i αi = 1, i α i ∆ i 6= 0, i.e., the n.s.f. ς is combinable with the affine combi-
nation i αi ςi .
If ςi is combinable with ς for each i = 1, ..., n (and in particular, when all the
normalised sum P functions involved {ς, ς1 , ..., ςn } = {b, b1 , · · · , bn } are belief func-
tions), then ς ⊕ i αi ςi is still an affine combination of the partial sums ς ⊕ ςi :
X X
ς⊕ αi ςi = βi (ς ⊕ ςi ), (7.2)
i i
i.e., b is combinable with the affine combination if and only if it is combinable with
at least one of the belief functions bi ,
P
This is due to the fact that if αi ∆i > 0 then i αi ∆i > 0.
Theorem 15 then specializes in the following way.
Theorem 16. The orthogonal sum b ⊕ b0 of two belief functions can be expressed
as a convex combination of the results b ⊕ bA of Bayes’ conditioning b with respect
to all the focal elements of b0 , namely:
X mb0 (A)plb (A)
b ⊕ b0 = P b ⊕ bA , (7.5)
B∈Eb0 mb (B)plb (B)
0
A∈Eb0
Proof. We know from Chapter 6 that any belief function b0 ∈ B can be written as
a convex sum of the categorical b.f.s bA (Equation (6.8)). We can therefore apply
Corollary 5 to Equation (6.8), obtaining:
X X mb0 (A)∆A
b ⊕ b0 = b ⊕ mb0 (A)bA = µ(A)b ⊕ bA , µ(A) = P .
B∈Eb0 mb (B)∆B
0
A∈Eb0 A∈Eb0
We can simplify Equation (7.5) after realizing that some of the partial combina-
tions b ⊕ bA may in fact coincide. Since b ⊕ bA = b ⊕ bB iff A ∩ Cb = B ∩ Cb we
can write:
X
mb0 (B)plb (B)
X B∩C =A, B∈E 0
b ⊕ b0 =
b b
b ⊕ bA X . (7.6)
0 0
A=A ∩Cb , A ∈Eb0 m b 0 (B)plb (B)
B∈Eb0
7.4 Commutativity
In Chapter 6 we have seen that the basic probability assignment mechanism is rep-
resented in the belief space framework by the convex closure operator. Theorem 15
in fact treats in full generality affine combinations of points, for they prove to be
more significant in the perspective of a geometric description of the rule of combi-
nation. The next natural step, therefore, is to analyse Dempster’s combinations of
affine closures, i.e., sets of affine combinations of points.
Let us denote by v(ς1 , ..., ςn ) the affine subspace generated by a collection of
normalized sum functions {ς1 , ..., ςn }:
( n
)
. X X
v(ς1 , ..., ςn ) = ς : ς = αi ςi , αi = 1 .
i=1 i
Theorem 17. Consider a collection of pseudo belief functions {ς, ς1 , ..., ςn } defined
on the same frame of discernment. If ςi is combinable with ς (∆i 6= 0) for all i then:
More precisely,
where {bi1 , · · · , bim }, m ≤ n, are all the belief functions combinable with b in the
collection {b1 , ..., bn }.
Theorem 17 states that Dempster’s rule maps affine spaces to affine spaces, but for
a lower dimensional subspace. From its proof (see Chapter Appendix), the affine
coordinates {αi } of a point τ ∈ v(ς1 , ..., ςn ) correspond to the affine coordinates
{βi } of the sum ς ⊕ τ ∈ v(ς ⊕ ς1 , ..., ς ⊕ ςn ) through the following equation:
βi 1
αi = P . (7.8)
∆i j βj /∆j
Hence, the values of the affine coordinates βi of v(ς ⊕ ς1 , ..., ς ⊕ ςn ) which are not
associated with affine coordinates of v(ς1 , ..., ςn ) are given by Equation (7.24):
236 7 Geometry of Dempster’s rule
X βi
= 0. (7.9)
i
∆i
If the map ς ⊕ (.) is injective then the points of the subspace M(ς, ς1 , ..., ςn ) asso-
ciated with the affine coordinates βi meeting (7.9) are not images through ς ⊕ (.) of
any points of v(ς1 , ..., ςn ): we call them missing points.
However, if the map in not injective, points in the original affine space with ad-
missible coordinates can be mapped onto M(ς, ς1 , ..., ςn ). In other words, missing
coordinates do not necessarily determine missing points. THIS PART TO CLAR-
IFY, EXAMPLE?
If we restrict our attention to convex combinations only (αi ≥ 0 ∀i) of belief
functions (∆i ≥ 0), Theorem 17 implies that
Corollary 6. Cl and ⊕ commute, i.e. if b is combinable with bi ∀i = 1, ..., n, then
Even when all the pseudo belief functions ςi of Theorem 17 are combinable with
ς, the affine space v(ς1 , ..., ςn ) generated by them includes an affine subspace of
non-combinable functions, namely those meeting the following constraint:
X
αi ∆i = 0, (7.10)
i
The results of this Section greatly simplify when we consider unnormalized belief
functions (u.b.f.s), i.e., belief functions assigning non-zero mass to the empty set
too. Unnormalized belief functions are obtained by relaxing the constraint m(∅) = 0
in Definition 2. The meaning of the basic probability value of ∅ has been studied by
Smets [1239], as a measure of the internal conflict present in a b.p.a. m. It is easy to
see that, for Dempster’s sum of two belief functions b1 and b2 , we get mb1 ⊕b2 (∅) =
1 − ∆(b1 , b2 ) with ∆(b1 , b2 ) as above.
7.4 Commutativity 237
VÅ
åa D
i
i i =0 V Å V1
V1
V Å V2
b
V2 åi D
i
i
= 0
Fig. 7.2. The dual role of non-combinable and missing points in Theorem 17, and their
relation with the infinite points of the associated affine spaces.
Clearly, Dempster’s rule can be naturally modified to cope with such functions.
Equation (2.6) simplifies in the following way: if mb1 , mb2 are the b.p.a.s of two
unnormalized b.f.s, their Demspter’s combination becomes:
X
mb1 ⊕b2 (C) = mb1 (Ai )mb2 (Bj ). (7.11)
i,j:Ai ∩Bj =C
This new operator gets the name of unnormalized rule of conditioning, and has been
introduced by Smets in its Transferable Belief Model [1218] (cf. Chapter ??, Section
3.3.1).
Obviously enough, unnormalized belief functions are always combinable through
(7.11). If we still denote by ⊕ the unnormalized conditioning operator, given a col-
lection of u.b.f.s b̃, b̃1 , ..., b̃n , we get that
X X hX i
mb⊕P αi b̃i (C) = mP αi b̃i (B)mb̃ (A) = αi mb̃i (B) mb̃ (A)
i i
B∩A=C
X X B∩A=C
X i
= αi mb̃i (B)mb̃ (A) = αi mb̃⊕b̃i (C)
i B∩A=C i
Proposition 38. If b̃, b̃1 , ..., b̃n are unnormalized belief functions defined on the
same frame of discernment, then:
X X
b̃ ⊕ αi b̃i = αi b̃ ⊕ b̃i ,
i i
P
whenever i αi = 1, αi ≥ 0 ∀i.
Clearly Theorem 16 also simplifies, for the coefficients of a convex combination are
preserved under (7.11). Namely
X X X
b̃0 = mb̃0 (A)bA ⇒ b̃ ⊕ b̃0 = mb̃0 (A)b̃ ⊕ bA = mb̃0 (A)b̃|A.
A∈Eb̃0 A∈Eb̃0 A∈Eb̃0
The commutativity results of Section 7.4 remain valid too. Indeed, Proposition 38
implies that:
nX X o
b̃ ⊕ Cl(b̃1 , ..., b̃n ) = b̃ ⊕ αi b̃i : αi = 1, αi ≥ 0
n X X i oi
= b̃ ⊕ αi b̃i : αi = 1, αi ≥ 0
nX i Xi o
= αi b̃ ⊕ b̃i : αi = 1, αi ≥ 0 = Cl(b̃ ⊕ b̃1 , ..., b̃ ⊕ b̃n )
i i
for any collection of u.b.f.s {b̃, b̃1 , · · · , b̃n }, since their combinability is always guar-
anteed.
The commutativity results we proved in Section 7.2 are rather powerful, as they
specify how the rule of combination works when applied to entire regions of the
Cartesian space, and in particular to affine closures of normalized sum functions.
Since the belief space itself is a convex region of RN , it is easy to realize that these
results can help us draw a picture of the “global” behavior of ⊕ within our geometric
approach to the theory of evidence.
Definition 75. Given a belief function b ∈ B we call conditional subspace hbi the
set of all Dempster’s combinations of b with any other combinable belief function
on the same frame, namely:
.
n
hbi = b ⊕ b0 , b0 ∈ B s.t. ∃ b ⊕ b0 }. (7.12)
7.5 Conditional subspaces 239
Roughly speaking, hbi is the set of possible “futures” of b under the assumption that
new evidence is combined with b via Dempster’s rule.
Since not all belief functions are combinable with a given b, we need to un-
derstand the geometric structure of such combinable b.f.s. Let us call compatible
subspace C(b) associated with a belief function b the collection of all the b.f.s with
focal elements included in the core of b:
.
n o
C(b) = b0 : Cb0 ⊂ Cb .
The conditional subspace associated with b is nothing but the result of combining b
with its compatible subspace.
Py
<s>
C(s)=S2
PQ
Px
Fig. 7.3. REVISE Conditional and compatible subspaces for a belief function b in the binary
belief space B2 . The coordinate axes measure the belief values of {x} and {y}, respectively.
The vertices of hbi are b, bx and by since b ⊕ bx = bx ∀b 6= by , and b ⊕ by = by ∀b 6= bx .
Proof. Let us denote by Et = {Ai } and Es = {Bj } the focal elements of two belief
functions defined on the same frame b0 and b, respectively, where b0 is combinable
with b. Obviously Bj ∩Ai = (Bj ∩Cb )∩Ai = Bj ∩(Ai ∩Cb ). Therefore, once defined
.
a new b.f. b00 with focal elements {Aj , j = 1, ..., m} = {Ai ∩ Cb , i = 1, ..., n} and
basic probability assignment
X
mb00 (Aj ) = mb0 (Ai ),
i:Ai ∩Cb =Aj
240 7 Geometry of Dempster’s rule
we have that b ⊕ b0 = b ⊕ b00 . In other words, any point of hbi is a point of b ⊕ C(b).
The reverse implication is trivial.
Finally, Theorem 11 ensures that C(b) = Cl(bA , A ⊆ Cb ), so that Corollary 6
eventually yields the desired expression for hbi (being bA combinable with b for all
A ⊂ Cb ).
Figure 7.3 illustrates the form of the conditional subspaces in the belief space
related to the simplest, binary frame.
The original belief function b is always a vertex of its own conditional subspace hbi,
as the result of the combination of itself with the b.f. focussed on its core: b⊕bCb = b.
In addition the conditional subspace is a subset of the compatible one, hbi ⊆ C(b),
since if b00 = b ⊕ b0 for some b0 ∈ C(b) then Cb00 ⊆ Cb , i.e., b00 is combinable with b
as well.
since all u.b.f.s are combinable with any arbitrary u.b.f. b̃. The idea of compatible
subspace retains its validity, though, as the empty set is a subset of the core of any
u.b.f. Note that in this case, however, if Cb̃ ∩ Cb̃0 = ∅ then the combination b̃ ⊕ b˜0
reduces to the single point b∅ .
The proof of Theorem 18 still works for u.b.f.s too, so that we can write
hb̃i = b̃ ⊕ C(b̃)
Therefore:
1 X
b ⊕ bA = vB (plb (A) − plb (A \ B)), (7.13)
plb (A)
B⊂Θ
having denoted as usual by vB the B-th axis of the orthonormal reference frame in
|Θ|
R2 −2 with respect to which we measure belief coordinates. Notice that plb (A) 6=
0 for every A ⊆ Cb .
B
E
EÇB
A ÇB
A
Fig. 7.4. One of the sets E involved in the computation of the vertices of hsi.
|Θ|
−2
where 1 denotes the vector of R2 whose components are all equal to 1.
Any belief function b can be decomposed as an affine combination of its own Demp-
ster’s combinations with all the categorical belief functions that agree with it (up to
a constant mb (Cb ) that measures the uncertainty of the model). The coefficients of
this decomposition are nothing but the plausibilities of the events A ⊆ Cb given the
evidence represented by b.
which incidentally is the limit of b ⊕ b0 for l → ±∞ (we omit the details). This is
true for every k ∈ [0, 1], as shown in Figure 7.5.
7.6 Constant mass loci 243
Fig. 7.5. The x-focus Fx of a conditional subspace hb1 i in the binary belief space for
m1 (Θ) 6= 0. The white circle place in Fx indicates that the latter is a missing point for
each of the lines representing images of constant mass loci.
Simple manipulations of Equation (7.15) can help us to realize that all the col-
lections of Dempster’s sums b ⊕ b0 (where b0 is a n.s.f.) with k = const have a
common intersection at the point (7.16) located outside the belief space. This is true
in the same way for the sets {b ⊕ b0 : l = const}, which lie each on a distinct line
passing through a twin point:
mb (Θ)
Fy (b) = − ,1 .
mb (y)
We call Fx (b), Fy (b) the foci of the conditional subspace hbi.
Note that Fx (b) can be located by intersecting the two lines for k = 0 and
k = 1. It is also worth to notice that the shape of these loci is in agreement with the
prediction of Theorem 17 for b1 = kbx , b2 = kbx + (1 − k)by . Indeed, according
to Equation (7.7) the missing points are supposed to have coordinates:
244 7 Geometry of Dempster’s rule
∆1 ∆2 1 − k + k(1 − m (y))
b
v b ⊕ b1 − b ⊕ b2 = v · b ⊕ kbx +
∆1 − ∆2 ∆1 − ∆2 mb (x)(1 − k)
(mb (x) − 1)(1 − k) − k(1 − mb (y))
+ b ⊕ [kbx + (1 − k)by ] ,
mb (x)(1 − k)
which coincide with those of Fx (b) (as it is easy to check). It is quite interesting
to note that the intersection takes place exactly where the images of the lines {k =
const} do not exist.
If ms (Θ2 ) = 0 the situation is slightly different. The combination locus turns
out to be v(Px , Py ) \ {Px } for every k ∈ [0, 1) (note that in this case Fx (s) =
(1, −ms (Θ)/mx ) = (1, 0) = Px ). If k = 1, instead, Equations (7.15) yield
Px l 6= 1
s ⊕ [1 · Px + l · Py ] =
∅ l = 1.
Incidentally, in this case the missing coordinate l = 1 (see Section 7.4) does not
correspond to an actual missing point. The situation is represented in Figure 7.6.
Py
Fx(s)=Px
®¥
PQ
®¥
Î
Fig. 7.6. The x-focus of a conditional subspace in the binary belief space for mb (Θ2 ) = 0
(b ∈ P). For each value of k in [0, 1) the image of the locus k = const through the map
b⊕(.) coincides with the line spanned by P, with missing point bx . The value of the parameter
l of this line is shown for some relevant points. For k = 1 the locus reduces to the point bx
for all values of l.
7.6 Constant mass loci 245
It is interesting to note that, when mb (Θ2 ) 6= 0, for all b0 = [k, l]0 ∈ B2 the sum
b ⊕ b0 is uniquely determined by the intersection of the following lines:
. .
lx = b ⊕ {b00 : mb00 (x) = k} ly = b ⊕ {b00 : mb00 (y) = l}.
These lines are in turn determined by the related focus plus an additional point. We
can for instance choose their intersections with the probabilistic subspace P:
. .
px = lx ∩ P py = ly ∩ P,
for px = b ⊕ p0x and py = b ⊕ p0y where p0x , p0y are the unique probabilities with
m(x) = k, m(y) = l respectively.
This suggests a geometrical construction for the orthogonal sum of a pair of be-
lief functions b, b0 in B2 :
Algorithm.
1. compute the foci Fx (b), Fy (b) of the conditional subspace hbi;
2. project b0 onto P along the orthogonal directions, obtaining p0x and p0y ;
3. combine b with p0x and p0y to get px and py ;
4. draw the lines px Fx (b) and py Fy (b): their intersection is the desired orthogonal
sum b ⊕ b0 .
illustrated in Figure 7.7.
246 7 Geometry of Dempster’s rule
By the way, the notion of focus makes sense even for A = Θ2 . For instance, in
the case mb (Θ2 ) 6= 0 a few passages yield the following coordinates for the FΘ2 (b):
1 − mb (y) 1 − mb (x)
FΘ2 (b) = , , (7.17)
mb (x) − mb (y) mb (y) − mb (x)
which turns out to belong to v(P). Anyway, the point is FΘ (b) plays no role in
the geometric construction of Dempster’s rule, and we will not mention it in the
following.
It is instead important to realize that the algorithm does not work when mb (Θ2 ) =
0, since in this case the intersection of the lines
Our study of the binary belief space suggests how to proceed in the general case.
We first need to precise the shape of constant mass loci, and the way Dempster’s
rule acts on them. After proving the existence of the intersections of their images
and understanding their geometry we will finally be able to formulate a geometric
construction for the orthogonal sum in a generic belief space. Let us introduce the
following notations
k . k .
HA = {b : mb (A) = k}, k ∈ [0, 1]; HA = {ς : mς (A) = k}, k ∈ R
for constant mass loci related to belief functions and normalized sum functions, re-
k
spectively. Their dimension is of course dim(B)−1, that for B2 becomes dim(HA )=
4 − 2 − 1 = 1 and any constant mass locus is a line.
Theorem 20.
k
HA = v(kbA + γB bB : B ⊆ Θ, B 6= A), k∈R
with αB ≥ 0 ∀B 6= A. Trivially,
X X
0 0 0
b = kbA + (1 − k) αB bB , αB = 1, αB ≥ 0 ∀B 6= A
B6=A B6=A
7.6 Constant mass loci 247
so that
. . P
After denoting αB = βB γB for B 6= A, Θ, αΘ = 1 − k − B6=A,Θ βB γB we find
Equation (7.18) again.
Once expressed constant mass loci as affine closures, we can exploit the commu-
tativity property to compute their images through Dempster’s rule. Since hbi =
b ⊕ C(b), our intuition suggests that we should only consider constant mass loci
related to subsets of Cb . In fact, given a belief function b0 with basic probability
assignment mb0 , it is clear that, by definition:
\ m 0 (A)
b0 = HA b ,
A⊆Θ
for there cannot be several distinct normalized sum functions with the same mass
assignment. From Theorem 18 we know that b ⊕ b0 = b ⊕ b00 , where 00 0
P b = b ⊕ bCb is
a new belief function with probability assignment mb00 (A) = B:B∩Cb =A mb0 (B).
After introducing the notations
.
n o
k k
HA (b) = b ∈ B : b ∈ C(b), mb (A) = k = HA ∩ C(b)
.
n o
k N k
HA (b) = ς ∈ R : ς ∈ v(C(b)), mς (A) = k = HA ∩ v(C(b))
m (A)
since b00 ∈ HA b00 for all A ⊆ Cb by definition, and the intersection is unique in
v(C(b)). We are then only interested in computing loci of the type
248 7 Geometry of Dempster’s rule
k
b ⊕ HA (b) = b ⊕ Cl(kbA + (1 − k)bB : B ⊆ Cb , B 6= A).
for Equation (7.19), since B ⊆ Cb implies B 6= A. The set trivially coincides with
the whole conditional subspace.
k
If, on the other hand, A ∩ Cb 6= ∅ the image of HA becomes
k
b ⊕ HA = Cl b ⊕ bA , b ⊕ [kbA + (1 − k)bB ] : B ⊆ Cb , B 6= A
As we mentioned before we have no interest in the focus FCb (b). Hence, in the
following discussion we will assume A 6= Cb .
The toy problem of Section 7.6.1 also suggests that an analysis exclusively based
on belief functions could lead to wrong conclusions. In such case, in fact, since
1
v(b ⊕ HA (b)) reduces to the single point b ⊕ bA , we would only able to compute the
intersection: \
k
v(b ⊕ HA (b)),
k∈[0,1)
which is in general different from the actual focus FA (b). Consider for instance the
case mb (Θ2 ) = 0 in the binary belief space.
Our conjecture about the existence of the foci is indeed supported by a rigorous
analysis based on the affine methods we introduced in Section 7.2 and the results of
Section 7.6.3. We first note that when k ∈ [0, 1) Theorem 20 yields (since we can
choose γB = 1 − k for all B 6= A):
k
b ⊕ HA (b) = b ⊕ v(kbA + (1 − k)bB , B ⊂ Cs B 6= A).
k
As the generators of HA (b) are all combinable with b, we can apply Theorem 17
and get:
k k
v(b ⊕ HA (b)) = v(b ⊕ [kPA + (1 − k)PB ] : B ⊆ Cb , B 6= A) = v(b ⊕ HA (b)).
(7.20)
Let us then take a first step towards a proof of existence of FA (b).
250 7 Geometry of Dempster’s rule
k
Theorem 21. For all A ⊆ Cb the family of affine spaces {v(b ⊕ HA (b)) : 0 ≤ k <
0 . \ k
1} has a non-empty common intersection FA (b) = v(b ⊕ HA (b)), and
k∈[0,1)
0
FA (b) ⊃ v(ςB |B ⊆ Cb , B 6= A),
where
1 plb (B)
ςB = b+ b ⊕ bB . (7.21)
1 − plb (B) plb (B) − 1
The proof of Theorem 21 can be easily modified to cope with the case A = Cb .
System (7.29) is still valid, so we just need to modify the last part of the proof by
replacing A = Cb with another arbitrary subset C ( Cb . This yields a family of
generators for FC0 b (b), whose shape
so that in turns:
\ \
k 1 k
FA (b) = v(b ⊕ HA (b)) = v(b ⊕ HA (b)) ∩ v(b ⊕ HA (b))
k∈[0,1] k∈[0,1)
\
0 1 k
= v(b ⊕ HA (b)) ∩ v(b ⊕ HA (b)) ∩ v(b ⊕ HA (b))
k∈[0,1)
0 1
= v(b ⊕ HA (b)) ∩ v(b ⊕ HA (b)).
In other words,
Corollary 7. Given a belief function b, the A-th focus of its conditional subspace
hbi is the affine subspace
generated by the collection of points (7.21). It is natural to call them focal points of
the conditional subspace hbi.
7.7 Geometric orthogonal sum 251
Note that the coefficient of b in Equation (7.21) is non-negative, while the coefficient
of b ⊕ bB is non-positive. Hence, focal points cannot be internal points of the belief
space, i.e, they are not admissible belief functions. Nevertheless, they possess a very
intuitive meaning in terms of mass assignment, namely:
ςB = lim b ⊕ (1 − k)bB .
k→+∞
Indeed,
This again confirms what we have seen in the binary case, where the focus Fx (b)
turned out to be located in correspondence to the missing points of the images v(b ⊕
[kbx + by ], b ⊕ [kbx + bΘ ]) (see Figures 7.5, 7.6).
7.7.2 Algorithm
Now,
m (A)
\
b ⊕ b00 ∈ b ⊕ HA b00 (b)
A⊂Cb
n o
= b000 ∈ C(b)∀A ⊂ Cb ∃b0A : mb0A (A) = mb00 (A) : b000 = b ⊕ b0A .
If, in addition, the map b ⊕ (.) is injective (i.e., if dim(hbi) = dim(C(b))), such
intersection is unique as there can be only one such b00 = b0A for all A. In other
words, ⊕ and ∩ commute and we can write:
m (A) m (A)
\ \
b ⊕ b00 = b ⊕ HA b00 (b) = v(b ⊕ HA b00 (b))
A⊂Cb A⊂Cb
\
= v b ⊕ [mb00 (A)bA + (1 − mb00 (A))bB ] : B ⊂ Cb , B 6= A
A⊂Cb
by Equation (7.20).
At this point the geometric algorithm for the orthogonal sum b ⊕ b0 is easily
outlined. We just need one last result.
Theorem 24.
k
v(b ⊕ HA (b)) = v(FA (b), b ⊕ kbA ).
The proof is valid for k < 1 since when k = 1 the combination is trivial, but can be
easily modified to cope with unitary masses.
Algorithm.
1. First, all the foci {FA (b), A ⊆ Cb } of the subspace hbi conditioned by the first
belief function b are computed by calculating the corresponding focal points
(7.21);
2. then, an additional point b ⊕ mb00 (A)bA for each A ⊆ Cb is detected, selecting
the subspace
m (A)
v(b⊕HA b00 (b)) = v b⊕[mb00 (A)bA +(1−mb00 (A))bB ] : B ⊂ Cb , B 6= A ;
3. all these subspaces are intersected, eventually yielding the desired combination
b ⊕ b0 = b ⊕ b00 .
It is interesting to note that the focal points ςB have to be computed just once as
trivial functions of the upper probabilities plb (B) ∀B ⊆ Cb . In fact, each focus is
nothing more than a particular selection of 2|Cb | − 3 focal points out of a collection
of 2|Cb | − 2. Different groups of points are selected for each foci with no need for
further calculations.
Without discussing the computational complexity of the algorithm, we just point
out that the computation of ςB involves just Bayes’ conditioning (as in b|B = b ⊕
bB ) rather then the more general Dempster’s sum. There is hence no need for any
multiplication of probability assignments.
7.9 Open questions 253
x = b|ς, ∀ ς : x ∈ hςi.
Along this line we can generalize the notion of consistent probabilities, defining a
probability measure p to be consistent with a conditional b.f. x = b|ς when it domi-
nates x according to the oblique coordinates associated to the conditional subspace
hςi.
The set of probabilities consistent with a conditional b.f. b|ς then becomes:
.
n o
P̃(b) = p : p̃(A) ≥ b̃(A) ∀ A .
P̃(b ⊕ b0 ) = b ⊕ P̃(b0 ).
Appendix: proofs
Proof of Theorem 14
Proof of Lemma 6
P
By Lemma 5 i αi ςi is combinable with ς. Hence, remembering that
X X
mPi αi ςi (B) = (−1)|B\X| αi ςi (X) =
X⊂B
X X i X
= αi (−1)|B\X| ςi (X) = αi mςi (B)
i X⊂B i
Proof of Theorem 17
by Lemma 5. If {ς1 , ..., ςn } are all combinable n.s.f.s or belief functions, Theorem
15 applies and we can write
nX αi ∆i X X o
ς ⊕ v(ς1 , ..., ςn ) = βi · ς ⊕ ς i , β i = P : αi = 1, αi ∆i 6= 0 .
i j αj ∆j i i
αi ∆i
βj = P m j j ∀j = 1, ..., m. (7.23)
j=1 αij ∆ij
which is always possible when n > m since they do not play any role in the other
two constraints. If instead n = m (i.e. when considering only combinable func-
tions), we have no choice but to normalize the coefficients αij = αj , obtaining in
conclusion
αj βj
αj0 = Pn = Pn βi , j = 1, ..., n.
α
i=1 i ∆j i=1 ( ∆i )
n
X βi
However, this is clearly impossible iff = 0, which is equivalent to:
i=1
∆ i
256 7 Geometry of Dempster’s rule
n−1 n
X ∆i − ∆n X
βi = 1, βi = 1, (7.24)
i=1
∆i i=1
∆j ∆n
ς ⊕ ςj − v(ς ⊕ ςi |i : ∆i = ∆n )
∆j − ∆n ∆j − ∆n
∆ ∆n
j
=v ς ⊕ ςj − ς ⊕ ςi i : ∆i = ∆n .
∆j − ∆n ∆ j − ∆n
Being the general solution of system (7.24) an arbitrary affine combination of the
P βiin v(ς ⊕ ς1 , ..., ς ⊕ ςn ) that correspond to “for-
basis ones, the region of the points
bidden” coordinates {βi } s.t. i ∆ i
= 0 is finally:
∆j ∆n
v ς ⊕ ςj − ς ⊕ ςi ∀j : ∆j 6= ∆n , ∀i : ∆i = ∆n .
∆j − ∆n ∆j − ∆n
Proof of Theorem 19
where 1 is the 2n − 2-dimensional vector whose entries are all equal to 1. Also
X X X X
(−1)|A| plb (A) = (−1)|A| (1−b(Ac )) = (−1)|A| − (−1)|A| b(Ac ),
A⊂Cb A⊂Cb A⊂Cb A⊂Cb
(7.27)
where
|Cb |
X
|A|
X
k |Cs |−k Cb
(−1) = (−1) 1 =0
k
A⊂Cb |A|=k=0
for the Newton expression of the power (−1 + 1)|Cb | . Concerning the second adden-
dum of Equation (7.27), since
X X X
b(Ac ) = mb (B) = mb (B) = mb (B) = b(Cb \ A)
B⊂Ac ,B⊂Θ B⊂Ac ,B⊂Cb B⊂Cb \A
A \ B = C + X \ B = C.
.
Hence, if we fix C = A \ B and let X vary we get, for all B ⊂ Θ
X X X
(−1)|A| plb (A \ B) = plb (C) (−1)|C+X|
A⊂Cb C⊂Cb \B X⊂B∩Cb
|B∩Cb |
X X |B ∩ Cb |
= plb (C) (−1)|C|+|X|
|X|
C⊂Cb \B |X|=k=0
|B∩Cb |
X X |B ∩ Cb |
= (−1)|C| plb (C) (−1)k =0
k
C⊂Cb \B k=0
Proof of Theorem 21
By Equation (7.20) if k 6= 1
X X
k
v(b ⊕ HA (b)) = αB b ⊕ [kbA + (1 − k)bB ], αB = 1.
B⊂Cb ,B6=A B⊂Cb ,B6=A
(7.28)
.
Therefore Corollary 5 yields, after defining ∆kB = kplb (A) + (1 − k)plb (B),
X h kpl (A)
b (1 − k)plb (B) i
= αB b ⊕ b A + b ⊕ b B =
B⊂Cb ,B6=A
∆kB ∆kB
X αB kplb (A) X αB plb (B)(1 − k)
= b ⊕ bA k
+ b ⊕ bB
B⊂Cb ,B6=A
∆B B⊂Cb ,B6=A
∆kB
X αB kplb (A) X αB plb (B)(1 − k)
= b ⊕ bA k
+ b ⊕ bB .
B⊂C ,B6=A
∆B B⊂C ,B6=A
∆kB
b b
0 xB
From the second and third equations we have that αB = 1−k ∀B ⊂ Cb , B 6= A, for
some xB ∈ R. By replacing this expression in the normalization constraint we get:
X
xB [kplb (A) + (1 − k)plb (B)]
B⊂Cb ,B6=A X X
= kplb (A) xB + (1 − k) plb (B) = 1 − k,
B⊂Cb ,B6=A B⊂Cb ,B6=A
so that the real vector [xB , B ⊂ Cb , B 6= A]0 has to be a solution of the system
X
xB plb (B) = 1;
B⊂CX
b ,B6=A
xB = 0,
B⊂Cb ,B6=A
since plb (Cb ) − 1 = 0. When A 6= Cb then xCb appears in the second equation only.
The admissible solutions of the system are then all the affine combinations of the
following basis solutions
1 1
xB̄ = , xCb = −xB̄ = , xB = 0 ∀B ⊆ Cb , B 6= A, B̄.
plb (B̄) − 1 1 − plb (B̄)
Each basis solution generates the following values of the coefficients {αB }:
xB ∆k xCb ∆kCb
αB̄ = 1−k ,
B
αCb = , αB = 0 ∀B ⊆ Cb , B 6= A, B̄,
1−k
in turn associated via Equation (7.28) to the point:
α αCb α αC
b ⊕ bA kplb (A) B̄ + + b ⊕ bB̄ (1 − k) B̄ + b(1 − k) kb
∆kB̄ ∆kCb ∆kB̄ ∆Cb
kplb (A)
= b ⊕ bA (xB̄ + xCb ) + b ⊕ bB̄ xB̄ plb (B̄) + bxCb
1−k
plb (B̄)b ⊕ bB̄ b
= + .
plb (B̄) − 1 1 − plb (B̄)
T
The affine subspace generated by all these points then belongs to k∈[0,1) v(b ⊕
k 0
HA (b)), even if it is not guaranteed to exhaust the whole FA (b).
Proof of Theorem 22
1
We first need to compute the explicit form of v(b ⊕ HA (b)). After recalling that
1
HA (b) = v(bA + γB bB : B ⊂ Cb , B 6= A)
we can notice that for any B there exists a value of γB such that the point bA +γB bB
is combinable with b, i.e., ∆B = plb (A) + γB plb (B) − γB 6= 0. Since bA , bB and
bΘ are belief functions, Theorem 15 applies and we get:
A sum function ς belonging to both subspaces must then meet the following pair of
constraints: X X
ς= αB b ⊕ bB , αB = 1,
B⊂Cb ,B6=A B
0
(ς ∈ v(b ⊕ HA (b))); and
1 X h plb (B) b i
ς = βCb b ⊕ bA + βB b ⊕ bA + b ⊕ bB +
2 plb (B) − 1 1 − plb (B)
B⊆Cb ,B6=A
X βB
= b ⊕ bA βCb + +
2
B⊆Cb ,B6=A
b X βB X βB plb (B)
+ + b ⊕ bB
2 1 − plb (B) 2(plb (B) − 1)
B⊆Cb ,B6=A B⊆Cb ,B6=A
1
P
(ς ∈ b ⊕ HA (b)), where B⊂Cb ,B6=A βB = 1. By comparison we get:
1
X
βCb + βB = 0
2
B⊆C ,B6 =A
b
1 plb (B)
αB = βB , B ⊆ Cb , B 6= A (7.30)
2 pl b (B) − 1
1 βB
X
αCb = 2 .
1 − plb (B)
B⊆Cb ,B6=A
0 1
In conclusion, the points of the intersection v(b ⊕ HA (b)) ∩ v(b ⊕ HA (b)) are
0
associated with affine coordinates {αB } of v(b ⊕ HA (b)) satisfying the constraints:
X X 1
αB = 1, αB = 0.
plb (B)
B⊂Cb ,B6=A B⊂Cb ,B6=A
7.9 Open questions 261
To recover the actual shape of this subspace we just need to take their difference, to
obtain:
X plb (B) − 1 X
αB = 1, αCb = 1 − αB . (7.31)
plb (B)
B⊆Cb ,B6=A B⊆Cb ,B6=A
and is automatically satisfied by the last system’s solutions. If we choose for any
B̄ ⊆ Cb , B̄ 6= A the following basis solution of system (7.31):
plb (B̄)
αB̄ =
plb (B̄) − 1
αB = 0 B ⊆ Cb , B 6= A, B̄
1
αCb =
1 − plb (B̄)
0 1
we get a set of generators ςB for v(b ⊕ HA (b)) ∩ v(b ⊕ HA (b)), with ςB given by
Equation (7.21).
Proof of Theorem 24
so that, when
(β + βB̄ = 1), the coefficient of b vanishes and we get the point (7.32).
Three equivalent models
8
Plausibility
X
plb : 2Θ → [0, 1], plb (A) = 1 − b(Ac ) = mb (B)
B∩A6=∅
263
264 8 Three equivalent models
Chapter outline
First we introduce the notions of basic plausibility (Section 8.1) and commonal-
ity (8.2) assignments as the Moebius transforms of plausibility and commonality
functions, respectively.
Later we show that the geometric approach to uncertainty can be extended to
plausibility (Section 8.3) and commonality (Section 8.4) functions, in such a way
that the simplicial structure of the related spaces can be recovered as a function of
their Moebius transforms. We then show (Section 8.5) that the equivalence of the
proposed alternative formulations of the ToE is reflected by the congruence of the
corresponding simplices in the geometric framework. The point-wise geometry of
the triplet (b, plb , Qb ) in terms of the rigid transformation mapping them onto each
other, as a geometric nexus between the proposed models, is discussed in Section
8.6.
We summarise and comment these results in Section 8.7.
so that X
plb (A) = µb (B). (8.2)
B⊆A
Theorem 26. Given a belief function b with basic probability assignment mb , the
corresponding basic plausibility assignment can be expressed in terms of mb as
follows: X
(−1)|A|+1 mb (C) A 6= ∅
µb (A) = C⊇A (8.3)
0 A = ∅.
since X
− (−1)|A| = −(0 − (−1)0 ) = 1
∅(A⊆C
This confirms that b.pl.a.s meet the normalization constraint but not the non-
negativity one.
Basic probability and plausibility assignments are linked by a rather elegant relation.
266 8 Three equivalent models
Theorem 27. Given a belief function b : 2Θ → [0, 1], for each element x ∈ Θ of
the frame of discernment the sum of basic plausibility assignments of all the events
cointaining x equals its basic probability assignment:
X
µb (A) = mb (x). (8.5)
A⊇{x}
Proof.
X X X X X
µb (A) = (−1)|A|+1 mb (B) = mb (B) (−1)|A|
A⊇{x} A⊇{x} B⊇A B⊇{x} {x}⊆A⊆B
Pn
where, by Newton’s binomial 1n−k (−1)k = 0,
k=0
X 0 B 6= {x}
(−1)|A| =
−1 B = {x}.
{x}⊆A⊆B
It is natural to call the quantity (8.6) the basic commonality assignment (or b.comm.a.)
associated with a belief function b. To arrive at its explicit form we just need to re-
place the definition of Qb (A) into (8.6). We obtain:
X X
qb (B) = (−1)|B\A| mb (C)
∅⊆A⊆B
X C⊇A
X X
|B\A|
= (−1) mb (C) + (−1)|B|−|∅| mb (C)
∅(A⊆B C⊇A C⊇∅
X X
= mb (C) (−1)|B\A| + (−1)|B| .
B∩C6=∅ ∅(A⊆B∩C
In other words, whereas belief functions are normalized sum functions (n.s.f.) with
non-negative Moebius inverse, and plausibility functions are normalized sum func-
tions, commonality functions are combinatorially unnormalized sum functions. Go-
ing back to the example of Section 8.1.1, the b.comm.a. associated with mb (x) =
1/3, mb (Θ) = 2/3 is (by Equation (8.7))
so that X
qb (B) = 1 − 1/3 = 2/3 = mb (Θ) = Qb (Θ).
∅⊆B⊆Θ
{bA ∈ RN , ∅ ⊆ A ⊆ Θ},
268 8 Three equivalent models
.
this time including a new vector b∅ = [1 0 · · · 0]0 . Note also that in this case
0
bΘ = [0 · · · 0 1] . The space of unnormalized b.f.s is again a simplex in RN , namely
B U = Cl(bA , ∅ ⊆ A ⊆ Θ).
Indeed, as it is the case for belief functions, plausibility functions are completely
specified by their N − 2 plausibility values {plb (A), ∅ ( A ( Θ} and can also be
represented as vectors of RN −2 . We can therefore associate a pair of belief b and
plausibility plb functions with the following vectors, which we still denote by b and
plb : X X
b= b(A)xA , plb = plb (A)xA , (8.8)
∅(A⊆Θ ∅(A(Θ
where we used the definition (8.1) of basic plausibility assignment and we inverted
the role of A and B for sake of homogeneity of the notation.
Incidentally, as bΘ = [0, · · · , 0]0 = 0 is the origin of RN −2 , we can also write:
X
plb = µb (A)bA
∅(A⊆Θ
Note that
plx = −(−1)|x| bx = bx ∀x ∈ Θ,
so that: B ∩ PL ⊃ P.
The vertices of the plausibility space have a natural interpretation.
Theorem 29. The vertex plA of the plausibility space is the plausibility vector as-
sociated with the categorical belief function bA : plA = plbA .
When considering the case of unnormalized belief functions, whose role is so
important in the Transferable Belief Model, it is easy to see that Theorems 26 and
29 fully retain their validity. In the case of Theorem 28, however, as in general
mb (∅) 6= 0, we need to modify Equation (8.18) by adding a term related to the
empty set. This yields:
X
plb = mb (C)plC + mb (∅)pl∅
∅(C⊆Θ
where plC , C 6= ∅ is still given by Equation (8.11), and pl∅ = 0 is the origin of
RN . Note that even in the case of unnormalized belief functions (Equation (8.11))
the empty set is not considered, for µ(∅) = 0.
Figure 8.1 shows the geometry of belief and plausibility spaces in the familiar case
study of a binary frame Θ2 = {x, y}, where belief and plausibility vectors are points
of a plane R2 with coordinates
respectively. They form two simplices (in this special case, two triangles)
by =[0,1]'=ply plΘ=[1,1]'
PL
P
B pl b
1−m b(x)
P[b]
m b(y) b
bΘ=[0,0]' bx =[1,0]'=pl x
a(b,pl b) m b(x) 1−m b(y)
Fig. 8.1. Geometry of belief and plausibility spaces in the binary case. Belief B and plau-
sibility PL spaces are congruent and lie in symmetric locations with respect to the axis of
symmetry formed by the probability simplex P.
which are symmetric with respect to the probability simplex P (in this case a seg-
ment) and congruent, so that they can be moved onto each other by means of a rigid
transformation. In this simple case such transformation is just a reflection through
the Bayesian segment P.
From Figure 8.1 it is clear that each pair of belief/plausibility functions (b, plb )
determines a line a(b, plb ) which is orthogonal to P, on which they lay on symmetric
positions on the two sides of the Bayesian segment.
Just as before we can use Lemma 8 to change the reference frame and get the
coordinates of Qb with respect to the base {bA , ∅ ⊆ A ⊆ Θ} formed by all the
categorical u.b.f.s. We get:
X X
|B\A|
Qb = Qb (A) bB (−1)
∅⊆A⊆Θ
X X B⊇A X
= bB (−1)|B\A| Qb (A) = qb (B)bB
∅⊆B⊆Θ A⊆B ∅⊆B⊆Θ
where
. X
QA = (−1)|B| bB (8.12)
∅⊆B⊆Ac
is the A-th vertex of the commonality space. The latter is hence given by:
Q = Cl(QA , ∅ ⊆ A ⊆ Θ).
Again, QA is the commonality function associated with the categorical belief func-
tion bA , i.e.: X
QbA = qbA (B)bB .
∅⊆B⊆Θ
Qb(Θ) QΘ = [1 1 1]'
Qb(y)
Q
Qy = [0 1 0]'
Qx = [1 0 0 ]' Q b(x)
X
Qb (∅) = 1, Qb (x) = mb (A) = plb (x),
A⊇{x}
X
Qb (Θ) = mb (Θ), Qb (y) = mb (A) = plb (y).
A⊇{y}
The commonality space Q2 can then be drawn (if we neglect the coordinate Qb (∅)
which is constant ∀b) as in Figure 8.2.
The vertices of Q2 are, according to Equation (8.12):
X
Q∅ = (−1)|B| bB = b∅ + bΘ − bx − by
∅⊆B⊆Θ
1, 1, 1]0 + [0, 0, 0, 1]0 − [0, 1, 0, 1]0 − [0, 0, 1, 1]0 = [1, 0, 0, 0]0 = Qb∅ ,
= [1,X
Qx = (−1)|B| bB = b∅ − by = [1, 1, 1, 1]0 − [0, 0, 1, 1]0
∅⊆B⊆{y}
1, 0, 0]0 = Qbx ,
= [1,X
Qy = (−1)|B| bB = b∅ − bx = [1, 1, 1, 1]0 − [0, 1, 0, 1]0
∅⊆B⊆{x}
= [1, 0, 1, 0]0 = Qby .
We have seen that in the case of a binary frame of discernment, B and PL are con-
gruent, i.e. they can be superposed by means of a rigid transformation (see Section
8.3.3). Indeed the congruence of belief, plausibility and commonality spaces is a
general property.
Theorem 30. The corresponding 1-dim faces Cl(bA , bB ), Cl(plA , plB ) of belief
and plausibility spaces are congruent, namely
Therefore, ∀p:
X X X
|plB (C) − plA (C)|p = |bA (C c ) − bB (C c )|p = |bA (D) − bB (D)|p .
C⊆Θ C⊆Θ D⊆Θ
Notice that the proof of Theorem 30 holds no matter whether the pair (∅, ∅c =
Θ) is considered or not (i.e., it is immaterial whether classical or unnormalised
belief functions are considered).
A straightforward consequence is that:
Corollary 8. B and PL are congruent; B U and PLU are congruent.
as their corresponding 1-dimensional faces have the same length. This is due
to the generalization of a well-known Euclid’s theorem, which states that triangles
whose sides are of the same length are congruent. It is worth to notice that, although
this holds for simplices (generalized triangles), the same is not true for polytopes in
general, i.e. convex closures of a number of vertices greater than n + 1 where n is
the dimension of the Cartesian space in which they are defined (think, for instance,
of a square and a rhombus both with sides of length 1).
QU = Cl(QA , ∅ ⊆ A ⊆ Θ).
274 8 Three equivalent models
while:
X X
QA = (−1)|B| bB = (−1)|B| bB + b∅ = −plAc + b∅ . (8.14)
∅⊆B⊆Ac ∅(B⊆Ac
Theorem 31. The 1-dimensional faces Cl(QB , QA ) and Cl(plB c , plAc ) of the
commonality and the plausibility space, respectively, are congruent. Namely:
kQB − QA kp = kplB c − plAc kp .
Proof. Since QA = bΘ − plAc then
QA − QB = b∅ − plAc − b∅ + plB c = plB c − plAc
and the two faces are trivially congruent.
Therefore, the following map between vertices of PLU and QU
QA 7→ plAc (8.15)
maps 1-dimensional faces of the commonality space to congruent faces of the plau-
sibility space Cl(QA , QB ) 7→ Cl(plAc , plB c ). Therefore, the two simplices are
congruent.
However, (8.15) clearly acts as a 1-1 application of unnormalized categorical
commonality and plausibility functions (as the complement of ∅ is Θ, so that QΘ 7→
pl∅ ). Therefore we can only claim that:
8.6 Point-wise rigid transformation 275
Let us get back to the binary example: Θ2 = {x, y}. It is easy to see from Figures
8.1 and 8.2 that PL2 and Q2 are not congruent in the case of√normalized belief
functions, as Q2 is an equilateral triangle with sides of length 2, while PL2 has
two sides of length 1.
In the unnormalized case instead, recalling Equation (8.13), we have:
In the case of belief and plausibility spaces (in the standard, normalized case) the
rigid transformation
P is obviously encoded by Equation (3.13): plb (A) = 1 − b(Ac ).
Since plb = ∅(A⊆Θ plb (A)xA , Equation (3.13) implies the following relation:
plb = 1 − bc ,
where bc is the unique belief function whose belief values are the same as b’s on the
complement of each event A: bc (A) = b(Ac ).
As in the normalized case 1 = plΘ and 0 = bΘ , the above relation reads as:
plb = 1 − bc = 0 + 1 − bc = bΘ + plΘ − bc .
276 8 Three equivalent models
As a consequence, the segments Cl(bΘ , plΘ ) and Cl(bc , plb ) have the same center
of mass, for
plb + bc bΘ + plΘ
= .
2 2
In other words:
Theorem 32. The plausibility vector plb associated with a belief function b is the
reflection in RN −2 through the segment Cl(bΘ , plΘ ) = Cl(0, 1) of the “comple-
ment” belief function bc .
Geometrically, bc is obtained from b by means of another reflection (by swap-
ping the coordinates associated with the reference axes xA and xAc ), so that the
desired rigid transformation is completely determined.
Figure 8.3 illustrates the nature of the transformation, and its instantiation in the
binary case for normalized belief functions.
In the case of unnormalized belief functions (b∅ = 1, pl∅ = 0) we have
plb = pl∅ + b∅ − bc ,
i.e., plb is the reflection of bc through the segment Cl(b∅ , pl∅ ) = Cl(0, 1).
by =[0,1]'=ply plΘ=[1,1]'
−
b
c P
b(x)
pl b
b(y)
b
bΘ=[0,0]' bx =[1,0]'=pl x
b(y) b(x)
Fig. 8.3. The pointwise rigid transformation mapping b onto plb in the normalized case. In
the binary case the middle point of the segment Cl(0, 1) is the mean probability P.
The form of the desired point-wise transformation is also quite simple in the case of
the pair (PLU , QU ). We can indeed use Equation (8.14), getting:
8.7 Comments 277
X X X
Qb = mb (A)QA = mb (A)(b∅ − plAc ) = b∅ − mb (A)plAc
∅⊆A⊆Θ ∅⊆A⊆Θ ∅⊆A⊆Θ
= b∅ − plbmc ,
c
where bm is the unique belief function whose b.p.a. is mbmc (A) = mb (Ac ). But
then, since pl∅ = 0 = [0, · · · , 0]0 for unnormalized belief functions (remember the
binary example), we can rewrite the above equation as:
Qb = pl∅ + b∅ − plbmc .
In conclusion:
Theorem 33. The commonality vector associated with a belief function b is the
reflection in RN through the segment Cl(pl∅ , b∅ ) = Cl(0, 1) of the plausibility
c
vector plbmc associated with the belief function bm .
c
In this case, however, bm is obtained from b by swapping the coordinates with
respect to the base {bA , ∅ ⊆ A ⊆ Θ}. A pictorial representation for the binary case
(similar to Figure 8.3) is more difficult in this case as R4 is involved.
It is natural to stress the analogy between the two rigid transformations
τBU PLU : B U → PLU , τPLU QU : PLU → QU
mapping an unnormalized belief function onto the corresponding plausibility, and
an unnormalized pl.f. onto the corresponding commonality function, respectively:
b(A)7→b(Ac ) ref l. through Cl(0,1)
τBU PLU : b −→ bc −→ plb
c
mb (A)7→mb (A ) mc ref l. through Cl(0,1)
τPLU QU : plb −→ b −→ Qb .
They are both the composition of two reflections: a swap of the axes of the coor-
dinate frame {xA , A ⊂ Θ} ({bA , A ⊂ Θ}) induced by set-theoretic complement,
plus a reflection with respect to the center of the segment Cl(0, 1).
8.7 Comments
Although they are equivalent mathematical representations of the same evidence,
belief, plausibility, and commonality functions form a hierarchy of sum functions
meeting increasingly more constraints. Indeed, their Moebius transforms meet both
normalization and positivity constraints in the case of basic probability assignments,
just the normalization constraint for basic plausibility assignments, or none of them
(in the case of basic commonality assignments). This is summarised in Table 8.1.
Nevertheless, all these functions possess a similar simplicial geometry, which is
reflected in the congruence of the associated spaces and by the point-wise geometry
of the triplet (b, plb , Qb ).
We will see in Part ?? that the alternative models introduced here, and in par-
ticular the notion of basic plausibility assignment, can be put to good uise in the
probability transformation problem.
278 8 Three equivalent models
Table 8.1. Combinatorial properties of the Moebius transforms of belief, plausibility and
commonality functions.
Appendix: proofs
Proof of Theorem 26
for B c ⊇ C, B ⊆ A is equivalent to B ⊆ C c , B ⊆ A ≡ B ⊆ (A ∩ C c ).
Let us now analyze the following function of C:
. X
f (C) = (−1)|A\B| .
B⊆A∩C c
since B ⊆ D ⊆ A and |A| − |B| = |A| − |D| + |D| − |B|. But then
X
f (C) = (−1)|A|−|D| (−1)|D|−|B| = 0,
B⊆D
|D|−|B|
P
given that B⊆D (−1) = 0 by Newton’s binomial formula again.
In conclusion, f (C) = 0 if C c ∩ A 6= ∅, f (C) = (−1)|A| if C c ∩ A = ∅. We
can then rewrite (8.17) as:
X X X
− mb (C)f (C) = − mb (C) · 0 − mb (C) · (−1)|A| =
C⊆Θ c C:C c ∩A=∅
X∩A6=∅
C:C X
|A|+1 |A|+1
= (−1) mb (C) = (−1) mb (C).
C:C c ∩A=∅ C⊇A
Proof of Lemma 8
We first need to recall that a categorical belief function can be expressed as
X
bA = xC .
C⊇A
Proof of Theorem 28
The latter is indeed a convex combination, since basic probability assignments are
non-negative (but mb (∅ = 0)) and have unitary sum. It follows that:
PL = {pl
b , b ∈ B}
X X
= mb (C)plC , mb (C) = 1, mb (C) ≥ 0 ∀C ⊆ Θ
∅(C⊆Θ C
= Cl(plA , ∅ ( A ⊆ Θ),
Proof of Theorem 29
Now, if A ∩ C = ∅ then there are no addenda in the above sum, which nullifies.
Otherwise, by Newton’s binomial formula (8.4), we have:
n o
plA (C) = − [1 + (−1)]|A∩C| − (−1)0 = 1.
281
282 9 The geometry of possibility
Chapter outline
We first recall in Section 9.1 the relationship between consonant belief functions and
necessity measures. We then move on to study the geometry of the space of conso-
nant belief functions, or consonant subspace CO (Section 9.2). After observing the
correspondence between co.b.f.s and maximal chains of events, we look for useful
insights by studying the case of ternary frames, which leads us to prove that the con-
sonant subspace has the form of a simplicial complex [441], a structured collection
of simplices. In Section 9.3 we investigate the convex geometry of the components
of CO in more detail, proving that they are all congruent to each other, and can be
decomposed into faces which are right triangles.
In the second half of the Chapter we introduce the notion of consistent belief
function as the natural generalization in the context of belief theory of consistent
knowledge bases in classical logic (Section 9.4). In Section 9.5, following the intu-
ition provided by the simple case of binary frames, we prove that the set of consistent
b.f.s (just like consonant b.f.s do) form a simplicial complex in the space of all belief
functions, and that the maximal simplices of such a complex are all congruent with
each other. Finally, in Section 9.6 we show that each belief function can be decom-
posed into a number of consistent components living the consistent complex, rather
closely related to the pignistic transformation [1424, 1276].
To improve the readability of the paper several major proofs are collected in an
Appendix.
Proposition 39. Suppose b1 , ..., bn are non-vacuous simple support functions with
foci Cb1 , ..., Cbn respectively, and b = b1 ⊕ · · · ⊕ bn is consonant. If C denotes the
core of b, then the sets Cbi ∩ C are nested.
9.2 The consonant subspace 283
i.e., either b(A) = 0 or b(Ā) = 0 for every A ⊆ Θ. This result and Proposition
13 explain why we said in Chapter 2 that consonant and quasi-support functions
represent opposite sides of the class of belief functions.
As we recalled in Chapter 5, Section 5.5.1:
by =[0,1]'
P2
COy B2
b
mb(y)
COx mb(x)
bΘ=[0,0]' bx =[1,0]'
Fig. 9.1. The belief space B for a binary frame is a triangle in R2 whose vertices are the basis
belief functions focused on {x}, {y} and Θ, (bx , by , bΘ ) respectively. The probability region
is the segment Cl(bx , by ), while consonant and consistent belief functions are constrained to
belong to the union of the two segments CS x = COx = Cl(bΘ , bx ) and CS y = COy =
Cl(bΘ , by ).
and the consonant subspace CO3 , also part of the boundary of B3 , is given by the
union of the maximal simplices listed above.
b {x,z} bz
b{y,z}
P3
bx
by
b{x,y}
CO3
bΘ
Fig. 9.2. The simplicial complex CO3 of all the consonant belief functions for a ternary
frame Θ3 . The complex is composed by n! = 3! = 6 maximal simplicial components of
dimension n − 1 = 2, each vertex of P3 being shared by (n − 1)! = 2! = 2 of them. The
region is connected, and is part of the boundary ∂B3 of the belief space B3 .
Fig. 9.3. Intersection of simplices in a complex. Only the right-hand pair of triangles meets
condition 2. of the definition of simplicial complex (Definition 78).
associated with chains A = {A1 , ..., An1 } and B = {B1 , ..., Bn2 }, respectively.
As the vectors {bA , ∅ ( A ( Θ} are linearly independent in RN −2 , no linear
combination of the vectors bBi ’s can yield an element of span(bA1 , ..., bAn1 ), unless
some of those vectors coincide. The desired intersection is therefore:
where
{Cij , j = 1, ..., k} = C = A ∩ B,
with k < n1 , n2 . But then C is a subchain of both A and B, so that (9.1) is a face of
both Cl(bA1 , ..., bAn1 ) and Cl(bB1 , ..., bBn2 ).
As Figure 9.2 shows, the probability simplex P and the maximal simplices of
CO have the same dimension, and are both part of the boundary ∂B of the belief
space.
Theorem 35. All the maximal simplices of the consonant subspace are congruent.
Proof. To get a proof for the general case we need to find a 1-1 map between 1-
dimensional sides of two any maximal simplices. Let A = {A1 ⊂ · · · ⊂ Ai ⊂
· · · ⊂ An = Θ}, B = {B1 ⊂ · · · ⊂ Bi ⊂ · · · ⊂ Bn = Θ} be the associated
maximal chains.
The trick consists in associating pairs of events with the same cardinality:
Indeed the categorical b.f. bAi is such that bAi (B) = 1 when B ⊇ Ai , bAi (B) = 0
otherwise. On the other side bAj (B) = 1 when B ⊇ Aj ⊃ Ai , bAj (B) = 0
otherwise, since Aj ⊃ Ai by hypothesis. Hence
so that
q q
kbAi − bAj k2 = |{B ⊆ Θ : B ⊇ Ai , B 6⊃ Aj }| = |Aj \ Ai |.
But this is true for each similar pair in any other maximal chain, so that
It is easy to see that the components of CO are not congruent with P, even
though they have both dimension n − 1. In the binary case, for instance,
√
P = Cl(bx , by ), kPk = kby − bx k = 2
An analysis of the norm of the difference of two categorical belief functions can
provide us with additional information about the nature and structure of the maximal
simplices of the consonant subspace.
We know from [258] that in RN −2 each triangle
Cl(bΘ , bB , bA )
\
with Θ ) B ) A is a right triangle with as right angle bΘ bB bA .
Indeed we can prove here a much more general result.
All triangles Cl(bAi , bAj , bAk ) in CO such that Ai ) Aj ) Ak are right tri-
angles. But as each maximal simplicial component COC of the consonant complex
has vertices associated with the elements A1 ( · · · ( An of a maximal chain, any
three of them will also form a chain. Hence all 2-dimensional faces of any maximal
component of CO are right triangles. All its 3-dimensional faces (tetrahedrons) have
as faces right triangles (Figure 9.4), and so on.
bAk
bA bAj
l
bAi
Fig. 9.4. All the tetrahedrons Cl(bAi , bAj , bAk , bAl ) formed by vertices of a maximal sim-
plex of the consonant subspace, Ai ( Aj ( Ak ( Al , have all right triangles as faces.
This parallelism with classical logic reminds us of the fact that belief functions are
also collections of disparate pieces of evidence, incorporated in time as they become
available. As a result, each belief function is likely to contain self-contradictory
information, which is in turn associated with a degree of “internal” conflict. As we
have seen in Chapter ??, conflict and combinability play a central role in the theory
of evidence [1470, 1216, 1063], and have been recently subject to novel analyses
[871, 637, 884].
In propositional logic, propositions or formulas are either true or false, i.e., their
truth value is either 0 or 1 [922]. Formally, an interpretation or model of a formula
φ is a valuation function mapping φ to the truth value “true” (1). Each formula can
therefore be associated with the set of interpretations or models under which its truth
value is 1. If we define a frame of discernment collecting all the possible interpreta-
tions, each formula φ is associated with the subset A(φ) of this frame which collects
all its interpretations.
A straightforward extension of classical logic consists on assigning a probability
value to such sets of interpretations, i.e, to each formula. If, however, the available
evidence allows us to define a belief function on the frame of possible interpreta-
tions, each formula A(φ) ⊆ Θ is then naturally assigned a degree of belief b(A(φ))
between 0 and 1 [1104, 572], measuring the total amount of evidence supporting the
proposition “φ is true”.
A belief function can therefore be seen in this context as the generalization of a
knowledge base [1104, 572], i.e., a set of propositions together with their non-zero
belief values: b = {A ⊆ Θ : b(A) 6= 0}.
292 9 The geometry of possibility
b ` B ⇔ A ⊆ B ∀A : b(A) 6= 0. (9.2)
b ` B ⇔ b(B) 6= 0. (9.3)
Whatever the way we choose to define implication, we can define the class of con-
sistent belief functions as the set of BFs which cannot imply contradictory proposi-
tions.
Definition 79. A belief function b is consistent if there exists no proposition A such
that both A and its negation Ac are implied by b.
When adopting the implication relation (9.2), it is trivial to verify that:
\
A ⊆ B ∀A : b(A) 6= 0 ⇔ A ⊆ B.
b(A)6=0
No matter our definition of implication, the class of consistent belief functions cor-
responds to the set of b.f.s whose core is not empty.
Definition 80. A belief function is said to be consistent if its core is non-empty:
. \
Cb = A 6= ∅.
A:mb (A)6=0
Indeed we can prove that, under either definition (9.2) or definition (9.3) of the
implication b ` B, Definitions 79 and 80 are equivalent.
Theorem 37. A belief function b : 2Θ → [0, 1] has non-empty core if and only
if there do not exist two complementary propositions A, Ac ⊆ Θ which are both
implied by b in the sense (9.2).
9.5 The geometry of consistent belief functions 293
Theorem 38. A belief function b : 2Θ → [0, 1] has non-empty core if and only if
there do not exist two complementary propositions A, Ac ⊆ Θ which both enjoy
non-zero support from b, b(A) 6= 0, b(Ac ) 6= 0 (i.e., they are implied by b in the
sense (9.3)).
@A, Ac : b(A) 6= 0, b(Ac ) 6= 0.
A ∩ Ac ⊇ Cb 6= ∅
arbitrary belief function into |Θ| “consistent components”, each of them living in a
maximal simplex of the consistent complex. Such components can thus be seen as
natural consistent transformations with different cores of an arbitrary belief func-
tion. The topic of how to approximate a generic b.f. by means of a consistent one
will be discussed in greater detail Chapter 13.
Fig. 9.5. In the binary case consistent belief functions are constrained to belong to the union
of the two segments CS x = Cl(bΘ , bx ) and CS y = Cl(bΘ , by ) (cfr. the behaviour of conso-
nant b.f.s in Figure 9.1).
Just as in the consonant case, Theorem 11 implies that all the b.f.s whose fo-
cal elements belong to such a collection, no matter the actual values of their ba-
sic plausibility assignment, form a simplex Cl(bA1 , ..., bAm ). Such a collection is
“maximal” when it is not possible to add another focal element Am+1 such that
∩m+1
i=1 Ai 6= ∅.
It is easy to see that collections of events with non-empty intersection are maximal
iff they have the form {A ⊆ Θ : A 3 x}, where x ∈ Θ is a singleton. Consequently,
the region of consistent belief functions is the union of the following collection of
maximal simplices: [
CS = Cl(bA , A 3 x). (9.5)
x∈Θ
.
There are obviously n = |Θ| such maximal simplices in CS. Each of them has
c
|{A : A 3 x}| = |{A ⊆ Θ : A = {x} ∪ B, B ⊂ {x}c }| = 2|{x} | = 2n−1
B
vertices, so that their dimension as simplices in the belief space is 2n−1 − 1 = dim
2
n
(as the dimension of the whole belief space is dim B = 2 − 2).
Clearly CS is connected as as each maximal simplex is by definition connected,
and bΘ belongs to all maximal simplices.
Just as in the consonant case, the region (9.5) of consistent belief functions is an
instance of simplicial complex (see Definition 78) [441].
Theorem 40. CS is a simplicial complex in the belief space B.
As with consonant belief functions, more can be said about the geometry of the
maximal faces of CS. In Θ = {x, y, z}, for instance, the consistent complex CS is
composed by three maximal simplices of dimension |{A 3 x}|−1 = 3 (cfr. Section
9.3.2):
Cl(bA : A 3 x) = Cl(bx , b{x,y} , b{x,z} , bΘ ),
Cl(bA : A 3 y) = Cl(by , b{x,y} , b{y,z} , bΘ ), (9.6)
Cl(bA : A 3 z) = Cl(bz , b{x,z} , b{y,z} , bΘ ).
Once again the vertices of each pair of such maximal simplices can be put into a 1-1
correspondence. Consider for instance the pair CS x = Cl(bx , b{x,y} , b{x,z} , bΘ ),
CS z = Cl(bz , b{x,z} , b{y,z} , bΘ ). The desired mapping is:
for corresponding segments in the two simplices have the same length. For exam-
ple, (remembering that bA (B) = 1 B ⊇ A, bA (B) = 0 otherwise), Cl(bx , bΘ ) is
congruent with Cl(bz , bΘ ) as:
√
kbx −bΘ k = kbx k = k[1 0 0 1 1 0]0 k = 3 = k[0 0 1 0 1 1]0 k = kbz −bΘ k = kbz k.
296 9 The geometry of possibility
Theorem 41. All maximal simplices of the consistent complex are congruent.
. 1 X mb (A)
bx = bA , x ∈ Θ
BetP [b](x) |A|
A3x
Obviously if b ∈ P then bx = bx ∀x ∈ Θ.
Equation (9.7) draws a bridge between the notions of belief, probability, and
possibility, by associating each belief function with its “natural” probabilistic (the
pignistic function) and possibilistic (the quantities bx ) components.
9.7 Open questions 297
bx
P
CSx bz
x
b
BetP[b]
b
P
b
y
by b
CSy
z
b
bΘ
CSz
Fig. 9.6. Pictorial representation of the role of the pignistic values BetP [b](x) for a belief
function and the related pignistic function. Both b and BetP [b] live in a simplex (respectively
P = Cl(bx , x ∈ Θ) and P b = Cl(bx , x ∈ Θ)) on which they possess the same convex
coordinates {BetP [b](x)}. The vertices bx , x ∈ Θ of the simplex P b can be interpreted as
consistent components of the belief function b on the simplicial complex of consistent belief
functions CS.
where d is any function of the convex coordinates of b in P b (as they coincide with
the pignistic values for both b and BetP [b]).
The consistent approximation will be analysed in more detail in Chapter 12.
Appendix: proofs
Proof of Theorem 40
Cl(bA : A ⊇ C1 ) ∩ Cl(bA : A ⊇ C2 ).
Now, each convex closure of points b1 , ..., bm in a Cartesian space is trivially in-
cluded in the affine space they generate:
( )
. X
Cl(b1 , ..., bm ) ( a(b1 , ..., bm ) = b : b = α1 b1 + · · · + αm bm , αi = 1
i
(we just need to relax the positivity constraint on the coefficients αi ). But the cate-
gorical belief functions {bA : ∅ ( A ( Θ} are linearly independent (as it is straight-
forward to check), so that a(bA , A ∈ L1 ) ∩ a(bA , A ∈ L2 ) 6= ∅ (where L1 , L2 are
lists of subsets of Θ) if and only if L1 ∩ L2 6= ∅. Here L1 = {A ⊆ Θ : A ⊇ C1 },
L2 = {A ⊆ Θ : A ⊇ C2 }, so that the condition is
{A ⊆ Θ : A ⊇ C1 } ∩ {A ⊆ Θ : A ⊇ C2 } = {A ⊆ Θ : A ⊇ C1 ∪ C2 } =
6 ∅.
Proof of Theorem 41
We need to find a 1-1 map between the vertices of any two maximal simplices
Cl(bA , A 3 x), Cl(bA : A ⊇ y) of CS such that corresponding sides are congruent.
We need to rewrite the related collections of events as:
{A ⊆ Θ : A 3 x} = {A ⊆ Θ : A = B ∪ {x}, B ⊆ {x}c },
(9.8)
{A ⊆ Θ : A 3 y} = {A ⊆ Θ : A = B ∪ {y}, B ⊆ {y}c }.
Let us then define the following map between events of the two collections (9.8):
{A ⊆ Θ, A 3 x} → {A ⊆ Θ, A 3 y}
(9.9)
A = B ∪ {x} 7→ A0 = B 0 ∪ {y}
9.7 Open questions 299
where
B 7→ B 0 = B B ⊆ {x, y}c ;
0 (9.10)
B = C ∪ {y} 7→ B = C ∪ {x} B 6⊆ {x, y}c .
We can prove that (9.9) preserves the length of the segments in the corresponding
maximal simplices Cl(bA , A 3 x), Cl(bA , A 3 y). We first need to find an explicit
expression for kbA − bA0 k, A, A0 ⊆ Θ. Again, each categorical BF bA is such that:
and p p
kbA − bA0 k = |{B ⊇ A, B 6⊇ A0 }| = |A0 \ A|.
For each pair of vertices A1 = B1 ∪ {x}, A2 = B2 ∪ {x} in the first component
we can distinguish four cases:
1. B1 ⊆ {x, y}c , B2 ⊆ {x, y}c in which case B10 = B1 , B20 = B2 and
But then |A02 \ A01 | = |A2 \ A1 | so that again kbA02 − bA01 k = kbA2 − bA1 k;
3. B1 6⊆ {x, y}c , B1 = C1 ∪ {y} but B2 ⊆ {x, y}c , which by symmetry of the \
operator yields again kbA02 − bA01 k = kbA2 − bA1 k as in point 2);
4. B1 6⊆ {x, y}c , B1 = C1 ∪ {y}, B2 6⊆ {x, y}c , B2 = C2 ∪ {y} in which case
B10 = C1 ∪ {x}, B20 = C2 ∪ {x}, so that:
In all cases kbA02 − bA01 k = kbA2 − bA1 k, for pairs of segments Cl(A1 , A2 ),
Cl(A01 , A02 ) in the two maximal components associated through the mapping (9.10)
introduced above. For the usual generalization of a well known Euclid’s theorem
this implies that the two simplices are congruent.
Part III
305
306 10 The affine family of probability transforms
and can be used as a tool for constructing convex sets of probability distributions.
Uncertainty is modeled as sets of probabilities represented as ‘affine trees’, while ac-
tions (modifications of the uncertain state) are defined as tree manipulators. A small
number of properties of the affine operator are also presented. In a later work [566]
they presented the interval generalization of the probability cross-product operator,
called convex-closure (cc) operator. They analyzed the properties of the cc-operator
relative to manipulations of sets of probabilities, and presented interval versions of
Bayesian propagation algorithms based on it. Probability intervals were represented
in a computationally efficient fashion, by means of a data structure called pcc-tree,
in which branches are annotated with intervals, and nodes with convex sets of prob-
abilities.
The topic of this Chapter is somehow related to Ha’s cc operator, as we deal
here with probability transforms which commute (at least under certain conditions)
with affine combination. We call such group of transforms the ‘affine family’, of
which Smets’ pignistic transform (Section 4.4.2) is the foremost representative. We
introduce two new probability transformations of belief functions, both of them de-
rived from purely geometric considerations, that can be grouped with the pignistic
function in the affine family.
Chapter outline
As usual, we first look for insight by considering the simplest case of a binary frame
(Section 10.1). Each belief function b is associated there with three different geo-
metric entities, namely: the simplex of consistent probabilities P[b] = {p ∈ P :
p(A) ≥ b(A) ∀A ⊂ Θ} (see Chapter 4, Section 3.1.4, and [169]); the line (b, plb )
joining b with the related plausibility function plb ; and the orthogonal complement
P ⊥ of the probabilistic subspace P. These in turn determine three different proba-
bilities associated with b, i.e. the barycenter of P[b] or pignistic function BetP [b],
the intersection probability p[b], and the orthogonal projection π[b] of b onto P. In
the binary case all these Bayesian belief functions coincide.
In Section 10.2 we prove that, even though the (‘dual’) line (b, plb ) is always
orthogonal to P, it does not intersect in general the Bayesian simplex. However, it
does intersect the region of Bayesian normalized sum functions (compare Chapter
6, Section 6.3.3), i.e., the generalizations of belief functions obtained by relaxing
the positivity constraint for masses. This intersection yields a Bayesian n.s.f. ς[b].
We later see, in Section 10.3, that ς[b] is in turn associated with a proper Bayesian
belief function p[b], which we call intersection probability. We provide two differ-
ent interpretations of the way this probability distributes the masses of the focal ele-
ments of b to the elements of Θ, both functions of the difference between plausibility
and belief of singletons, and compare the combinatorial and geometric behavior of
p[b] with those of pignistic function and relative plausibility of singletons.
Section 10.4 concerns the study of the orthogonal projection of b onto the prob-
ability simplex P, i.e., the transform (10.1) associated with the classical d = L2
distance. We show that π[b] always exists and is indeed a probability function. After
deriving the condition under which a belief function b is orthogonal to P we give
10.1 Affine transforms in the binary case 307
two equivalent expressions of the orthogonal projection. We see that π[b] can be
reduced to another probability signalling the distance of b from orthogonality, and
that this ‘orthogonality flag’ can be in turn interpreted as the result of a mass redis-
tribution process analogous to that associated with the pignistic transform.
We prove that, just as BetP [b] does, π[b] commutes with the affine combination
operator, and can therefore be expressed as a convex combination of basis pignistic
functions, making orthogonal projection and pignistic function fellow members of
a common ‘affine family’ of probability transformations.
For sake of completeness, the case of unnormalized belief functions (see Section
10.5) is also discussed. We argue that, while the intersection probability p[b] is not
defined for a generic u.b.f. b, the orthogonal projection π[b] does exist and retains
its properties.
Finally, in Section 10.6 more general conditions under which the three affine
transformations coincide are analysed.
Thus, the point of R2 which represents its (dual) plausibility function is simply (see
Figure 10.1 again):
plb = plb (x)bx + plb (y)by .
As it was first noticed in Chapter 8, belief and plausibility spaces lie in symmetric
locations with respect to the Bayesian simplex P = Cl(bx , by ). Furthermore, each
pair of measures (b, plb ) determines a line orthogonal to P, where b and plb lie on
symmetric positions on the two sides of P itself.
308 10 The affine family of probability transforms
P'
plΘ=[1,1]'
by=[0,1]'=pl y
PL
P
B
pl b
1−m b(x)
P[b]
~
pl b
p[b]=π[b]=BetP[b]
~
b
m b(y) b
bx =[1,0]'=plx
bΘ=[0,0]' m b(x) 1−m b(y)
Fig. 10.1. In a binary frame Θ2 = {x, y} a belief function b and the corresponding plau-
sibility function plb are always located in symmetric positions with respect to the segment
˜ and belief
P of all the probabilities defined on Θ2 . The associated relative plausibility pl b
b̃ of singletons are just the intersections of the probability simplex P with the line passing
through plb and bΘ = [0, 0]0 and that joining b and bΘ , respectively. Pignistic function, or-
thogonal projection and intersection probability all coincide with the center of the segment
of probabilities P[b] which dominate b (in red).
Inherently epistemic notions such as ‘consistency’ and ‘linearity’ (one of the ratio-
nality principles behind the pignistic transform [1223]) seem to be related to geo-
metric properties such as orthogonality. It is natural to wonder whether this is true
in general, or is just an artifact of the binary frame.
10.2 Geometry of the dual line 309
Incidentally, both the relative plausibility (4.16) and the relative belief (4.18) of
singletons (see Chapter 4, Section 4.4.2 or the original paper [1358]), do not follow
the same pattern. We will consider their behavior separately in Chapter 11.
In the binary case the plane R2 in which both B and PL are embedded is the space
of all the normalized sum functions (n.s.f.s) on Θ2 (compare Section 6.3.3). The
region P 0 of all the Bayesian n.s.f.s one can define on Θ = {x, y} is the line:
n o
P 0 = ς ∈ R2 : mς (x) + mς (y) = 1 = a(P),
We first note that P 0 can be written as the translated version of a vector space as
follows:
a(P) = bx + span(by − bx , ∀y ∈ Θ, y 6= x),
where span(by − bx ) denotes the vector space generated by the n − 1 difference
vectors by − bx (n = |Θ|), and bx is the categorical belief function focussed on x
(Section 6.2). Since a categorial b.f. bB focussed on a specific subset B is such that:
1A⊇B
bB (A) = (10.3)
0 otherwise,
and vice-versa. On the other hand, [by − bx ](A) = 0 implies A ⊃ {y}, A ⊃ {x}
or A 6⊃ {y}, A 6⊃ {x}. In the first case Ac 6⊃ {x}, {y}, in the second one Ac ⊃
{x}, {y}. Regardless, [by − bx ](Ac ) = 0.
Lemma 9 allows us to prove that, just as in the binary case (see chapter Appendix):
Theorem 42. The line connecting plb and b is orthogonal to the affine space gen-
erated by the probabilistic simplex. Namely:
One might be tempted to conclude that, since a(b, plb ) and P are always orthogonal,
their intersection is the orthogonal projection of b onto P as in the binary case.
Unfortunately, this is not the case for in general they do not intersect each other.
As a matter of fact b and plb belong to a N − 2 = (2n − 2)-dimensional Euclidean
space (recall that we neglect the trivially constant components associated with the
empty set ∅ and the entire frame Θ, Chapter 6), while the simplex P generates a
vector space whose dimension is only n − 1. If n = 2, n − 1 = 1 and 2n − 2 = 2
so that a(P) divides the plane into two half-planes with b on one side and plb on the
other side (see Figure 10.1 again).
Formally, for a point on the line a(b, plb ) to be a probability measure we need
to find a value of α such that b + α(plb − b) ∈ P. Its components are obviously
b(A) + α[plb (A) − b(A)] for any subset A ⊂ Θ, A 6= Θ, ∅. In particular, when
A = {x} is a singleton:
In order for this point to belong to P, it needs to meet the normalization constraint
for singletons, namely:
X X
b(x) + α [1 − b(xc ) − b(x)] = 1.
x∈Θ x∈Θ
The latter yields a single candidate value β[b] for the line coordinate of the desired
intersection, more precisely:
P
1 − x∈Θ b(x) .
α= P c ) − b(x)]
= β[b]. (10.6)
x∈Θ [1 − b(x
(where P 0 denotes once again the set of all Bayesian normalized sum functions in
RN −2 ) is a Bayesian n.s.f., but is not guaranteed to be a Bayesian
Pbelief function.
For normalized sum functions, the normalization condition x∈Θ mς (x) = 1
implies |A|>1 mς (A) = 0, so that P 0 can be written as:
P
X X X
P0 = ς= mς (A)bA ∈ RN −2 : mς (A) = 1, mς (A) = 0 .
A⊂Θ |A|=1 |A|>1
(10.8)
Theorem 43. The coordinates of ς[b] in the reference frame of the categorical
Bayesian belief functions {bx , x ∈ Θ} can be expressed in terms of the basic prob-
ability assignment mb of b as follows:
X
mς[b] (x) = mb (x) + β[b] mb (A), (10.9)
A)x
where:
P
mb (B)
P
1− mb (x)
β[b] = P x∈Θ
= P |B|>1 . (10.10)
x∈Θ plb (x) − mb (x) |B|>1 mb (B)|B|
Equation (10.9) ensures that mς[b] (x) is positive for each x ∈ Θ. A more
symmetrical-looking version of (10.9) can be obtained after realizing that
P
|B|=1 mb (B)
P = 1,
|B|=1 mb (B)|B|
It is easy to prove that the line a(b, plb ) intersects the actual probability simplex
only for 2-additive belief functions (check the chapter’s Appendix as usual).
Theorem 44. The Bayesian normalised sum function ς[b] is a probability measure,
ς[b] ∈ P, if and only if b is 2-additive, i.e., mb (A) = 0 |A| > 2. In the latter case
plb is the reflection of b through P.
For 2-additive belief functions ς[b] is nothing but the mean probability function
b+plb
2 . In the general case, however, the reflection of b through P not only does not
coincide with plb , but it is not even a plausibility function [271].
312 10 The affine family of probability transforms
plb
ς[b]
p[b]
a(b,plb)
π[b]
P
b
a(P )
P'
Fig. 10.2. The geometry of the line a(b, plb ) and the relative locations of p[b], ς[b] and π[b]
for a frame of discernment of arbitrary size. Each belief function b and the related plausibil-
ity function plb lie on opposite sides of the hyperplane P 0 of all Bayesian normalised sum
functions, which divides the space RN −2 of all n.s.f.s into two halves. The line a(b, plb )
connecting them always intersects P 0 , but not necessarily a(P) (vertical line). This intersec-
tion ς[b] is naturally associated with a probability p[b] (in general distinct from the orthogonal
projection π[b] of b onto P), having the same components in the base {bx , x ∈ Θ} of a(P).
P is a simplex (a segment in the figure) in a(P): π[b] and p[b] are both “true” probabilities.
where mς[b] (x) is given by Equation (10.9). Trivially p[b] is a probability measure,
since by definition mp[b] (A) = 0 for |A| > 1, mp[b] (x) = mς[b] (x) ≥ 0 ∀x ∈ Θ,
and by construction:
X X
mp[b] (x) = mς[b] (x) = 1
x∈Θ x∈Θ
where X X X
kb̃ = mb (x) kpl
˜ =
b
plb (x) = mb (A)|A|
x∈Θ x∈Θ A⊂Θ
are the total mass (belief) of singletons and the total plausibility of singletons, re-
spectively (equivalently, the normalization factors for relative belief b̃ and relative
˜ , respectively). Consequently the intersection probability p[b] can be
plausibility pl b
rewritten as:
plb (x) − mb (x)
p[b](x) = mb (x) + (1 − kb̃ ) . (10.13)
˜ − kb̃
kpl b
In conclusion, both the relative plausibility of singletons pl˜ and the intersec-
b
tion probability p[b] belong to the segment Cl(R[b], b̃) joining relative belief b̃ and
probability flag R[b] (see Figure 10.3). The convex coordinate of pl ˜ in Cl(R[b], b̃)
b
(Equation (10.15)) measures the ratio between total mass and plausibility of single-
tons, while that of b̃ measures
P the total mass of singletons kb̃ .
However, since kpl ˜
b
= A⊂Θ mb (A)|A| ≥ 1, we have that kb̃ /kpl ˜ ≤ kb̃ : hence
b
˜ (Figure 10.3 again).
p[b] is closer to R[b] than the relative plausibility function pl b
~ ~
plb b
p[b]
R[b]
P
Fig. 10.3. Location in the probability simplex P of intersection probability p[b] and relative
˜ with respect to the non-Bayesianity flag R[b]. They both lie on
plausibility of singletons pl b
˜ is closer to b̃ than p[b].
the segment joining R[b] and the relative belief of singletons b̃, but pl b
Obviously when kb̃ = 0 (the relative belief of singletons b̃ does not exists, for b
assigns no mass to singletons) the remaining probability approximations coincide:
˜ = R[b] by Equation (10.13).
p[b] = pl b
Meaning of the ratio β[b] and pignistic function To shed more light on p[b] and
get an alternative interpretation of the intersection probability it is useful to compare
p[b] as expressed in Equation (10.13) with the pignistic function:
10.3 The intersection probability 315
. X mb (A) X mb (A)
BetP [b](x) = = mb (x) + .
|A| |A|
A⊃x A⊃x,A6=x
We can notice that in BetP [b] the mass of each event A, |A| > 1 is considered
separately, and its mass mb (A)
P is equally shared among the elements of A. In p[b],
instead, it is the total mass |A|>1 mb (A) = 1 − kb̃ of non-singletons which is
considered, and this total mass is distributed proportionally to their non-Bayesian
contribution to each element of Θ.
How should β[b] be interpreted then? If we write p[b](x) as
where βA = const = β[b] for p[b], while βA = β[bA ] in case of the pignistic
function.
Under what conditions intersection probability and pignistic function coincide?
A sufficient condition can be easily given for a special class of belief functions.
Theorem 45. Intersection probability and pignistic function coincide for a given
belief function b whenever the focal elements of b have size 1 or k only.
Let us briefly discuss these two interpretations of p[b] in a simple example. Consider
a ternary frame Θ = {x, y, z}, and a belief function b with b.p.a.:
Θ Θ
x z x z
y y
x z
y
Fig. 10.4. Sign of non-zero masses assigned to events by the functions discussed in the exam-
ple. Top left: b.p.a. of the belief function (10.17), with 5 focal elements. Right: the associated
b.pl.a. assigns positive masses to all events of size 1 and 3, negative ones to all events of size
2. This is the case for the mass assignment associated with ς (10.9) too. Bottom: the intersec-
tion probability p[b] (10.12) retains, among the latter, only the masses assigned to singletons.
Figure 3-top depicts the subsets of Θ with non-zero b.p.a. (left) and b.pl.a. (right)
induced by the belief function (10.17): dashed lines indicate a negative mass. The
total mass (10.17) accords to singletons is kb̃ = 0.1 + 0 + 0.2 = 0.3 - therefore, the
line coordinate β[b] of the intersection ς[b] of the line a(b, plb ) with P 0 is equal to:
1 − kb̃ 0.7
β[b] = = .
mb ({x, y})|{x, y}| + mb ({x, z})|{x, z}| + mb (Θ)|Θ| 1.7
10.3 The intersection probability 317
Theorem 46. Given two arbitrary belief functions b1 , b2 defined on the same frame
of discernment, the intersection probability of their affine combination α1 b1 + α2 b2
is, for any α1 ∈ [0, 1], α2 = 1 − α1 :
p[α1 b1 + α2 b2 ] = α
\ 1 D1 α1 p[b1 ] + α2 T [b1 , b2 ] + α2 D2 α1 T [b1 , b2 ]) + α2 p[b2 ] ,
\
(10.18)
αi Di
where α [ D
i i = α1 D1 +α2 D2 , T [b ,
1 2b ] is the probability with values:
.
T [b1 , b2 ](x) = D̂1 p[b2 , b1 ] + D̂2 p[b1 , b2 ], (10.19)
. Di
with D̂i = D1 +D2 and:
.
p[b2 , b1 ](x) = mb2 (x) + β[b1 ] plb2 (x) − mb2 (x),
. (10.20)
p[b1 , b2 ](x) = mb1 (x) + β[b2 ] plb1 (x) − mb1 (x) .
Geometrically, p[α1 b1 + α2 b2 ] can be constructed as in Figure 4 as a point of the
simplex Cl(T [b1 , b2 ], p[b1 ], p[b2 ]). The point α1 T [b1 , b2 ] + α2 p[b2 ] is the intersec-
tion of the segment Cl(T, p[b2 ]) with the line l2 passing through α1 p[b1 ] + α2 p[b2 ]
and parallel to Cl(T, p[b1 ]). Dually, α2 T [b1 , b2 ] + α1 p[b1 ] is the intersection of the
segment Cl(T, p[b1 ]) with the line l1 passing through α1 p[b1 ] + α2 p[b2 ] and parallel
to Cl(T, p[b2 ]). p[α1 b1 + α2 b2 ] is finally the point of the segment
Cl(α1 T + α2 p[b2 ], α2 T + α1 p[b1 ])
with convex coordinate α
\1 D1 (or equivalently α2 D2 ).
\
Location of T [b1 , b2 ] in the binary case As an example, let us consider the loca-
tion of T [b1 , b2 ] in the binary belief space B2 (Figure 10.6), where
mbi (Θ)
β[b1 ] = β[b2 ] = = 1/2
2mbi (Θ)
∀b1 , b2 ∈ B2 and p[b] always commutes with the convex closure operator.
Accordingly,
mb1 (Θ) h mb2 (Θ) i mb2 (Θ)
T [b1 , b2 ](x) = mb2 (x) + + ·
mb1 (Θ) + mb2 (Θ) 2 mb1 (Θ) + mb2 (Θ)
h
m (Θ)
i mb1 (Θ) mb2 (Θ)
· mb1 (x) + b12 = p[b2 ] + p[b1 ].
mb1 (Θ) + mb2 (Θ) mb1 (Θ) + mb2 (Θ)
Looking at Figure 10.6, simple trigonometric considerations show that the segment
Cl(p[bi ], T [b1 , b2 ]) has length √m2itan
(Θ)
φ
, where φ is the angle between the segments
Cl(bi , T ) and Cl(p[bi ], T ).
T [b1 , b2 ] is then the unique point of P such that the angles b1\ T p[b1 ] and b2\
T p[b2 ]
coincide, i.e., T is the intersection of P with the line passing through bi and the
reflection of bj through P.
As this reflection (in B2 ) is nothing but plbj :
T [b1 , b2 ] = Cl(b1 , plb2 ) ∩ P = Cl(b2 , plb1 ) ∩ P.
10.3 The intersection probability 319
Τ[b1,b2]
α2
α1T+α 2 p[b2]
^
α1D1 α1
p[b2]
α1
p[α1b1+α2b2]
^
α2D2 α1
α2T+α 1 p[b1]
α1 p[b1] + α2 p[b2]
α2
α2 l1
p[b1] l2
Fig. 10.5. Behavior of the intersection probability p[b] under affine combination. α2 T +
α1 p[b1 ] and α1 T + α2 p[b2 ] lie on inverted locations on the segments joining T [b1 , b2 ] and
p[b1 ], p[b2 ] respectively: αi p[bi ] + αj T is the intersection of the line Cl(T, p[bi ]) with the
parallel to Cl(T, p[bj ]) passing through α1 p[b1 ] + α2 p[b2 ]. The quantity p[α1 b1 + α2 b2 ] is
finally the point of the segment joining them with convex coordinate α[ i Di .
by=[0,1]'
1
0.9
plb
2
0.8
p[b2 ]
0.7
0.6
b2
0.5
m2(Θ) Τ[b1 ,b2 ]
0.4 φ
0.3 pl b1
0.2 p[b1 ]
0.1
b1
0 bx =[1,0]'
0 0.2 0.4 0.6 0.8 1
m1 (Θ)
Fig. 10.6. Location of the probability function T [b1 , b2 ] in the binary belief space B2 .
function. Theorem 47 states the conditions under which p[b] and convex closure
(Cl) commute.
Geometrically, only when the two lines l1 , l2 in Figure 10.5 are parallel to a(p[b1 ], p[b2 ])
(i.e. T [b1 , b2 ] ∈ Cl(p[b1 ], p[b2 ]), compare above) the desired quantity p[α1 b1 +
α2 b2 ] belongs to Cl(p[b1 ], p[b2 ]) (i.e., it is also a convex combination of p[b1 ] and
p[b2 ]).
Theorem 47 reflects the two complementary interpretations of p[b] we gave in
terms of β[b] and R[b] (Equations (10.13) and (10.16)):
If β[b1 ] = β[b2 ] both belief functions assign to each singleton the same share of
their non-Bayesian contribution. If R[b1 ] = R[b2 ] the non-Bayesian mass 1 − kb̃ is
distributed in the same way to the elements of Θ.
A sufficient condition for the commutativity of p[.] and Cl(.) can be obtained
via the following decomposition of β[b]:
P Pn P
|B|>1 mb (B) k=2 |B|=k mb (B) σ2 + · · · + σn
β[b] = P = Pn P =
|B|>1 m b (B)|B| k=2 k · |B|=k m b (B) 2σ 2 + · · · + nσn
(10.21)
. P
where σk = |B|=k mb (B).
Theorem 48. If the ratio between the total mass of focal elements of different car-
dinality is the same for all the belief functions involved, namely:
σ1l σ2l
= ∀l, m ≥ 2 s.t. σ1m , σ2m 6= 0 (10.22)
σ1m σ2m
Theorem 49. The orthogonal projection π[b] of b onto a(P) can be expressed in
terms of the basic probability assignment mb of b in two equivalent forms:
1 − |A|21−|A|
X X
1−|A|
π[b](x) = mb (A)2 + mb (A) (10.25)
n
A⊃x A⊂Θ
Equation (10.26) shows that π[b] is indeed a probability, since both 1+|Ac |21−|A| ≥
0 and 1 − |A|21−|A| ≥ 0 ∀|A| = 1, ..., n. This is not at all trivial, as π[b] is the
projection of b onto the affine space a(P), and could have in principle assigned
negative masses to one or more singletons.
This make the orthogonal projection a valid probability transform.
5
The proof is valid for A = Θ, ∅ too, see Section 10.5.
322 10 The affine family of probability transforms
Theorem 49 does not provide any clear intuition about the meaning of π[b] in terms
of degrees of belief. In fact, if we process Equation (10.26) we can reduce it to a
new Bayesian b.f. strictly related to the pignistic function.
Theorem 50. The orthogonal projection of b onto P can be decomposed as:
A compelling link can be drawn between orthogonal projection and pignistic func-
tion via the orthogonality flag O[b].
Let us introduce the following two belief functions associated with b:
. 1 X mb (A) . 1 X mb (A)
b|| = bA , b2|| = bA ,
k|| |A|
A⊂Θ
k2||
A⊂Θ
2|A|
where k|| and k2|| are the normalization factors needed to make them admissible.
Theorem 51. O[b] is the relative plausibility of singletons of b2|| ; BetP [b] is the
relative plausibility of singletons of b|| .
10.4 Orthogonal projection 323
Fig. 10.7. Redistribution processes associated with pignistic transformation and orthogonal
projection. In the pignistic transformation (top) the mass of each focal element A is dis-
tributed among its elements. In the orthogonal projection (bottom) instead (through the or-
thogonality flag), the mass of each f.e. A is divided among all its subsets B ⊂ A. In both
cases, the related relative plausibility of singletons yields a Bayesian belief function.
The two functions b|| and b2|| represent two different processes acting on b (see Fig-
ure 10.7). The first one equally redistributes the mass of each focal element among
its singletons (yielding directly the Bayesian belief function BetP [b]). The second
one equally redistributes the b.p.a. of each focal element A to its subsets B ⊂ A
(∅, A included). In this second case we get an unnormalized [1238] b.f. bU :
X mb (B)
mbU (A) = ,
B⊃A
2|B|
Example Let us consider again as an example the belief function b on the ternary
frame Θ = {x, y, z} considered in Section 10.3.2:
To compute the orthogonality flag O[b] we need to apply the redistribution process
of Figure 10.7-bottom to each focal element of b. In this case their masses are di-
vided among their subsets as follows:
324 10 The affine family of probability transforms
m({x, y}) = 0.3 7→ m0 ({x, y}) = m0 (x) = m0 (y) = m0 (∅) = 0.3/4 = 0.075
m({x, z}) = 0.1 7→ m0 ({x, z}) = m0 (x) = m0 (z) = m0 (∅) = 0.1/4 = 0.025
whose sum is the normalization factor kO [b] = mbU (x) + mbU (y) + mbU (z) =
0.4625. After normalisation we get O[b] = [0.405, 0.243, 0.351]0 . The orthogonal
projection π[b] is finally the convex combination of O[b] and P = [1/3, 1/3, 1/3]0
with coefficient kO [b]:
This property can be used to find an alternative expression of the orthogonal pro-
jection as a convex combination of the pignistic functions associated with all the
categorical belief functions.
Lemma 11. The orthogonal projection of a categorical belief function bA is:
|A|>2
π[b]
_ BetP[b]
PA
|A|<3
_
P[bΘ ] = P PΘ
Fig. 10.8. Orthogonal projection π[b] and pignistic function BetP [b] both lie in the simplex
whose vertices are the categorical pignistic functions, i.e., the uniform probabilities with sup-
port on a single event A. However, as the convex coordinates of π[b] are weighted by a factor
kO [bA ] = |A|21−|A| , the orthogonal projection is relatively closer to vertices related to lower
size events.
_
by = P{y}
_ _
P{x,y} P{y,z}
_
PΘ
BetP[b] = π[b]
_ _
bx = P{x} _ bz = P{z}
P{x,z}
Fig. 10.9. Orthogonal projection and pignistic function for the belief function (10.30) on the
ternary frame Θ3 = {x, y, z}.
ter 4, there are situations (‘open world’ scenarios) in which it makes sense to work
with unnormalized belief functions (u.b.f.) [1238], namely belief functions admit-
ting non-zero support mb (∅) 6= 0 for the empty set [1224]. The mass mb (∅) of
the empty set is an indicator of the amount of internal conflict carried by a belief
function b, but can also be interpreted as the chance that the existing frame of dis-
cernment does not exhaust all the possible outcomes of the problem.
Unnormalized b.f.s are naturally associated with vectors with N = 2|Θ| coordi-
nates. A coordinate frame of basis u.b.f.s can be defined as follows:
{bA ∈ RN , ∅ ⊆ A ⊆ Θ},
.
this time including a vector b∅ = [1 0 · · · 0]0 . Note also that in this case bΘ =
[0 · · · 0 1]0 is not the null vector.
It is natural to wonder whether the above definitions and properties of p[b] and
π[b] hold their validity. Let us consider again the binary case. We now have to use
four coordinates, associated with all the subsets of Θ: ∅, {x}, {y}, and Θ itself.
Remember that for unnormalised belief functions:
X
b(A) = mb (B) A 6= ∅,
∅(B⊆A
i.e., the contribution of the empty set is not considered when computing the belief
value of an event A 6= ∅ 6 . The four-dimensional vectors corresponding to basis
belief and plausibility functions, respectively, are therefore:
b∅ = [1, 0, 0, 0]0 , pl∅ = [0, 0, 0, 0]0 ,
bx = [0, 1, 0, 1]0 , plx = [0, 1, 0, 1]0 = bx ,
by = [0, 0, 1, 1]0 , ply = [0, 0, 1, 1]0 = by ,
bΘ = [0, 0, 0, 1]0 , plΘ = [0, 1, 1, 1]0 .
A striking difference with the ‘classical’ case is that b(Θ) = 1 − mb (∅) = plb (Θ)
which implies that both belief and plausibility spaces are not in general subsets of
the section {v ∈ RN : vΘ = 1} of RN . In other words, u.b.f.s and u.pl.f.s are not
normalized sum functions (n.s.f.s, Section 6.3.3).
As a consequence, the line a(b, plb ) is not guaranteed to intersect the affine space
P 0 of the Bayesian n.s.f.s.
Consider for instance the line connecting b∅ and pl∅ in the binary case:
α b∅ + (1 − α) pl∅ = α [1, 0, 0, 0]0 , α ∈ R.
α · [1, 0, 0, 0]0 ∈ P 0 .
Simple calculations show that in fact a(b, plb ) ∩ P 0 6= ∅ iff b(∅) = 0 (i.e. b is
‘classical’) or (trivially) b ∈ P. This is true in the general case.
6
In the unnormalized case the notation b is usually reserved for implicability functions,
while belief functions are denoted by Bel [1223]. In this book however, as the notation
Bel would be impractical when used for vectors, we denote both belief measures and their
vectors by b.
328 10 The affine family of probability transforms
Proposition 40. The intersection probability is well defined for classical belief
functions only.
It is interesting to note that, however, the orthogonality results of Section 10.2.1 are
still valid since Lemma 9 does not involve the empty set, while the proof of Theorem
42 is valid for the components A = ∅, Θ too (as by − bx (A) = 0 for A = ∅, Θ).
Therefore:
Proposition 41. The dual line a(b, plb ) is orthogonal to P for each unnormalised
belief function b, although ς[b] = a(b, plb ) ∩ P 0 exists if and only if b is a classical
belief function.
Analogously, the orthogonality condition (10.24) is not affected by the mass of the
empty set. The orthogonal projection π[b] of a u.b.f. b is then well defined (check
Theorem 49’s proof), and it is still given by Equations (10.25),(10.26), with the
caveat that the summations on the right hand side include ∅ as well:
1 − |A|21−|A|
X X
π[b](x) = mb (A)21−|A| + mb (A)
n
A⊃x ∅⊆A⊂Θ
c 1−|A|
1 − |A|21−|A|
X 1 + |A |2 X
π[b](x) = mb (A) + mb (A) .
n n
A⊃x ∅⊆A6⊃x
This is the case for binary frames, in which all belief functions meet the conditions
of both Proposition 42 and Proposition 43. As a result, p[b] = BetP [b] = π[b] for
all the b.f.s defined on Θ = {x, y} (see Figure 8.1 again).
More stringent conditions can however be formulated in terms of equal distribu-
tion of masses among focal elements.
Theorem 54. If a belief function b is such that its mass is equally distributed among
focal elements of the same size, namely ∀k = 2, ..., n:
then its pignistic and intersection probabilities coincide: BetP [b] = p[b].
Condition (10.32) is sufficient to guarantee the equality of intersection probabil-
ity and orthogonal projection too.
Theorem 55. If a belief function b meets condition (10.32) (i.e., its mass is equally
distributed among focal elements of the same size) then the related orthogonal pro-
jection and intersection probability coincide.
In the special case of a ternary frame π[b] = BetP [b] [267], so that checking
whether p[b] = BetP [b] is equivalent to check the analogous condition for the
pignistic function. One can prove that [267]:
Proposition 44. For belief functions b defined on a ternary frame, the Lp distance
kp[b] − BetP [b]kp between intersection probability and pignistic function in the
probability simplex has three maxima, corresponding to the three b.f.s with basic
probability assignment:
√ √ 0
mb1 = [0, 0, 0, 3 − 6, √ 0, 0, √6 − 2]0 ,
mb2 = [0, 0, 0, 0, 3 − 6, √0, √6 − 2] ,
mb3 = [0, 0, 0, 0, 0, 3 − 6, 6 − 2]0
Proposition 44 opens the way to a more complete quantitative analysis of the differ-
ences between the intersection probability and the other Bayesian transforms of the
same family.
Proof of Theorem 42
Having denoted as usual by xA the A-th axis of the orthonormal reference frame
{xA : ∅ ( A ( Θ} in RN −2 (see Chapter 6, Section 6.1), we can write the
difference b − plb as:
330 10 The affine family of probability transforms
X
plb − b = [plb (A) − b(A)]xA ,
∅(A(Θ
where:
[plb − b](Ac ) = plb (Ac ) − b(Ac ) = 1 − b(A) − b(Ac )
(10.33)
= 1 − b(Ac ) − b(A) = plb (A) − b(A) = [plb − b](A).
The scalar product h·, ·i between the vector plb − b and any arbitrary basis vector
by − bx of a(P) is therefore:
X
hplb − b, by − bx i = [plb − b](A) · [by − bx ](A),
∅(A(Θ
Proof of Theorem 43
P
The numerator of Equation (10.6) is trivially |B|>1 m(B). On the other hand:
X X X
1 − b(xc ) − b(x) = mb (B) − mb (B) − mb (x) = mb (B),
B⊂Θ B⊂xc B⊃x,B6=x
yielding (10.10). Equation (10.9) comes directly from (10.5) when we recall that
b(x) = mb (x), ς(x) = mς (x) ∀x ∈ Θ.
Proof of Theorem 44
By definition (10.7) ς[b] reads in terms of the reference frame {bA , A ⊂ Θ} as:
X X X
mb (A)bA + β[b] µb (A)bA − mb (A)bA =
A⊂Θ X A⊂Θ A⊂Θ
= bA mb (A) + β[b](µb (A) − mb (A))
A⊂Θ
10.6 Comparisons within the affine family 331
since µb (.) is the Moebius inverse of plb (.). For ς[b] to be a Bayesian belief function,
accordingly, all the components related to non-singleton subsets need to be zero,
mb (A) + β[b](µb (A) − mb (A)) = 0 ∀A : |A| > 1.
This condition reduces to (after recalling expression (10.10) of β[b]):
X X
µb (A) mb (B) + mb (A) mb (B)(|B| − 1) = 0 ∀A : |A| > 1. (10.34)
|B|>1 |B|>1
or, equivalently:
[mb (A) + µb (A)]M1 [b] + mb (A)M2 [b] = 0 ∀A : |A| > 1, (10.35)
. X . X
after defining: M1 [b] = mb (B), M2 [b] = mb (B)(|B| − 2). Clearly:
|B|>1 |B|>2
(since mb (B) = 0 ∀B ⊃ A, |B| > 2) so that µb (A)+mb (A) = 0 and the constraint
is again met.
Finally, as the coordinate β[b] of ς[b] on the line a(b, plb ) can be rewritten as a
function of M1 [b] and M2 [b] as follows:
M1 [b]
β[b] = , (10.36)
M2 [b] + 2M1 [b]
b+plb
if M2 = 0 then β[b] = 1/2 and ς[b] = 2 .
332 10 The affine family of probability transforms
Proof of Theorem 46
Proof of Theorem 47
T [b1 , b2 ](D1 + D2 ) = p[b1 ]D2 + p[b2 ]D1 ≡ T [b1 , b2 ] = D̂1 p[b2 ] + D̂2 p[b1 ],
as α1 Dα11+α
α2
2 D2
is always non-zero in non-trivial cases. This is equivalent to (after
replacing the expressions for p[b] (10.16) and T [b1 , b2 ] (10.19))
Obviously this is true iff β[b1 ] = β[b2 ] or the second factor is zero, i.e.
Proof of Theorem 48
(2σ22 + ... + nσ2n )(σ12 + ... + σ1n ) = (2σ12 + ... + nσ1n )(σ22 + ... + σ2n ).
Let us assume that there exists a cardinality k such that σ1k 6= 0 6= σ2k . We can then
divide the two sides by σ1k and σ2k , obtaining
σ2 σ n σ12 σn
2 2k + .. + k + ... + n 2k k
+ .. + 1 + ... + 1k =
σ σ2 σ1 σ1
2σ 2 σ n 2
σ σ2n
= 2 1k + .. + k + ... + n k11 2
+ .. + 1 + ... + .
σ1 σ1 σ2k σ2k
Therefore, if σ1j /σ1k = σ2j /σ2k ∀j 6= k the condition β[b1 ] = β[b2 ] is met. But this
is equivalent to (10.22).
Proof of Lemma 10
When the vector v in (10.23) is a belief function (vA = b(A)) we have that:
X X X X
b(A) = mb (B) = mb (B)2n−1−|B∪{y}| ,
A⊃y,A6⊃x A⊃y,A6⊃x B⊂A B⊂{x}c
Now, sets B ⊂ {x, y}c appear in both summations, with the same coefficient (since
|B ∪ {x}| = |B ∪ {y}| = |B| + 1).
After simplifying the common factor 2n−2 we get (10.24).
Proof of Theorem 49
Finding the orthogonal projection π[b] of b onto a(P) is equivalent to imposing the
condition hπ[b] − b, by − bx i = 0 ∀y 6= x. Replacing the masses of π − b
π(x) − mb (x), x ∈ Θ
−mb (A), |A| > 1
into Equation (10.24) yields, after extracting the singletons x from the summation,
the following system of equations:
10.6 Comparisons within the affine family 335
X
mb (A)21−|A| + mb (y)+
π(y) = π(x) +
A⊃y,A6⊃x,|A|>1
X
mb (A)21−|A| ∀y 6= x
−mb (x) − (10.39)
A⊃x,A6⊃y,|A|>1
X
π(y) = 1.
y∈Θ
− mb (A)21−|A| = 1,
A⊃x,A6⊃y,|A|>1
as all the events A not containing x do contain some y 6= x, and they are counted |A|
times (i.e. once for each element they contain). As for the last addendum, instead:
X X X
mb (A)21−|A| = mb (A)21−|A| (n − |A|)
y6=x A)x,A6⊃y A⊃x,1<|A|<n
X
= mb (A)21−|A| (n − |A|)
A)x
Proof of Theorem 50
i.e. kO [b] is the normalization factor for Ō[b], the function (10.27) is a Bayesian
belief function, and we can write, as desired, (since P(x) = 1/n):
π[b] = (1 − kO [b])P + kO [b]O[b].
Proof of Theorem 51
X 1 X mb (A) 1
plb|| (x) = mb|| (A) = = BetP [b](x)
k|| |A| k||
A⊃x A⊃x
and since
P ˜ (x) = BetP [b](x).
BetP [b](x) = 1, pl
x b||
10.6 Comparisons within the affine family 337
Proof of Theorem 52
mb (A)21−|A| . Hence:
P
and Ō[b](x) = A⊃x
X
α1 mb1 (A) + α2 mb2 (A) |A|21−|A|
kO [α1 b1 + α2 b2 ] =
A⊂Θ
=α1 kO [b1 ] + α2 kO [b2 ],
X
α1 mb1 (A) + α2 mb2 (A) 21−|A|
Ō[α1 b1 + α2 b2 ](x) =
A⊃x
= α1 Ō[b1 ] + α2 Ō[b2 ],
Proof of Theorem 53
By Theorem 52
hX i X
π[b] = π mb (A)bA = mb (A)π[bA ]
A⊂Θ A⊂Θ
Proof of Theorem 54
Proof of Theorem 55
The orthogonal projection of a belief function b on the probability simplex P has
the following expression [267] (Equation 10.26):
1 + |Ac |21−|A| 1 − |A|21−|A|
X X
π[b](x) = mb (A) + mb (A) .
n n
A⊇{x} A6⊃{x}
k
P
where again A⊇{x},|A|=k mb (A) = σ k/n, while
n−1
X (n − 1)! k!(n − k)! n−k
mb (A) = σ k nk = σ k = σk .
k
k!(n − k − 1)! n! n
A6⊃{x},|A|=k
i.e., the value (10.41) of the intersection probability under the same assumptions.
The epistemic family of probability
transforms
11
We have seen in Chapter 4 that a decision-based approach to probability transfor-
mation is the foundation of Smets’ ‘Transferable Belief Model’ [1231, 1276]. In the
TBM, Smets abandons all notions of multivalued mapping to define belief directly
in terms of basis belief assignments (‘credal’ level), while decisions are made via
the pignistic probability (4.15)
X mb (A)
BetP [b](x) = ,
|A|
A⊇{x}
339
340 11 The epistemic family of probability transforms
combination, and are therefore considered by some scholars as more consistent with
the original Dempster-Shafer framework. We show that the relative plausibility and
relative belief transforms belong to this group, which we call the ‘epistemic family.
of probability transforms.
Chapter content
b ↔ plb
˜
pl ↔ b̃
b
˜ ⊕ p ∀p
b ⊕ p = pl ↔ plb ⊕ p = b̃ ⊕ p ∀p
b
˜ [b1 ⊕ b2 ] = pl
pl ˜ [b1 ] ⊕ pl
˜ [b2 ] ↔ b̃[plb ⊕ plb ] = b̃[plb ] ⊕ b̃[plb ].
b b b 1 2 1 2
stressing the issue of its applicability (Section 11.4). Even though this situation is
‘singular’ (in the sense that it excludes most belief and probability measures, Section
11.4.1), in practice the situation in which the mass of all singletons is nil is not so
uncommon. However, in Section 11.4.2 we point out that relative belief is only a
member of a class of relative mass transformations, which can be interpreted as
11.1 Rationale of epistemic transforms 341
low-cost proxies for both plausibility and pignistic transforms (11.4.3). We discuss
their applicability as approximate transformations in two significant scenarios.
The second part of the Chapter is devoted to the study of the geometry of epis-
temic transforms, in both the space of all pseudo belief functions (Section 11.5), in
which the belief space is embedded, and the probability simplex (Section 11.6).
Indeed, the geometry of relative belief and plausibility can be reduced to that of
two specific pseudo belief functions called ‘plausibility of singletons’ (11.14) and
‘belief of singletons’ (11.16), which are introduced in Sections (11.5.1) and (11.5.2)
respectively. Their geometry can be described in terms of three planes (11.5.3) and
angles (11.5.4) in the belief space. Such angles are, in turn, related to a probability
distribution which measures the relative uncertainty on the probabilities of single-
tons determined by b, and can be considered as the third P member of the epistemic
family of transformations. As b̃ does not exist when x mb (x) = 0, this singular
case needs to be discussed separately (Section 11.5.5).
Several examples illustrate the relation between the geometry of the involved func-
tions and their properties in terms of degrees of belief.
As probability transforms map belief functions onto probability distributions, it
makes sense to study their behavior in the simplex of all probabilities as well. We
will get some insight on this in Section 11.6, at least in the case study of a frame of
size 3.
Finally, as a step towards a complete understanding of the probability transfor-
mation problem, we discuss (Section 11.7) what we learned about the relationship
between the affine and epistemic families of probability transformations. Inspired
by the binary case study, we provide sufficient conditions under which all trans-
forms coincide, in terms of equal distribution of masses and equal contribution to
the plausibility of the singletons.
As we well know by now, the original semantics of belief functions derive from
Dempster’s analysis of the effect of multi-valued mappings Γ : Ω → 2Θ , x ∈ Ω 7→
Γ (x) ⊆ Θ on evidence available in the form of a probability distribution on the
‘top’ domain Ω on the ‘bottom’ decision set Θ (Section ??). As such, belief values
are probabilities of events implying other events.
In some of his papers [346], however, Dempster himself claimed that the mass
mb (A) associated with a non-singleton event A ⊆ Θ could be understood as a ‘float-
ing probability mass’ which could not be attached to any particular singleton event
x ∈ A because of the lack of precision of the (multi-valued) operator that quantify
our knowledge via the mass function. This has originated a popular but controver-
sial interpretation of belief functions as coherent sets of probabilities determined by
sets of lower and upper bounds on their probability values (Section 3.1.4).
As Shafer admits in [?], there is a sense in which a single belief function can indeed
be interpreted as a consistent system of probability bounds. However, the issue with
342 11 The epistemic family of probability transforms
of mass from focal elements to singletons, as the basic probabiloity of the same
higher cardinality event is assigned to different singletons;
– the obtained plausibility values plb (x) are nevertheless normalized to yield a for-
mally admissible probability distribution.
Similarly, for the relative belief of singletons (4.18):
– for each singleton x ∈ Θ a mass reassignment strategy is selected in which only
the mass of {x} itself is re-assigned to x, yielding {b(x) = mb (x), x ∈ Θ};
– once again this scenario does not correspond to a single valid redistribution pro-
cess, as the mass of all higher-size focal elements is not assigned to any single-
tons;
– the obtained values b(x) are nevertheless normalized to produce a valid probabil-
ity.
The fact that both such probability transforms come from jointly assuming a
number of incompatible redistribution processes is reflected by the fact that the re-
sulting probability distributions are not guaranteed to belong to the set of probabili-
ties (3.10) consistent with b.
Theorem 56. The relative belief of singletons of a belief function b is not always
consistent with b.
Theorem 57. The relative plausibility of singletons of a belief function b is not al-
ways consistent with b.
Clearly this is not a good representative of the set of probabilities consistent with the
above belief function, as it does not contemplate at all the chance the heirs x2 , ..., xn
have to gain a remarkable amount of money.
Indeed, according to Theorem 56, (11.3) is not at all consistent with (11.2).
344 11 The epistemic family of probability transforms
The relative belief of singletons meets similar dual properties. Their study, however,
requires to extend the analysis to normalised sum functions (also called ‘pseudo
belief functions’, cfr. Section 6.3.3).
3
The original statements from [196] have been reformulated according to the notation used
in this Book.
346 11 The epistemic family of probability transforms
A direct consequence of the duality between belief and plausibility measures is the
existence of a striking symmetry between (relative) plausibility and belief transform.
A formal proof of this symmetry is based on the following interesting property of
the basic plausibility assignment µb (8.1) [260].
P
Lemma 12. A⊇{x} µb (A) = mb (x).
Theorem 58. Given a pair of belief/plausibility functions b, plb : 2Θ → [0, 1], the
relative belief transform of the belief function b coincides with the plausibility trans-
form of the associated plausibility function plb (interpreted as a pseudo belief func-
tion):
˜ b ].
b̃[b] = pl[pl
The symmetry between relative plausibility and relative belief of singletons is bro-
ken by the fact that the latter is not defined for belief functions with no singleton
focal sets. Since b̃ is itself an instance of relative plausibility (of a plausibility func-
˜ always exists, this fact seems to contradict Theorem 58.
tion plb ), and pl b
This seeming paradox can be explained by the combinatorial nature of belief,
plausibility, and commonality functions. As we provedP in Chapter 8 [260], while
belief measures are sum functions of the form b(A) = B⊂A m(B) whose Moe-
bius transform m is both normalized and non-negative, plausibility measures are
sum functions whose Moebius transform µ is not necessarily non-negative (com-
monality functions are not even normalized).
As a consequence, the quantity
X X X X
plplb (x) = µb (A) = µb (A)|A|
x x A⊇{x} A⊇Θ
˜
can be equal to zero, in which case pl plb = b̃ does not exist.
The duality between b̃ and pl ˜ (albeit to some extent imperfect) extends to the pair
b
of transformations’ behavior with respect to Dempster’s rule of combination (2.6).
We have seen in Chapter 7, Section 7.1, that the orthogonal sum can be natu-
rally extended to a pair ς1 , ς2 of pseudo belief functions (p.b.f.s) [265], by simply
applying (2.6) to their Moebius inverses mς1 , mς2 .
Proposition 47. Dempster’s rule defined as in Equation (2.6) when applied to a
pair of pseudo belief functions ς1 , ς2 yields again a pseudo belief function.
We still denote the orthogonal sum of two p.b.f.s ς1 , ς2 by ς1 ⊕ ς2 .
As plausibility functions are pseudo b.f.s, Dempster’s rule can then be formally
applied to them too. It is convenient to introduce a dual form of the relative be-
lief operator, mapping a plausibility function to the corresponding relative belief of
singletons: b̃ : PL → P, plb 7→ b̃[plb ], where
11.2 Dual properties of epistemic transforms 347
. mb (x)
b̃[plb ](x) = P ∀x ∈ Θ (11.5)
y∈Θ mb (y)
P
is defined as usual for b.f.s b such that y mb (y) 6= 0.
Indeed, as b and plb are in 1-1 correspondence, we can indifferently define an op-
erator mapping a belief function b to its relative belief b̃, or mapping the unique
plausibility function plb associated with b to b̃.
The following commutativity theorem follows, as the dual of point 1) in Propo-
sition 45.
Theorem 59. The relative belief operator commutes with respect to Dempster’s
combination of plausibility functions:
To check the validity of Theorems 59 and 60 let us analyse the two series of proba-
bility measures (b̃[plb ])n and b̃[(plb )n ].
By applying Dempster’s rule to the b.pl.a. (11.8) (plb2 = plb ⊕ plb ) we get a new
b.pl.a. µ2b with values µ2b (x) = 4/7, µ2b (y) = 8/7, µ2b (z) = 4/7, µ2b (w) = −1/7,
µ2b ({x, y}) = −4/7, µ2b ({y, z}) = −4/7 (see Figure 11.1). To compute the corre-
Fig. 11.1. Intersection of focal elements in Dempster’s combination of the b.pl.a. (11.8) with
itself. Non-zero mass events for each addendum µ1 = µ2 = µb correspond to rows/columns
of the table, each entry of the table hosting the related intersection.
sponding relative belief b̃[plb2 ] we first need to get the plausibility values
plb2 ({x, y, z}) = µ2b (x) + µ2b (y) + µ2b (z) + µ2b ({x, y}) + µ2b ({y, z}) = 8/7,
plb2 ({x, y, w}) = 1, plb2 ({x, z, w}) = 1, plb2 ({y, z, w}) = 1
.
which imply (as, by definition, plb (A) = 1 − b(Ac )): b2 (w) = −1/7, b2 (z) =
b2 (y) = b2 (x) = 0. Therefore: b̃[plb2 ] = [0, 0, 0, 1]0 (representing probability distri-
butions as vectors of the form [p(x), p(y), p(z), p(w)]0 ).
Theorem 59 is confirmed as, by (11.7) (being {w} the only singleton with non-
zero mass), b̃ = [0, 0, 0, 1]0 so that b̃ ⊕ b̃ = [0, 0, 0, 1]0 and b̃[.] commutes with plb ⊕.
By combining plb2 with plb one more time we get the b.pl.a.
µ3b (x) = 16/31, µ3b (y) = 32/31, µ3b (z) = 16/31, µ3b (w) = −1/31,
µ3b ({x, y}) = −16/31, µ3b ({y, z}) = −16/31
which corresponds to plb3 ({x, y, z}) = 32/31, plb3 ({x, y, w}) = 1, plb3 ({x, z, w}) =
1, plb3 ({y, z, w}) = 1. Therefore: b3 (w) = −1/31, b3 (z) = b3 (y) = b3 (x) = 0,
and b̃[plb3 ] = [0, 0, 0, 1]0 which again is equal to b̃ ⊕ b̃ ⊕ b̃ as Theorem 59 guarantees.
The series of basic plausibility assignments (µb )n clearly converges to:
µnb (x) → 1/2+ , µ3b (y) → 1+ , µ3b (z) → 1/2+ , µ3b (w) → 0− ,
µb ({x, y}) → −1/2− ,
3
µb ({y, z}) → −1/2− ,
3
11.2 Dual properties of epistemic transforms 349
associated with the following plausibility values: limn→∞ plbn ({x, y, z}) = 1+ ,
plbn ({x, y, w}) = plbn ({x, z, w}) = plbn ({y, z, w}) = 1 ∀n ≥ 1. These correspond
to the following values of belief of singletons: limn→∞ bn (w) = 0− , bn (z) =
bn (y) = bn (x) = 0 ∀n ≥ 1, so that:
n
limn→∞ b̃[plb∞ ](w) = limn→∞ bbn (w)
(w) = 1,
limn→∞ b̃[plb∞ ](x) = limn→∞ b̃[plb∞ ](y) = limn→∞ b̃[plb∞ ](z)
= limn→∞ bn 0(w) = limn→∞ 0 = 0,
A dual of the representation theorem (Proposition 46) for the relative belief trans-
form can also be proven, once we recall the following result on Dempster’s sum of
affine combinations [265] (cfr. Chapter 7, Theorem (15)).
Proposition 48. The orthogonal sum b ⊕ i αi bi , i αi = 1 of a b.f. b with any4
P P
affine combination of belief functions is itself an affine combination of the partial
sums b ⊕ bi X X
b⊕ αi bi = γi (b ⊕ bi ), (11.9)
i i
where γi = Pαi k(b,bi ) and k(b, bi ) is the normalization factor of the partial
j αj k(b,bj )
Dempster’s sum b ⊕ bi .
Again, the duality between b̃ and pl˜ suggests that the relative belief of single-
b
tons represent the associated plausibility function plb , rather than the corresponding
belief function b: b̃ ⊕ p 6= b ⊕ p.
Theorem 61. The relative belief of singletons b̃ perfectly represents the correspond-
ing plausibility function plb when combined with any probability through (extended)
Dempster’s rule:
b̃ ⊕ p = plb ⊕ p
for all Bayesian belief functions p ∈ P.
˜
Theorem 61 can be obtained from Proposition 46 by replacing b with plb and pl b
with b̃ in virtue of their duality.
The main reason for that is that the plausibility function of a sum of two belief
functions is not the sum of the associated plausibilities:
b ↔ plb
˜
plb ↔ b̃
˜ ⊕ p ∀p
b ⊕ p = pl ↔ plb ⊕ p = b̃ ⊕ p ∀p
b
˜ [b1 ⊕ b2 ] = pl
pl ˜ [b1 ] ⊕ pl
˜ [b2 ] ↔ b̃[plb1 ⊕ plb2 ] = b̃[plb1 ] ⊕ b̃[plb2 ]
b b b
˜ ⊕ pl[b]
b ⊕ b = b ` pl[b] ˜ = pl[b] ˜ ↔ plb ⊕ plb = plb ` b̃[plb ] ⊕ b̃[plb ] = b̃[plb ].
Note that, just as Voorbraak’s and Cobb’s results are not valid for all pseudo belief
functions but only for proper b.f.s., the above dual results do not hold for all pseudo
belief functions either, but only for those p.b.f.s which are plausibility functions.
These results bring about a classification of all probability transformations in
two families related to Dempster’s sum and affine combination, respectively.
The notion that there exist two distinct families of probability transformations, each
determined by the operator they commute with, was already implicitly present in
the literature. Smets’ linearity axiom [1276], which lays at the foundation of the
pignistic transform, obviously corresponds (even though expressed in a somewhat
different language) to the commutativity with affine combination of belief func-
tions. To address the criticism such axiom was subject to, Smets introduced later a
formal justification based on an expected utility argument in the presence of con-
ditional evidence [1223]. On the other hand, Cobb and Shenoy argued in favour of
the commutativity with respect of Dempster’s rule, on the basis that the Dempster-
Shafer theory of evidence is a coherent framework of which Dempster’s rule is an
integral part, and that a Dempster-compatible transformation can provide a useful
probabilistic semantic for belief functions.
11.4 Generalizations of the relative belief operator 351
Incidentally, there seems to be a flaw in Smets’ argument that the pignistic trans-
form is uniquely determined as the probability transformation which commutes with
affine combination: in [267] and Chapter 10 we indeed proved that the orthogonal
transform (Section 10.4) also enjoys the same property.
Analogously, we showed here that the plausibility transform is not unique as a prob-
ability transformation which commutes with ⊕ (even though, in this latter case, the
transformation is applied to different objects).
where
αkpl1 αkpl2
β1 = , β2 = .
αkpl1 + (1 − α)kpl2 αkpl1 + (1 − α)kpl2
It follows that:
Theorem 62. The relative plausibility operator commutes with convex closure in
˜
the belief space: pl[Cl(b ˜ ˜
1 , ..., bk )] = Cl(pl[b1 ], ..., pl[bk ]).
The behavior of the plausibility transform, in this respect, is similar to that of Demp-
ster’s rule (Theorem 6, [265]), supporting the argument that the plausibility trans-
form is indeed naturally associated with the D-S framework.
Let us first consider the set of belief functions for which a relative belief of single-
tons does not exist. In the binary case Θ = {x, y}, the existence constraint (11.1)
implies that the only belief function which does not admit relative belief of single-
tons is the vacuous one bΘ : mbΘ (Θ) P = 1. Indeed, for the vacuous belief function
there, mbΘ (x) = mbΘ (y) = 0 so that x mbΘ (x) = 0 and b̃Θ does not exist. Sym-
metrically, the pseudo b.f. ς = plbΘ (for which plbΘ (x) = plbΘ (y) = 1) is such that
˜
plplbΘ = bΘ , so that pl plbΘ does not exist either.
Figure 11.2-left illustrates the geometry of the relative belief operator in the binary
case - the dual singular points bΘ , ς = plbΘ are highlighted.
b y =[0,1]'=pl b y
plb =[1,1]' by
Θ
P
_ _
P{x,y} P{y,z}
_
~ P
b
b
bx _ bz
bΘ=[0,0]' b x =[1,0]'=pl bx P{x,z}
Fig.
h 11.2. Left: The location
i0 of the relative belief of singletons b̃ =
mb (x) mb (y)
,
mb (x)+mb (y) mb (x)+mb (y)
associated with an arbitrary belief function b on {x, y}
is shown. The singular points bΘ = [0, 0]0 and plbΘ = [1, 1]0 are marked by small circles.
Right: The images
P under pignistic function and relative plausibility of the subset of belief
functions {b : x mb (x) = 0} span only a proper subset of the probability simplex. This
region is shown here in the ternary case Θ = {x, y, z} (the triangle delimited by dashed
lines).
The analysis of the binary case shows that the set of belief functions for which
b̃ does not exist is a lower-dimensional subset of the belief space B. To support
this point, we determine here the region spanned by the most common probability
transformations: the plausibility and the pignistic transforms.
Theorem 62 proves that the plausibility transform commutes with convex closure.
As (by Proposition 48, [267]) the pignistic transform (4.15) commutes with affine
combination, we have that BetP also commutes with Cl:
To determine the image under both probability transforms of any convex set Cl(b1 , ..., bk )
of belief functions it is then sufficient to compute the images of its vertices.
.
The space of all belief functions B = {b : 2Θ → [0, 1]}, in particular, is the
convex closure of all the categorical b.f.s bA : B = Cl(bA , A ⊆ Θ) [244] (cfr. Theo-
rem 11). The image of a categorical b.f. bA (a vertex of B) under either plausibility
or pignistic transform is:
P 1
m (B)
|A| x ∈ A =. X mb (B)
˜ (x) = P B⊇{x} bA
pl = PA = A
bA
B⊇{x} m bA
(B)|B| 0 else |B|
B⊇{x}
Pignistic and relative plausibility transform span the whole probability simplex P.
Consider, however, the set of (singular) b.f.s which assign zero
P mass to single-
tons. They live in Cl(bA , |A| > 1), as they have the form b = |A|>1 mb (A)bA ,
P
with mb (A) ≥ 0, |A|>1 mb (A) = 1.
The region of P spanned by their probability transforms is therefore:
˜
pl[Cl(b ˜
A , |A| > 1)] = Cl(plbA , |A| > 1) = Cl(P A , |A| > 1)
If (11.1) is not met, both probability transforms span only a limited region of the
probability simplex. In the case of a ternary frame this yields the triangle:
One may argue that although the ‘singular’ case concerns only a small fraction of
all belief and probability measures, in many practical application there is a bias to-
wards some particular models which are the most exposed to the problem.
For example, uncertainty is often represented using a fuzzy membership function
[725]. If the membership function has only a finite number of values, then it is
equivalent to a belief function whose focal sets are linearly ordered under set inclu-
sion A1 ⊆ · · · ⊆ An = Θ, |Ai | = i, or ‘consonant’ belief function (see Chapter 2,
[1149, 411]). In consonant b.f.s at most one focal element A1 is a singleton, hence
most information is stored in the non-singleton focal elements.
This train of thoughts leads to the realization that the relative belief transform is
merely one representative of an entire family of probability transformations. Indeed,
it can be thought of as the probability transformation which, given a b.f. b:
354 11 The epistemic family of probability transforms
1. retains the focal elements of size 1 only, yielding an unnormalized belief func-
tion;
2. computes (indifferently) the latter’s relative plausibility/pignistic transforma-
tion:
P P mb (A)
A⊇x,|A|=1 mb (A) mb (x) A⊇x,|A|=1 |A|
b̃(x) = P P = =P P mb (A)
.
y A⊇x,|A|=1 mb (A) kmb
y A⊇x,|A|=1 |A|
i.e., the very same result. The following natural extension of the relative belief op-
erator is then well defined.
Definition 81. Given any belief function b : 2Θ → [0, 1] with basic probability
assignment mb , we call relative mass transformation of level s the transform M̃s [b]
which maps b to the probability distribution (11.10).
We denote by m̃s the output of the relative mass transform of level s.
This yields the following convex decomposition of the relative plausibility of sin-
gletons into relative mass probabilities m̃s :
P
˜ (x) = Pplb (x) = P s plb (x; s)
X plb (x; s) X plb (x; s) skb,s
pl b = P = P
plb (y) r rkb,r r rkb,r skb,s r rkb,r
Xy s s
= αs m̃s (x),
s
(11.11)
plb (x;s)
as m̃s (x) = skb,s . The coefficients
skb,s X
αs = P ∝ skb,s = plb (y; s)
r rkb,r y
of the convex combination measure for each level s the total plausibility contribution
of the focal elements of size s.
In the case of the pignistic probability we get:
X mb (A) X 1 X
BetP [b](x) = = mb (A)
|A| s
s
A⊇{x} A⊇{x},|A|=s
X1 (11.12)
X plb (x; s) X
= plb (x; s) = kb,s = kb,s m̃s (x),
s
s s
skb,s s
with coefficients βs = kb,s measuring for each level s the mass contribution of the
focal elements of size s.
to be stored, while all the others can be dropped without further processing.
We can think of two natural criteria for such an approximation of pl, ˜ BetP via
the relative mass transforms.
356 11 The epistemic family of probability transforms
– (C1) we retain the component s whose coefficient αs /βs is the largest in the con-
vex decomposition (11.11)/(11.12);
– (C2) we retain the component associated with the minimal size focal elements.
Clearly,
P the second criterion delivers the classical relative belief transform whenever
x m b (x) 6= 0. When the mass of singletons is nil, instead, (C2) amounts to a
natural extension of the relative belief operator:
P
ext . A⊇{x}:|A|=min mb (A)
b̃ (x) = P . (11.13)
|A|min A⊆Θ:|A|=min mb (A)
The two approximation criteria favour different aspects of the original belief func-
tion. (C1) focuses on the strength of the evidence carried by focal elements of equal
size. Note that the optimal C1 approximations of plausibility or pignistic transform
are in principle distinct:
˜ = arg max skb,s ,
ŝ[pl] ŝ[BetP ] = arg max kb,s .
s s
The optimal approximation for the pignistic probability will not necessarily be the
best approximation of the relative plausibility of singletons as well.
(C2) favors instead the precision of the pieces of evidence involved. Let us compare
these two approaches in two simple scenarios.
Fig. 11.3. Left: the original belief function in Scenario 1. Right: corresponding profile of both
relative plausibility of singletons and pignistic probability.
Now, according to criterion (C1), the best approximation (among all relative
mass transforms) of both pl ˜ and BetP [b] is given by selecting the focal element
b
of size n, i.e., Θ, as the greatest contributor to both the convex sums (11.11) and
(11.12). However, it is easy to see that this yields as an approximation the uniform
probability p(w) = 1/n, which is the least informative probability distribution.
In particular, the fact that the available evidence supports to a limited extent the
singletons x, y and z is completed discarded, and no decision is possible.
If, on the other hand, we operate according to criterion (C2), we end up selecting
the size-2 focal elements A and B. The resulting approximation is:
m̃2 (x) ∝ mb (A), m̃2 (y) ∝ mb (A) + mb (B), m̃2 (z) ∝ mb (B),
m̃2 (w) = 0 ∀w 6= x, y, z. This mass assignment has the same profile as that of
˜ or BetP [b] (Figure 11.3-right): any decision made according to the latter will
pl b
correspond to that made on the basis of pl ˜ or BetP [b].
b
In a decision-making sense, therefore, m̃2 = b̃ext is the most correct approximation
of both plausibility and pignistic transforms. We end up making the same decisions,
at a much lower (in general) computation cost.
Scenario 2 Consider now a second scenario, involving a belief function with only
two focal elements A and B, with |A| > |B| and mb (A) mb (B) (Figure 11.4-
left). Both relative plausibility and pignistic probability have the following values:
˜ (w) = BetP (w) ∝ mb (A) w ∈ A, pl
pl ˜ (w) = BetP (w) ∝ mb (B) w ∈ B,
b b
Fig. 11.4. Left: the b.f. of the second scenario. Right: corresponding profile of both relative
plausibility of singletons and pignistic probability.
decision alternatives, and it is quite difficult to say which one makes more sense.
Should we privilege precision or evidence support?
Some insight on this issue comes from recalling that higher-size focal elements are
expression of ‘epistemic’ uncertainty (in Smets’ terminology), as they come from
missing data/lack of information on the problem at hand. Besides, by their own
nature they allow for a lower resolution for decision making purposes (in the second
scenario above, if we trust (C1) we are left uncertain on whether to pick one of |A|
outcomes, while if adopt (C2) the uncertainty is restricted to |B| outcomes).
In conclusion, it is not irrational, in case of conflicting evidence, to judge larger
size focal elements ‘less reliable’ (as carriers of greater ignorance) than more fo-
cused focal elements. It follows a preference for approximation criterion (C2),
which ultimately supports the case for the relative belief operator and its natural
extension (11.13).
Let us call plausibility of singletons the pseudo belief function plb : 2Θ → [0, 1]
with Moebius inverse mplb : 2Θ → R given by:
X
mplb (x) = plb (x) ∀x ∈ Θ, mplb (Θ) = 1 − plb (x) = 1 − kplb ,
x
mplb (A) = 0 ∀A ⊆ Θ : |A| =
6 1, n.
Then, as 1 − kplb ≤ 0, plb is a pseudo belief function (Section 6.1). Note that plb is
instead not a plausibility function.
In the belief space plb is represented by the vector
X X
plb = plb (x) bx + (1 − kplb ) bΘ = plb (x) bx , (11.14)
x∈Θ x∈Θ
The geometry of plfb depends on that of pl through Theorem 63. In the binary case
b
plb = plb , and we go back to the situation of Figure 10.1.
5
This result, at least in the binary case, appeared in [308] too.
360 11 The epistemic family of probability transforms
Analogously to what done for the plausibility of singletons, we can define the belief
function (belief of singletons) b : 2Θ → [0, 1] with basic probability assignment:
mb (x) = mb (x), mb (Θ) = 1 − kmb , mb (A) = 0 ∀A ⊆ Θ : |A| = 6 1, n,
P
where the scalar quantity kmb = x∈Θ mb (x) measures the total mass of single-
tons. The belief of singletons assigns to Θ all the mass b gives to non-singletons. In
the belief space b is represented by the vector:
X X
b= mb (x)bx + (1 − kmb )bΘ = mb (x)bx (11.16)
x∈Θ x∈Θ
The geometry of relative plausibility and belief of singletons can therefore be re-
duced to that of plb , b.
As we know, a belief function b and the corresponding plausibility function plb
have the same coordinates with respect to the vertices bA , plA of the belief and the
plausibility space, respectively:
X X
b= mb (A)bA ↔ plb = mb (A)plA .
∅6=A⊆Θ ∅6=A⊆Θ
Just as the latter form a pair of ‘dual’ vectors in the respective spaces, plausibility plb
and belief b of singletons have duals (that we can denote by pl b and bb) characterised
b
by having the same coordinates in the plausibility space: b ↔ bb, plb ↔ pl b .
b
They can be written as:
X
bb = mb (x)plx + (1 − kmb )plΘ = b + (1 − kmb )plΘ
x∈Θ
X (11.18)
pl
b =
b plb (x)plx + (1 − kplb )plΘ = plb + (1 − kplb )plΘ
x∈Θ
If kmb 6= 0 the geometry of relative plausibility and belief of singletons can there-
fore be described in terms of the three planes
a(plb , p[b], pl
b ),
b a(bΘ , pl
fb , plΘ ), a(bΘ , eb, plΘ )
(see Figure 11.5), where eb = b/kmb is the relative belief of singletons. Namely:
^ −
pl b pl b
~
pl b
φ1
φ2 p[b]
φ3
bΘ plΘ
−
b b^
~
b
P
Fig. 11.5. Planes and angles describing the geometry of relative plausibility and belief of
singletons, in terms of plausibility of singletons plb and belief of singletons b. Geometrically
two lines or three points are sufficient to uniquely determine a plane passing through them.
The two lines a(b, plb ) and a(b b, pl
b ) uniquely determine a plane a(b, p[b], b
b b). Two other
planes are uniquely determined by the origins of belief bΘ and plausibility plΘ spaces to-
gether with either the relative plausibility of singletons ple or the relative belief of singletons
b
b: a(bΘ , plb , plΘ ) (top of the diagram) and a(bΘ , b, plΘ ) (bottom), respectively. The angles
e e e
φ1 [b], φ2 [b], φ3 [b] are all independent, as the value of each of them reflects a different prop-
erty of the original belief function b. The original belief b and plausibility plb functions do
not appear here for sake of simplicity. They play a role only through the related plausibility
of singletons (11.14) and belief of singletons (11.16).
2. Furthermore, by definition,
362 11 The epistemic family of probability transforms
fb − bΘ = (pl − bΘ )/kpl
pl (11.20)
b b
fb − plΘ = (pl
pl b − plΘ )/kpl . (11.21)
b b
By comparing (11.20) and (11.21) we realize that pl fb has the same affine co-
ordinate on the two lines a(bΘ , plb ) and a(plΘ , plb ), which intersect exactly in
b
pl
fb . The functions bΘ , plΘ , pl
fb , pl and pl
b
b therefore determine another plane
b
which we can denote by:
a(bΘ , pl
fb , plΘ ).
fb\
φ1 [b] = pl
\b
p[b] plb , φ2 [b] = b p[b] pl e\
b , φ3 [b] = b bΘ plb (11.22)
f
(cfr. Figure 11.5 again). Such angles are all independent, and each of them has
a distinct interpretation in terms of degrees of belief as different values of theirs
reflect different properties of the belief function b and the associated probability
transformations.
Orthogonality condition for φ1 [b] We know that the dual line a(b, plb ) is always
orthogonal to P (Section 10.2). The line a(b, plb ), though, is not in general orthog-
onal to the probabilistic subspace.
Formally, the simplex P = Cl(bx , x ∈ Θ) determines an affine (or vector) space
a(P) = a(bx , x ∈ Θ). A set of generators for a(P) is formed by the n − 1 vectors:
by − bx , ∀y ∈ Θ, y 6= x, after picking an arbitrary element x ∈ Θ as reference. The
non-orthogonality of a(b, plb ) and a(P) can therefore be expressed by saying that
for at least one of such basis vectors the scalar product h·i with the difference vector
plb − b (which generates the line a(b, plb ) ) is non-zero:
∃y 6= x ∈ Θ s.t. hplb − b, by − bx i =
6 0. (11.23)
Recall that φ1 [b] as defined in (11.22) is the angle between a(b, plb ) and the specific
line a(eb, pl
fb ) laying on the probabilistic subspace.
11.5 Geometry in the space of pseudo belief functions 363
The value R[b](x) indicates how much the uncertainty plb (x)−mb (x) on the proba-
bility value on x ‘weighs’ on the total uncertainty on the probabilities of singletons.
It is the natural to call it relative uncertainty on the probabilities of singletons. When
b is Bayesian, R[b] does not exist.
Corollary 13. The line a(b, plb ) is orthogonal to P iff the relative uncertainty on
the probabilities of singletons is the uniform probability: R[b](x) = 1/|Θ| for all
x ∈ Θ.
If this holds the evidence carried by b yields the same uncertainty on the probability
value of all singletons. By definition of p[b] (10.12) wehave that:
namely the intersection probability re-assigns the mass originally given by b to non-
singletons to each singleton on an equal basis.
h1, R[b]i
cos(π − φ2 [b]) = 1 − , (11.25)
kR[b]]k2
where again h1, R[b]i denotes the usual scalar product between the unit vector 1 =
[1, .., 1]0 and the vector R[b] ∈ RN −2 .
364 11 The epistemic family of probability transforms
Example Let us see that by comparing the situations of the 2-element and 3-element
frames. If Θ = {x, y} we have that plb (x) − mb (x) = mb (Θ) = plb (y) − mb (y),
and the relative uncertainty function is:
1 1
R[b] = bx + by = P ∀b
2 2
(where P denotes the uniform probability on Θ, Figure 4) and R[b] = 21 1 = 12 plΘ .
In the binary case the angle φ2 [b] is zero for all belief functions. As we learned
b = b = pl
b , plb = pl = bb and the geometry of the epistemic family is planar.
b b
On the other side, if Θ = {x, y, z} not even the vacuous belief function bΘ
meets condition 2. In that case R[bΘ ] = P = 13 bx + 13 by + 13 bz and R is still the
uniform probability. But hR[bΘ ], 1i = 3, while
Dh 1 1 1 2 2 2 i0 h 1 1 1 2 2 2 i0 E 15
hR[bΘ ], R[bΘ ]i = hP, Pi = , = .
333333 333333 9
Unifying condition for the epistemic family The angle φ3 [b] is related to the con-
dition under which relative plausibility of singletons and relative belief of singletons
coincide. As a matter of fact, the angle is nil iff eb = pl
fb , which is equivalent to:
Again, this necessary and sufficient condition for φ3 [b] = 0 can expressed in terms
of the relative uncertainty on the probabilities of singletons, as
i.e., R[b] = eb, with R[b] ‘squashing’ pl fb onto eb from the outside. In this case the
quantities plb , plb , plb , p[b], b, b, b all lie in the same plane.
b f b e
Fig. 11.6. The angle φ2 [b] is nil for all belief functions in the size-two frame Θ = {x, y}, as
R[b] = [1/2, 1/2]0 is parallel to plΘ = 1 for all b.
As a matter of fact the belief of singletons b still exists, even in this case, and by
Equation (11.16) b = bΘ , while bb = plΘ by duality. Recall the description in terms
of planes we gave in Section 11.5.3. In this case the first two planes a(b, p[b], bb) =
a(a(bb, pl
b ), a(b, pl )) = a(a(bΘ , pl
b b
b ), a(plΘ , pl )) = a(bΘ , pl
b b
fb , plΘ ) coincide,
while the third one a(bΘ , b, plΘ ) simply does not exist. The geometry of the epis-
e
temic family reduces to a planar one (see Figure 11.7), which depends only on the
angle φ2 [b]. It is remarkable that, in this case:
1 − kmb 1
p[b](x) = mb (x) + plb (x) − mb (x) = plb (x) = pl
fb (x).
kplb − kmb kplb
Theorem 68. If a belief function b does not admit relative belief of singletons (as
b assigns zero mass to all singletons) then its relative plausibility of singletons and
intersection probability coincide.
Also, in this case the relative uncertainty on the probabilities of singletons coincides
with the relative plausibility of singletons too: R[b] = pl
fb = p[b] (see (11.24)).
^ −
pl b pl b
~
φ2 p[b]= pl b =R[b]
−
b = bΘ P plΘ = b^
ing, however, to see how they behave as probability distributions in the probability
simplex.
We can observe for instance that, as
kplb fb − eb).
R[b] = eb + (pl (11.27)
kplb − kmb
Let us study the situation in a simple example.
mb1 (x) = 0.5, mb1 (y) = 0.1, mb1 ({x, y}) = 0.3, mb1 ({y, z}) = 0.1
Their relative uncertainty is therefore R[b1 ](x) = 3/8, R[b1 ](y) = 1/2, R[b1 ](z) =
1/8. R[b1 ] is plotted as a point of the probability simplex P = Cl(bx , by , bz ) in
Figure 11.8. Its distance from the uniform probability P = [1/3, 1/3, 1/3]0 in P is:
hX i1/2
kP − R[b1 ]k = (1/3 − R[b1 ](x))2
h x1 3 2 1 1 2 1 1 2 i1/2
= − + − + − = 0.073.
3 8 3 2 3 8
The related intersection probability (as kmb1 = 0.6, kplb1 = 0.8 + 0.5 + 0.1 = 1.4,
β[b1 ] = (1 − 0.6)/(1.4 − 0.6) = 1/2)
p[b1 ](x) = 0.5 + 12 0.3 = 0.65, p[b1 ](y) = 0.1 + 12 0.4 = 0.3,
p[b1 ](z) = 0 + 12 0.1 = 0.05,
is plotted as a square (second from the left) on the dotted triangle of Figure 11.8.
A larger uncertainty on the probability of singletons is associated with b2
mb2 (x) = 0.5, mb2 (y) = 0.1, mb2 (z) = 0, mb2 ({x, y}) = 0.4,
in which all the higher-size mass is assigned to a single focal element {x, y}. In that
case plb2 (x) − mb2 (x) = 0.4, plb2 (y) − mb2 (y) = 0.4, plb2 (z) − mb2 (z) = 0 so
that the relative uncertainty on the probabilities of singletons is R[b2 ](x) = 1/2,
R[b2 ](y) = 1/2, R[b2 ](z) = 0 with a Euclidean distance from P equal to d2 =
[(1/6)2 + (1/6)2 + (1/3)2 ]1/2 = 0.408.
The corresponding intersection probability (as β[b2 ] = (1 − 0.6)/0.8 is still 1/2) is
the first square from the left on the above dotted triangle:
1 1
p[b2 ](x) = 0.5 + 0.4 = 0.7, p[b2 ](y) = 0.1 + 0.4 = 0.3, p[b2 ](z) = 0.
2 2
If we spread the mass of non-singletons on to two focal elements to get a third
belief function b3 :
mb3 (x) = 0.5, mb3 (y) = 0.1, mb3 ({x, y}) = 0.2, mb3 ({y, z}) = 0.2
plb3 (x) − mb3 (x) = 0.2, plb3 (y) − mb3 (y) = 0.4,
plb3 (z) − mb3 (z) = 0.2,
which correspond to R[b3 ](x) = 1/4, R[b3 ](y) = 1/2, R[b3 ](z) = 1/4 and a dis-
tance from P of 0.2041. The intersection probability assumes the values p[b3 ](x) =
0.5 + 12 0.2 = 0.6, p[b3 ](y) = 0.1 + 12 0.4 = 0.3, p[b3 ](z) = 0 + 12 0.2 = 0.1.
Assigning a certain mass to the singletons determines a set of belief functions
compatible with such a probability assignment. In our example b1 , b2 and b3 all
belong to the following such set:
368 11 The epistemic family of probability transforms
Fig. 11.8. Locations of the members of the epistemic family in the probability simplex
P = Cl(bx , by , bz ) for a 3-element frame Θ = {x, y, z}. The relative uncertainty on the
probability of singletons R[b], the relative plausibility of singletons plfb and the intersection
probability p[b] for the family of belief functions on the 3-element frame defined by the mass
assignment (11.28) lie on the dashed, solid and dotted triangles respectively. The locations
of R[b1 ], R[b2 ], R[b3 ] for the three belief functions b1 , b2 and b3 discussed in the example
are shown. The relative plausibility of singletons and the intersection probability for the same
b.f.s appear on the corresponding triangles in the same order. The relative belief of singletons
b lies on the bottom-left square for all the belief functions of the considered family (11.28).
e
X
b : mb (x) = 0.5, mb (y) = 0.1, mb (z) = 0, mb (A) = 0.4 . (11.28)
|A|>1
Let us pay attention to the singular case. For each belief function b such that
mb (x) = mb (y) = mb (z) = 0 the plausibilities of the singletons of a size-3 frame
are:
plb (x) = mb ({x, y}) + mb ({x, z}) + mb (Θ) = 1 − mb ({y, z}),
plb (y) = mb ({x, y}) + mb ({y, z}) + mb (Θ) = 1 − mb ({x, z}),
plb (z) = mb ({x, z}) + mb ({y, z}) + mb (Θ) = 1 − mb ({x, y}).
Furthermore, by hypothesis plb (w) − mb (w) = plb (w) for all w ∈ Θ, so that:
X
(plb (w) − mb (w)) = plb (x) + plb (y) + plb (z) =
w
= 2(mb ({x, y}) + mb ({x, z}) + mb ({y, z})) + 3mb (Θ) = 2 + mb (Θ)
and we get:
P
1 − w mb (w) 1 1
β[b] = P =P = .
w (plb (w) − mb (w)) pl
w b (w) 2 + m b (Θ)
Therefore:
plb (x) − mb (x) 1 − mb ({y, z})
R[b](x) = P = ,
w (pl b (w) − m b (w)) 2 + mb (Θ)
1 − mb ({x, z}) 1 − mb ({x, y})
R[b](y) = , R[b](z) = ;
2 + mb (Θ) 2 + mb (Θ)
1 − mb ({y, z})
p[b](x) = mb (x) + β[b](plb (x) − mb (x)) = β[b]plb (x) = ,
2 + mb (Θ)
1 − mb ({x, z}) 1 − mb ({x, y})
p[b](y) = , p[b](z) = ;
2 + mb (Θ) 2 + mb (Θ)
fb (x) = Pplb (x) = 1 − mb ({y, z}) ,
pl
w plb (w) 2 + mb (Θ)
1 − m b ({x, z}) f 1 − mb ({x, y})
pl
fb (y) = , plb (z) =
2 + mb (Θ) 2 + mb (Θ)
370 11 The epistemic family of probability transforms
Fig. 11.9. Simplices spanned by R[b] = p[b] = pl e and BetP [b] = π[b] in the probability
b
simplex for the cardinality 3 frame in the singular case mb (x) = mb (y) = mb (z) = 0, for
different values of mb (Θ). The triangle spanned by R[b] = p[b] = pl e (solid lines) coincides
b
with that spanned by BetP [b] = π[b] for all b such that mb (Θ) = 0. For mb (Θ) = 1/2,
R[b] = p[b] = pl e spans the triangle Cl(p01 , p02 , p03 ) (dotted lines) while BetP [b] = π[b]
b
spans the triangle Cl(p001 , p002 , p003 ) (dashed lines). For mb (Θ) = 1 both groups of transforma-
tions reduce to a single point P.
As a reference, for mb (Θ) = 0 the latter is the triangle delimited by the points
p1 , p2 , p3 in Figure 11.9 (solid line). For mb (Θ) = 1 we get a single point: P (the
central black square in the Figure). For mb (Θ) = 1/2, instead, (11.29) yields
Cl(p01 , p02 , p03 ) = Cl([2/5, 2/5, 1/5]0 , [2/5, 1/5, 2/5]0 , [1/5, 2/5, 2/5]0 )
(the dotted triangle in Figure 11.9). For comparison let us compute the values of
Smets’ pignistic probability (which in the 3-element case coincide with the orthog-
onal projection [267], see Section 10.4.5). We get:
11.7 Equality conditions for both families of approximations 371
mb ({x,y})+mb ({x,z})
BetP [b](x) = 2 + mb3(Θ) ,
mb ({x,y})+mb ({y,z})
BetP [b](y) = 2 + mb3(Θ) ,
mb ({x,z})+mb ({y,z})
BetP [b](z) = 2 + mb3(Θ) .
Thus, the simplices spanned by the pignistic function for the same sample values of
mb (Θ) are (Figure 11.9 again): mb (Θ) = 1 → P; mb (Θ) = 0 → Cl(p1 , p2 , p3 );
mb (Θ) = 1/2 → Cl(p001 , p002 , p003 ) where
p001 = [5/12, 5/12, 1/6]0 , p002 = [5/12, 1/6, 5/12]0 , p003 = [1/6, 5/12, 5/12]0
(the vertices of the dashed triangle in the figure). The behavior of the two families
of probability transformations is rather similar, at least in the singular case. In both
cases approximations are allowed to span only a proper subset of the probability
simplex P, stressing the pathological situation of the singular case itself.
Let us first focus on functions of the affine family. In particular, let us consider the
orthogonal projection (10.26) of b onto P [267]
1 − |A|21−|A| 1 − |A|21−|A|
X X
mb (A) − mb (A) . (11.30)
n |A|
A⊆Θ A⊇{x}
X n
X X n−1
X
mb (A) = mb (A) = mb (Θ) + plb (·; k), (11.33)
A){x} k=2 |A|=k,A⊃{x} k=2
We can summarise our findings by stating that, if focal elements of the same
size equally contribute to the plausibility of each singleton (plb (x; k) = const) the
following consequences on the relation between all probability transformations and
their geometry hold, as a function of the range of values of |A| = k for which the
hypothesis is true:
374 11 The epistemic family of probability transforms
∀k = 1, ..., n : b⊥P, pl
fb = eb = R[b] = P = BetP [b] = p[b] = π[b].
Less binding conditions may be harder to formulate - we plan on studying them in
the near future.
Proof of Theorem 57
Let us pick for simplicity a frame of discernment with just three elements: Θ =
{x1 , x2 , x3 }, and the following b.p.a.:
k
mb ({xi }c ) = ∀i = 1, 2, 3, mb ({x1 , x2 }c ) = mb ({x3 }) = 1 − k.
3
In this case, the plausibility of {x1 , x2 } is obviously: plb ({x1 , x2 }) = 1 − (1 −
k) = k, while the plausibilities Pof the singletons are: plb (x1 ) = plb (x2 ) = 2/3k,
plb (x3 ) = 1 − 1/3k. Therefore x∈Θ plb (x) = 1 + k and the relative plausibility
values are:
˜ (x1 ) = pl
pl ˜ (x2 ) = 2/3k , pl ˜ (x3 ) = 1 − 1/3k .
b b b
1+k 1+k
˜ to be consistent with b we would need that:
For plb
˜ ({x1 , x2 }) = pl
pl ˜ (x2 ) = 4 k 1 ≤ plb ({x1 , x2 }) = k,
˜ (x1 ) + pl
b b b
3 1+k
˜ 6∈ P[b].
which happens if and only if k ≥ 1/3. Therefore, for k < 1/3 pl b
11.7 Equality conditions for both families of approximations 375
Proof of Theorem 58
Proof of Theorem 59
Proof of Theorem 60
Let us consider the quantity (b̃[plb ])∞ = limn→∞ (b̃[plb ])n on the right hand side.
Since (b̃[plb ])n (x) = K(b(x))n (where K is a constant independent from x), and x
is the unique most believed state, it follows that:
Proof of Theorem 61
where P
µb (A)k(p, bA ) p(x)bx
ν(A) = P p ⊕ bA = x∈A ,
B⊆Θ b µ (B)k(p, bB ) k(p, bA )
P
with k(p, bA ) = x∈A p(x).
By replacing these expressions into (11.38) we get: plb ⊕ p =
X X X X X
µb (A) p(x)bx p(x) µb (A) bx p(x)mb (x)bx
A⊆Θ x∈A x∈Θ A⊇{x}
= X X = X X = x∈Θ
X ,
µb (B) p(y) p(y) µb (B) p(y)mb (y)
B⊆Θ y∈B y∈Θ B⊇{y} y∈Θ
once again by Lemma 12. But this is exactly b̃ ⊕ p, as a direct application of Demp-
ster’s rule (2.6) shows.
Proof of Lemma 13
We first need to analyze the behavior of the plausibility transform with respect to
affine combination of belief functions. By definition, the plausibility values of the
affine combination αb1 + (1 − α)b2 are pl[αb1 + (1 − α)b2 ](x) =
X X
= mαb1 +(1−α)b2 (A) = [αm1 (A) + (1 − α)m2 (A)]
A⊇{x} A⊇{x}
X X
=α m1 (A) + (1 − α) m2 (A) = αpl1 (x) + (1 − α)pl2 (x).
A⊇{x} A⊇{x}
11.7 Equality conditions for both families of approximations 377
P
Hence, after denoting by kpli = y∈Θ pli (y) the total plausibility of the single-
tons with respect to bi , the values of the relative plausibility of singletons can be
˜
computed as: pl[αb 1 + (1 − α)b2 ](x) =
Proof of Theorem 62
The proof follows the structure of that of Theorem 3 and Corollary 3 in [265], on
the commutativity of Dempster’s rule and convex closure.
Formally, we need to prove that:
P P ˜
1. whenever b = k αk bk , αk ≥ 0, k αk = 1, we have that pl[b] =
P ˜
k βk pl[bk ] for some convex coefficients βk ;
2. whenever p ∈ Cl(pl[b ˜ k ], k) (i.e., p = P βk pl[b
˜ k ] with βk ≥ 0, P βk = 1),
k P k
there exists a set of convex coefficients αk ≥ 0, k αk = 1 such that p =
˜ P αk bk ].
pl[ k
Now, condition 1. follows directly P
from Lemma 13. Condition 2., instead, amounts
to proving that there exist αk ≥ 0, k αk = 1 such that:
αk kplk
βk = P ∀k, (11.39)
j αj kplj
which is equivalent to
βk X βk
αk = · αj kplj ∝ ∀k
kplk j kplk
P βk
as j αj kplj does not depend on k. If we pick αk = kplk the system (11.39) is
met: by further normalization we obtain as desired.
Proof of Theorem 64
By Equation (11.18) bb − b = (1 − kmb )plΘ and pl
b − pl = (1 − kpl )plΘ . Hence:
b b b
b − bb) + bb =
β[b](pl b
= β[b] plb + (1 − kplb )plΘ − b − (1 − kmb )plΘ + b + (1 − kmb )plΘ
= β[b] plb − b + (kmb − kplb )plΘ + b + (1 − kmb )plΘ
= b + β[b](plb − b) + plΘ β[b](kmb − kplb ) + 1 − kmb .
378 11 The epistemic family of probability transforms
Proof of Theorem 65
Proof of Corollary 13
As a matter of fact:
X X X
mb (A) = (plb (x) − mb (x)) = kplb − kmb ,
x∈Θ A⊃x,A6=x x∈Θ
1
P
Replacing this in (11.24) yields R[b] = x∈Θ n bx .
11.7 Equality conditions for both families of approximations 379
Proof of Theorem 66
b − p[b] = (pl
pl h b − plb ) + (plb − p[b]) i
b
b
= plb + (1 − kplb )plΘ − plb + (kplb − 1)R[b] (11.41)
= (1 − kplb )plΘ + (kplb − 1)R[b] = (kplb − 1)(R[b] − plΘ ).
But now
b − p[b], pl − p[b]i
hpl b b
cos(π − φ2 ) =
kpl − p[b]kkpl − p[b]k
b
b b
where
h i1/2
kpl
b − p[b]k = hpl
b
b − p[b], pl
b
b − p[b]i
b
h i1/2
= (kplb − 1) hR[b] − plΘ , R[b] − plΘ i
h i1/2
= (kplb − 1) hR[b], R[b]i + hplΘ , plΘ i − 2hR[b], plΘ i
We can further simplify this expression by noticing that for all probabilities p ∈
c
P we have h1, pi = 2|{x} | − 1 = 2n−1 − 1 while h1, 1i = 2n − 2, so that
h1, 1i − 2hp, 1i = 0 and being R[b] a probability we get (11.25).
380 11 The epistemic family of probability transforms
Proof of Theorem 67
We make use of (11.42). As φ2 [b] = 0 iff cos(π −φ2 [b]) = −1 the desired condition
is:
(kR[b]k2 − h1, R[b]i)
−1 = p
kR[b]]k kR[b]k2 + h1, 1i − 2hR[b], 1i
i.e., after elevating to the square both numerator and denominator:
kR[b]k2 (kR[b]k2 +h1, 1i−2hR[b], 1i) = kR[b]k4 +h1, R[b]i2 −2h1, R[b]ikR[b]k2 .
After erasing the common terms we get that φ2 [b] is nil if and only if:
h1, R[b]i2 = kR[b]k2 h1, 1i. (11.43)
Condition (11.43) has the form:
hA, Bi2 = kAk2 kBk2 cos2 (AB)
d = kAk2 kBk2
Proof of Lemma 14
Using the form (10.26) of the orthogonal projection we get π[b] − BetP [b](x) =
1 + |Ac |21−|A| 1 − |A|21−|A|
X 1 X
= mb (A) − + mb (A)
n |A| n
A⊇{x} A6⊃{x}
but
1 + |Ac |21−|A| 1 |A| + |A|(n − |A|)21−|A| − n
− = =
n |A| n|A|
(|A| − n)(1 − |A|21−|A| )
1 1 1−|A|
= = − 1 − |A|2
n|A| n |A|
so that π[b](x) − BetP [b](x) =
1 − |A|21−|A| 1 − |A|21−|A|
X n X
= mb (A) 1− + mb (A)
n |A| n
A⊇{x} A6⊃{x}
(11.44)
or equivalently, Equation (11.30).
11.7 Equality conditions for both families of approximations 381
Proof of Theorem 69
By Equation (11.30) the condition π[b](x) − BetP [b](x) = 0 for all x ∈ Θ reads
as:
1 − |A|21−|A| 1 − |A|21−|A|
X X
mb (A) = mb (A) ∀x ∈ Θ
n |A|
A⊆Θ A⊇{x}
i.e.,
1 − |A|21−|A|
X X 1 1
mb (A) mb (A)(1 − |A|21−|A| )
= −
n |A| n
A6⊃{x}
1−|A|
A⊇{x}
X 1 − |A|2 X
1−|A| n − |A|
mb (A) = mb (A)(1 − |A|2 )
n |A|n
A6⊃{x} A⊇{x}
Proof of Corollary 14
nn − 1 n
which is verified since = .
k k−1 k
Finally, let us consider Condition 3. Under (11.32) the system of equations
(11.45) reduces to a single equation:
n−1 n−1
X n X X
(1 − k21−k ) plb (.; k) = (1 − k21−k ) mb (A).
k
k=3 k=3 |A|=k
Proof of Theorem 70
we get Pn
fb (x) = plb (x) = P k=1 pl (x; k)
pl Pn b
kplb x∈Θ k=1 plb (x; k)
As we have learned in the last two Chapters, probability transforms are a very well
studied topic in belief calculus, as they useful as a means to reduce the compu-
tational complexity of the framework (Section 4.4), they allow us to reduce deci-
sion making with belief functions to the classical utility theory approach (Section
4.5.1), and are theoretically interesting when understanding the relationship between
Bayesian reasoning and belief theory [197].
Less extensively studied is the problem of mapping a belief function toSpossibil-
ity measures, namely functions P os : 2Θ → [0, 1] on Θ such that P os( i Ai ) =
supi P os(Ai ) for any family {Ai |Ai ∈ 2Θ , i ∈ I} where I is an arbitrary set index.
Their dual ‘necessity’ measures are defined as N ec(A) = 1 − P os(Ac ), and (as we
learned in Chapter 9) have as counterparts in the theory of evidence belief functions
whose focal elements are nested [1149] (‘consonant’ b.f.s).
Approximating a belief function by a necessity measure is then equivalent to map-
ping it to a consonant b.f. [?, 682, 680, 61]. As possibilities are completely deter-
mined by their values on the singletons P os(x), x ∈ Θ, they are less computation-
ally expensive than belief functions (indeed, their complexity is linear in the size of
the frame of discernment, just like standard probabilities’), making the approxima-
tion process interesting for many applications.
Furthermore, just as in the case of Bayesian belief functions, the study of possibility
transforms can shed light on the relation between belief and possibility theory.
385
386 12 Consonant approximation
Dubois and Prade [?], in particular, have extensively worked on consonant approx-
imations of belief functions [682, 680], suggesting the notion of ‘outer consonant
approximation’.
Several partial orderings between belief functions have been introduced [1481,
410], in connection with the so-called ‘least commitment principle’. The latter plays
a similar role in the ToE as the principle of maximum entropy does in Bayesian
theory. It postulates that, given a set of basic probability assignments compatible
with a set of constraints, the most appropriate is the least informative (according to
one of those orderings).
In particular, belief functions admit (among others) the following order relation:
b ≤ b0 ≡ ∀A ⊆ Θ, b(A) ≤ b0 (A), called ‘weak inclusion’. It is then possible
to define the outer consonant approximations [?] of a belief function b as those
co.b.f.s co such that co(A) ≤ b(A) ∀A ⊆ Θ. Dubois and Prade’s work has been
later extended by Baroni [61] to capacities. In [257], the author of this Book has
provided a comprehensive description of the geometry of the set of outer consonant
approximations. For each possible maximal chain A1 ⊂ · · · ⊂ An = Θ of focal
elements, i.e., a collection of nested subsets of all possible cardinalities from 1 to
|Θ|, a maximal outer consonant approximation with mass assignment:
can be singled out. The latter mirrors the behavior of the vertices of the credal set of
probabilities dominating a belief function or a 2-alternating capacity [169, 944].
Another interesting approximation has been studied in the context of Smets’
Transferable Belief Model [1276], where the pignistic transform assumes a central
role for decision making. One can then define an ‘isopignistic’ approximation as the
unique consonant belief function whose pignistic probability is identical to that of
the original belief function [409, 415]. Subsequent work has been conducted along
these lines: for instance, the expression of the isopignistic consonant b.f. associated
with a unimodal probability density has been derived in [1222], while in [38], con-
sonant belief functions are constructed from sample data using confidence sets of
pignistic probabilities.
Chapter’s content
The theme of this Chapter is to conduct an exhaustive, analytical study of all the
consonant approximations of belief functions. In the first part we will understand the
geometry of classical outer consonant approximations, while in the second part we
will characterise geometric consonant approximations which minimise appropriate
distances from the original belief function.
We will focus in particular on approximations induced by minimizing L1 , L2
or L∞ distances, in both the belief and the mass space, and in both representations
of the latter. Even though we believe the resulting consonant approximations are
likely to be potentially useful in practical applications, our purpose at this stage
is not to empirically compare them with existing approaches such as isopignistic
function and outer approximations, but to initiate a theoretical study of the nature
of consonant approximations induced by geometric distance minimization, starting
with Lp norms as a stepping stone of a more extensive line of research. Our purpose
is to point out their semantics in terms of degrees of belief, their mutual relationships
and to analytically compare them with existing approximations. What emerges is a
picture in which belief-, mass-, and pignistic-based approximations form distinct
families of approximations with different semantics.
In some cases, improper partial solutions (in the sense that they potentially in-
clude negative mass assignments) can be generated by the Lp minimization process.
The set of approximations, in other words, may fall partly outside the simplex of
proper consonant belief functions for a given desired chain of focal elements. This
situation is not entirely new, as outer approximations themselves include infinitely
many improper solutions. Nevertheless, only the subset of acceptable solutions is
388 12 Consonant approximation
retained. In the case of the present work, the set of all (admissible and not) solutions
is typically much simpler to describe geometrically, in terms of simplices or poly-
topes. Computing the set of proper approximations in all cases requires significant
further effort, which for reasons of clarity and length we reserve for the near future.
Additionally, in this Chapter only ‘normalized’ belief functions (i.e., b.f.s whose
mass of the empty set is nil) are considered. Unnormalized b.f.s, however, play an
important role in the TBM [1238] as the mass of the empty set is an indicator of
conflicting evidence. The analysis of the unnormalized case is also left to future
work for lack of sufficient space here.
We will show that outer consonant approximations form a convex subset of the con-
sonant complex, for every choice of the desired maximal chain C = A1 ⊂ · · · ⊂ An
of focal elements Ai . In particular the set of outer consonant approximations with
chain C, OC [b], is a polytope whose vertices are indexed by all the functions reassin-
ing the mass of each focal element to elements of the chain containing it (‘assign-
ment functions’). In particular, the maximal outer approximation is the vertex of this
polytope associated with the permutation of singletons which produces the desired
maximal chain C.
Two sets of results are reported instead for geometric approximations.
As it turns out, partial approximations in the mass space M amount to redis-
tributing in various ways the mass of focal elements outside the desired maximal
chain to elements of the chain itself (compare [256]). In the (N − 1)-dimensional
representation, the L1 (partial) consonant approximations are such that their mass
values are greater than those of the original belief function on the desired maximal
chain. They form a simplex which is entirely admissible, and whose vertices are
obtained by re-assigning all the mass originally outside the desired maximal chain
C to a single focal element of the chain itself. The barycenter of this simplex is the
L2 partial approximation, which redistributes the mass outside the chain to all the
elements of C on an equal basis. The simplex of L1 , M approximations, in addition,
exhibits interesting relations with outer consonant approximations.
When the partial L∞ approximation is unique, it coincides with the L2 approxima-
tion and the barycenter of the set of L1 approximations, and it is obviously admis-
sible. When it is not unique, it is a simplex whose vertices assign to each element
of the chain (but one) the maximal mass outside the chain: this set is in general not
entirely admissible.
The L1 and L2 partial approximations calculated when adopting a (N − 2) section
of M coincide. For each possible neglected component Ā ∈ C they describe all the
vertices of the simplex of L1 , M partial approximations in the (N − 1)-dimensional
representation. In each such section, the L∞ partial approximations form instead a
(partly admissible) region whose size is determined by the largest mass outside the
desired maximal chain.
Finally, the global approximations in the L1 , L2 , L∞ cases span the simplicial com-
12 Consonant approximation 389
ponents of CO whose chains minimize the sum of mass, sum of square masses, and
maximal mass outside the desired maximal chain, respectively.
In the belief space B, all Lp approximations amount to picking different repre-
sentatives from the n lists of belief values:
n o
Li = b(A), A ⊇ Ai , A 6⊃ Ai+1 ∀i = 1, ..., n.
Belief functions are defined on a partially ordered set, the power set {A ⊆ Θ}, of
which a maximal chain is a maximal totally ordered subset. Therefore, given two
elements of the chain Ai ⊂ Ai+1 , there are a number of ‘intermediate’ focal ele-
ments A which contain the latter but not the former. This list is uniquely determined
by the desired chain.
Indeed, all partial Lp approximations in the belief space have mass m0 (Ai ) =
f (Li )−f (Li−1 ), where f is a simple function of the belief values in the list, such as
max, average, or median. Classical maximal outer and ‘contour-based’ approxima-
tions can also be expressed in the same way. As they would all reduce to the maximal
outer approximation (12.1) if the power set were totally ordered, all these consonant
approximations can be considered as generalization of the the latter. Sufficient con-
ditions on their admissibility can be given in terms of the (partial) plausibility values
of the singletons.
As for global approximations, in the L∞ case they fall on the component(s) associ-
ated with the maximal plausibility singleton(s). In the other two cases they are, for
now, of more difficult interpretation.
Table outline
Chapter (12.1) illustrates the behavior of the different geometric consonant ap-
proximations explored here, in terms of multiplicity/admissibility/global solutions.
First (Section 12.1.1) we study the geometry of the polytope of outer consonant
approximations and its vertices.
We then provide the necessary background on the geometric representation of
belief and mass and the geometric approach to the approximation problem (12.2).
After going through the case study of the binary frame (Section 12.3), we ap-
proach the problem in the mass space (Section 12.4). We: analytically compute the
approximations induced by L1 , L2 and L∞ (12.4.1) norms; discuss their interpre-
tation in terms of mass re-assignment and the relationship between the results in
the mass space versus those on its sections (12.4.2); analyze the computability and
admissibility of global approximations (12.4.3); study the relation of the obtained
approximations with classical outer consonant approximations (12.4.4); and finally,
illustrate the results in the significant ternary case (12.4.5).
In the last part of the Chapter we analyse the Lp approximation problem in
the belief space (Section 12.5). Again, we compute the approximations induced by
L1 (12.5.1), L2 (12.5.2) and L∞ (12.5.3) norms, respectively; we propose a com-
prehensive view of all approximations in the belief space via lists of belief values
determined by the desired maximal chain (Section 12.5.4), and draw some compara-
tive conclusions on the behavior of geometric approximations in the belief and mass
space (12.5.6).
To improve readability, as usual all proofs are collected in an Appendix to be
found at the end of the Chapter.
390 12 Consonant approximation
multiplicity admissibility
global solution(s)
of partial sol. of partial sol.
X
L1 , M simplex entirely arg min mb (B)
C
B6∈C
point, X
L2 , M yes arg min (mb (B))2
bary of L1 , M C
B6∈C
X
arg min mb (B)
point / yes / C
L∞ , M B6∈C
simplex partial / arg min max mb (B)
C B6∈C
point,
L1 , M \ Ā yes as in L1 , M
vertex of L1 , M
point,
L2 , M \ Ā yes as in L2 , M
as in L1 , M \ Ā
not
L∞ , M \ Ā polytope arg min max mb (B)
entirely C B6∈C
depends
L1 , B polytope not easy to interpret
on plb (xi )
depends
L2 , B point not known
on plb (xi )
depends
L∞ , B polytope arg maxC pl(A1 )
on plb (xi )
Table 12.1. Properties of the geometric consonant approximations studied in the second part
of this Chapter, in terms of multiplicity and admissibility of partial solutions, and the related
global solutions.
With the purpose of finding outer approximations which are maximal with respect to
the weak inclusion relation (4.19) Dubois and Prade have introduced two different
families of approximations.
A first group of is obtained by considering all possible permutations ρ of the
elements {x1 , ..., xn } of the frame of discernment Θ: {xρ(1) , ..., xρ(n) }.
The following family of nested sets can be then built:
n o
S1ρ = {xρ(1) }, S2ρ = {xρ(1) , xρ(2) }, ..., Snρ = {xρ(1) , ..., xρ(n) } ,
Analogously, we can consider all the permutations ρ of the focal elements {E1 , ..., Ek }
of b, {Eρ(1) , ..., Eρ(k) }, and introduce the following family of sets:
n o
S1ρ = Eρ(1) , S2ρ = Eρ(1) ∪ Eρ(2) , ..., Skρ = Eρ(1) ∪ · · · ∪ Eρ(k) .
In general, approximations of the second family (12.3) are generated by the first
family (12.2) too [?, 61].
In the binary belief space B2 the set O[b] of all the outer consonant approximations
of b is depicted in Figure 12.1-left (dashed lines). It is the intersection of the region
of the points b0 such that ∀A ⊆ Θ b0 (A) ≤ b(A), and the complex CO = COx ∪COy
of consonant b.f.s (cfr. Chapter 9, Figure 9.1). Among them, the co.b.f.s generated
by the 6 = 3! possible permutations of the three focal elements {x}, {y}, {x, y} of
392 12 Consonant approximation
b (12.3) correspond to the points cρ1 , ..., cρ6 in Figure 12.1, namely the orthogonal
projections of b onto COx , COy , respectively, plus the vacuous belief function bΘ =
0.
Let us denote by OC [b] the intersection of the set O[b] of all outer consonant ap-
proximations with the component COC of the consonant complex, with C a maximal
chain of 2Θ . We can notice that, for each maximal chain C:
1. OC [b] is convex (in this case C = {x, Θ} or {y, Θ});
2. OC [b] is in fact a polytope, i.e. the convex closure of a number of vertices: in
particular a segment in the binary case (Ox,Θ [b] or Oy,Θ [b]);
3. the maximal (with respect to weak inclusion (4.19)) outer approximation of b is
one of the vertices of this polytope OC [b] (coρ , Equation (12.2)), that associated
with the permutation ρ of singletons which generates the chain.
In the binary case there are just two such permutations, ρ1 = {x, y} and ρ2 =
{y, x}, which generate the chains {x, Θ} and {y, Θ}, respectively.
We will prove that all these properties hold in the general case as well.
12.1.3 Convexity
A more cogent statement on the shape of O[b] can be proven by means of the follow-
ing result on the basic probability assignment of consonant belief functions weakly
included in b.
Lemma 15. Consider a belief function b with basic probability assignment mb . A
consonant belief function co is weakly included in b, for all A ⊆ Θ co(A) ≤ b(A),
B
if and only if there is a choice of coefficients {αA , B ⊆ Θ, A ⊇ B} satisfying:
X
B B
∀B ⊆ Θ, ∀A ⊇ B, 0 ≤ αA ≤1 ∀B ⊆ Θ, αA =1 (12.4)
A⊇B
12.1 Geometry of outer consonant approximations in the consonant simplex 393
α1 < 0
b(A)
b2(A)
α1 , α 2 > 0
b1(A) b(A)
Fig. 12.2. The convex combination of two belief functions weakly included in b is still weakly
included in b: this does not hold for affine combinations (dashed line).
Lemma 15 states that the b.p.a. of any outer consonant approximation of b is ob-
tained by re-assigning the mass of each f.e. A of b to some B ⊇ A.
We will extensively use this result in what follows.
Let us call C = {B1 , ..., Bn } (|Bi | = i) the chain of focal elements of a consonant
belief function weakly included in b.
It is natural to conjecture that, for each maximal simplex COC of CO associated
with a maximal chain C, OC [b] is the convex closure of the co.b.f.s oB [b] with b.p.a.:
X
moB [b] (Bi ) = mb (A). (12.6)
A⊆Θ:B(A)=Bi
B : 2Θ → C
(12.7)
A 7→ B(A) ⊇ A
which maps each subset A to one of the focal elements of the chain C = {B1 ⊂
... ⊂ Bn } which contains it.
394 12 Consonant approximation
Theorem 72. For each simplicial component COC of the consonant space associ-
ated with any maximal chain of focal elements C = {B1 , ..., Bn } the set of outer
consonant approximation of an arbitrary belief function b is the convex closure
We can prove instead that the outer approximation (12.2) obtained by permuting
the singletons of Θ as in Section 12.1.1 is an actual vertex of OC [b]. More pre-
cisely, all possible permutations of the elements of Θ generate exactly n! differ-
ent outer approximations of b, each of which lies on a single simplicial compo-
nent of the consonant complex. Each such permutation ρ generates a maximal chain
Cρ = {S1ρ , ..., Snρ } of focal elements so that the corresponding belief function will
lie on COCρ .
Theorem 73. The outer consonant approximation coρ (12.2) generated by a per-
mutation ρ of the singleton elements of Θ is a vertex of OCρ [b].
Corollary 17. The maximal outer consonant approximation with maximal chain C
of a belief function b is the vertex (12.2) of OCρ [b] associated with the permutation
ρ of the singletons which generates C = Cρ .
By definition (12.2) coρ assigns the mass mb (A) of each focal element A to the
smallest element of the chain containing A. By Lemma 15 each outer consonant
approximation of b with chain C, co ∈ OCρ [b], is the result of re-distributing the
mass of each focal element A to all its supersets in the chain {Bi ⊇ A, Bi ∈ C}.
But then each such co is weakly included in coρ for its b.p.a. can be obtained by
re-distributing the mass of the minimal superset Bj , where j = min{i : Bi ⊆ A},
to all supersets of A. Hence, coρ is the maximal outer approximation with chain Cρ .
for all finite set systems A such that ∩A∈A A ∈ S. If this property holds for arbitrary
A and S is closed under arbitrary intersection, then β is called a necessity measure.
Any necessity measure is a lower chain measure, but the converse does not hold.
However, the class of necessity measures coincides with the class of lower chain
measures if Θ is finite.
As consonant belief functions are necessity measures on finite domains, they are
trivially also lower chain measures and vice-versa.
Now, let b be a belief function and C a maximal chain in 2Θ . Then we can build
a chain measure (consonant b.f.) associated with b as:
namely: 0
B1 = {x}, {x, y}, Θ, {x, y}, Θ, Θ, Θ ;
0
B2 = {x}, {x, y}, Θ, Θ, Θ, Θ, Θ ;
0
B3 = {x}, Θ, Θ, {x, y}, Θ, Θ, Θ ;
0
B4 = {x}, Θ, Θ, Θ, Θ, Θ, Θ ;
0
B5 = {x, y}, {x, y}, Θ, {x, y}, Θ, Θ, Θ ;
0
B6 = {x, y}, {x, y}, Θ, Θ, Θ, Θ, Θ ;
0
B7 = {x, y}, Θ, Θ, {x, y}, Θ, Θ, Θ ;
0
B8 = {x, y}, Θ, Θ, Θ, Θ, Θ, Θ ;
0
B9 = Θ, {x, y}, Θ, {x, y}, Θ, Θ, Θ ;
0
B10 = Θ, {x, y}, Θ, Θ, Θ, Θ, Θ ;
0
B11 = Θ, Θ, Θ, {x, y}, Θ, Θ, Θ ;
0
B12 = Θ, Θ, Θ, Θ, Θ, Θ, Θ .
They correspond to the following co.b.f.s with b.p.a. [m({x}), m({x, y}), m(Θ)]0 :
Figure 12.3-left shows the resulting polytope OC [b] for a belief function
mb (x) = 0.3, mb (y) = 0.5, mb ({x, y}) = 0.1, mb (Θ) = 0.1, (12.11)
in the component COC = Cl(bx , b{x,y} , bΘ ) of the consonant complex (black tri-
angle in the figure). The polytope OC [b] is plotted in red, together with all the 12
points (12.10) (red squares). Many of them lie on a side of the polytope. However,
the point obtained by permutation of singletons (12.2) is an actual vertex (red star):
it is the first item oB1 of the list (12.10).
12.2 Geometric consonant approximation 397
Fig. 12.3. Not all the points (12.6) associated with assignment functions are actual ver-
tices of OC [b]. Here the polytope OC [b] of outer consonant approximations with C =
{{x}, {x, y}, Θ} for the belief function (12.11) on Θ = {x, y, z}, is plotted in red, together
with all the 12 points (12.10) (red squares). Many of them lie on a side of the polytope. How-
ever, the point obtained by permutation of singletons (12.2) is an actual vertex (red star). The
minimal and maximal outer approximations with respect to weak inclusion are oB12 and oB1 ,
respectively.
It is interesting to point out that the points (12.10) are ordered with respect to
weak inclusion (we just need to apply its definition, or the re-distribution property of
Lemma 15). The result is summarized in the graph of Figure 12.4. We can appreciate
that the vertex oB1 generated by singleton permutation is indeed the maximal outer
approximation of b, as stated by Corollary 17.
Fig. 12.4. Partial order of the points (12.10) with respect to weak inclusion. For sake of
simplicity we denote by Bi the co.b.f. oBi associated with the assignment function Bi . An
arrow from Bi to Bj stands for oBj ≤ oBi .
(compare Equation (6.8) and Theorem 10), where mA is the vector of mass values
of the categorical belief function bA : mA (A) = 1, mA (B) = 0 ∀B 6= A. Note that
in RN −1 mΘ = [0, ..., 0, 1]0 and cannot be neglected.
Since the mass of any focal element Ā is uniquely determined by all the other
masses in virtue of the normalization constraint, we can also choose to represent
b.p.a.s as vectors of RN −2 of the form:
X
mb = mb (A)mA , (12.13)
∅(A⊂Θ,A6=Ā
Binary example In the case of a binary frame Θ = {x, y}, since mb (x) ≥ 0,
mb (y) ≥ 0, and mb (x) + mb (y) ≤ 1 we can easily infer that the set B2 = M2 of
all the possible basic probability assignments on Θ2 can be depicted as the triangle
in the Cartesian plane of Figure 12.5, whose vertices are the vectors
which correspond respectively to the vacuous belief function bΘ , the Bayesian b.f.
bx with mbx (x) = 1, and the Bayesian b.f. by with mby (y) = 1.
Fig. 12.5. The mass space M2 for a binary frame is a triangle in R2 whose vertices are
the mass vectors associated with the categorical belief functions focused on {x}, {y} and Θ:
mx , my , mΘ . The belief space B2 coincides with M2 when Θ = {x, y}. Consonant b.f.s
live in the union of the segments CO{x,Θ} = Cl(mx , mΘ ) and CO{y,Θ} = Cl(my , mΘ ).
The unique L1 = L2 consonant approximation (circle) and the set of L∞ consonant approx-
imations (dashed segment) on CO{x,Θ} are shown.
The region P2 of all Bayesian belief functions on Θ2 is the diagonal line segment
Cl(mx , my ) = Cl(bx , by ). On Θ2 = {x, y} consonant belief functions can have
as chain of focal elements either {{x}, Θ2 } or {{y}, Θ2 }. Therefore, they live in
the union of two segments (see Figure 12.5):
Analogously, the region COM of consonant belief functions in the mass space M
is the simplicial complex:
[
COM = Cl(mA1 , · · · , mAn ).
C={A1 ⊂···⊂An }
Fig. 12.6. In order to minimize the distance of a mass vector from a consonant simplicial
complex, we need to find all the partial solutions (12.17) on all the maximal simplices which
form the complex (empty circles), and compare these partial solutions to select a global one
(black circle).
Choice of norm Consonant belief functions are the counterparts of necessity mea-
sures in the theory of evidence, so that their plausibility functions are possibility
measures, which in turn are inherently related to L∞ as P os(A) = maxx∈A P os(x)
(cfr. Section 9.1). It makes therefore sense to conjecture that a consonant transfor-
mation obtained by picking as distance function d in (12.14) one of the classical Lp
norms would be meaningful.
For vectors mb , mb0 ∈ M representing the basic probability assignments of
two belief functions b, b0 , they read as:
12.2 Geometric consonant approximation 401
. X
kmb − mb0 kL1 = |mb (B) − mb0 (B)|;
∅(B⊆Θ
s
. X
kmb − mb0 kL2 = (mb (B) − mb0 (B))2 ; (12.15)
∅(B⊆Θ
.
kmb − mb0 kL∞ = max |mb (B) − mb0 (B)|,
∅(B⊆Θ
.
h mb (Θ) mb (Θ) i0
pL2 [b] = arg min kb − pkL2 = mb (x) + , mb (y) +
p∈P 2 2
is the orthogonal projection π[b] of b onto P [267], and coincides with the pignistic
function BetP [b] [1231, 1276, 199] only in the binary case (Section 10.4).
The L∞ norm yields the same Bayesian approximation:
.
n o
pL∞ [b] = arg min kb − pkL∞ = arg min max |b(x) − p(x)|, |b(y) − p(y)|
p∈P n p∈P o
= arg min max |mb (x) − p(x)|, |mb (y) − p(y)|
h p∈P i0
= mb (x) + mb2(Θ) , mb (y) + mb2(Θ) = pL2 [b] = π[b],
has as solution the entire set of probabilities ‘consistent’ with b [801, 437]:
n o
p ∈ P : p(A) ≥ b(A) ∀A ⊆ Θ = P[b]. (12.18)
As illustrated in Figure 12.6, in the consonant case we need to find a partial ap-
proximation on each component of the consonant complex, to later select a global
approximation among the resulting partial solutions. We get for L2 :
( 0
. m (x), 0 mb (x) ≤ mb (y)
coL2 [b] = arg min kb − cokL2 = b 0 (12.19)
co∈CO 0, mb (y) mb (x) ≥ mb (y),
while:
kb−cokL1 = |mb (x)−mco (x)|+|mb (y)−mco (y)| = |mb (x)−mco (x)|+mb (y)
for co ∈ COx . This is minimal for mco (x) = mb (x) (mco (y) = 0 by definition).
Analogously for the component COy
Its minimum arg minco∈COx kb − cokL∞ corresponds to all the consonant belief
functions such that |mb (x) − mco (x)| ≤ mb (y), i.e.:
n o
co ∈ COx : max{0, mb (x) − mb (y)} ≤ co(x) ≤ mb (x) + mb (y) .
An analogous result holds for the COy component. We can thus write arg minco∈CO2 kb−
.
cokL∞ = CO[b] =
n o
co ∈ CO x : m b (x) − m b (y) ≤ mco (x) ≤ m b (x) + mb (y) , mb (x) ≥ mb (y)
n o
co ∈ COy : mb (y) − mb (x) ≤ mco (y) ≤ mb (y) + mb (x)
mb (y) ≥ mb (x)
we can recognize the dual role of the norms L∞ and L1 in the two problems (at least
in the binary case). It is natural to call the set CO[b] the collection of consonant
belief functions compatible with b.
Fig. 12.7. Characterization of compatible consonant belief functions in terms of the ref-
erence frame (X, Y ) formed by the probability line and the line P ⊥ orthogonal to P in
P = [1/2, 1/2]0 .
in the reference frame (X, Y ) with origin O = P = [1/2, 1/2]0 . The coordinates of
a belief function b in this reference frame can be computed through simple trigono-
metric arguments, and are given by
mb (Θ) mb (x)−mb (y)
X(b) = √
2
, Y (b) = √
2
.
More interestingly, X(b) is the L2 distance of b from the Bayesian region, while
Y (b) is the distance between b and the orthogonal complement of P in P:
Furthermore, P ⊥ (or better its segment Cl(bΘ , P) joining bΘ and P) is the set
of belief functions in which the mass is equally distributed among events of the
same size ({x} and {y} in the binary case). This link between orthogonality and
equidistribution is true in the general case too (recall Theorem 70 [283]).
In conclusion, at least in the binary case the consonant belief functions compat-
ible with b ∈ B2 are those which are simultaneously less Bayesian and less equally
distributed than b. The question of the existence of a set of compatible consonant be-
lief functions in the general case is something which we plan to explore in upcoming
work.
12.4 Consonant approximation in the mass space 405
One can observe that, since (12.22) coincides with (12.20) (factoring out the miss-
ing component Ā) minimizing the Lp norm of the difference vector in a (N − 2)-
dimensional section of the mass space which leaves out a focal element outside
the desired maximal chain yields the same results as in the complete mass space1 .
Therefore, in what follows we only consider consonant approximations in (N − 2)-
dimensional sections obtained by excluding a component associated with an element
Ā ∈ C of the desired maximal chain.
In the following we denote by COCM\Ā,Lp [mb ] (uppercase) the set of partial
Lp approximations of b with maximal chain C in the section of the mass space
which excludes Ā ∈ C. We drop the superscript C for global solutions, drop \Ā
for solutions in the complete mass space, and use coCM\Ā,Lp [mb ] (lowercase) for
pointwise solutions and the barycenters of sets of solutions.
1
The absence of the missing component mb (Ā) in (12.22) implies in fact a small difference
when it comes to the L∞ approximation: Theorem 77 and Equation (12.26), concerning
the vertices of the polytope of L∞ approximations, remain valid as long as we replace
maxB6∈C mb (B) with maxB6∈C,B6=Ā mb (B).
406 12 Consonant approximation
Theorem 75. Given a belief function b : 2Θ → [0, 1] with basic probability assign-
ment mb , the partial L1 consonant approximations of b with maximal chain of focal
elements C in the complete mass space M is the set of co.b.f.s co with chain C such
that mco (A) ≥ mb (A) ∀A ∈ C. They form a simplex:
Theorem 76. Given a belief function b : 2Θ → [0, 1] with basic probability assign-
ment mb , the partial L2 consonant approximation of b with maximal chain of focal
elements C in the complete mass space M has mass assignment (12.25):
i.e., the union of the partial solutions associated with maximal chains of focal ele-
ments which minimise the sum of square masses outside the chain.
The partial L2 consonant approximation of b in the section of the mass space
with missing component Ā ∈ C is unique, and coincides with the L1 partial conso-
nant approximation in the same section (12.24):
The L2 global approximations in the section form the Punion of the related partial
approximations associated with the chains: arg minC B6∈C (mb (B))2 .
Note that global solutions in the L1 and L2 cases fall in general onto different sim-
plicial components of CO.
L∞ approximation
Theorem 77. Given a belief function b : 2Θ → [0, 1] with basic probability as-
signment mb , the partial L∞ consonant approximations of b with maximal chain of
focal elements C in the complete mass space M form a simplex:
When the opposite is true, the sought partial L∞ consonant approximation re-
duces to a single consonant belief function, the barycenter of the above simplex,
located on the partial L2 approximation (and barycenter of the L1 partial approxi-
mations) (12.25).
408 12 Consonant approximation
The related global approximations of b are associated with the optimal chain(s)
(12.28).
We can observe that, for each desired maximal chain of focal elements C:
1. the L1 partial approximations of b are those consonant b.f.s whose mass assign-
ment dominates that of b over all the elements of the chain;
2. this set is a fully admissible simplex, whose vertices are obtained by re-
assigning all the mass outside the desired chain to a single focal element of
the chain itself (see (12.24));
3. its barycenter coincides with the L2 partial approximation with the same chain,
which redistributes the original mass of focal elements outside the chain to all
the elements of the chain on an equal basis (12.25);
12.4 Consonant approximation in the mass space 409
– the latter forms a generalized rectangle in the mass space M, whose size is deter-
mined by the largest mass outside the desired maximal chain.
Fig. 12.8. Graphical representation of the relationships between the different (partial) Lp
consonant approximations with desired maximal chain C, in the related simplex COCM of
the consonant complex CO. Approximations in the full mass space M and approximations
computed in a (N − 2)-dimensional section with missing component Ā ∈ C are compared.
In the latter case, the special case Ā = Θ is highlighted.
select the component Ā to neglect, the latter simplex (and its barycenter) seem to
play a privileged role.
L∞ approximations in any such sections are not entirely admissible, and do not
show a particular relation with the simplex of L1 , M solutions. It remains to be de-
termined the relation between the L∞ partial solutions in the full mass space and
those computed in its sections M \ Ā.
Theorem 78. Given a belief function b : 2Θ → [0, 1] with b.p.a. mb and a maximal
chain of focal elements C, the partial L∞ consonant approximations of b in the
complete mass space COCM,L∞ [mb ] are not necessarily partial L∞ approximations
COCM\Ā,L∞ [mb ] in the section excluding Ā. However, for all Ā ∈ C the two sets of
approximations share the vertex (12.26).
As far as global solutions are concerned, we can observe the following facts:
– in the L1 case, in both the (N − 1)- and (N − 2)-dimensional representations,
the optimal chain(s) are:
X X
arg min mb (B) = arg max mb (A);
C C
B6∈C A∈C
unless the approximation is unique in the full mass space, in which case the opti-
mal chains behave as in the L1 case.
This behavior compares unfavorably with that of two other natural consonant ap-
proximations.
Definition 83. Given a belief function b : 2Θ → [0, 1], its isopignistic conso-
nant approximation [415] is defined as the unique consonant b.f. coiso [b] such that
BetP [coiso [b]] = BetP [b]. Its contour function is:
X n o
plcoiso [b] (x) = min BetP [b](x), BetP [b](x0 ) . (12.33)
x0 ∈Θ
It is well known that, given the contour function plb of a consistent belief function
b : 2Θ → [0, 1] (such that maxx plb (x) = 1) we can obtain the unique consonant
b.f. which has plb as contour function via the following formulae:
plb (xi ) − plb (xi+1 ) i = 1, ..., n − 1,
mco (Ai ) = (12.34)
plb (xn ) i = n,
Definition 84. Given a belief function b : 2Θ → [0, 1], its contour-based consonant
approximation with maximal chain of focal elements C = {A1 ⊂ · · · ⊂ An } has
mass assignment:
1 − plb (x2 ) i = 1,
mcocon [b] (Ai ) = plb (xi ) − plb (xi+1 ) i = 2, ..., n − 1, (12.36)
plb (xn ) i = n,
.
where xi = Ai \ Ai−1 for all i = 1, ..., n.
Such approximation uses the (unnormalized) contour function of an arbitrary b.f. b
as if it was a possibility distribution, by replacing the plausibility of the maximal
element with 1, and applying the mapping (12.34).
In order to guarantee their admissibility, both the isopignistic and the contour-
based approximations require sorting (respectively) the pignistic and the plausibil-
ity values of the singletons (an operation whose complexity is O(n log n)). On top
of that, though, one must add the complexity of actually computing the value of
BetP [b](x) (plb (x)) from a mass vector, which requires n scans (one for each sin-
gleton x) with an overall complexity of n · 2n .
An interesting relationship between outer consonant and L1 consonant approxi-
mation in the mass space M can also be pointed out.
Theorem 79. Given a belief function b : 2Θ → [0, 1], the set of partial L1 conso-
nant approximations COCM,L1 [mb ] with maximal chain of focal elements C in the
complete mass space and the set OC C [b] of its partial outer consonant approxima-
tions with the same chain have non-empty intersection. This intersection contains
at least the convex closure of the candidate vertices of OC C [b] whose assignment
functions are such that B(Ai ) = Ai for all i = 1, ..., n.
Proof. Clearly if B(Ai ) = Ai for all i = 1, ..., n, then the mass mb (Ai ) is re-
assigned to Ai itself for each element Ai of the chain. Hence mco (Ai ) ≥ mb (Ai ),
and the co.b.f. belongs to COCM,L1 [mb ] (see Equation (12.30)).
In particular, both coCmax [b] (12.2) and
mco (A) = mb (A), X A ∈ C, A 6= Θ,
coCM\Θ,L1 /2 [mb ] : mco (Θ) = mb (Θ) + mb (B) A = Θ, (12.37)
B6∈C
Notice that only the Lp approximations in the section with Ā = Θ are shown for
sake of simplicity. The example confirms the general picture of their relationships
given in Figure 12.8.
According to the formulae at page 8 of [250] (see also Section 12.1.8), the set
of outer consonant approximations of (12.38) with chain {{x}, {x, y}, Θ} is the
convex closure of the points:
These points are plotted in Figure 12.9 as empty squares. We can observe that, as
proven by Theorem 79, both coCmax [b] (12.2) and coCM\Θ,L1 /2 [mb ] (12.37) belong
to the intersection of (partial) outer and L1 , M consonant approximations.
The example also suggests that (partial) outer consonant approximations are
included in L∞ consonant approximations, calculated by neglecting the component
Ā = Θ. However, this is not so as attested by the binary case Θ = {x, y}, for
which the L∞ , M \ Θ solutions satisfy, for the maximal chain C = {{x} ⊂ Θ}:
mb (x) − mb (y) ≤ mco (x) ≤ mb (x) + mb (y), while the outer approximations are
such that 0 ≤ mco (x) ≤ mb (x).
As for isopignistic and contour-based approximations, they coincide in this case
with the vectors:
miso = [0.15, 0.1, 0.75]0 ,
mcon = [1 − plb (y), plb (y) − plb (z), plb (z)]0 = [0.7, −0.2, 0.5]0 .
The pignistic values of the elements in this example are BetP [b](x) = 0.45,
BetP [b](y) = 0.3, BetP [b](z) = 0.25 so that the chain associated with the
isopignistic approximation is indeed {{x}, {x, y}, Θ}. Notice though that ‘pseudo’
isopignistic approximations can be computed for all chains via Equation (12.35),
none of which will be admissible. The contour-based approximation is not admissi-
ble in this case, as singletons have a different plausibility ordering.
12.5 Consonant approximation in the belief space 415
Fig. 12.9. The simplex COC in the mass space of consonant belief functions with maximal
chain C = {{x} ⊂ {x, y} ⊂ Θ} defined on Θ = {x, y, z}, and the Lp partial consonant
approximations in M of the belief function with basic probabilities (12.38). The L2 , M
approximation is plotted as a red square, as the barycenter of both the sets of L1 , M (blue
triangle) and L∞ , M (green triangle) approximations. The maximal outer approximation
is denoted by a yellow square, the contour-based approximation is a vertex of the triangle
L∞ , M. The various Lp approximations are also depicted for the section M \ Θ of the mass
space: the unique L1 /L2 approximation is a vertex of L1 , M, while the polytope of L∞
approximations in the section is depicted in light green. The related set OC C [b] of partial
outer consonant approximations (12.39) is also shown for comparison (light yellow), while
the isopignistic function is represented by a star.
values determined by the desired maximal chain, and through the latter to other
natural approximations.
We first need to make explicit the analytical form of the difference vector b − co
between the original b.f. b and the desired approximation co.
X n−1
X X i
X
b − co = b(A)xA + xA γ(Ai ) + b(A) − mb (Aj ) ,
A6⊃A1 i=1 A⊇Ai ,A6⊃Ai+1 j=1
(12.40)
where X
γ(A) = (mb (B) − mco (B))
B⊆A,B∈C
and {xA , ∅ =
6 A ( Θ} is the usual orthonormal reference frame in the belief space
B (Section 6.1).
12.5.1 L1 approximation
n−1 n−1
In particular, bn−1 = γint1 = γint2 = b(An−1 ).
Note that, even though the approximation is computed in B, we present the result
in terms of mass assignments as they are simpler and easier to interpret. The same
holds for the other Lp approximations in B.
Due to the nature of partially ordered set of 2Θ , the innermost values of the above
lists (12.42) cannot be analytically identified in full generality (even though they can
be easily computed numerically). Nevertheless, the partial L1 approximations in B
can be analytically derived in some cases. By (12.41), the barycenter of the set of
partial L1 consonant approximations in B has mass vector:
12.5 Consonant approximation in the belief space 417
1
1 2 2 1 1
0
γint1 + γint2 γint1 + γint2 γint1 + γint2
mcoCB,L [b] = , − , · · · , 1 − b(A n−1 ) .
1 2 2 2
(12.43)
The global L1 approximation(s) can be easily derived from the expression of the
norm of the difference vector (see proof of Theorem 81, Equation (12.76)).
Theorem 82. Given a belief function b : 2Θ → [0, 1], its global L1 consonant
approximations COB,L1 [b] in B live in the collection of partial such approximations
associated with maximal chain(s) which maximize the cumulative lower halves of
the lists of belief values Li (12.42):
X X
arg max b(A). (12.44)
C
i b(A)∈Li ,b(A)≤γint1
as bΘ = 0 is the origin of the Cartesian space in B, and bAj −bΘ for j = 1, ..., n−1
are the generators of the component COCB .
Using once again expression (12.40), the orthogonality conditions (12.45) trans-
late into the following linear system of equations:
X X
mb (A)hbA , bAj i + mb (A) − mco (A) hbA , bAj i = 0 (12.46)
A6∈C A∈C,A6=Θ
.
where L0 = {0}, and ave(Li ) is the average of the list of belief values Li (12.42):
1 X
ave(Li ) = |Aci+1 |
b(A). (12.48)
2 A⊇Ai ,A6⊃Ai+1
2
The computation of the global L2 approximation(s) is rather involved. We plan to solve
this issue in the near future.
418 12 Consonant approximation
12.5.3 L∞ approximation
b(A1 ) + b({x2 }c )
i = 1,
2
mcoCB,L [b] (Ai ) = b(Ai ) − b(Ai−1 ) plb ({xi }) − plb ({xi+1 })
∞ + i = 2, ..., n − 1,
2 2
1 − b(An−1 ) i = n.
(12.50)
Note that, since b(Ac1 ) = 1 − plb (A1 ) = 1 − plb (x1 ), the size of the polytope
(12.49) of partial L1 approximations of b is a function of the plausibility of the
innermost desired focal element only. As expected, it reduces to zero only when the
b is a consistent belief function (see Section 9.4) and A1 = {x1 } has plausibility 1.
A straightforward interpretation of the barycenter of the partial L∞ approxi-
mations in B in terms of degrees of belief is possible when we notice that, for all
i = 1, ..., n:
mcoCmax [b] (Ai ) + mcocon [b] (Ai )
mco (Ai ) =
2
(recall Equations (12.2) and (12.36)), i.e., (12.50) is the average of the maximal
outer consonant approximation and what we called ‘contour-based’ consonant ap-
proximation (Definition 84).
arg min b(Ac1 ) = arg min(1 − plb (A1 )) = arg max plb (A1 ).
C C C
12.5 Consonant approximation in the belief space 419
Theorem 85. Given a belief function b : 2Θ → [0, 1], the set of global L∞ conso-
nant approximations of b in the belief space is the collection of partial approxima-
tions associated with maximal chains whose smallest focal element is the maximal
plausibility singleton:
[
COB,L∞ [b] = COCB,L∞ [b].
C:A1 =arg maxx plb (x)
for i = n − 1, n;
– in particular, coCB,L1 [b] = coCB,L2 [b] = coCB,L∞ [b] whenever |Θ| = n ≤ 3;
– all the point-wise approximations in (12.51) coincide on the last component:
mcoCmax [b] (An ) = mcoCcon [b] (An ) = mcoCB,L [b] (An )
1
= mcoCB,L [b] (An ) = mcoCB,L [b] (An ) = 1 − b(An−1 ).
2 ∞
Admissibility As it is clear from the table of Equation (12.51), all the Lp approx-
imations in the belief space are differences of vectors of all positive values; in-
deed, differences of shifted version
i h of the same positive vector.iAs such vectors
h 0 0
int1 (Li )+int2 (Li ) max(Li )+min(Li )
2 ,i = 1, ..., n , 2 ,i = 1, ..., n , [ave(Li ), i =
0
1, ..., n] are not guaranteed to be monotonically increasing for any arbitrary maxi-
mal chain C, none of the related partial approximations are guaranteed to be entirely
admissible. However, sufficient conditions under which they are admissible can be
worked out by studying the structure of the list of belief values (12.42).
Let us first consider comax and cocon . As min(Li−1 ) = b(Ai−1 ) ≤ b(Ai ) =
min(Li ), the maximal partial outer approximation is admissible for all maximal
chains C. As for the contour-based approximation, max(Li ) = b(Ai + Aci+1 ) =
b(xci+1 ) = 1 − plb (xi+1 ) while max(Li−1 ) = 1 − plb (xi ), so that max(Li ) −
max(Li−1 ) = plb (xi ) − plb (xi+1 ) which is guaranteed non-negative if the chain C
is generated by singletons sorted by their plausibility values. Thus, as:
12.5 Consonant approximation in the belief space 421
(where plAi+2 (xi ) measures the plausibility of xi given Ai+2 ), then both the partial
L2 consonant approximation and the barycenter of the L1 consonant approxima-
tions in the belief space with maximal chain C are admissible.
Fig. 12.10. Comparison between Lp partial consonant approximations in the mass M and
belief B spaces for the belief function with basic probabilities (12.38) on Θ = {x, y, z}. The
L2 , B approximation is plotted as a red square, as the barycenter of both the sets of L1 , B
(blue segment) and L∞ , B (green quadrangle) approximations. Contour-based and maximal
outer approximations are in this example the extreme of the segment L1 , B (blue squares).
The polytope of partial outer consonant approximations (yellow), the isopignistic approxima-
tion (star) and the various Lp partial approximations in M (in gray levels) are also drawn.
approximations in the belief and in the mass space as vectors of mass values. When
422 12 Consonant approximation
(see Figure 12.10). Note that this set is not entirely admissible, not even in this
ternary example.
The partial L2 approximation in B is, by (12.51), unique, with mass vector:
0
b(x) + b(x, z) b(x) + b(x, z)
mcoB,L2 [b] = mcoB,L∞ [b] = , b(x, y)− , 1−b(x, y) ,
2 2
(12.53)
and coincides with the barycenter of the set of partial L∞ approximations (note that
this is not so in the general case).
As for the full set of partial L∞ approximations, this has vertices (12.49):
h i0
b(x)+b(x,z) b(x)+b(x,z)
2 − b(y, z), b(x, y) − 2 , 1 − b(x, y) + b(y, z) ;
h i0
b(x)+b(x,z)
2 − b(y, z), b(x, y) − b(x)+b(x,z)
2 + 2b(y, z), 1 − b(x, y) − b(y, z) ;
h i0
b(x)+b(x,z) b(x)+b(x,z)
2 + b(y, z), b(x, y) − 2 − 2b(y, z), 1 − b(x, y) + b(y, z) ;
h i0
b(x)+b(x,z) b(x)+b(x,z)
2 + b(y, z), b(x, y) − 2 , 1 − b(x, y) − b(y, z) ,
which, as expected, are not all admissible (see Figure 12.10 again).
The example hints at the possibility that the contour-based approximation and/or
the L2 , L∞ barycenter approximations in the belief space be related to the set of L1
approximations in the full mass space: this deserves further analysis. On the other
hand, we know that the maximal partial outer approximation (12.2) is not in general
a vertex of the polygon of L1 partial approximations in B, unlike what the ternary
example (for which int1 (L1 ) = b(x)) suggests.
but related ways the classical approach incarnated by the maximal outer approxima-
tion (12.2). The latter, together with the contour-based approximation (12.36) form
therefore a different, coherent family of consonant approximations.
As for the isopignistic approximation, it seems to be completely unrelated to ap-
proximations in both the mass and the belief space, as it naturally fits in the context
of the Transferable Belief Model and the use of the pignistic function.
It will be interesting, in this respect, to study the property of geometric con-
sonant approximations with respect to other major probability transforms, such as
orthogonal projection, intersection probability, and relative plausibility and belief of
singletons (since they seem to be related the plausibilities of the singletons).
Isopignistic, mass-space and belief-space consonant approximations form three dis-
tinct families of approximations, with fundamentally different rationales: which ap-
proach to use will therefore vary according to the chosen framework, and the prob-
lem at hand.
Appendix
Proof of Lemma 15
(1) Sufficiency. If Equation (12.5) holds for all focal elements A ⊆ Θ then:
X X X X X
B B
co(A) = mco (X) = αX mb (B) = mb (B) αX
X⊆A X⊆A B⊆X B⊆A B⊆X⊆A
X X
B B
where, by Condition (12.4): αX ≤ αX = 1. Therefore:
B⊆X⊆A X⊇B
X
co(A) ≤ mb (B) = b(A),
B⊆A
. B
Let us introduce the notation αiB = αB i
for sake of simplicity. For each i we can
sum up the first i equations of system (12.54) and obtain the equivalent system of
equations: X
co(Bi ) = βiB mb (B) ∀i = 1, · · · , n (12.55)
B⊆Bi
12.5 Consonant approximation in the belief space 425
Pi
as co(Bi ) = j=1 mco (Bj ) for co is consonant. For all B ⊆ Θ the coefficients
. Pi
βiB = j=1 αjB need to satisfy:
Let us call 1.,2.,3.,4. the above constraints on the free variables {βiB , B ⊆ Bi−1 }.
– 1. and 2. are trivially compatible;
– 1. is compatible with 3. as replacing βiB = βi−1
B
into 1. yields (due to the i − 1-th
equation of the system):
X X
βiB mb (B) = B
βi−1 mb (B) = co(Bi−1 ) ≤ co(Bi );
B⊆Bi−1 B⊆Bi−1
Proof of Theorem 72
We then need to write (12.61) as a convex combination of the moB [b] (Bi ), i.e.:
X X X X X
αB oB [b](Bi ) = αB mb (X) = mb (X) αB .
B B X⊆Bi :B(X)=Bi X⊆Bi B(X)=Bi
Using the normalization constraint the system of equations (12.62) reduces to:
X
A
αB i
= αB ∀i = 1, ..., n − 1; ∀A ⊆ Bi . (12.63)
B(A)=Bi
We can show that each equation in the reduced system (12.63) involves at least one
variable αB which is not present in any other equation.
Formally, the set of assignment functions which meet the constraint of equation
A, Bi but not all others is not empty:
428 12 Consonant approximation
^ ^
B : (B(A) = Bi ) (B(A) 6= Bj ) (B(A0 ) 6= Bj ) 6= ∅.
∀j=1,...,n−1 ∀A0 6=A
j6=i ∀j=1,...,n−1
(12.64)
But the assignment functions B such that B(A) = Bi and ∀A0 6= A B(A0 ) = Θ all
meet condition (12.64). Indeed they obviously meet B(A) 6= Bj for all j 6= i while
clearly for all A0 ⊆ Θ B(A0 ) = Θ 6= Bj , as j < n so that Bj 6= Θ.
A non-negative solution of (12.63) (and hence of (12.62)) can be obtained by
setting for each equation one of the variables αB to be equal to the left hand side
A
αB i
, and all the others to zero.
Proof of Theorem 73
where
.
j = max{ji1 , ..., jim }
maps each event A to the smallest Siρ in the chain which contains A: j = min{i :
A ⊆ Siρ }. Therefore it generates a co.b.f. with b.p.a. (12.2), i.e. coρ .
2. In order for coρ to be an actual vertex, we need to ensure that it cannot be
written as a convex combination of the other (pseudo) vertices oB [b]:
X X
coρ = αB oB [b], αB = 1, ∀B 6= Bρ αB ≥ 0.
B6=Bρ B6=Bρ
P
As moB (Bi ) = A:B(A)=Bi mb (A) the above condition reads as:
X X X
mb (A) αB = mb (A) ∀Bi ∈ C.
A⊆Bi B:B(A)=Bi A⊆Bi :Bρ (A)=Bi
P
For i = 1 the condition is mb (B1 ) B:B(B1 )=B1 B = mb (B1 ), namely:
α
12.5 Consonant approximation in the belief space 429
X X
αB = 1, αB = 0.
B:B(B1 )=B1 B:B(B1 )6=B1
which implies αB = 0 for all the assignment functions B such that B(B2 \ B1 ) 6=
B2 or B(B2 ) 6= B2 . The only non-zero coefficients can then be the αB such that
B(B1 ) = B1 , B(B2 \ B1 ) = B2 , B(B2 ) = B2 .
By induction we get that αB = 0 for all B 6= Bρ .
Proof of Theorem 74
Let us denote as usual by {B1 , ..., Bn } the elements of the maximal chain C. By
definition the masses coρ assigns to the elements of the chain are:
X
mcoρ (Bi ) = mb (B),
B⊆Bi ,B6⊂Bi−1
so that the belief value of coρ on an arbitrary event A ⊆ Θ can be written as:
X X X
coρ (A) = mcoρ (Bi ) = mb (B)
Bi ⊆A,Bi ∈C Bi ⊆A B⊆Bi ,B6⊂Bi−1
X
= mb (B) = b(BiA ),
B⊆BiA
where BiA is the largest element of the chain included in A. But then, as the ele-
ments B1 ⊂ · · · ⊂ Bn of the chain are nested and any belief function b is monotone:
Proof of Theorem 75
X X
we have that: β(Θ) = − mb (B) − β(A). Therefore, the above norm
B6∈C A∈C,A6=Θ
reads as: kmb − mco kL1 =
X X X X
= − mb (B) − β(A) + |β(A)| + mb (B). (12.65)
B6∈C A∈C,A6=Θ A∈C,A6=Θ B6∈C
Fig. 12.11. The minima of a function of the form (12.66) with two variables x1 , x2 form the
triangle x1 ≤ 0, x2 ≤ 0, x1 + x2 ≥ −k.
This reads, in terms of the mass assignment mco of the desired consonant approxi-
mation, as:
12.5 Consonant approximation in the belief space 431
mco
X (A) ≥ mb (A) X ∀A ∈ C, A 6= Θ,
m b (A) − mco (A) ≥ − m b (B). (12.68)
A∈C,A6=Θ B6∈C
i.e., mco (Θ) ≥ mb (Θ). Therefore the partial L1 approximations in M are those
consonant b.f.s co s.t. mco (A) ≥ mb (A) ∀A ∈ C. The vertices of the set of par-
tial approximations (12.67) (see Figure 12.11) are given by the vectors of variables
{βĀ , Ā ∈ C} such that: βĀ (Ā) = mb (B), for βĀ (A) = 0 for A 6= Ā whenever
Ā 6= Θ, while βΘ = 0. Immediately, in terms of masses the vertices of the set of
partial L1 approximations have b.p.a. (12.24) and barycenter (12.25).
To find the global L1 consonant approximation(s) over the whole consonant
complex, we need to locate the component COCM at minimal L1 distance from mb .
C
P partial approximations (12.68) onto COM have L1 distance from mb equal
All the
to 2 B6∈C mb (B). Therefore, the minimal distance component(s) of the complex
are those whose maximal chains originally have maximal mass with respect to mb .
RN −2 representation Consider now the difference vector (12.21). Its L1 norm is:
X X
kmb − mco kL1 = |mb (A) − mco (A)| + mb (B),
A∈C,A6=Ā B6∈C
Proof of Theorem 76
we obtain (12.25).
To find the global L2 approximation(s), we need to compute the L2 distance of
mb from the closest such partial solution. We have:
X
kmb − mco k2L2 = (mb (A) − mco (A))2
A⊆Θ P 2
X B6∈C mb (B) X
= + (mb (B))2
n
A∈C B6∈C
P 2
B6∈C mb (B)
X
= + (mb (B))2 ,
n
B6∈C
2
P
which is once again minimized by the maximal chain(s) arg minC B6∈C (mb (B)) .
12.5 Consonant approximation in the belief space 433
Proof of Theorem 77
Such a function has two possible behaviors in terms of its minimal points in the
plane x1 , x2 .
Case 1. If k1 ≤ 3k2 its contour function has the form rendered in Figure 12.12-
left. The set of minimal points is given by xi ≥ −k2 , x1 + x2 ≤ k2 − k1 . In
the generalP case of an arbitrary number m − 1 of variables x1 , ..., xm−1 such that
xi ≥ −k2 , i xi ≤ k2 − k1 , the set of minimal points is a simplex with m vertices:
each vertex v i is such that
or, in terms of their b.p.a.s, (12.26). Its barycenter has mass assignment:
X X X
mĀ
L∞ [mb ](A) n · mb (A) + mb (B) mb (B)
Ā∈C B6∈C B6∈C
= = mb (A) + ,
n n n
for all A ∈ C, i.e., the L2 partial approximation (12.25). The corresponding minimal
L∞ norm of the difference vector is, according to (12.69), equal to maxB6∈C mb (B).
Fig. 12.12. Left: contour function (level sets) and minimal points (white triangle) of a func-
tion of the form (12.70), when k1 ≤ 3k2 . In the example k2 = 0.4 and k1 = 0.5. Right:
contour function and minimal point of a function of the form (12.70), when k1 > 3k2 . In this
example k2 = 0.1 and k1 = 0.5.
Case 2. In the second case k1 > 3k2 , i.e., for the norm (12.69),
1 X
max mb (B) < mb (B),
B6∈C n
B6∈C
[(−1/m)k1 , · · · , (−1/m)k1 ]0 ,
In terms of basic probability assignments, this yields (12.25) (the mass of Θ is ob-
tained by normalization). The corresponding minimal L∞ norm of the difference
vector is n1 B6∈C mb (B).
P
12.5 Consonant approximation in the belief space 435
i.e., in the mass coordinates mco , (12.29). According to (12.71) the corresponding
minimal L∞ norm is: maxB6∈C mb (B). Clearly, the vertices of the set (12.72) are
all the vectors of β variables such that β(A) = +/ − maxB6∈C mb (B) for all A ∈ C,
A 6= Ā. Its barycenter is given by β(A) = 0 for all A ∈ C, A 6= Ā, i.e., (12.24).
Proof of Theorem 78
C
By (12.26) the vertex mĀ
L∞ [mb ] of CO M,L∞ [mb ] meets the constraints (12.29) for
COM\Ā,L∞ [mb ]. As for the other vertices of COCM,L∞ [mb ] (12.26), let us check
C
the conditions on
. X
∆= mb (B) − n max mb (B)
B6∈C
B6∈C
X 1
n max mb (B) < mb (B) ≡ max mb (B) < ,
B6∈C B6∈C n
B6∈C
Proof of Lemma 16
In the belief space the original belief function b and the desired consonant approxi-
mation co are written as:
X X X
b= b(A)xA , co = mco (B) xA .
∅(A(Θ A⊇A1 B⊆A,B∈C
All the terms in (12.74) associated with subsets A ⊇ Ai , A 6⊃ Ai+1 depend on the
same auxiliary variable γ(Ai ), while the difference in the component xΘ is trivially
1 − 1 = 0. Therefore, we obtain (12.40).
Proof of Theorem 80
To understand the relationship between the sets COCM,L∞ [mb ] and OC C [b], let us
rewrite the system of constraints for L∞ approximations in M under condition
(12.27) as:
Indeed, this is true even if mass redistribution does take place within the chain.
Suppose that some mass mb (A), A ∈ C is reassigned to some other A0 ∈ C. By
the first constraint in (12.75), this is allowed only if mb (A) ≤ maxB6∈C mb (B).
Therefore the mass of just one outside focal element can still be reassigned to A,
while now none can be reassigned to A0 . In both cases, since the number of elements
outside the chain m = 2n − 1 − n is greater than n (unless n ≤ 2) the second
equation of (12.75) implies:
Proof of Theorem 81
After recalling the expression (12.40) of the difference vector b − co in the belief
space, the latter’s L1 norm reads as:
n−1
X X
i
X
X
kb − cokL1 = γ(Ai ) + b(A) −
mb (Aj ) + |b(A)|.
i=1 A⊇Ai ,A6⊃Ai+1 j=1 A6⊃A1
(12.76)
The norm (12.76) can be decomposed into a number of summations which depend
on a single auxiliary variable γ(Ai ). Such components are of the form |x + x1 | +
... + |x + xn |, with an even number of ‘nodes’ −xi .
Let us consider the simple function of Figure 12.13-left: it is easy to see that
similar functions are minimized by the interval of values comprised between their
two innermost nodes, i.e., in the case of norm (12.76):
i
X i
X
i i
mb (Aj ) − γint1 ≤ γ(Ai ) ≤ mb (Aj ) − γint2 ∀i = 1, ..., n − 1. (12.77)
j=1 j=1
n−1 n−1
while mco (An−1 ) = b(An−1 ), as by definition (12.42) γint1 = γint2 = b(An−1 ).
This is a set of constraints of the form l1 ≤ x ≤ u1 , l2 ≤ x + y ≤ u2 ,
l3 ≤ x + y + z ≤ u3 , also expressed as l1 ≤ x ≤ u1 , l2 − x ≤ y ≤ u2 − x,
l3 − (x + y) ≤ z ≤ u3 − (x + y). This is a polytope whose 2n−2 vertices are
obtained by assigning to x, x + y, x + y + z and so on either their lower or their
upper bound. For the specific set (12.78) this yields exactly (12.41).
438 12 Consonant approximation
Fig. 12.13. Left: minimising the L1 distance from the consonant subspace involves functions
such as the one depicted above, |x + 1| + |x + 3| + |x + 7| + |x + 8|, which is minimised
by 3 ≤ x ≤ 7. Right: minimising the L∞ distance from the consonant subspace involves
functions of the form max{|x + x1 |, ..., |x + xn |} (in bold).
Proof of Theorem 82
The minimal value of a function of the form |x + x1 | + ... + |x + xn | is:
X X
xi − xi .
i≥int2 i≤int1
In the case of the L1 norm (12.76), such minimal attained value is:
X X
b(A) − b(A),
A:A⊇Ai ,A6⊃Ai+1 ,b(A)≥γint2 A:A⊇Ai ,A6⊃Ai+1 ,b(A)≤γint1
Pi
since in the difference the addenda j=1 mb (Aj ) disappear.
Overall the minimal L1 norm is:
n−2
!
X X X X
b(A) − b(A) + b(A)
i=1 A:A⊇Ai ,A6⊃Ai+1 , A:A⊇Ai ,A6⊃Ai+1 , A6⊃A1
b(A)≥γint2 b(A)≤γint1
X n−2
X X
= b(A) − 2 b(A),
∅(A(Θ,A6=An−1 i=1 A:A⊇Ai ,A6⊃Ai+1 ,
b(A)≤γint1
Proof of Theorem 83
By replacing the hypothesized solution (12.47) for the L2 approximation in B into
the system of constraints (12.46) we get, for all j = 1, ..., n − 1:
X
mb (A)hbA , bAj i −ave(Ln−1 )hbAn−1 , bAn−1 i+
A(Θ
n−2
X
− ave(Li ) hbAi , bAj i − hbAi+1 , bAj i = 0,
i=1
12.5 Consonant approximation in the belief space 439
where
{A : C ⊆ A ( Θ, A ⊇ Aj } = {A : A ⊇ (C ∪ Aj ), A 6= Θ}
c
= 2|(C∪Aj ) | − 1 = hbC , bAj i.
Therefore, summarizing:
n−1
X X X
b(A) = mb (C)hbC , bAj i.
i=j A⊇Ai ,A6⊃{xi+1 } C(Θ
Proof of Theorem 84
Given the expression (12.40) for the difference vector of interest in the belief space,
we can compute the explicit form of its L∞ norm as: kb − cok∞ =
Xi
X
= max max max γ(Ai ) + b(A) − mb (Aj ), max
mb (B)
i A⊇Ai ,A6⊃Ai+1 A6⊃A1
j=1 B⊆A
Xi
mb (Aj ), b(Ac1 ) ,
= max max max γ(Ai ) + b(A) −
i A⊇Ai ,A6⊃Ai+1
j=1
P (12.80)
c
as maxA6⊃A1 B⊆A mb (B) = b(A1 ). Now, (12.80) can be minimized separately
for each i = 1, ..., n − 1. Clearly, the minimum is attained when the variable ele-
ments in (12.80) are not greater than the constant element b(Ac1 ):
i
X
c
max γ(Ai ) + b(A) − m b (A j ≤ b(A1 ).
) (12.81)
A⊇Ai ,A6⊃Ai+1
j=1
The left hand side of (12.81) is a function of the form max |x + x1 |, ..., |x +
xn | (see Figure 12.13-right). Such functions are minimized by x = − xmin +x2
max
(see Figure 12.13-right again). In the case of (12.81), such minimum and maximum
offset values are, respectively,
i
X
i
γmin = b(Ai ) − mb (Aj ),
j=1
i
X i
X
i
γmax = b({xi+1 }c ) − mb (Aj ) = b(Ai + Aci+1 ) − mb (Aj ),
j=1 j=1
i.e.:
i i
γmin + γmax γ i + γmax
i
− − b(Ac1 ) ≤ γ(Ai ) ≤ − min + b(Ac1 ) ∀i = 1, ..., n − 1.
2 2
In terms of mass assignments, this is equivalent to:
i
b(Ai ) + b({xi+1 }c ) X b(Ai ) + b({xi+1 }c )
−b(Ac1 ) + ≤ mco (Ai ) ≤ b(Ac1 ) + .
2 j=1
2
(12.82)
12.5 Consonant approximation in the belief space 441
Proof of Theorem 86
whereas Li−1 contains 2 · |Li | elements and can then be written as the union of two
lists of |Li | elements:
= ave(Li−1 ).
As for the barycenter L1 approximation, obviously int1 (Li ) ≥ b(Ai + B) for all
442 12 Consonant approximation
B s.t. b(Ai + B) ≤ int1 (Li ) so that int1 (Li ) ≥ b(Ai + xi+1 + B) for all B
s.t. b(Ai + B) ≤ int1 (Li ) as well. Hence int1 (Li ) ≥ int1 (Li−1 ) since int1 (Li )
dominates at least half the elements of Li−1 .
In the same way, int2 (Li ) ≥ b(Ai + B) for all B s.t. b(Ai + B) ≤ int2 (Li ) so that
int2 (Li ) ≥ b(Ai + xi+1 + B) for the same Bs, hence int2 (Li ) ≥ int2 (Li−1 ) since
int2 (Li ) dominates at least |Li−1 |/2 + 2 elements of Li−1 .
Summarising, if C = {A1 ⊂ · · · ⊂ An }, with Ai = {x1 , ..., xi } is such that
(12.83) holds for all i, then:
As we know belief functions are complex objects, in which different and sometimes
contradictory bodies of evidence may coexist, as they mathematically describe the
fusion of possibly conflicting expert opinions and/or imprecise/ corrupted measure-
ments, etcetera. As a consequence, making decisions based on such objects can be
misleading. As we discussed in the second part of Chapter 9, this is a well known
problem in classical logics, where the application of inference rules to inconsis-
tent knowledge bases (sets of propositions) may lead to incompatible conclusions
[1012]. We have also seen that belief functions can be interpreted as generalisations
of knowledge bases in which a belief value, rather than a truth one, is attributed to
each formula (interpreted as the set of worlds for which that formula is true).
We have also identified consistent belief functions, belief functions whose focal ele-
ments have non-empty intersection, as the natural counterparts of consistent knowl-
edge bases in belief theory.
Analogously to consistent knowledge bases, consistent belief functions are char-
acterized by null internal conflict. It may be therefore be desirable to transform a
generic belief function to a consistent one prior to making a decision, or picking a
course of action. This is all the more valuable as several important operators used to
update or elicit evidence represented as belief measures, like Dempster’s sum [343]
and disjunctive combination [1225] (cfr. Section 4.2), do not preserve consistency.
We have seen in this Book how the transformation problem is spelled out in the prob-
abilistic [311, 267] (Chapters 10 and 11) and possibilistic [?] (Chapter 12) case. As
we argued for probability transforms, consistent transformations can be defined as
the solutions to a minimization problem of the form:
443
444 13 Consistent approximation
where b is the original belief function, dist an appropriate distance measure between
belief functions, and CS denotes the collection of all consistent b.f.s. We call (13.1)
the consistent transformation problem. Once again, by plugging in different distance
functions in (13.1) we get different consistent transformations. We refer to Section
4.8.2 for a review of dissimilarity measures for belief functions.
As we did for consonant belief functions in Chapter 12, in this Chapter we focus
on what happens when applying the classical Lp norms to the consistent approxi-
mation problem. Indeed the L∞ norm, in particular, is closely related to consistent
belief functions, as the region of consistent b.f.s can be expressed as
n o
CS = b : max plb (x) = 1 ,
x∈Θ
i.e., the set of b.f.s for which the L∞ norm of the ‘contour function’ plb (x) is
equal to 1. In addition, consistent belief functions relate to possibility distributions,
and possibility measures P os are inherently associated with L∞ as P os(A) =
maxx∈A P os(x).
Chapter content
Chapter outline
We briefly recall in Section 13.1 how to solve the transformation problem separately
for each maximal simplex of the consistent complex. We then proceed to solve the
L1 -, L2 - and L∞ -consistent approximation problems in full generality, in both the
mass (Section 13.2) and the belief (Section 13.3) space representations. In Section
13.4 we compare and interpret the outcomes of Lp approximations in the two frame-
works, with the help of the ternary example.
csxLp [b] = arg min x kb − cskLp , csxLp [mb ] = arg min km − mcs kLp
cs∈CS B mcs ∈CS x
M
(13.2)
in the belief/mass space, respectively. Then, the distance of b from all such partial
solutions needs to be assessed to select a global, optimal approximation. As a matter
of fact, an analysis of the outcomes of Lp consistent approximation in the case of
a binary frame has already been run in Section 12.3 (as if |Θ| = 2 consonant and
consistent belief functions coincide).
446 13 Consistent approximation
13.2.1 L1 approximation
.
Let us first tackle the L1 case. After introducing the auxiliary variables β(B) =
mb (B) − mcs (B) we can write the L1 norm of the difference vector as:
X X
kmb − mcs kM L1 = |β(B)| + |mb (B)|, (13.5)
B⊇{x},B6=Θ B6⊃{x}
The mass of all the subsets not in the desired principal ultrafilter {B ⊇ {x}} is
simply reassigned to Θ.
13.2 Consistent approximation in M 447
13.2.2 L∞ approximation
namely:
mb (B) − max mb (C) ≤ mcsxL [mb ] (B) ≤ mb (B) + max mb (C). (13.7)
C6⊃{x} ∞ ,M C6⊃{x}
Clearly this set of solutions can also include pseudo belief functions.
Global approximation. Once again, the global L∞ consistent approximation
in M coincides with the partial approximation (13.7) at minimal distance from the
original b.p.a. mb .
The partial approximation focussed on x has distance max mb (B) from mb . The
B6⊃{x}
global L∞ approximation mcsL∞ ,M [mb ] is therefore the (union of the) partial ap-
proximation(s) associated with the singleton(s) such that:
13.2.3 L2 approximation
which is minimized by the same singleton(s). Note that, even though (in the N − 2
representation) the partial L1 and L2 approximations coincide, the global approxi-
mations in general may fall on different components of the consonant complex.
We have seen that in the mass space (at least in its N −2 representation, Theorem 89)
the L1 and L2 approximations coincide. This is true in the belief space in the general
case as well. We will gather some intuition on the general solution by considering
first the slightly more complex case of a ternary frame: Θ = {x, y, z}.
We will use the notation:
X X
cs = mcs (B)bB , b = mb (B)bB .
B⊇{x} B(Θ
hb − cs, bB i = 0
∀B : {x} ⊆ B ( Θ.
X X
As b − cs = (mb (A) − mcs (A))bA = β(A)bA the condition becomes:
A(Θ A(Θ
X X
β(A)hbA , bB i + mb (A)hbA , bB i = 0 ∀B : {x} ⊆ B ( Θ.
A⊇{x} A6⊃{x}
(13.12)
450 13 Consistent approximation
Linear system for L1 In the L1 case, the minimization problem to solve is:
X X X
arg min mb (B) − mcs (B)
α
B⊆A
A⊇{x}
X X
B⊆A,B⊇{x}
X
= arg min β(B) + mb (B) ,
β
A⊇{x} B⊆A,B⊇{x} B⊆A,B6⊃{x}
Linear transformation in the ternary case An interesting fact emerges when com-
paring the linear systems for L1 and L2 in the ternary case Θ = {x, y, x}:
3β(x) + β(x, y) + β(x, z)+
β(x) = 0
+mb (y) + mb (z) = 0
β(x) + β(x, y) + mb (y) = 0 (13.14)
β(x) + β(x, y) + mb (y) = 0
β(x) + β(x, z) + mb (z) = 0.
β(x) + β(x, z) + mb (z) = 0
The solution is the same for both, as the second linear system can be obtained from
the first one by a simple linear transformation of rows (we just need to substitute the
first equation e1 of the first system with the difference: e1 7→ e1 − e2 − e3 ).
Linear transformation in the general case This holds in the general case, too.
Lemma 17.
X
|B\A| 1C⊆A
hbB , bC i(−1) =
0 otherwise.
B⊇A
Corollary 18. The linear system (13.12) can be reduced to the system (13.13)
through the following linear transformation of rows:
X
rowA 7→ rowB (−1)|B\A| . (13.15)
B⊇A
mcsxL [b] (A) = mcsxL [b] (A) = mb (A) − β(A) = mb (A) + mb (A \ {x})
1 2
Example The partial consistent approximation with core {x} of an example belief
function defined on a frame Θ = {x, y, z, w} is illustrated in Figure 13.1.
Fig. 13.1. A belief function on Θ = {x, y, z, w} (left) and its partial L1 /L2 consistent
approximation in B with core {x} (right).
The b.f. with focal elements {y}, {y, z}, and {x, z, w} is transformed by the map-
ping:
452 13 Consistent approximation
and the global approximation falls on the component of the consistent complex as-
sociated with the element of maximal plausibility. Unfortunately, in the case of an
arbitrary frame Θ (13.16) is not necessarily the maximal plausibility element:
X
arg min b(A) 6= arg max plb (x),
x∈Θ x∈Θ
A⊆{x}c
Once again, in the binary case the condition of Theorem 92 reads as:
X
x̂ = arg min (b(A))2 = arg min(mb ({x}c ))2 = arg max plb (x)
x x x
A⊆{x}c
and the global approximation for L2 also falls on the component of the consistent
complex associated with the element of maximal plausibility, while this is not gen-
erally true for an arbitrary frame.
As observed in the binary case, for each component CS x of the consistent com-
plex the set of partial L∞ -approximations form a polytope whose center of mass is
exactly equal to the partial L1 /L2 approximation.
Theorem 93. Given an arbitrary belief function b : 2Θ → [0, 1] and an element
x ∈ Θ of its frame of discernment, its L∞ partial consistent approximation with
core containing x in the belief space CSLx ∞ ,B [mb ] is determined by the following
system of constraints:
X X
−b(xc ) − mb (B) ≤ γ(A) ≤ b(xc ) − mb (B), (13.17)
B⊆A,B6⊃{x} B⊆A,B6⊃{x}
where
. X X
γ(A) = β(B) = mb (B) − mcs (B) . (13.18)
B⊆A,B⊇{x} B⊆A,B⊇{x}
– the set of partial L∞ solutions form a polytope on each component of the consis-
tent complex, whose center of mass lies on the partial L1 /L2 approximation;
– the global L∞ solutions fall on the component(s) associated with the maximal
plausibility element(s), and their center of mass, when such element is unique, is
the consistent transformation focused on the maximal plausibility singleton [?].
Approximations in both mass and belief space reassign the total mass b(xc )
outside the filter focussed on x, although in different ways. However, mass space-
consistent approximations do so either on an equal basis, or by favouring no particu-
lar focal element in the filter (i.e., by reassigning the entire mass to Θ). They do not
distinguish focal elements in virtue of their set-theoretic relationships with subsets
B 6⊃ x outside the filter.
In contrast, approximations in the belief space do so according to the focussed con-
sistent transformation principle.
It can be useful to illustrate the different approximations in the toy case of a ternary
frame, Θ = {x, y, z}, for sake of completeness. Assuming we want the consistent
approximation to focus on x, Figure 13.2 illustrates the different partial consistent
approximations in the simplex Cl(mx , mx,y , mx,z , mΘ ) of consistent belief func-
tions focussed on x in a ternary frame, for the belief function with masses:
Fig. 13.2. The simplex (solid black tetrahedron) Cl(mx , mx,y , mx,z , mΘ ) of consistent be-
lief functions focussed on x on the ternary frame Θ = {x, y, z}, and the associated Lp partial
consistent approximations for the example belief function (13.19).
Chapter appendix
Proof of Theorem 89
i.e., mcs (x) = mb (x) + (1 − plb (x))/2|Θ|−1 , as there are 2|Θ|−1 subsets in the
ultrafilter containing x.
By replacing the value of mcs (x) into the first equation we get (13.11).
Proof of Corollary 18
Proof of Lemma 17
Therefore:
X X c
hbB , bC i(−1)|B\A| = (2|(B∪C) | − 1)(−1)|B\A|
B⊆A B⊆A
X c X
= 2|(B∪C) | (−1)|B\A| − (−1)|B\A| (13.20)
B⊆A B⊆A
X c
= 2|(B∪C) | (−1)|B\A|
B⊆A
|B\A|
X X c
as (−1)|B\A| = 1|A |−k
(−1)k = 0 by Newton’s binomial (8.4).
B⊆A k=0
As both B ⊇ A and C ⊇ A the set B can be decomposed into the disjoint sum
B = A + B 0 + B 00 , where ∅ ⊆ B 0 ⊆ C \ A, ∅ ⊆ B 00 ⊆ (C ∪ A)c .
The quantity (13.20) can then be written as:
X X c 00 0 00
2|(A∪C)| −|B | (−1)|B |+|B | =
∅⊆B 0 ⊆C\A ∅⊆B 00 ⊆(C∪A)c
0 00 c
−|B 00 |
X X
= (−1)|B |
(−1)|B | 2|(A∪C)| ,
∅⊆B 0 ⊆C\A ∅⊆B 00 ⊆(C∪A)c
where
00 c
−|B 00 | c c
X
(−1)|B | 2|(A∪C)| = [2 + (−1)]|(A∪C)| = 1|(A∪C)| = 1,
∅⊆B 00 ⊆(C∪A)c
Proof of Theorem 91
The L1 distance between the partial approximation and b can be easily computed
as: kb − csxL1 [b]kL1 =
X
= |b(A) − csxL1 [b](A)|
A⊆Θ
X X X
= b(A) −
|b(A) − 0| + mcs (B)
A6⊃{x}
X X A⊇{x}
X B⊆A,B⊇{x}
X
= b(A) + mb (B) − (mb (B) + mb (B \ {x}))
A6⊃{x} A⊇{x} B⊆A B⊆A,B⊇{x}
X X X X
= b(A) + m (B) − mb (B \ {x})
b
A6⊃{x} A⊇{x} B⊆A,B6⊃{x} B⊆A,B⊇{x}
X X X X
= b(A) + mb (C) − mb (C)
A6⊃{x} A⊇{x} C⊆A\{x} C⊆A\{x}
X X
= b(A) = b(A).
A6⊃{x} A⊆{x}c
Proof of Theorem 92
The L2 distance between the partial approximation and b can be computed as: kb −
csxL2 [b]k2 =
X XX X 2
= (b(A) − csxL2 [b](A))2 = mb (B) − mcs (B)
A⊆Θ A⊆Θ B⊆A B⊆A,B⊇{x}
XX X X 2
= mb (B) − mb (B) − mb (B \ {x})
A⊆Θ B⊆A B⊆A,B⊇{x} B⊆A,B⊇{x}
X X X X 2
2
= (b(A)) + mb (B) − mb (B \ {x})
A6⊃{x} A⊇{x} B⊆A,B6⊃{x} B⊆A,B⊇{x}
X X X X 2
2
= (b(A)) + mb (C) − mb (C)
A6⊃{x} A⊇{x} C⊆A\{x} C⊆A\{x}
X X
so that kb − csxL2 [b]k2 = (b(A))2 = (b(A))2 .
A6⊃{x} A⊆{x}c
Proof of Theorem 93
The quantity maxA(Θ has as lower limit the value associated with the largest norm
which does not depend on mcs (.), i.e.:
X X
mcs (B) ≥ b({x}c ).
max mb (B) −
A(Θ
B⊆A B⊆A,B⊇{x}
In the above constraint only the expressions associated with A ⊇ {x} contain vari-
able terms β(B). Therefore, the desired optimal values are such that:
X X
mb (B) ≤ b({x}c )
β(B) + {x} ⊆ A ( Θ.
B⊆A,B⊇{x} B⊆A,B6⊃{x}
(13.21)
After introducing the change of variables (13.18), system (13.21) reduces to:
X
mb (B) ≤ b({x}c )
γ(A) +
{x} ⊆ A ( Θ
B⊆A,B6⊃{x}
Proof of Corollary 20
The centre of mass of the set (13.17) of solutions to the L∞ consistent approxima-
tion problem is given by:
X
γ(A) = − mb (B), {x} ⊆ A ( Θ;
B⊆A,B6⊃{x}
But this is exactly the linear system (13.13) which determines the L1 /L2 consistent
approximation csxL1/2 [b] of b onto CS x .
Part IV
463
15
Geometric conditioning
465
466 15 Geometric conditioning
of this Book has computed some conditional belief functions generated by minimis-
ing Lp norms in the ‘mass space’ (cfr. Chapter 12, Section 15.5), where b.f.s are
represented by the vectors of their basic probabilities.
Chapter content
In this Chapter we explore the geometric conditioning approach in both the mass
space M and the belief space B, in which belief functions are represented by the
vectors of their belief values b(A) (Chapter 6). We adopt once again distance mea-
sures d of the classical Lp family, as a first step towards a complete analysis of the
geometric approach to conditioning. We show that geometric conditional b.f.s in B
are more complex than in the mass space, less naive objects whose interpretation in
terms of degrees of belief is however less natural.
Conditioning in the belief space Conditional belief functions in the belief space
seem to have rather less straightforward interpretations than the corresponding
quantities in the mass space. The barycenter of the set of L∞ conditional belief
functions can be interpreted as follows: the mass of all the subsets whose intersec-
tion with A is C ( A is re-assigned by the conditioning process half to C, and half
to A itself. While in the M case the barycenter of L1 conditional b.f.s is obtained
by reassigning the mass of all B 6⊂ A to each B ( A on equal grounds, for the
barycenter of L∞ conditional b.f.s in B normalization is achieved by adding or sub-
tracting their masses according to the cardinality of C (even or odd). As a result, the
15.1 Conditioning in belief calculus: a concrete scenario 467
obtained mass function is not necessarily non-negative: again, such version of ge-
ometrical conditioning may generated pseudo belief functions. Furthermore, while
being quite similar to it, the L2 conditional belief function in B is distinct from the
barycenter of the L∞ conditional b.f.s.
In the L1 case, not only the resulting conditional pseudo belief functions are not
guaranteed to be proper belief functions, but it appears difficult to find simple inter-
pretations for these results in terms of degrees of belief.
A number of interesting cross relations between conditional belief functions of
the two representation domains appear to exist from an empirical comparison, and
remain to be investigated further.
Chapter outline
The data association problem is one of the more intensively studied computer vi-
sion applications for its important role in the implementation of automated defense
systems, and its connections to the classical field of ‘structure from motion’, i.e., the
reconstruction of a rigid scene from a sequence of images.
A number of feature points moving in the 3D space are followed by one or more
cameras and appear in an image sequence as ‘unlabeled’ points (i.e. we do not know
the correspondences between points appearing in two consecutive frames). A typical
example consists of a set of markers set at fixed positions on a moving articulated
468 15 Geometric conditioning
body (e.g., a human body): in order to reconstruct the trajectory of the cloud of
markers (or of the underlying body) we need to associate feature points belonging
to pairs of consecutive images, Ik and Ik+1 .
A classical approach to the data association problem called joint probabilistic
data association filter (JPDAF) [58], is based on tuning a number of Kalman filters
(each associated with a single feature point), whose aim is to predict the future po-
sition of each target, in order to produce the most probable labeling of the cloud of
points in the next image.
Unfortunately, the JPDAF method suffers from a number of drawbacks: for exam-
ple, when several features converge to a small region (‘coalescence’ [116]) the al-
gorithm cannot tell them apart. Several techniques have been proposed to overcome
this sort of problems [648].
However, assume that the feature points represent fixed locations {Mi , i =
1, ..., M } on an articulated body, and that we know the rigid motion constraints
between pairs of markers. This is equivalent to possessing a topological model of
the articulated body, represented by an undirected graph whose edges correspond
to rigid motion constraints. We can then exploit such a-priori information to solve
the association task in critical situations where several points fall into the validation
region of a single filter.
A topological model of the body to track, for instance, can provide:
– a prediction constraint, encoding the likelihood of a measurement mki at time k
of being associated with a measurement mk−1 i of the previous image;
– an occlusion constraint, expressing the chance that a given marker of the model
is occluded in the current image;
– a metric constraint, representing the knowledge of the lengths of the links, which
can be learned from the history of the past associations;
– a rigid motion constraint on pairs of markers.
Belief calculus provides a coherent framework in which to combine all these sources
of information, and cope with possible conflicts. Indeed all these constraints can be
expressed as belief functions over a suitable frame of discernment, namely set of
possible associations mi ↔ mj between feature points.
By reflecting on the nature of the above constraints we can note that the information
carried by predictions of filters and occlusions inherently concerns associations be-
tween feature points belonging to consecutive images, while other conditions (such
as the metric constraint) can be expressed instantaneously in the frame of the cur-
rent time-k associations. Finally, a number of bodies of evidence depend on the
model-measurement associations mk−1 i ↔ Mj at the previous time step. This is the
case of belief functions encoding the information carried by the motion of the body,
expression of rigid motion constraints.
We can then introduce the frame of discernment of past model-to-feature asso-
ciations:
15.1 Conditioning in belief calculus: a concrete scenario 469
Fig. 15.1. Rigid motion constraints in the data association problem involve the combination
of a set of conditional belief functions in each partition of the joint association space in a
single total function.
k−1 .
ΘM = {mk−1
i ↔ Mj , ∀i = 1, ..., nk−1 ∀j = 1, ..., M, }
where nk is the number of feature points {mki } appearing in image Ik (see Figure
15.1.
All the available pieces of evidence can be combined on the ‘minimal refinement’
k−1
(see [1149] of [?] of all these frames, the product association frame ΘM ⊗ Θkk−1 .
k
The result is later projected onto the current association set ΘM in order to yield the
best current estimate.
Crucially, rigid motion constraints can be expressed in a conditional way only:
hence, the computation of a belief estimate of the desired current model-measurement
associations (15.2) involves combining conditional belief functions defined over the
product association frame.
The purpose of this Chapter is to study how such conditional belief functions can be
induced by minimizing geometric distances between belief measures.
470 15 Geometric conditioning
The same is true in the belief space, where (the vector a associated with) each b.f. a
assigning mass to focal elements included in A only is decomposable as:
X
a= a(B)bB .
∅(B⊆A
.
These vectors live in a simplex BA = Cl(bB , ∅ ( B ⊆ A). We call MA and BA
the conditioning simplices in the mass and the belief space, respectively.
Definition 85. Given a belief function b : 2Θ → [0, 1], we call geometric condi-
tional belief function induced by a distance function d in M (B) the belief func-
tion(s) bd,M (.|A) (bd,B (.|A)) on Θ which minimize(s) the distance d(mb , MA )
(d(b, BA )) between the mass (belief) vector representing b and the conditioning
simplex associated with A in M (B).
Using the expression (12.15) of the L1 norm in the mass space M, (15.4) becomes:
15.3 Geometric conditional belief functions in M 471
X
arg min kmb − ma kL1 = arg min |mb (B) − ma (B)|.
ma ∈MA ma ∈MA
∅(B⊆Θ
.
where β(B) = mb (B) − ma (B).
Theorem 95. Given a b.f. b : 2Θ → [0, 1] and an arbitrary non-empty focal element
∅ ( A ⊆ Θ, the set of L1 conditional belief functions bL1 ,M (.|A) with respect to A
in M is the simplex
ma (X) = mb (X) ∀∅ ( X ( A, X 6= B.
(15.7)
It is important to notice that all the vertices of the L1 conditional simplex fall inside
MA proper (as the mass assignment (15.7) is non-negative for all subsets X). A
priori, some of them could have belonged to the linear space generated by MA
but outside the simplex MA (i.e., some of the solutions ma (B) could have been
negative). This is indeed the case for geometrical belief functions induced by other
norms, as we will see in the following.
472 15 Geometric conditioning
Let us now compute the analytical form of the L2 conditional belief function(s) in
the mass space. We make use of the form (15.5) of the difference vector mb − ma ,
where again ma is an arbitrary vector of the conditional simplex MA . As usual,
rather than minimising the norm of the difference we seek the point of conditioning
simplex such that the difference vector is orthogonal to all the generators of a(MA ).
1 X plb (Ac )
mL2 ,M (B|A) = mb (B) + mb (B) = mb (B) + |A| .
2|A| − 1 B6⊂A 2 − 1 (15.8)
According to Equation (15.8) the L2 conditional belief function is unique, and corre-
sponds to the mass function which redistributes the mass the original belief function
assigns to focal elements not included in A to each and all the subsets of A in an
equal, even way.
L2 and L1 conditional belief functions in M exhibit a strong relationship.
Theorem 97. Given a belief function b : 2Θ → [0, 1] and an arbitrary non-empty
focal element ∅ ( A ⊆ Θ, the L2 conditional belief function bL2 ,M (.|A) with
respect to A in M is the center of mass of the simplex ML1 ,A [b] of L1 conditional
belief functions with respect to A in M.
Proof. By definition the center of mass of ML1 ,A [b], whose vertices are given by
(15.7), is the vector
1 X
|A|
m[b]|B
L1 A
2 −1
∅(B⊆A
1 h
|A|
i
whose entry B is given by mb (B)(2 − 1) + (1 − b(A)) , i.e., (15.8).
2|A| − 1
Similarly, we can use Equation (15.5) to minimize the L∞ distance between the
original mass vector mb and the conditioning subspace MA .
Theorem 98. Given a belief function b : 2Θ → [0, 1] with b.p.a. mb , and an ar-
bitrary non-empty focal element ∅ ( A ⊆ Θ, the set of L∞ conditional belief
functions mL∞ ,M (.|A) with respect to A in M forms the simplex:
with vertices
15.3 Geometric conditional belief functions in M 473
m[b]|B̄
(
L∞ (B|A) = mb (B) + max mb (C) ∀B ⊆ A, B 6= B̄
C6⊂A
|A|
m[b]|B̄
P
L∞ (B̄|A) = mb (B̄) + C6⊂A mb (C) − (2 − 2) maxC6⊂A mb (C),
(15.9)
whenever:
1 X
max mb (C) ≥ |A| mb (C). (15.10)
C6⊂A 2 − 1 C6⊂A
whenever:
1 X
max mb (C) < mb (C). (15.11)
C6⊂A 2|A|− 1 C6⊂A
The latter is the barycenter of the simplex of L∞ conditional b.f.s in the former case,
and coincides with the L2 conditional belief function (15.8).
If |A| = 2, A = {x, y}, the conditional simplex is 2-dimensional, with three vertices
mx , my and mx,y . For a b.f. b on Θ = {x, y, z} Theorem 94 states that the vertices
of the simplex ML1 ,A of L1 conditional belief functions in M are:
{x}
m[b]|L1 {x, y} = [mb (x) + plb (z), mb (y), mb (x, y) ]0 ,
{y}
m[b]|L1 {x, y} = [mb (x), mb (y) + plb (z), mb (x, y) ]0 ,
{x,y}
m[b]|L1 {x, y} = [mb (x), mb (y), mb (x, y) + plb (z) ]0 .
Figure 15.2 shows such simplex in the case of a belief function b on the ternary
frame Θ = {x, y, z} and basic probability assignment
Fig. 15.2. The simplex (solid red triangle) of L1 conditional belief functions in M associated
with the belief function with mass assignment (15.12) in Θ = {x, y, z}. The related unique
L2 conditional belief function in M is also plotted as a red square. It coincides with the center
of mass of the L1 set. The set of L∞ conditional (pseudo) belief functions is also depicted
(green triangle).
We hence fall under condition (15.10), and there is a whole simplex of L∞ con-
ditional belief function (in M). According to Equation (15.9) such simplex has
2|A| − 1 = 3 vertices, namely (taking into account the nil masses in (15.12)):
{x}
m[b]|L∞ ,M {x, y} = [mb (x) − mb (x, z), mb (y) + mb (x, z), mb (x, z) ]0 ,
{y}
m[b]|L∞ ,M {x, y} = [mb (x) + mb (x, z), mb (y) − mb (x, z), mb (x, z) ]0 ,
{x,y}
m[b]|L∞ ,M {x, y} = [mb (x) + mb (x, z), mb (y) + mb (x, z), −mb (x, z) ]0 .
(15.14)
We can notice that the set of L∞ conditional (pseudo) b.f.s is not entirely admis-
sible, but its admissible part contains the set of L1 conditional b.f.s, which amounts
therefore a more ‘conservative’ approach to conditioning. Indeed, the latter is the
triangle inscribed in the former, determined by its median points. Note also that
both the L1 and L∞ simplices have the same barycenter in the L2 conditional b.f.
(15.13).
15.3 Geometric conditional belief functions in M 475
Just as we did in Section 12.5.6 for consonant approximations in the mass space, we
can provide an interesting interpretation of geometric conditional belief functions in
the mass space in the framework of the ‘imaging’ approach [1032].
Suppose we briefly glimpse at a transparent urn filled with black or white balls,
and are asked to assign a probability value to the possible ‘configurations’ of the
urn. Suppose also that we are given three options: 30 black balls and 30 white balls
(state a); 30 black balls and 20 white balls (state b); 20 black balls and 20 white
balls (state c). Hence, Θ = {a, b, c}. Since the observation only gave us the vague
impression of having seen approximately the same number of black and white balls,
we would probably deem the states a and c equally likely, but at the same time we
would tend to deem the event ‘a or c’ twice as likely as the state b. Hence, we assign
probability 1/3 to each of the states. Now, we are told that state c is false. How do
we revise the probabilities of the two remaining states a and b?
Lewis [841] argued that, upon observing that a certain state x ∈ Θ is impossible,
476 15 Geometric conditioning
15.4.1 L2 conditioning in B
We start with the L2 norm, as this seems to have a more straightforward interpreta-
tion in the belief space.
Theorem 99. Given a belief function b : 2Θ → [0, 1] with b.p.a. mb , and an arbi-
trary non-empty focal element ∅ ( A ⊆ Θ, the L2 conditional b.f. bL2 ,B (.|A) with
respect to A in the belief space B is unique, and has basic probability assignment:
X X
mL2 ,B (C|A) = mb (C) + mb (B ∪ C)2−|B| + (−1)|C|+1 mb (B)2−|B|
B⊆Ac B⊆Ac
(15.16)
for each proper subset ∅ ( C ( A of the event A.
Example: the ternary frame In the ternary case the unique L2 mass space- condi-
tional belief function has b.p.a. ma such that:
mb (z) + mb (x, z)
ma (x) = mb (x) + ,
2
mb (z) + mb (y, z)
ma (y) = mb (y) + , (15.17)
2
mb (x, z) + mb (y, z)
ma (x, y) = mb (x, y) + mb (Θ) + .
2
At a first glance, each focal element B ⊆ A seems to be assigned to a fraction of
the original mass mb (X) of all focal elements X of b such that X ⊆ B ∪ Ac . This
contribution seems proportional to the size of X ∩ Ac , i.e., how much the focal
element of b falls outside the conditioning event A.
Notice that Dempster’s conditioning b⊕ (.|A) = b ⊕ bA yields in this case:
L2 conditioning in the belief space differs from its ‘sister’ operation in the mass
space (Theorem 96) in that it makes use of the set-theoretic relations between focal
elements, just as Dempster’s rule does. However, contrarily to Dempster’s condi-
tioning, it does not apply any normalization, as even subsets of Ac ({z} in this case)
contribute as addenda to the mass of the resulting conditional belief function.
Interpretation As for the general case (15.16), we can notice that the (unique) L2
conditional belief function in the belief space is not guaranteed to be a proper belief
function, as some masses can be negative, due to the addendum
X
(−1)|C|+1 mb (B)2−|B| .
B⊆Ac
478 15 Geometric conditioning
The quantity shows, however, an interesting connection with the redistribution pro-
cess associated with the orthogonal projection π[b] of a belief function onto the
probability simplex ([267], Section 10.4), in which the mass of each subset A is
re-distributed among all its subsets B ⊆ A on an equal basis.
Here (15.16), the mass of each focal element not included in A is also broken
into 2|B| parts, equal to the number of its subsets. Only one such part is re-attributed
to C = B ∩ A, while the rest is re-distributed to A itself.
15.4.2 L1 conditioning in B
To discuss L1 conditioning in the belief space we need to write explicitly the differ-
ence vector b − a.
Lemma 19. The L1 norm of the difference vector b − a can be written as
X
kb − akL1 = γ(B ∩ A) + b(B) − b(B ∩ A)
∅(B∩A(A
so that the L1 conditional belief functions in B are the solutions of the following
minimization problem:
X
arg minγ kb − akL1 = arg min γ(B ∩ A) + b(B) − b(B ∩ A),
γ
∅(B∩A(A
P
where β(B) = mb (B) − ma (B) and γ(B) = C⊆B β(B).
As we also noticed in the L1 minimization problem in the mass space, each group of
addenda which depend on the same variable γ(X), ∅ ( X ( A, can be minimized
separately. Therefore, the set of L1 conditional belief functions in the belief space
B is determined by the following minimization problem:
X
arg min γ(X) + b(B) − b(X) ∀∅ ( X ( A. (15.18)
γ(X)
B:B∩A=X
The functions appearing in (15.18) are of the form |x+k1 |+...+|x+km |, where m
is even. Such functions are minimized by the interval determined by the two central
‘nodes’ −kint1 ≤ −kint2 (see Figure 15.3 for an example, and compare the proof
of Theorem 81, Chapter 12).
In the case of system (15.18) this yields:
X X
b(X) − b(Bint 1
) ≤ γ(X) ≤ b(X) − b(Bint2
), (15.19)
X X
where Bint 1
and Bint 2
are the central, median values of the collection {b(B), B ∩
A = X}. Unfortunately, it is not possible, in general, to determine the median
values of such a collection of belief values, as belief functions are defined on a
partially (rather than totally) ordered set (the power set 2Θ ).
15.4 Geometric conditioning in the belief space 479
The special case |Ac | = 1 This is possible, however, in the special case in which
|Ac | = 1 (i.e., the conditioning event is of cardinality n − 1). In this case:
X
Bint1
= b(X + Ac ), X
Bint2
= b(X),
It is not difficult to see that, in the variables {β(X)}, the solution reads as:
X
b(X) − b(X + Ac ) ≤ β(X) ≤ − b(B) − b(B + Ac ) ,
∅(B(X
15.4.3 L∞ conditioning in B
Let us finally approach the problem of finding L∞ conditional belief functions given
an event A, starting with the ternary case study.
The ternary case In the ternary case, kb − akL∞ = max∅(B(Θ |b(B) − a(B)| =
n
max |b(x) − a(x)|, |b(y) − a(y)|, |b(z)|, |b(x, y) − a(x, y)|, |b(x, z) − a(x, z)|,
o n
|b(y, z) − a(y, z)| = max |mb (x) − ma (x)|, |mb (y) − ma (y)|, |mb (z)|,
|mb (x) + mb (y) + mb (x, y) − ma (x) − ma (y) − ma (x, y)|, |mb (x)+o
+mb (z) + mb (x, z) − ma (x)|, |mb (y) + mb (z) + mb (y, z) − ma (y)| =
n
max |β(x)|, |β(y)|, mb (z), 1 − b(x, y), |β(x) + mb (z) + mb (x, z)|,
o
|β(y) + mb (z) + mb (y, z)| ,
On the left hand side we have functions of the form max{|x|, |x + k|}. The interval
of values in which such a function is below a certain threshold k 0 ≥ k is [−k 0 , k 0 −k].
This yields:
b(x, y) − 1 ≤ β(x) ≤ 1 − b(x, y) − (mb (z) + mb (x, z))
(15.21)
b(x, y) − 1 ≤ β(y) ≤ 1 − b(x, y) − (mb (z) + mb (y, z)).
The solution in the masses of the sought L∞ conditional b.f. reads as:
mb (x) − mb (y, z) − mb (Θ) ≤ ma (x) ≤ 1 − (mb (y) + mb (x, y))
(15.22)
mb (y) − mb (x, z) − mb (Θ) ≤ ma (y) ≤ 1 − (mb (x) + mb (x, y)).
Its barycenter is clearly given by:
ma (x) = mb (x) + mb (z)+m 2
b (x,z)
ma (y) = mb (y) + mb (z)+m 2
b (y,z)
mb (x,z)+mb (y,z)
ma (x, y) = 1 − ma (x) − ma (y) = mb (x, y) + mb (Θ) + 2
(15.23)
i.e., the L2 conditional belief function (15.17) as computed in the ternary case.
15.4 Geometric conditioning in the belief space 481
Lemma 20 can be used to prove the following form of the set of L∞ conditional
belief functions in B.
Theorem 101. Given a belief function b : 2Θ → [0, 1] and an arbitrary non-empty
focal element ∅ ( A ⊆ Θ, the set of L∞ conditional belief functions bL∞ ,B (.|A)
with respect to A in B is the set of b.f.s with focal elements in {X ⊆ A} which meet
the following constraints for all ∅ ( X ⊆ A:
X
mb (X) + mb (C) + (2|X| − 1)(1 − b(A)) ≤ ma (X) ≤ mb (X)
C∩Ac 6=∅,∅⊆C∩A⊆X X X
+(2|X| − 1)(1 − b(A)) − mb (C) − (−1)|X| mb (B).
C∩Ac 6=∅,∅⊆C∩A(X B⊆Ac
(15.26)
This result appears of rather difficult interpretation in terms of mass allocation. Nev-
ertheless, the ternary example we will see in Section 15.5.2 seems to suggest that
this set, or at least its admissible part, has some nice properties worth to explore.
For instance, its barycenter has a much simpler form.
3. the L∞ conditional b.f. either coincides with the L2 one, or forms a simplex
obtained by assigning the maximal mass outside A (rather than the sum of such
masses plb (Ac )) to all subsets of A (but one) indifferently.
L1 and L2 conditioning are closely related in the mass space, and have a compelling
interpretation in terms of general imaging [1032, 502].
The L2 and L∞ conditional b.f.s just computed in the belief space are instead:
X X
mL2 ,B (B|A) = mb (B) + mb (B + C)2−|C| + (−1)|B|+1 mb (C)2−|C|
C⊆Ac C⊆Ac
1 X 1
mL∞ ,B (B|A) = mb (B) + mb (B + C) + (−1)|B|+1 b(Ac ).
2 c
2
∅(C⊆A
As for the L2 case, the result makes a lot of sense in the ternary case, but it is difficult
to interpret in its general form (above). It seems to be related to the process of mass
redistribution among all subsets, as it happens with the (L2 induced) orthogonal
projection of a belief function onto the probability simplex. In both expressions
above we can note that normalization is achieved by alternatively subtracting and
summing a quantity, rather than via a ratio or, as in Equation (15.29), by reassigning
the mass of all B 6⊂ A to each B ( A on equal grounds.
We can interpret the barycenter of the set of L∞ conditional belief functions
as follows: the mass of all the subsets whose intersection with A is C ( A is re-
assigned by the conditioning process half to C, and half to A itself. In the case of
C = A itself, by normalization, all the subsets D ⊇ A including A have their whole
mass re-assigned to A, consistently with the above interpretation. The mass b(Ac )
of the subsets which have no relation with the conditioning event A is used to guar-
antee the normalization of the resulting mass distribution. As a result, the obtained
mass function is not necessarily non-negative: again, such version of geometrical
conditioning may generated pseudo belief functions.
The L1 case is also intriguing, as in that case it appears impossible to obtain a
general analytic expression, whereas in the special cases in which this is possible
the result has potentially interesting interpretations, as confirmed by the empirical
comparison of Section 15.5.2.
Generally speaking, though, Lp conditional belief functions in the belief space
seem to have rather less straightforward interpretations than the corresponding
quantities in the mass space.
Fig. 15.4. The simplex (red triangle) of L1 , M conditional belief functions associated with
the belief function with mass assignment (15.12) in Θ = {x, y, z}, with conditioning event
A = {x, y}. The related L2 , M conditional belief function is plotted as a red square, and
coincides with the center of mass of the L1 set. The set of L∞ , M conditional belief functions
is represented as the green triangle containing L2 , M. The set of L∞ , B conditional b.f.s is
drawn as a yellow rectangle, and also falls partly outside the conditioning simplex (black
triangle). The set of L1 conditional b.f.s in B is a (light blue) line segment with barycenter
in the L2 conditional b.f. (black square). In the ternary case L2 , B is the barycenter of this
rectangle. Interesting cross - relations between conditional functions in M and B seem to
emerge which are not clearly reflected by their analytical expressions computed here.
15.6 An outline of future research 485
Additional constraints may have to be imposed in order to obtain a unique result, for
instance commutativity with affine combination (or linearity, in Smets’ terminology
[1221]).
In the near future we plan to explore the world of combination rules induced
by conditioning rules, starting from the different geometrical conditional processes
introduced here.
486 15 Geometric conditioning
Appendix
Proof of Lemma 18
By definition:
X X
mb − ma = mb (B)mB − ma (B)mB .
∅(B⊆Θ ∅(B⊆A
.
The change of variables β(B) = mb (B) − ma (B) further yields:
X X
mb − ma = β(B)mB + mb (B)mB . (15.30)
∅(B⊆A B6⊂A
Observe that the variables {β(B), ∅ ( B ⊆ A} are not all independent. Indeed:
X X X
β(B) = mb (B) − ma (B) = b(A) − 1
∅(B⊆A ∅(B⊆A ∅(B⊆A
P
as ∅(B⊆A ma (B) = 1 by definition, since ma ∈ MA . As a consequence, in
optimization problem (15.4) only 2|A| − 2 variables are independent (as ∅ is not
included), while: X
β(A) = b(A) − 1 − β(B).
∅(B(A
Proof of Theorem 94
The minima of the L1 norm of the difference vector are given by the set of con-
straints:
X≤0
β(B) ∀∅ ( B ( A
β(B) ≥ b(A) − 1. (15.31)
∅(B(A
Proof of Theorem 95
It is easy to see that, by Equation (15.31), the 2|A| − 2 vertices of the simplex of L1
conditional belief functions in M (denoted by m[b]|B L1 A, where ∅ ( B ⊆ A) are
determined by the following solutions:
m[b]|AL1 A : β(X) = 0 ∀∅ ( X ( A,
β(B) = b(A) − 1,
m[b]|B
L1 A : ∀∅ ( B ( A.
β(X) = 0 ∀∅ ( X ( A, X 6= B
Proof of Theorem 96
with all zero entries but entry B (equal to 1) and entry A (equal to -1). Making use
of Equation (15.30), condition hmb − ma , mB − mA i = 0 assumes then a very
simple form X
β(B) − b(A) + 1 + β(X) = 0
∅(X(A,X6=B
where d is the number of rows (or columns) of A. It is easy to see that A−1 1 =
1 |A|
d+1 1, where in our case d = 2 − 2.
The solution to (15.32) is then, in matrix form:
1
β = A−1 1 · (b(A) − 1) = 1(b(A) − 1)
2|A| − 1
488 15 Geometric conditioning
Proof of Theorem 98
As X
X X
b(A) − 1 − β(B) =
mb (B) + β(B),
∅(B(A B6⊂A ∅(B(A
1
P
The corresponding minimal L∞ norm of the difference vector is: 2|A| −1 C6⊂A mb (C).
Proof of Theorem 99
hb − a, bC − bA i = 0 ∀∅ ( C ( A
To prove Theorem 99 we just need to replace the above expression into the system
of constraints (15.38). We obtain, for all ∅ ( C ( A:
X h c c
i Xh X
mb (B) 2|(B∪C) | − 2|(B∪A) | + − mb (X ∪ B)2−|X| +
B6⊂A B(Aih X⊆Ac i
X c c
|B| −|X|
+(−1) mb (X)2 2|(B∪C) | − 2|A | = 0.
X⊆Ac
As c c
2−|X| (2|(Y ∪C) | − 2|A | ) = 2n−|Y ∪C|−|X| − 2n−|A|−|X|
c c
= 2|[(Y ∪C)+X] | − 2|(A+X) | ,
the system further simplifies as:
X c c
mb (X + Y ) 2|[(Y ∪C)+X] | − 2|(A+X) | +
∅(X⊆Ac
∅⊆Y ⊆A
X c c
(−1)|Y | mb (X) − mb (X + Y ) 2|[(Y ∪C)+X] | − 2|(A+X) | = 0.
∅(X⊆Ac
∅(Y (A
15.6 An outline of future research 491
After separating in the first sum the contributions of Y = ∅ and Y = A, noting that
A ∪ C = A as C ⊂ A, and splitting the second one into a part which depends on
mb (X) and one which depends on mb (X + Y ), the system of constraints becomes,
again for all ∅ ( C ( A:
X X c c
X
mb (X + Y ) 2|[(Y ∪C)+X] | − 2|(A+X) | + mb (X)·
∅(X⊆A c ∅(Y (A ∅(X⊆Ac
|(X+C)c | |(X+A)c | |(X+A)c | |(X+A)c |
X
· 2 −2 + mb (X + A) 2 −2
c
∅(X⊆A
|[(Y ∪C)+X]c | c
X X
+ (−1)|Y | mb (X) 2 − 2|(A+X) |
∅(X⊆Ac ∅(Y (A
X X c c
+ −mb (X + Y ) 2|[(Y ∪C)+X] | − 2|(A+X) | = 0
∅(X⊆Ac ∅(Y (A
Y = (Y ∩ C) + (Y \ C)
where
|A\C| |A\C|
X |A \ C| |Y \C| −|Y \C| 1
(−1) 2 = −1+ = −2−|A\C|
|Y \ C| 2
|Y \C|
again by Newton’s binomial. By replacing this result in cascade into (15.40) and
(15.39) we have that the system of constraints is always met as it reduces to the
equality 0 = 0.
Proof of Lemma 19
where
15.6 An outline of future research 493
X X X
γ(A) = β(C) = mb (C) − ma (C) = b(A) − 1.
C⊆A C⊆A C⊆A
Thus, the first and the third addenda above are constant, and since
X
mb (C) = b(B) − b(B ∩ A)
C⊆B,C6⊂A
we obtain, as desired:
X X
arg min kb − akL1 = arg min γ(B ∩ A) + mb (C).
γ γ
∅(B∩A(A C⊆B,C6⊂A
Proof of Lemma 20
The variable term in (15.41) can be decomposed into collections of terms which
depend on the same individual variable γ(X):
X
max γ(A ∩ B) +
mb (C)
B:B∩A6=∅,A
C⊆B,C6⊂A
X X
= max max c γ(X) + mb (Z + W ),
∅(X(A ∅⊆Y ⊆A
∅(Z⊆Y ∅⊆W ⊆X
494 15 Geometric conditioning
since when γ ∗ (X) ≥ 0 the argument to maximize is non-negative, and its max-
imum is trivially achieved by Y = Ac . Hence, all the
X
γ ∗ (X) : γ ∗ (X) ≤ 1 − b(A) − mb (C) (15.43)
C∩Ac 6=∅,C∩A⊆X
are optimal.
2. If γ ∗ (X) < 0 the maximum in (15.42) can be achieved by either Y = Ac or
Y = ∅, and we are left with the two corresponding terms in the max:
X
∗
∗ ∗
γ (X) : max c γ (X) +
mb (C), −γ (X) ≤ 1 − b(A).
∅⊆Y ⊆A
C∩Ac 6=∅,C∩A⊆X
(15.44)
Now, either
∗ X
mb (C) ≥ −γ ∗ (X)
γ (X) +
C∩Ac 6=∅,C∩A⊆X
or viceversa. In the first case, since the argument of the absolute value has to be
non-negative:
1 X
γ ∗ (X) ≥ − mb (C).
2 c
C∩A 6=∅,C∩A⊆X
in turn trivially true for γ ∗ (X) < 0 and mb (C) ≥ 0 for all C. Therefore, all
1 X
0 ≥ γ ∗ (X) ≥ − mb (C) (15.45)
2
C∩Ac 6=∅,C∩A⊆X
i.e., γ ∗ (X) ≤ − 21
P
C∩Ac 6=∅,C∩A⊆X mb (C). Optimality is met for
−γ ∗ (X) ≤ 1 − b(A) ≡ γ ∗ (X) ≥ b(A) − 1,
which is satisfied for all
1 X
b(A) − 1 ≤ γ ∗ (X) ≤ − mb (C). (15.46)
2
C∩Ac 6=∅,C∩A⊆X
Following Lemma 20 it is not difficult to see by induction that in the original aux-
iliary variables {β(X)} the set of L∞ conditional b.f.s in B is determined by the
following constraints:
X
−K(X) + (−1)|X| mb (C) ≤ β(X)
C∩Ac 6=∅,C∩A⊆X X
≤ K(X) − mb (C)
C∩Ac 6=∅,C∩A⊆X
(15.47)
where we have defined:
X
K(X) = (2|X| − 1)(1 − b(A)) − mb (C).
C∩Ac 6=∅,∅⊆C∩A(X
The proof is by substitution. In the {β(B)} variables the thesis reads as:
1 X h i
β(C) = (−1)|C| mb (B) − mb (B ∪ C) . (15.48)
2
∅(B⊆Ac
by Newton’s binomial:
1 X h i 1 X
(−1)|C| mb (B) − mb (B ∪ C) + mb (C)
2 2
∅(C⊆X C∩Ac 6=∅,C∩A⊆X
∅(B⊆Ac
1 X X 1 X 1 X
= mb (B) (−1)|C| − mb (B ∪ C) + mb (C)
2 2 2
∅(B⊆Ac ∅(C⊆X ∅(B⊆Ac C∩Ac 6=∅
∅(C⊆X C∩A⊆X
1 X 1 X 1 X
=− mb (B) − mb (B ∪ C) + mb (C)
2 c
2 c
2
∅(B⊆A ∅(B⊆A C∩Ac 6=∅
∅(C⊆X C∩A⊆X
1 X X 1 X
=− mb (B ∪ C) + mb (C)
2 c
2 c
∅(B⊆A ∅⊆C⊆X C∩A 6=∅,C∩A⊆X
1 X 1 X
=− mb (C) + mb (C) = 0
2 c
2 c
C∩A 6=∅,C∩A⊆X C∩A 6=∅,C∩A⊆X
where Ω is the collection of all the possible outcomes ω, F is the set of possible
actions f , and the utility function is defined on F × Ω.
As we know, besides satisfying a number of sensible rationality principles, this prob-
ability transform has a nice geometric interpretation in the probability simplex as the
barycenter of the credal set of probability measures consistent with b:
.
n o
P[b] = p ∈ P : p(A) ≥ b(A) ∀A ⊆ Θ .
Betting and credal semantics seem to be connected, in the case of the pignistic trans-
form. Unfortunately, while their geometry in the belief space is well understood, a
credal semantic is still lacking for most of the transforms we studied in the last part
of the Book.
We address this issue here in the framework of probability intervals [1333, 320]
(Section 5.3), which we briefly recall here.
A set of probability intervals or interval probability system is a system of constraints
497
498 16 Decision making with epistemic transforms
Probability intervals have been introduced as a tool for uncertain reasoning in [320],
where combination and marginalization of intervals were studied in detail and spe-
cific constraints for such intervals to be consistent and tight were given.
A typical way in which probability intervals arise is through measurement errors,
for measurements can be inherently of interval nature (due to the finite resolution of
the instruments) [642]. In such as case the probability interval of interest is the class
of probability measures consistent with the measured interval.
A set of constraints of the form (16.1) determines a convex set of probabilities or
‘credal set’ [838]. Lower and upper probabilities (cfr. Section 5.1) determined by
P(l, u) on any event A ⊆ Θ can be easily obtained from the lower and upper
bounds (l, u) as follows:
X X X X
P (A) = max l(x), 1 − u(x), P (A) = min u(x), 1 − l(x).
x∈A x6∈A x∈A x6∈A
(16.2)
Making decisions based on credal sets is not trivial, for the natural extensions
of the classical expected utility rule amount to multiple potentially optimal deci-
sions [1343]. In alternative, similarly to what done for belief functions, we can seek
a single probability measure to represent the credal set associated with a set of prob-
ability intervals.
Fig. 16.1. The focus of a pair of simplices is, in non-pathological situations, the unique inter-
section of the lines joining their corresponding vertices.
Chapter content
As we show here, the credal set associated with a probability interval possesses an
interesting structure, as it can be decomposed into a pair of simplices.
16 Decision making with epistemic transforms 499
Indeed, the probabilities consistent with a certain interval system (16.1) lie in
the intersection of two simplices: a ‘lower simplex’ T 1 [b] determined by the lower
bound b(x) ≤ p(x), and an ‘upper simplex’ T n−1 [b] determined by the upper con-
straint p(x) ≤ plb (x):
. .
n o n o
T 1 [b] = p : p(x) ≥ b(x) ∀x ∈ Θ , T n−1 [b] = p : p(x) ≤ plb (x) ∀x ∈ Θ .
Chapter outline
We start by proving that the credal set associated with a system of probability in-
tervals can be decomposed in terms of a pair of upper and lower simplices (Section
16.1). We point out that the intersection probability, although originally defined for
belief functions (Chapter 10), is closely linked to the notion of interval probabil-
ity system and can be seen as the natural representative of the associated credal set
(Section 16.2).
Drawing inspiration from the analysis of the ternary case (Section 16.3), we
prove in Section 16.4 that all the considered probability transformations (relative
belief and plausibility of singletons, intersection probability) are geometrically the
foci of different pairs of simplices, and discuss the meaning of the mapping asso-
ciated with a focus in terms of mass assignment. We prove that upper and lower
simplices can themselves be interpreted as the sets of probabilities consistent with
belief and plausibility of singletons.
The conclusions of this analysis are used in Section 16.5 to prospect alternative
decision frameworks based on the introduced credal interpretations of upper and
lower probability constraints and the associated probability transformations.
In Section 16.6 preliminary results are discussed which show that relative belief and
plausibility play an interesting role in determining the safest betting strategy in an
500 16 Decision making with epistemic transforms
adversarial game scenario in which the decision maker has to minimize their max-
imal loss/maximize their minimal return, in a modified Wald approach to decision
making.
where P i [b] is the set of probabilities meeting the lower probability constraint for
size-i events:
.
n o
P i [b] = p ∈ P : p(A) ≥ b(A), ∀A : |A| = i .
Note that for i = n the constraint is trivially met by all distributions: P n [b] = P.
Lower and upper simplices A simple and elegant geometric description can be
given if we consider instead the credal sets:
.
n o
T i [b] = p ∈ P 0 : p(A) ≥ b(A), ∀A : |A| = i .
and the set T n−1 [b] of pseudo-probabilities which meet the analogous constraint on
events of size n − 1:
.
n o
T n−1 [b] = p ∈ P 0 : p(A) ≥ b(A) ∀A : |A| = n − 1
n o
= p ∈ P 0 : p({x}c ) ≥ b({x}c ) ∀x ∈ Θ (16.5)
n o
= p ∈ P 0 : p(x) ≤ plb (x) ∀x ∈ Θ ,
i.e., the set of pseudo-probabilities which meet the upper bound for the elements x
of Θ.
16.1 The credal set of probability intervals 501
Dually, the upper simplex T n−1 [b] reads as the convex closure
of the vertices
X X
tn−1
x [b] = plb (y)by + 1 − plb (y) bx . (16.9)
y6=x y6=x
the total mass and plausibility of singletons, respectively. By Equation (16.7) each
vertex t1x [b] of the lower simplex is a probability that adds the mass 1 − kb of non-
singletons to the mass of the element x, leaving all the others unchanged:
As mt1x [b] (z) ≥ 0 ∀z ∈ Θ ∀x (all t1x [b] are actual probabilities) we have that
of the lower and upper simplices associated with its lower and upper bound con-
straints, where
n o n o
T [l] = p ∈ P 0 : p(x) ≥ l(x) ∀x ∈ Θ , T [u] = p ∈ P 0 : p(x) ≤ u(x) ∀x ∈ Θ .
In particular, when lower and upper bounds are those enforced by a pair of belief
and plausibility measures on the singletons, l(x) = b(x) and u(x) = plb (x):
Fig. 16.2. An illustration of the notion of intersection probability for an interval probability
system (16.1).
Definition 86. The intersection probability p[(l, u)] : Θ → [0, 1] associated with
the interval probability system (16.1) is the probability measure:
p[(l, u)](x) = β[(l, u)]u(x) + (1 − β[(l, u)])l(x), (16.12)
with β[(l, u)] given by Equation (16.11).
The ratio β[(l, u)] (16.11) measures the fraction of each probability interval which
we need to add to the lower bound l(x) to obtain a valid probability function (sum-
ming to one).
It is easy to see that when (l, u) are a pair of belief/plausibility measures (b, plb ), we
obtain the intersection probability we defined for belief functions (Section 10.3). Al-
though originally defined by geometric means, the intersection probability is really
‘the’ rational probability transform for general interval probability systems.
As it was the case for p[b], p[(l, u)] can also be written as:
X
p[(l, u)](x) = l(x) + 1 − l(x) R[(l, u)](x) (16.13)
x
where
. u(x) − l(x) ∆(x)
R[(l, u)](x) = P =P , (16.14)
y∈Θ (u(y) − l(y)) y∈Θ ∆(y)
∆(x) measures the width of the probability interval for x, and R[(l, u)] : Θ →
[0, 1] measures how much the uncertainty on the probability value of each singleton
‘weighs’ on the total width of the interval system (16.1). We will therefore call it
P uncertainty on singletons. We can then say that p[(l, u)] distributes the mass
relative
(1 − x l(x)) which is necessary to obtain a valid probability to each singleton x ∈
Θ according to the relative uncertainty R[(l, u)](x) it carries in the given interval.
Proposition 49. Given a belief function b : 2Θ → [0, 1], the simplex P[b] of the
probability measures consistent with b is the polytope:
where ρ is any permutation {xρ(1) , ..., xρ(n) } of the singletons of Θ, and the vertex
pρ [b] is the Bayesian b.f. such that
X
pρ [b](xρ(i) ) = mb (A). (16.16)
A3xρ (i),A63xρ (j) ∀j<i
ρ1 = (x, y, z),
ρ1 [b](x) = .4, ρ1 [b](y) = .3, ρ1 [b](z) = .3;
2
ρ = (x, z, y),
ρ2 [b](x) = .4, ρ2 [b](y) = .1, ρ2 [b](z) = .5;
3
ρ = (y, x, z),
(16.17)
ρ3 [b](x) = .2, ρ3 [b](y) = .5, ρ3 [b](z) = .3;
4
ρ = (z, x, y),
ρ4 [b](x) = .3, ρ4 [b](y) = .1, ρ4 [b](z) = .6;
5
ρ = (z, y, x),
ρ5 [b](x) = .2, ρ5 [b](y) = .2, ρ5 [b](z) = .6;
(as the permutations (y, x, z) and (y, z, x) yield the same probability distribution).
We can notice a number of interesting facts:
1. P[b] (the polygon delimited by the red squares) is the intersection of the two
triangles (2-dimensional simplices) T 1 [b] and T 2 [b];
2. the relative belief of singletons,
.2 1 .1 1 .3 1
b̃(x) = = , b̃(y) = = , b̃(z) = = ,
.6 3 .6 6 .6 2
is the intersection of the lines joining the corresponding vertices of probability
simplex P and lower simplex T 1 [b];
3. the relative plausibility of singletons,
is the intersection of the lines joining the corresponding vertices of P and upper
simplex T 2 [b];
16.3 Credal interpretation of Bayesian transforms: The ternary case 505
Fig. 16.3. The simplex of probabilities consistent with the belief function (16.15) defined on
{x, y, z} is shown. Its vertices (red squares) are given by (16.17). Intersection probability, rel-
ative belief and plausibility of singletons are the foci of the pairs of simplices {T 1 [b], T 2 [b]},
{T 1 [b], P} and {P, T 2 [b]}, respectively. In the ternary case T 1 [b] and T 2 [b] are normal tri-
angles. Geometrically, their focus is the intersection of the lines joining their corresponding
vertices (dashed lines for {T 1 [b], P},{P, T 2 [b]}; solid lines for {T 1 [b], T 2 [b]}).
is the unique intersection of the lines joining the corresponding vertices of upper
T 2 [b] and lower T 1 [b] simplices.
Point 1. can be explained by noticing that in the ternary case, by Equation (16.3),
P[b] = T 1 [b] ∩ T 2 [b].
Although Figure 16.3 suggests that b̃, pl ˜ and p[b] might be consistent with b,
b
this is a mere artifact of the ternary case for we proved in Theorem 56 that neither
the relative belief of singletons nor the relative plausibility of singletons necessarily
belong to the credal set P[b].
Indeed, the point of this Chapter is that these epistemic transforms b̃, pl ˜ , p[b] are
b
consistent with the interval probability P[b, plb ] associated with b:
˜ , p[b] ∈ P[b, plb ] = T 1 [b] ∩ T n−1 [b].
b̃, pl b
Their geometric behavior as described by points 2., 3. and 4. holds in the general
case, as we will see in Section 16.4.
506 16 Decision making with epistemic transforms
Such a point always exists. As a matter of fact condition (16.18) can be written as
n
X
αi (si − ti ) = 0.
i=1
Proof. We just need to replace mb (x) with plb (x) in the proof of Theorem 105.
It is interesting to note that the affine coordinate of both belief and plausibility of
singletons as foci on the respective intersecting lines (16.19) has a meaning in terms
of degrees of belief.
Theorem 107. The affine coordinate of b̃ as the focus of {P, T 1 [b]} on the corre-
sponding intersecting lines is the reciprocal k1b of the total mass of singletons.
Theorem 108. The affine coordinate of pl ˜ as focus of {P, T n−1 [b]} on the corre-
b
1
sponding intersecting lines is the reciprocal kpl of the total plausibility of single-
b
tons.
Theorem 110. The coordinate of the intersection probability as focus of the pair
{T 1 [b], T n−1 [b]} on the corresponding intersecting lines coincides with the ratio
β[b] (10.10).
The fraction α = β[b] of the width of the probability interval that generates the
intersection probability can be read in the probability simplex as its coordinate on
any of the lines determining the focus of {T 1 [b], T n−1 [b]}.
It is quite straightforward to notice that the geometric notion of focus turns out
to possess a simple semantic in terms of probability constraints. Selecting the focus
of two simplices representing two different constraints (i.e., the point with the same
convex coordinates in the two simplices) means adopting the single probability dis-
tribution which meets both constraints in exactly the same way.
If we assume homogeneous behavior in the two sets of constraints {p(x) ≥
b(x) ∀x}, {p(x) ≤ plb (x) ∀x} as a rationality principle for the probability transfor-
mation of an interval probability system, then the intersection probability necessar-
ily follows as the unique solution to the problem.
Clearly the focus is the (unique) fixed point of this transformation: FS,T (f (S, T )) =
f (S, T ). Each Bayesian transformation in 1-1 correspondence with a pair of sim-
plices (relative plausibility, relative belief, and intersection probability) determines
therefore a mapping of probabilities to probabilities.
The mapping (16.20) induced by the relative Pbelief of singletons is actually quite
interesting. Any probability distribution p = x p(x)bx is mapped by FP,T 1 [b] to
the probability distribution:
X
FP,T 1 [b] (p) = p(x)t1x [b]
x X i
X X
= p(x) mb (y)by + 1 − mb (y) bx
x∈Θ y6=x y6=x
X X
= bx 1 − mb (y) p(x) + mb (x)(1 − p(x)) (16.21)
x∈Θ y6 =x
X X
= bx p(x) − p(x) mb (y) + mb (x)
x∈Θ
X y∈Θ
= bx mb (x) + p(x)(1 − kb ) ,
x∈Θ
the probability obtained by adding to the belief value of each singleton x a fraction
p(x) of mass (1 − kb ) of non-singletons. In particular, (16.21) maps the relative
uncertainty of singletons R[b] to the intersection probability p[b]:
16.4 Credal geometry of probability transformations 509
X
FP,T 1 [b] (R[b]) = bx mb (x) + R[b](x)(1 − kb )
x∈Θ
X
= bx p[b](x) = p[b].
x∈Θ
Relative belief and plausibility are then the foci associated with lower T 1 [b] and
upper T n−1 [b] simplices, the incarnations of lower and upper constraints on single-
tons. We can close the circle opened by the analogy with the pignistic transforma-
tion by showing that those two simplices can in fact also be interpreted as the sets of
probabilities consistent with the plausibility (11.14) and belief (11.16) of singletons,
respectively (cfr. Chapter 11).
510 16 Decision making with epistemic transforms
Indeed, the set of pseudo probabilities consistent with a pseudo belief function
ς can be defined as:
.
n o
P[ς] = p ∈ P 0 : p(A) ≥ ς(A) ∀A ⊆ Θ ,
just as we did for ‘standard’ belief functions. We can then prove the following result.
Theorem 111. The simplex T 1 [b] = P 1 [b] of the lower probability constraint for
singletons (16.4) is the set of probabilities consistent with the belief of singletons b̄:
T 1 [b] = P[b̄].
The simplex T n−1 [b] of the upper probability constraint for singletons (16.5) is the
¯ :
set of pseudo probabilities consistent with the plausibility of singletons pl b
¯ ].
T n−1 [b] = P[pl b
Proof. As the pignistic function is the center of mass of the simplex of consistent
probabilities, and upper and lower simplices are the sets of probabilities consistent
¯ respectively (by Theorem 111) the thesis follows.
with b̄, pl b
Another corollary stems from the fact that pignistic function and affine combi-
nation commute:
whenever α1 + α2 = 1.
Corollary 23. The intersection probability is the convex combination of the barycen-
ters of the lower and upper simplices, with coefficient (10.10):
have a common denominator, in the sense that they can all be linked to different
(credal) sets of probabilities, in this way extending the classical interpretation of the
pignistic transformation as barycenter of the polygon of consistent probabilities.
As P[b] is the credal set associated with a belief function b, upper and lower
simplices geometrically embody the probability interval associated with b:
n o
P[b, plb ] = p ∈ P : b(x) ≤ p(x) ≤ plb (x), ∀x ∈ Θ .
By applying the notion of focus to all the possible pairs of simplices in the triad
{P, T 1 [b], T n−1 [b]} we obtain in turn all the different Bayesian transformations
considered here:
n o
P, T 1 [b] : f (P, T 1 [b]) = b̃,
n o
P, T n−1
[b] : f (P, T n−1
[b]) ˜ ,
= pl (16.23)
b
n o
T 1 [b], T n−1 [b] : f (T 1 [b], T n−1 [b]) = p[b].
Their coordinates as foci encode major features of the underlying belief function:
the total mass it assigns to singletons, their total plausibility, and the fraction β of
the related probability interval which yields the intersection probability.
The credal interpretation of upper, lower, and interval probability constraints
on singletons lays in perspective the foundations for the formulation of TBM-like
frameworks for such systems.
We can think of the TBM as of a pair {P[b], BetP [b]} formed by a credal set linked
to each belief function b (in this case the polytope of consistent probabilities) and
a probability transformation (the pignistic function). As the barycenter of a simplex
is a special case of focus, the pignistic transformation is just another probability
transformation induced by the focus of two simplices.
The results of this Chapter suggest therefore similar frameworks:
n o n o n o
˜ , T 1 [b], T n−1 [b] , pl
P, T 1 [b] , b̃ , P, T n−1 [b] , pl ˜ ,
b b
Consider instead the following game theory scenario, inspired by Strat’s expected
utility approach to decision making with belief functions [1303, 1121] (Section
4.5.1).
In a country fair, people are asked to bet on one of the possible outcomes of a
spinning carnival wheel. Suppose the outcomes are {♣, ♦, ♥, ♠}, and that they each
have the same utility (return) to the player. This is equivalent to a lottery (probability
distribution), in which each outcome has a probability proportional to the area of
the corresponding sectors on the wheel. However, the fair manager decides to make
the game more interesting by covering part of the wheel. Players are still asked
to bet on a single outcome, knowing that the manager is allowed to rearrange the
hidden sector of the wheel as he pleases (see Figure 16.5). Clearly, this situation
Fig. 16.5. The modified carnival wheel, in which part of the spinning wheel is cloaked.
16.6 A game/utility theory interpretation 513
can be described as a belief function, in particular one in which the fraction of area
associated with the hidden sector is assigned as mass to the whole decision space
{♣, ♦, ♥, ♠}. If additional (partial) information is provided, for instance that ♦
cannot appear in the hidden sector, different belief functions must be chosen instead.
Regardless the particular belief function b (set of probabilities) at hand, the rule
allowing the manager to pick an arbitrary distribution of outcomes in the hidden
section mathematically translates into allowing him/her to choose any probability
distribution p ∈ P[b] consistent with b in order to damage the player. Supposing the
aim of the player is to maximize their minimal chance of winning the bet, which
outcome (singleton) should they pick?
.
Hence xmaximin = arg maxx∈Θ b(x) is the outcome which maximizes such mini-
mal support. In the example of Figure 16.5, as ♣ is the outcome which occupies the
largest share of the visible part of the wheel, the safest bet (the one which guarantees
the maximal chance in the worst case) is indeed ♣. In a more formal language, ♣
is the singleton with the largest belief value. Now, if we normalize to compute the
r.b.s. this outcome is obviously conserved:
While in classical utility theory the decision maker has to select the best ‘lottery’
(probability distribution) in order to maximize the expected utility, here the ‘lottery’
is chosen by their opponent (given the available partial evidence), and the decision
maker is left with betting on the safest strategy (element of Θ).
Relative belief and plausibility of singletons play then a crucial role in determining
the safest betting strategy in an adversarial scenario in which the decision maker has
to minimize their maximal loss/maximize their minimal return.
Appendix: proofs
Proof of Theorem 103
Proof. Let us suppose against the thesis that there exists an affine decomposition of
one of the points, say tx [b], in terms of the others:
X X
t1x [b] = αz t1z [b], αz ≥ 0 ∀z 6= x, αz = 1.
z6=x z6=x
X
if and only if: αz bz = bx . But this is impossible, as the categorical probabilities
z6=x
bx are trivially affinely independent.
16.6 A game/utility theory interpretation 515
Proof of Theorem 103 . Let us detail the proof for T 1 [b]. We need to show that:
1. all the points which belong to Cl(t1x [b], x ∈ Θ) satisfy p(x) ≥ mb (x) too;
2. all the points which do not belong to the above polytope do not meet the con-
straint either.
Concerning item 1., as
mb (y) x 6= y
t1x [b](y) = 1 −
X
mb (z) = mb (y) + 1 − kb x = y,
z6=y
We need to prove that b̃ has the same simplicial coordinates in P and T 1 [b]. By def-
inition (4.18) b̃ can be expressed in terms of the vertices of the probability simplex
P as:
X mb (x)
b̃ = bx .
kb
x∈Θ
We then need to prove that b̃ can be written as the same affine combination
X mb (x)
b̃ = t1 [b]
kb x
x∈Θ
In the case of the pair {P, T 1 [b]} we can compute the (affine) line coordinate α of
b̃ = f (P, T 1 [b]) by imposing condition (16.24).
The latter assumes the following form (being si = bx , ti = t1x [b]):
X mb (x)
bx = t1x [b] + α(bx − t1x [b]) = (1 − α)t1x [b] + αbx
kb
x∈Θ X
= (1 − α) mb (y)by + 1 − kb + mb (x) bx + αbx
h y6=x i X
= bx (1 − α) 1 − kb + mb (x) + α + mb (y)(1 − α)by ,
y6=x
1 kb −1
and for 1 − α = kb , α= kb the condition is met.
Again we can compute the line coordinate α of pl ˜ = f (P, T n−1 [b]) by imposing
b
condition (16.24). The latter assumes the form (being si = bx , ti = tn−1
x [b]):
16.6 A game/utility theory interpretation 517
X plb (x)
bx = tn−1
x [b] + α(bx − tn−1x [b]) = (1 − α)tn−1
x [b] + αbx
kplb
x∈Θ X
= (1 − α) plb (y)by + 1 − kplb + plb (x) bx + αbx
h y6 =x i X
= bx (1 − α) 1 − kplb + plb (x) + α + plb (y)(1 − α)by .
y6=x
1 kplb −1
For 1 − α = kplb ,α= kplb the condition is met.
Again, we need to impose condition (16.24) on the pair {T 1 [b], T n−1 [b]}, or
p[b] = t1x [b] + α(txn−1 [b] − t1x [b]) = (1 − α)t1x [b] + αtn−1
x [b]
for all the elements x ∈ Θ of the frame, α being some constant. This is equivalent
to (after replacing the expressions (16.7), (16.9) of t1x [b] and tn−1
x [b]):
X
bx mb (x) + β[b](plb (x) − mb (x)) =
x∈Θ hX i hX i
= (1 − α) mb (y)by + (1 − kb )bx + α plb (y)by + (1 − kplb )bx
h y∈Θ y∈Θ i
= bx (1 − α)(1 − kb ) + (1 − α)mb (x) + αplb (x) + α(1 − kplb ) +
X
+ by (1 − α)mb (y) + αplb (y)
y6n
=x
o
= bx (1 − kb ) + mb (x) + α plb (x) + (1 − kplb ) − mb (x) − (1 − kb ) +
X
+ by mb (y) + α(plb (y) − mb (y)) .
y6=x
1−kb
If we set α = β[b] = kpl −kb we get for the coefficient of bx in the above expression
b
(i.e., the probability value of x):
1−kb
kplb −kb plb (x) + (1 − kplb ) − mb (x) − (1 − kb ) + (1 − kb ) + mb (x)
= β[b][plb (x) − mb (x)] + (1 − kb ) + mb (x) − (1 − kb ) = p[b](x).
For each belief function b, the vertices of the consistent polytope P[b] are generated
by a permutation ρ of the elements of Θ (16.16). This is true for the b.f. b̄ too, i.e.,
the vertices of P[b̄] are also generated by permutations of singletons.
In this case, however:
– given such a permutation ρ = (xρ(1) , ..., xρ(n) ) the mass of Θ (the only non-
singleton focal element of b̄) is assigned according to the mechanism of Propo-
sition 49 to xρ(1) , while all the other elements receive only their original mass
mb (xρ(j) ), j > 1;
– therefore all the permutations ρ putting the same element in the first place yield
the same vertex of P[b̄];
16.6 A game/utility theory interpretation 519
– hence there are just n such vertices, one for each choice of the first element
xρ(1) = x;
– but this vertex, a probability distribution, has mass values (simplicial coordinates
in P):
m(x) = mb (x) + (1 − kb ), m(y) = mb (y) ∀y 6= x,
as (1 − kb ) is the mass b̄ assigns to Θ;
– the latter clearly corresponds to t1x [b] (16.7).
¯ , as Proposition 1 remains valid for pseudo
A similar proof holds for the case of pl b
b.f.s too.
Proof of Corollary 23
¯ and
By Equation (11.17) the intersection probability p[b] lies on the line joining pl b
b̄, with coordinate β[b]:
¯ + (1 − β[b])b̄.
p[b] = β[b]pl b
by Corollary 22.
Part V
As we have seen in this Book, the theory of belief functions is a modeling lan-
guage for representing and combining elementary items of evidence, which do not
necessarily come in the form of sharp statements, with the goal of maintaining a
mathematical representation of our beliefs about those aspects of the world which
we are unable to predict with reasonable certainty.
While arguably a more appropriate mathematical description of uncertainty than
classical probability theory, the theory of evidence is relatively simple to implement
and it does not require to abandon the notion of event, as is the case, for instance, for
Walley’s imprecise probability theory. It is grounded in the beautiful mathematics
of random sets, which constitute the natural continuous extension of belief func-
tions, and exhibits strong relationships with many other theories of uncertainty. As
mathematical objects, belief functions have interesting properties in terms of their
geometry, algebra, and combinatorics. This Book was, in particular, dedicated to
the geometric approach to belief and other uncertainty measures proposed by the
Author.
Despite initial objections on the computational complexity of a naive implemen-
tation of the theory of evidence, evidential reasoning can actually be implemented
on large sample spaces and in situations involving the combination of numerous
pieces of evidence. Elementary items of evidence often induce simple belief func-
tions, which can be combined very efficiently with complexity O(n + 1). We do not
need to assign mass to all subsets, but we need to be allowed to do so when neces-
sary (e.g. in case of missing data) – this directly implies a random set description.
Most relevantly, the most plausible hypothesis can be found without computing the
whole combined belief function. At any rate, Monte-Carlo approximations can be
easily implemented when the explicit result of the combination is required. Last but
523
524 17 An agenda for the future
not least, local propagation schemes allow for the parallelisation of belief function
reasoning just as it happens with Bayesian networks.
As we saw in Chapter 4, statistical evidence can be represented in belief theory
in several ways:
– by likelihood-based belief functions, in a way that generalises both likelihood-
based and Bayesian inference;
– via Dempster’s inference approach, which makes use of auxiliary variables;
– in the framework of the Generalised Bayesian Theorem proposed by Smets.
Decision making strategies based on intervals of expected utilities can be formu-
lated, which produce decision which are more cautious than traditional ones, and
are able to explain the empirical aversion to second-order uncertainty highlighted in
Ellsberg’s paradox.
The extension of the theory, originally formulated for finite sample space, to contin-
uous domains can be tackled via the Borel interval representation initially brought
forward by Strat and Smets, in case the analysis is restricted to intervals of real val-
ues. In the more general case of arbitrary subsets of the real domain, the theory of
random sets is the natural mathematical framework to adopt.
An array of estimation, classification, regression tools based on the theory of belief
functions is already available, and more can be envisaged.
Open issues
As we have had the chance to appreciate, a number of important issues remain open.
For instance, the correct epistemic interpretation of belief function theory should
be clarified once and for all: we argue here that belief measures should be seen as
random variables for set-valued observations (recall the random die example of
Chapter 1).
What is the most appropriate mechanism for evidence combination is also still de-
bated. The reason is that the choice seems to depend on meta-information on the
reliability and independence of the sources involved which is hardly accessible. As
we argue here (and we have hinted at in Chapter 4), working with intervals of belief
functions may be the way forward, as this acknowledges the meta-uncertainty on
the nature of the sources generating the evidence.
The same holds for conditioning, as we showed.
Finally, the theory of belief functions on Borel intervals of the real line is rather
elegant, but if we want to achieve full generality the way forward is grounding the
theory into the mathematics of random sets.
A research programme
We then think appropriate to conclude this Book by outlining what in our view is the
research agenda for the future development of random set and belief function theory.
For obvious reason we will only touch a few of the most interesting developments,
17 An agenda for the future 525
without being able to go beyond a certain level of detail. However, we hope this will
stimulate the reader to pursue some of the research directions and contribute to the
further development of the theory in the near future.
Although random set theory as a mathematical formalism is quite well devel-
oped, thanks in particular to the work of Ilya Molchanov [], a theory of statistical
inference with random sets is not yet in sight.
In Section 17.1 if this final Chapter we briefly touch, in particular, the following
points:
– the notion of generalised lower and upper likelihoods (Section 17.1.1), to go
beyond inference with belief functions which takes classical likelihood at face
value;
– the formulation of a framework for logistic regression with belief functions,
which makes use of these generalised lower and upper likelihoods (Section
17.1.2);
– fiducial inference with belief functions is also possible (Section 17.1.3), as pro-
posed by .. and Gong [].
– the generalisation of the classical total probability theorem for random sets
(17.1.4), starting with belief functions [?];
– the generalisation of classical limit theorems (central limit theorem, law of large
numbers) to the case of random sets (17.1.5): this allows us, for instance, a rigor-
ous definition of Gaussian random sets and belief functions (17.1.5);
– the introduction of parametric models based of random sets (Section 17.1.6)
will allow us to perform robust hypothesis testing (Section 17.1.6), thus laying
the foundations for a theory of frequentist inference with random sets (Section
17.1.6);
– the development of a theory of random variables and processes in which the un-
derlying probability space is replaced by a random set space (Section 17.1.7):
in particular, this requires the generalisation of the notion of Radon-Nikodym
derivative to belief measures (17.1.7).
The geometric approach to uncertainty is also open to a number of further de-
velopments (Section 17.2), including:
– the geometry of combination rules other than Dempster’s (Section 17.2.1), and
the associated conditioning operators (17.2.2);
– the possibility of conducting inference in a geometric fashion, by finding a com-
mon representation for both belief measures and the data that drives the inference
(17.2.3);
– the geometry of continuous extension of belief functions needs to be explored
(Section 17.2.4): starting from the geometry of belief functions on Borel intervals
to later tackle the general random set representation;
– in Chapters ?? we provided a first extension to possibility theory; a geometric
analysis of other uncertainty measures is in place (Section 17.2.5), including ma-
jor ones such as capacities and gambles;
526 17 An agenda for the future
Belief likelihood function of repeated trials What can we say about the belief
likelihood function of a series of trials? Note that this is defined on arbitrary subsets
A of X1 × · · · × Xn , where Xi denotes the space of quantities that can be observed
at time i. A series of sharp observations is then a tuple x = (x1 , ..., xn ) ∈ X1 ×
· · · × Xn .
Definition 88. The value of the belief likelihood function on an arbitrary subset A
of X1 × · · · × Xn is:
.
BelX1 ×···×Xn (A|θ) = BelX ↑×
1
i Xi
· · · BelX ↑×
n
i Xi
(A|θ), (17.1)
where BelX ↑×
j
i Xi
is the vacuous extension of BelXj to the Cartesian product X1 ×
· · · × Xn where the observed tuples live, and is an arbitrary combination rule.
Can we express a belief likelihood value (17.1) as a function of the belief values
of the individual trials? The answer is that yes, if we merely wish to compute like-
lihood values of tuples of individual outcomes x = (x1 , ..., xn ) ∈ X1 × · · · × Xn
rather than sets of outcomes, the following decomposition holds.
Theorem 112. When using either
∩ or ⊕ as a combination rule in the definition
of belief likelihood function, the following decomposition holds:
n
Y
BelX1 ×···×Xn ({(x1 , ..., xn )}|θ) = BelXi (xi )
i=1
Yn (17.2)
P lX1 ×···×Xn ({(x1 , ..., xn )}|θ) = P lXi (xi )
i=1
Proof.
Definition 89. We call the quantities
.
L(x = {x1 , ..., xn }) = BelX1 ×···×Xn ({(x1 , ..., xn )}|θ),
(17.3)
.
L(x = {x1 , ..., xn }) = PlX1 ×···×Xn ({(x1 , ..., xn )}|θ)
Bernoulli trials example Consider once again the Bernoulli trials example, which
the single outcome space is binary: Xi = X = {H, T }. We know that under the
assumptions of conditional independence and equidistribution, the traditional like-
lihood for a series of Bernoulli trials reads as pk (1 − p)n−k , where p = P (H), k is
the number of successes and n the total number of trials.
Let us then compute the belief likelihood function for a series of Bernoulli trials,
under the similar assumption that the belief functions BelXi = BelX , i = 1, ..., n,
coincide, with BelX parameterised by p = m(H), q = m(T ) (with p + q ≤ 1
this time). We seek the belief function on X = {H, T }, which best describes the
observed sample, i.e., the optimal values of the two parameters p and q.
Under the equal mass distribution assumption, applying Theorem 112 yields the
following expression for the lower and upper likelihoods of the sample x =
{x1 , ..., xn }, respectively:
L({x1 , ..., xn }) = BelX ({x1 }) · ... · BelX ({xn }) = pk q n−k ;
(17.4)
L({x1 , ..., xn }) = P lX ({x1 }) · ... · P lX ({xn }) = (1 − q)k (1 − p)n−k .
After normalisation, these can be seen as probability distribution functions (PDFs)
over the (belief) space B of all belief functions definable on X (compare Chapter 6).
Fig. 17.1. Lower (left) and upper (right) likelihood functions plotted over the space of belief
functions defined on the frame X = {H, T }, parameterised by p = m(H) (X axis) and
q = m(T ) (Y axis), for the case of k = 6 successes over n = 10 trials.
Figure 17.1 plots the both lower and upper likelihoods (17.4) for the case of
k = 6 successes over n = 10 trials.
Note that lower likelihood (left) subsumes the traditional likelihood pk (1 − p)n−k ,
as its section for p + q = 1. Indeed, the the maximum of the lower likelihood is
the traditional ML estimate p = k/n, q = 1 − p. This makes sense, for the lower
likelihood is highest for the most committed belief functions (i.e., for probability
measures).
The upper likelihood (right) has a unique maximum in p = q = 0: this is the
vacuous belief function on {H, T }, with m({H, Y }) = 1.
17.1 A statistical random set theory 529
The interval of belief functions joining max L with max L is the set of belief
functions such that pq = n−k k
, i.e., those which preserve the ratio between the ob-
served empirical counts. Once again the maths leads us to think in terms of intervals
of belief functions, rather than individual ones.
Bernoulli trials are central in statistics: generalising their likelihood, as we just did,
allows us to represent uncertainty in a number of regression problems.
For instance, in logistic regression (recall Chapter 1, Section 1.4.7):
1 e−(β0 +β1 xi )
πi = P (Yi = 1|xi ) = , 1−πi = P (Yi = 0|xi ) =
1+ e−(β0 +β1 xi ) 1 + e−(β0 +β1 xi )
(17.5)
the two scalar parameters β0 , β1 are estimated by maximising the likelihood of the
sample, where the likelihood function is:
n
Y
L(β0 , β1 |Y ) = πiYi (1 − πi )1−Yi .
i=1
The problem is, how do we generalise the logit link between observations x and
outputs y? For just assuming (17.5) does not yield any analytical dependency for qi .
In other words, we seek a logit-type analytical mapping between observations and
belief functions over a binary frame.
A first, simple proposal may consist of just adding a parameter β2 such that the
following relationship holds:
e−(β0 +β1 xi )
qi = m(Yi = 0|xi ) = β2 . (17.6)
1 + e−(β0 +β1 xi )
We can then seek lower and upper optimal estimates for the parameter vector β =
[β0 , β1 , β2 ]:
Plugging these optimal paramaters into (17.5), (17.6) will then yield an upper and
a lower family of conditional belief functions given x (once again, an interval of
belief functions):
BelX (.|β, x) BelX (.|β, x).
An analysis of the validity of such a straightforward extension of the logit map-
ping, and the exploration of alternative ways of generalising it are research questions
potentially very interesting to pursue.
Spies (Section 4.3.3) and others have posed themselves the problem of generalising
the law of total probability
N
X
P (A) = P (A|Bi )P (Bi ),
i=1
where {B1 , ..., BN } is a disjoint partition of the sample space, to the case of belief
function. They mostly did so from the angle of producing a generalisation of Jef-
frey’s combination rule – nevertheless, the question goes rather beyond their original
intentions, as it involves understanding the space of solutions to the generalised total
probability problem.
The problem of generalising the total probability theorem to belief functions can
be posed as follows (Figure 17.2).
Theorem 113. (Total belief theorem) Suppose Θ and Ω are two frames of dis-
cernment, and ρ : 2Ω → 2Θ the unique refining between them. Let b0 be a belief
function defined over Ω = {ω1 , ..., ω|Ω| }. Suppose there exists a collection of be-
lief functions bi : 2Πi → [0, 1], where Π = {Π1 , ..., Π|Ω| }, Πi = ρ({ωi }), is the
partition of Θ induced by its coarsening Ω.
Then, there exists a belief function b : 2Θ → [0, 1] such that:
1. b0 is the restriction of b to Ω, b0 = b|Ω (Equation (2.11), Chapter 2);
2. b ⊕ bΠi = bi ∀i = 1, ..., |Ω|, where bΠi is the categorical belief function with
b.p.a. mΠi (Πi ) = 1, mΠi (B) = 0 for all B 6= Πi .
It can be proven that any solution to the total belief problem must have focal
elements which obey the following structure [?].
Proposition 50. Each focal element ek(.) of a total belief function b meeting the re-
quirements of Theorem 113 is the union of exactly one focal element for each of
the conditional belief functions whose domain Πi is a subset of ρ(Ek ), where Ek is
the smallest focal element of the a-priori belief function b0 such that ek(.) ⊂ ρ(Ek ).
Namely:
17.1 A statistical random set theory 531
Fig. 17.2. Pictorial representation of the total belief theorem hypotheses (Theorem 113).
[
ek(.) = eji i (17.8)
i:Πi ⊂ρ(Ek )
where eji i ∈ Ebi ∀i, and Ebi denotes the list of focal elements of bi .
If we enforce the a-priori function b0 to have only disjoint focal elements (i.e.,
b0 to be the vacuous extension of a Bayesian function defined on some coarsening
of Ω), we have what we call the restricted total belief theorem.
In this special case it suffices to solve the |Eb0 | sub-problems obtained by consid-
ering each focal element E of b0 separately, and then combine the resulting partial
solutions by simply weighing the resulting basic probability assignments using the
a-priori mass mb0 (E), to obtain a fully normalized total belief function.
For each individual focal element of b0 the task of finding a suitable solution to
the total belief problem translates into a linear algebra problem.
A candidate solution to the subproblem of the restricted total beliefPproblem associ-
ated with E ∈ Eb0 is the solutionQ to a linear system with n min = i=1,...,N (ni −
1) + 1 equations and nmax = i ni unknowns:
Ax = b, (17.9)
where each column of A is associated with an admissible (i.e., meeting the struc-
ture of Lemma ??) focal element ej of the candidate total belief function, x =
[mb (e1 ), · · · , mb (en )] and n = nmin is the number of equalities generated by the
N conditional constraints.
Since the rows of the solution system (??) are linearly independent, any system
of equation obtained by selecting nmin columns from A has a unique solution. A
minimal solution to the restricted total belief problem (??) (i.e., a solution with the
minimum number of focal elements) is then uniquely determined by the solution of
532 17 An agenda for the future
a system of equations obtained by selecting nmin columns from the nmax columns
of A.
Theorem 114. Column substitutions of the class T reduce the absolute value of the
most negative solution component.
using Theorem 114 to prove that there always exists a selection of columns of
A (focal elements of the total belief function) such that the resulting square linear
system has a positive vector as a solution. This can be done in a constructive way, by
applying a transformation of the type (17.10) recursively to the column associated
with the most negative component, to obtain a path in the solution space which
eventually lead to the desired solution.
The following sketch of an existence proof for the restricted total belief theorem
exploits the effects on solution components of colum substitutions of type T :
1. at each column substitution the most negative solution component decreases by
Theorem 114;
2. if we keep substituting the most negative variable we keep obtaining distinct
linear systems, for at each step the transformed column is assigned a positive
solution component and therefore, if we follow the proposed procedure, cannot
be changed back to a negative one by applying transformations of class T ;
3. this implies that there can be no cycles in the associated path in the solution
space;
4. the number nnmax
min
of solution systems is obviously finite, hence the procedure
must terminate.
Unfortunately, counterexamples show that there are ‘transformable’ columns
(associated with negative solution components) which do not admit a transforma-
tion of the type (17.10). Although they do have companions on every partition Πi ,
such counterexamples do not admit a complete collection of ‘selection’ columns.
candidate minimal solution systems related to a problem of a given size {ni , i =
1, ..., N } into a solution graph.
17.1 A statistical random set theory 533
Fig. 17.3. The solution graph associated with the restricted total belief problem with N = 2,
n1 = 3 and n2 = 2.
Total probability is only one important result of classical probability theory that
needs to be generalised to the wider setting of random sets.
Central limit theorem In order to properly define a Gaussian belief function, how-
ever, we need to need to generalise the classical central limit theorem to random
sets. The old proposal by Dempster and Liu merely transfers normal distributions
on the real line by Cartesian product with Rm (cfr. Chapter 3, Section 3.3.4).
Both the central limit theorem and the law(s) of large numbers have already been
generalised to imprecise probabilities1
A central limit theorem for belief function was recently formulated by Boston
University’s Larry G. Epstein and Kyoungwon Seo2 . Xiaomin Shi from Shandong
University) has separately brought forward a number of cntral limit theorems for
belief measures3 .
1
See ‘Introduction to Imprecise Probabilities’, http://onlinelibrary.wiley.com/book/10.1002/9781118763117.
2
http://people.bu.edu/lepstein/files-research/CLT-Nov17-2011.pdf
3
https://arxiv.org/pdf/1501.00771.pdf
534 17 An agenda for the future
Random sets are mathematical objects detached from any specific interpretation.
Just as probability measures are used by both Bayesians and frequentists for their
analyses, random sets can also be employed in different ways according to the in-
terpretation they are provided with.
In particular, it is natural to think of a generalised frequentist framework in
which random experiments are designed by assuming a specific random set distribu-
tion, rather than a conventional one, in order to better cope with the ever-occurring
set-valued observations.
Fig. 17.4. Describing the family of random sets (right) induced by families of probability
distributions in the source probability space (left) is the first step towards a generalisation of
frequentist inference to random sets.
valued mapping Γ which defines a random set be ‘designed’, or derived from the
problem?
For instance, in the cloaked die example (Section 1.4.5) it is the occlusion which
generates the multi-valued mapping and we have no control over it. In other situa-
tions, however, it may make sense to impose a parameterised family of mappings
Γ (.|θ) : Ω → 2Θ
which, given a (fixed) probability on the source space Ω, would yield as a result a
parameterised family of random sets.
The alternative is to fix the multi-valued mapping (e.g., when it is given by
the problem), and model the source probability by a classical parametric model. A
Gaussian or binomial family of source probabilities would then induce a family of
‘Gaussian’ or ‘binomial’ random sets (see Figure 17.4 again).
17.1 A statistical random set theory 535
We know that random sets are set-valued random variables: nevertheless, the ques-
tion stands as to whether one can build random variables on top of random set (be-
lief) spaces, rather than the usual probability space.
Just as in the classical case, we need a mapping from Θ to a measurable space
(e.g. the positive real half line):
f : Θ → R+ = [0, +∞],
where ...
An interesting questyion is: can we compute a (generalised) PDF for a random
set random variable as defined above?
536 17 An agenda for the future
The extension of the Radon-Nikodym derivative for set functions4 was first studied
by Harding et al in 1997. Yann Rebille (2009) has also investigated the problem in
his ‘A Radon-Nikodym derivative for almost subadditive set functions’5 . Graf, on
the other hand, has tackled the problem of defining the RND for capacities, rather
than probability measures []. The following summary of the problem is abstracted
from Molchanov’s ‘Theory of Random Sets’6 .
Assume that the two capacities µ, ν are monotone, subadditive and continuous
from below.
The definition is the same as for standard measures. Only, Rfor standard measures
absolute continuity is equivalent to the integral relation µ = ν – this is not longer
true for general capacities.
Definition 92. The pair (µ, ν) has the strong decomposition property if ∀α ≥ 0
there exists a measurable set Aα ∈ F such that
In rough words, the strong decomposition condition states that, for each bound α,
the ‘incremental ratio’ of the two capacities is bounded by α in the sub-power set
capped by some event Aα .
Note that all standard measures meet the strong decomposition property.
A number of problems remain open. The conditions of the theorem (which holds
for general capacities) need to be elaborated for the case of completely alternating
capacities (distributions of random closed sets).
4
https://www.math.nmsu.edu/ jharding/
5
https://halshs.archives-ouvertes.fr/hal-00441923/document
6
http://www.springer.com/jp/book/9781852338923
17.2 Developing the geometric approach 537
An intriguing question is whether we can pose the inference problem in this setting
as well. Namely, we seek a geometric representation general enough to encode both
the data driving the inference and the (belief) measures possibly resulting from the
inference, in such a way that the inferred measure minimises some sort of distance
from the empirical data.
This Book has mainly concerned itself with the geometric representation of finite
belief measures. Nevertheless, here we can start providing some insights on how to
extend this approach to belief functions on infinite spaces.
A true geometry of uncertainty will require the ability to manipulate in our geo-
metric language any (or most) forms of uncertainty measures (compare the partial
hierarchy reported in Chapter ??).
Probability and possibility measures are, as we know, special cases of belief
functions: therefore, their geometric interpretation does not require any extension
of the notion of belief space (as we extensive learned in Parts II and III of this
Book).
Most other uncertainty measures, however, are not special cases of belief functions
– in fact, a number of them are more general than belief functions, such as for in-
stance probability intervals (2-monotone capacities), general monotone capacities,
upper/lower previsions.
Tackling these more general measures requires therefore an extension of the geomet-
ric belief space able to encapsulate the most general such representation. Arguably,
this will lead to a geometric theory of imprecise probabilities, starting from gambles
and sets of desirable gambles.
Geometry of capacities
17.2 Developing the geometric approach 539
Geometry of gambles
Representing belief functions as mere vectors of mass or belief values is not entirely
satisfactory. Basically, when doing so all vector components are undistinguishable,
while they correspond to values assigned to subsets of Θ of different cardinality.
Other geometrical representation of belief functions on finite space can never-
theless be imagined, which take into account the qualitative different between events
of different cardinalities.
Capacities as isoperimeters of convex bodies Convex bodies, for instance, are the
subject of a fascinating field of study.
Any convex body in Rn obviously possesses 2n distinct orthogonal projections
onto the 2n subspaces generated by all possible subsets of coordinate axes (see
Figure 17.5).
Fig. 17.5. Given a convex body K in the Cartesian space Rn , endowed with coordinates
x1 , ..., xn , the function ν assigning to each subset of coordinates S = {xi1 , ..., xim } the
(hyper)-volume ν(S) of the orthogonal projection K|S of K onto the linear subspace gener-
ated by S = {xi1 , ..., xim } is a capacity.
This idea is clearly related to the notion of Grassman manifold, i.e. the manifold
of all linear subspaces of a given vector space.
It is easy to see that, given a convex body K in the Cartesian space Rn , endowed
with coordinates x1 , ..., xn , the function ν assigning to each subset of coordinates
540 17 An agenda for the future
S = {xi1 , ..., xim } the (hyper)-volume ν(S) of the orthogonal projection K|S of K
onto the linear subspace generated by S = {xi1 , ..., xim } is a capacity.
Under what conditions is this capacity monotone? Under what conditions is this
capacity a belief function (i.e. an infinitely-monotone capacity)?
we saw they pop up all the time when reasoning or making inference
geometry of convex sets of belief functions?
Decision trees A decision tree is a recursive divide and conquer structure, in which
at each step:
1. an attribute is selected to partition the training set in an optimal manner;
2. the current training set is split into training subsets according to the values of
the selected attribute.
A typical attribute selection criterion is based on the information gain Info(S)InfoA (S),
where information is measured by the classical Shannon entropy:
. X
Info(S) = − pc log2 pc ,
c∈C
The information gain criterion favours attributes with a larger number of values over
those with fewer possible values.
Belief decision trees A belief decision tree [] is composed by the same elements
of a traditional decision tree but, at each step, class information on the items of
the dataset is expressed by a basic probability assignment (b.p.a.) over the set of
possible classes C for each object.
The average such b.p.a. is then computed, and the pignistic probability of the result
is used to compute the entropy InfoA (S). The attribute with the highest gain ratio is
selected, and eventually each leaf is labeled by a b.p.a. expressing a belief about the
actual class of the object, rather than a unique class.
17.4 High-impact applications 541
Random forests Random forests [?] are ensembles of decision trees by random
selection of sub-training sets with replacement.
At each step one selects a random subset of features as well as a set of thresholds
for these features, and then
P chooses the single best feature and threshold (using
entropy or Gini impurity c pc (1 − pc )). The process is repeated until all the trees
are fully grown.
For multi-label problems entropy has to be computed over sets of labels: Varma, for
instance [] assumes independence of individual labels.
Random set random forests The idea behind random set random forests is that the
training data for a given subset provides sample statistics for a random set (a belief
function).
Consequently, a measure of entropy/uncertainty for random sets should be used
to perform the splitting. This can be done efficiently by Monte-Carlo sampling of
subsets (cfr. Chapter 4, Section 4.4.3). As we saw in Chapter 4, Section 4.8.3, a
number of generalisations of entropy to belief functions have been proposed – what
is the most suitable for decision purposes will be an interesting topic of research.
Fig. 17.6. Both the lower and the upper optimal belief functions (17.11) amount to a convex
envelope of logistic functions.
Rougier [?] has very nicely outlined a Bayesian approach to climate modelling
and prediction, in which the predictive distribution for future climate is found by
conditioning future climate on the observed values for historical and current climate.
A number of challenging arise:
– in climate prediction the collection of uncertain quantities for which the climate
scientist must specify prior probabilities can be large;
– specifying a prior distribution over climate vectors is very challenging.
Considering that people spend thousands of hours collecting climate data and con-
structing climate models, it is surprising to know that little attention is devoted to
quantifying our judgements about how the two are related.
In this Section, climate is represented as a vector of measurements y, collected
at a given time. Its components include, for instance, the level of CO2 concentration
on the various points of a grid.
More precisely, the climate vector y = (yh , yf ) collects both historical and present
(yh ) and future (yf ) climate values. A measurement error e is introduced to taken
into account errors due to, for instance, a seasick technician, or atmospheric turbu-
lence. The actual measurement vector is therefore:
.
z = yh + e
The Bayesian treatment of the problem makes use of a number of assumptions.
For starters:
17.4 High-impact applications 543
which requires us to specify a prior distribution for the climate vector y itself.
Climate models The choice of such a prior p(y) is extremely challenging, because
y is such a large collection of quantities, and these component quantities are linked
by complex interdependencies, such as those arising from the laws of nature.
The role of the climate model is then to induce a distribution for climate itself, and
plays the role of a parametric model in statistical inference (Section 1.3.3).
Namely, a climate model is a deterministic mapping from a collection of param-
eters x (equation coefficients, initial conditions, forcing functions) to a vector of
measurements (the ‘climate’):
x → y = g(x) (17.13)
Prediction via a parametric model The difference between the climate vector and
any model evaluation can be decomposed into two parts:
The first part is a contribution that may be reduced by a better choice of the model g;
the second part is, instead, an irreducible contribution that arises from the model’s
own imperfections.
Note that x∗ is not just a statistical parameter, though, for it relates to physical quan-
tities, so that climate scientists have a clear intuition of its effects. Consequently,
scientists may be able to exploit their expertise to provide a prior p(x∗ ) on the input
parameters.
In this Bayesian framework to climate prediction, two more assumptions are
needed.
544 17 An agenda for the future
Axiom 3 ‘Best’ input, discrepancy, and measurement error are mutually (statisti-
cally) independent:
x∗ ⊥∗ ⊥e.
Axiom 4 The model discrepancy ∗ is Gaussian distributed, with mean 0 and co-
variance Σ .
Axioms 3 and 4 then allow us to compute the desired climate prior as:
Z
p(y) = N (y − g(x∗ )|0, Σ )p(x∗ )dx∗ , (17.14)
which can be plugged into (17.12) to yield a Bayesian prediction of future climate
values.
In practice, as we said, the climate model function g(.) is not known – we only
possess a sample of model evaluations {g(x1 ), ..., g(xn )}. We call model validation
the process of tuning the covariances Σ , Σ e , and checking the validity of the gaus-
sianity assumptions 2 and 4.
This can be done by using (17.12) to predict past/present climates p(z), and apply
some hypothesis testing to the result. If the observed value z̃ is in the tail of the
distribution, the model parameters (if not the entire set of model assumptions) need
to be corrected. As Rougier admits [], responding to bad validation results is not
straightforward.
Model calibration Assuming that the model has been validated, it needs to be ‘cal-
ibrated’, i.e., we need to find the desired ‘best’ value x∗ of the model’s parameters.
Indeed, under Axioms 1–4 we can compute:
where p(yf |x∗ , z = z̃) is Gaussian with a mean which depends on z̃ − g(x).
The posterior prediction (17.15) highlights two routes for climate data to impact on
future climate predictions:
– by concentrating the distribution p(x∗ |z = z̃) relative to the prior p(x∗ ), depend-
ing on both quantity and quality of the climate data;
– by shifting the mean of p(yf |x∗ , z = z̃) away from g(x), depending on the size
of the difference z̃ − g(x).
17.4 High-impact applications 545
Role of model evaluations Let us go back to the initial question: what is the prob-
ability that a doubling of atmospheric CO2 will raise the global mean temperature
by at least 2o C by 2100?
Let Q ⊂ Y be the set of climates y for which the global mean temperature is
at least 2o C higher in 2100. The probability of the event of interest can then be
computed by integration, as follows:
Z
P r(yf ∈ Q|z = z̃) = f (x∗ )p(x∗ |z = z̃)dx∗ .
Z
The following integral can be computed directly f (x) = n(yf |µ(x), Σ)dyf
Q
the other integral requires numerical integration, e.g.
Pn
f (x )
– naive Monte-Carlo: ∼ = Pi=1n i , xi ∼ p(x∗ |z = z̃)
R
n
w f (x )
– weighted sampling: ∼ = i=1 ni i , xi ∼ p(x∗ |z = z̃) weighted by the likeli-
R
hood:
wi ∝ p(z = z̃|x∗ = xi )
sophisticated models whicn take a long time to evaluate may not provide enough
samples for the prediction to be statistically significant
albeit they may make the prior p(x∗ ) and covariance Σ easier to specify
Modelling climate with belief functions there is a number of issue with making
climate inferences in the Bayesian framework
lots of assumptions are necessary (e.g. Gaussianity), most of them to make cal-
culations practical rather than anything else
although the prior on climates is reconduced to prior on the parameters of a
climate model, there is no obvious way of picking p(x∗ )
it is far easier to say what are wrong choices (e.g. uniform priors)
significant parameter tuning is required (e.g. for Σ , Σ e ..)
Quite a lot of work to do, but a few landmarks:
– avoid committing to priors p(x∗ ) on the correct climate model parameters
– use climate model as a parametric model to infer either a BF on the space of
climates Y
– or on the space of parameters (e.g. covariances, etc) of the distribution on Y
unable to predict how a system will behave in a radically new setting (e.g., how
does a smart car cope with driving through extreme weather conditions?
most systems have no way of detecting whether their underlying assumptions
have been violated: they will happily continue to predict and act even on inputs that
are completely outside the scope of what they have actually learned
it is imperative to ensure that these algorithms behave predictably in the wild
where the training data x = [x1 , ..., xn ] is assumed drawn from a distribution D,
h(x) is the predicted label for input x and y(x) the actual label
Definition 93. Probabilistically Approximately Correct learning The learning al-
gorithm finds with probability at least 1 − δ a model h ∈ H which is approximately
correct, i.e. it makes a training error of no more than
the main result of PAC learning is that we can relate the required size N of a
training sample to the size of the model space H
1
log |H| ≤ N − log
δ
so the minimum number of training examples given , δ and |H| is
1 1
N≥ log |H| + log
δ
for infinite-dimensional hypothesis spaces H
17.4 High-impact applications 547
4R2
V CSV M = min{D, }+1
m2
where R is the radius of the smallest hypersphere enclosing all the training data
Large margin classifiers As the VC dimension of Hm decreases when m grows,
it is desirable to select linear boundaries with max margin.
1.
2.
3.
4. Tech. report.
5.
6. Weighing evidence: The design and comparison of probability thought experiments,
Tech. report, Research paper, 75th anniversary colloquium series Harvard Business
School, 1983.
7. Evidentiary value: philosophical, judicial and psychological aspects of a theory
(P. Gardenfors, B. Hansson, and N. E. Sahlin, eds.), 1988.
8. Nilson’s probabilistic entailment extended to dempster-shafer theory, International
Journal of Approximate Reasoning 2 (1988), no. 3, 339 – 340.
9. A. Laurentini A. Bottino and P. Zuccone, Towards non-intrusive motion capture, Asian
Conf. on Computer Vision, 1998.
10. M Lecours A Cheaito and E Bosse, Modified dempster-shafer approach using an ex-
pected utility interval decision rule, Proc. SPIE 3719, Sensor Fusion: Architectures,
Algorithms, and Applications III, vol. 34, 1999.
11. S. Abel, The sum-and-lattice points method based on an evidential reasoning system
applied to the real-time vehicle guidance problem, Uncertainty in Artificial Intelli-
gence 2 (Lemmer and Kanal, eds.), 1988, pp. 365–370.
12. A. Agarwal and B. Triggs, A local basis representation for estimating human pose
from cluttered images, 2006, pp. I:50–59.
13. Ankur Agarwal and Bill Triggs, 3d human pose from silhouettes by relevance vector
regression, cvpr 02 (2004), 882–888.
14. , Learning to track 3d human motion from silhouettes, ICML ’04: Proceed-
ings of the twenty-first international conference on Machine learning (New York, NY,
USA), ACM Press, 2004, p. 2.
15. J. Aggarwal and Q. Cai, Human motion analysis: a review, Computer Vision and Im-
age Understanding 73 (1999).
16. , Human motion analysis: a review, IEEE Proc. Nonrigid and Articulated Mo-
tion Workshop, June 1997, pp. 90–102.
17. J. Aggarwal, Q. Cai, W. Liao, and B. Sabata, Articulated and elastic non-rigid mo-
tion: A review, IEEE Proc. Nonrigid and Articulated Motion Workshop, Austin, Texas,
1994, pp. 2–14.
18. , Nonrigid motion analysis: articulated and elastic motion, CVIU 70 (1998),
142–156.
19. Martin Aigner, Combinatorial theory, Classics in Mathematics, Springer, New York,
1979.
20. J. Aitchinson, Discussion on professor Dempster’s paper, Journal of the Royal Statis-
tical Society B 30 (1968), 234–237.
549
550 References
21. K. Akita, Image sequence analysis of real world human motion, Pattern Recognition
17 (1984), 73–83.
22. R. Almond, Belief function models for simple series and parallel systems, Tech. report,
Department of Statistics, University of Washington, Tech. Report 207, 1991.
23. R. G. Almond, Fusion and propagation of graphical belief models: an implementation
and an example, PhD dissertation, Department of Statistics, Harvard University, 1990.
24. , Graphical belief modeling, Chapman and Hall/CRC, 1995.
25. Diego A. Alvarez, On the calculation of the bounds of probability of events using
infinite random sets, International Journal of Approximate Reasoning 43 (2006), no. 3,
241 – 267.
26. J. Amat, M. Casals, and M. Frigola, Stereoscopic systems for human body tracking in
natural scenes, Int. Workshop on Modeling People at ICCV’99, September 1999.
27. P. An and W. M. Moon, An evidential reasoning structure for integrating geophysical,
geological and remote sensing data, Proceedings of IEEE, 1993, pp. 1359–1361.
28. Z. An, Relative evidential support, PhD dissertation, University of Ulster, 1991.
29. Z. An, D. A. Bell, and J. G. Hughes, Relation-based evidential reasoning, International
Journal of Approximate Reasoning 8 (1993), 231–251.
30. Z. An, D.A. Bell, and J.G. Hughes, Relation-based evidential reasoning, International
Journal of Approximate Reasoning 8 (1993), no. 3, 231 – 251.
31. K.A. Andersen and J.N. Hooker, A linear programming framework for logics of un-
certainty, Decision Support Systems 16 (1996), 39–53.
32. , A linear programming framework for logics of uncertainty, Decision Support
Systems 16 (1996), no. 1, 39 – 53.
33. B. Anrig, R. Haenni, and N. Lehmann, ABEL - a new language for assumption-based
evidential reasoning under uncertainty, Tech. report, Institute of Informatics, Univer-
sity of Fribourg, 1997.
34. A. Antonucci and F. Cuzzolin, Credal sets approximation by lower probabilities: Ap-
plication to credal networks, Proc. of IPMU 2010, 2010.
35. A. Appriou, Knowledge propagation in information fusion processes, Keynote talk,
wtbf’10, Brest, France, mars 2010.
36. O. Aran, T. Burger, A. Caplier, and L. Akarun, Sequential Belief-Based Fusion of
Manual and Non-manual Information for Recognizing Isolated Signs, Gesture-Based
Human-Computer Interaction and Simulation (2009), 134–144.
37. Oya Aran, Thomas Burger, Alice Caplier, and Lale Akarun, A belief-based sequen-
tial fusion approach for fusing manual and non-manual signs, Pattern Recognition 42
(2009), no. 5, 812–822.
38. A. Aregui and T. Denoeux, Constructing consonant belief functions from sample data
using confidence sets of pignistic probabilities, International Journal of Approximate
Reasoning 49 (2008), no. 3, 575–594.
39. M. Armstrong and A. Zisserman, Robust object tracking, Proc. ACCV’95, Singapore,
vol. 1, December 1995, pp. 58–62.
40. Krassimir T. Atanassov, Intuitionistic fuzzy sets, Fuzzy Sets and Systems 20 (1986),
no. 1, 87 – 96.
41. C.I. Attwood, G.D. Sullivan, and K.D. Baker, Model-based recognition of human
posture using single synthetic images, Fifth Alvey Vision Conference, Reading, UK,
1989.
42. T. Augustin, Modeling weak information with generalized basic probability assign-
ments, Data Analysis and Information Systems - Statistical and Conceptual Ap-
proaches (H. H. Bock and W. Polasek, eds.), Springer, 1996, pp. 101–113.
References 551
63. Pietro Baroni and Paolo Vicig, Transformations from imprecise to precise probabili-
ties, ECSQARU, 2003, pp. 37–49.
64. J. L. Barron, D. J. Fleet, and S. S. Beauchemin, Performance of optical flow tech-
niques, International Journal of Computer Vision, vol. 12(1), 1994, pp. 43–77.
65. Jean-Pierre Barthélemy, Monotone functions on finite lattices: An ordinal approach to
capacities, belief and necessity functions, pp. 195–208, Physica-Verlag HD, Heidel-
berg, 2000.
66. O. Basir, F. Karray, and Hongwei Zhu, Connectionist-based dempster-shafer eviden-
tial reasoning for data fusion, Trans. Neur. Netw. 16 (2005), no. 6, 1513–1530.
67. D. Batens, C. Mortensen, and G. Priest, Frontiers of paraconsistent logic, Studies in
logic and computation (J.P. Van Bendegem, ed.), vol. 8, Research Studies Press, 2000.
68. M. Bauer, A Dempster-Shafer approach to modeling agent preferences for plan recog-
nition, User Modeling and User-Adapted Interaction 5:3-4 (1995), 317–348.
69. , Approximation algorithms and decision making in the Dempster-Shafer the-
ory of evidence – An empirical study, International Journal of Approximate Reasoning
17 (1997), 217–237.
70. , Approximations for decision making in the Dempster-Shafer theory of evi-
dence, Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence
(F. Horvitz, E.; Jensen, ed.), Portland, OR, USA, 1-4 August 1996, pp. 73–80.
71. A. Baumberg and D. Hogg, Learning flexible models from image sequences,
ECCV’94, Stockholm (J. Eklundh, ed.), vol. 800, 1994, pp. 299–308.
72. P. Beardsley, P. Torr, and A. Zisserman, 3D model acquisition from extended image
sequences, Proc. of ECCV’96, Cambridge, UK, vol. 2, April 1996, pp. 683–695.
73. D. A. Bell and J. W. Guan, Discounting and combination operations in evidential
reasoning, Uncertainty in Artificial Intelligence. Proceedings of the Ninth Conference
(1993) (A. Heckerman, D.; Mamdani, ed.), Washington, DC, USA, 9-11 July 1993,
pp. 477–484.
74. D. A. Bell, J. W. Guan, and G. M. Shapcott, Using the Dempster-Shafer orthogonal
sum for reasoning which involves space, Kybernetes 27:5 (1998), 511–526.
75. D.A. Bell, J.W. Guan, and Suk Kyoon Lee, Generalized union and project operations
for pooling uncertain and imprecise information, Data and Knowledge Engineering
18 (1996), 89–117.
76. , Generalized union and project operations for pooling uncertain and impre-
cise information, Data Knowledge Engineering 18 (1996), no. 2, 89 – 117.
77. Yakov Ben-Haim, Info-gap decision theory (second edition), second edition ed., Aca-
demic Press, Oxford, 2006.
78. Kent Bendall, Belief-theoretic formal semantics for first-order logic and probability,
Journal of Philosophical Logic 8 (1979), no. 1, 375–397.
79. A. Bendjebbour and W. Pieczynski, Unsupervised image segmentation using
Dempster-Shafer fusion in a Markov fields context, Proceedings of the Interna-
tional Conference on Multisource-Multisensor Information Fusion (FUSION’98)
(R. Hamid, A. Zhu, and D. Zhu, eds.), vol. 2, Las Vegas, NV, USA, 6-9 July 1998,
pp. 595–600.
80. S. Benferhat, A. Saffiotti, and Ph. Smets, Belief functions and default reasoning, Procs.
of the 11th Conf. on Uncertainty in AI. Montreal, Canada, 1995, pp. 19–26.
81. S. Benferhat, Alessandro Saffiotti, and Philippe Smets, Belief functions and de-
fault reasonings, Tech. report, Universite’ Libre de Bruxelles, Technical Report
TR/IRIDIA/95-5, 1995.
82. R. J. Beran, On distribution-free statistical inference with upper and lower probabili-
ties, Annals of Mathematical Statistics 42 (1971), 157–168.
References 553
83. Berger, Robust bayesian analysis: Sensitivity to the prior, Journal of Statistical Plan-
ning and Inference 25 (1990), 303–328.
84. N. Bergman and A. Doucet, Markov chain monte carlo data association for target
tracking, IEEE Int. Conference on Acoustics, Speech and Signal Processing, 2000.
85. Ulla Bergsten and Johan Schubert, Dempster’s rule for evidence ordered in a complete
directed acyclic graph, International Journal of Approximate Reasoning 9 (1993), 37–
73.
86. Ulla Bergsten, Johan Schubert, and P. Svensson, Applying data mining and ma-
chine learning techniques to submarine intelligence analysise, Proceedings of the
Third International Conference on Knowledge Discovery and Data Mining (KDD’97)
(D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy, eds.), Newport Beach,
USA, 14-17 August 1997, pp. 127–130.
87. R. Bertschy and P.A. Monney, A generalization of the algorithm of heidtmann to non-
monotone formulas, Journal of Computational and Applied Mathematics 76 (1996),
no. 12, 55 – 76.
88. P. Besnard and Jurg Kohlas, Evidence theory based on general consequence relations,
Int. J. of Foundations of Computer Science 6 (1995), no. 2, 119–135.
89. PHILIPPE BESNARD and JRG KOHLAS, Evidence theory based on general conse-
quence relations, International Journal of Foundations of Computer Science 06 (1995),
no. 02, 119–135.
90. B. Besserer, S. Estable, and B. Ulmer, Multiple knowledge sources and evidential
reasoning for shape recognition, Proceedings of IEEE, 1993, pp. 624–631.
91. Albrecht Beutelspacher and Ute Rosenbaum, Projective geometry, Cambridge Uni-
versity Press, Cambridge, 1998.
92. Malcolm Beynon, Bruce Curry, and Peter Morgan, The Dempster-Shafer theory of
evidence: approach to multicriteria decision modeling, OMEGA: The International
Journal of Management Science 28 (2000), 37–50.
93. KK Bharadwaj, Neerja, and GC Goel, Hierarchical censored production rules (hcprs)
system employing the Dempster-Shafer uncertainty calculus, Information and Soft-
ware Technology 36 (1994), no. 3, 155 – 164.
94. A.G. Bharatkumar, K.E. Diagle, M.G. Pandy, Q. Cai, and J.K. Aggarwal, Lower limb
kinematics of human walking with the medial axis transformation, Workshop on Mo-
tion of Non-Rigid and Articulated Objects, Austin, Texas, 1994.
95. P. Bhattacharya, On the dempster-shafer evidence theory and non-hierarchical aggre-
gation of belief structures, IEEE Transactions on Systems, Man, and Cybernetics -
Part A: Systems and Humans 30 (2000), no. 5, 526–536.
96. Loredana Biacino, Fuzzy subsethood and belief functions of fuzzy events, Fuzzy Sets
and Systems 158 (2007), no. 1, 38 – 49.
97. Elisabetta Binaghi, L. Luzi, P. Madella, F. Pergalani, and A. Rampini, Slope insta-
bility zonation: a comparison between certainty factor and fuzzy Dempster-Shafer
approaches, Natural Hazards 17 (1998), 77–97.
98. Elisabetta Binaghi, P. Madella, I. Gallo, and A. Rampini, A neural refinement strategy
for a fuzzy Dempster-Shafer classifier of multisource remote sensing images, Proceed-
ings of the SPIE - Image and Signal Processing for Remote Sensing IV, vol. 3500,
Barcelona, Spain, 21-23 Sept. 1998, pp. 214–224.
99. Elisabetta Binaghi and Paolo Madella, Fuzzy Dempster-Shafer reasoning for rule-
based classifiers, International Journal of Intelligent Systems 14 (1999), 559–583.
100. J. Binder, D. Koeller, S. Russell, and K. Kanazawa, Adaptive probabilistic networks
with hidden variables, Machine Learning, vol. 29, 1997, pp. 213–244.
554 References
101. G. Birkhoff, Abstract linear dependence and lattices, American Journal of Mathemat-
ics 57 (1935), 800–804.
102. , Lattice theory (3rd edition), Amer. Math. Soc. Colloquium Publications, Vol.
25, Providence, RI, 1967.
103. R. Bissig, Jurg Kohlas, and N. Lehmann, Fast-division architecture for Dempster-
Shafer belief functions, Qualitative and Quantitative Practical Reasoning, First In-
ternational Joint Conference on Qualitative and Quantitative Practical Reasoning;
ECSQARU–FAPR’97 (D. Gabbay, R. Kruse, A. Nonnengart, and H.J. Ohlbach, eds.),
Springer, 1997.
104. G. Biswas and T. S. Anand, Using the Dempster-Shafer scheme in a mixed-initiative
expert system shell, Uncertainty in Artificial Intelligence, volume 3 (L.N. Kanal, T.S.
Levitt, and J.F. Lemmer, eds.), North-Holland, 1989, pp. 223–239.
105. M. Black and P. Anandan, The robust estimation of multiple motions: parametric and
piecewise smooth flow fields, Computer Vision and Image Understanding, vol. 63(1),
January 1996, pp. 75–104.
106. M. J. Black, Explaining optical flow events with parameterized spatio-temporal mod-
els, Proc. of Conference on Computer Vision and Pattern Recognition, vol. 1, 1999,
pp. 326–332.
107. M.J. Black and A.D. Jepson, Eigentracking: Robust matching and tracking of articu-
lated objects using a view-based representation, ECCV’96, 1996, pp. 329–342.
108. M.J. Black and Y. Yacoob, Tracking and recognizing rigid and non-rigid facial mo-
tions using local parametric models of image motions, Proceedings of the International
Conference on Computer Vision ICCV’95, Cambridge, MA, 1995, pp. 374–381.
109. P. Black, Is Shafer general Bayes?, Proceedings of the Third AAAI Uncertainty in
Artificial Intelligence Workshop, 1987, pp. 2–9.
110. , An examination of belief functions and other monotone capacities, PhD dis-
sertation, Department of Statistics, Carnegie Mellon University, 1996, Pgh. PA 15213.
111. , Geometric structure of lower probabilities, Random Sets: Theory and Appli-
cations (Goutsias, Malher, and Nguyen, eds.), Springer, 1997, pp. 361–383.
112. Paul K. Black, Is shafer general bayes?, CoRR abs/1304.2711 (2013).
113. A. Blake and M. Isard, Active contours, Springer-Verlag, April 1998.
114. I. Bloch, Information combination operators for data fusion: a comparative review
with classification, IEEE Transactions on Systems, Man, and Cybernetics - Part A:
Systems and Humans 26 (1996), no. 1, 52–67.
115. Isabelle Bloch, Some aspects of Dempster-Shafer evidence theory for classification
of multi-modality medical images taking partial volume effect into account, Pattern
Recognition Letters 17 (1996), 905–919.
116. Edwin A. Bloem and Henk A.P. Blom, Joint probabilistic data association methods
avoiding track coalescence, Proceedings of the 34th Conference on Decision and Con-
trol, December 1995.
117. Aaron F. Bobick and Andrew D. Wilson, Learning visual behavior for gesture analy-
sis, IEEE Symposium on Computer Vision, November 1995.
118. P.L. Bogler, ShaferDempster reasoning with applications to multisensor target iden-
tification systems, IEEE Transactions on Systems, Man and Cybernetics 17 (1987),
no. 6, 968–977.
119. H. Borotschnig, L. Paletta, M. Prantl, and A. Pinz, A comparison of probabilistic, pos-
sibilistic and evidence theoretic fusion schemes for active object recognition, Comput-
ing 62 (1999), 293–319.
120. Michael Boshra and Hong Zhang, Accommodating uncertainty in pixel-based verifi-
cation of 3-d object hypotheses, Pattern Recognition Letters 20 (1999), 689–698.
References 555
121. E. Bosse and J. Roy, Fusion of identity declarations from dissimilar sources using the
Dempster-Shafer theory, Optical Engineering 36:3 (March 1997), 648–657.
122. J. R. Boston, A signal detection system based on Dempster-Shafer theory and compar-
ison to fuzzy detection, IEEE Transactions on Systems, Man, and Cybernetics - Part
C: Applications and Reviews 30:1 (February 2000), 45–51.
123. L. Boucher, T. Simons, and P. Green, Evidential reasoning and the combination of
knowledge and statistical techniques in syllable based speech recognition, Proceed-
ings of the NATO Advanced Study Institute, Speech Recognition and Understanding.
Recent Advances, Trends and Applications (R. Laface, P.; De Mori, ed.), Cetraro,
Italy, 1-13 July 1990, pp. 487–492.
124. S. Boucheron and E. Gassiat, Optimal error exponents for HMM order estimation,
IEEE Trans. Info. Th. 48 (2003), 964–980.
125. R. Bowden, T. Mitchell, and M. Sarhadi, Reconstructing 3D pose and motion from a
single camera view, BMVC’98, Southampton, UK, 1998, pp. 904–913.
126. M. Brand, Shadow puppetry, ICCV’99, Corfu, Greece, September 1999.
127. M. Brand, N. Oliver, and A. Pentland, Coupled hmm for complex action recogni-
tion, Proc. of Conference on Computer Vision and Pattern Recognition, vol. 29, 1997,
pp. 213–244.
128. Jerome J. Braun, Dempster-shafer theory and bayesian reasoning in multisensor data
fusion, 2000, pp. 255–266.
129. C. Bregler, Learning and recognizing human dynamics in video sequences, Proc. of
the Conference on Computer Vision and Pattern Recognition, 1997, pp. 568–574.
130. C. Bregler and J. Malik, Video motion capture, Tech. report, UCB//CSD-97-973, Com-
puter Science Dept., U.C. Berkeley, 1997.
131. , Estimating and tracking kinematic chains, Proceedings of the Conference on
Computer Vision and Pattern Recognition CVPR’98, Santa Barbara, CA, June 1998.
132. , Tracking people with twists and exponential maps, Proceedings of the Con-
ference on Computer Vision and Pattern Recognition CVPR’98, Santa Barbara, CA,
June 1998.
133. M. Bruning and D. Denneberg, Max-min σ-additive representation of monotone mea-
sures, Statistical Papers 34 (2002), 23–35.
134. Noel Bryson and Ayodele Mobolurin, Qualitative discriminant approach for gener-
ating quantitative belief functions, IEEE Transactions on Knowledge and Data Engi-
neering 10 (1998), 345–348.
135. B. G. Buchanan and E. H. Shortliffe, Rule-based expert systems, Addison-Wesley,
Reading (MA), 1984.
136. D. M. Buede and J. W. Martin, Comparison of bayesian and dempster-shafer fusion,
In 1989 Tri-Service Data Fusion Symposium, 1989, pp. 81–101.
137. Dennis M. Buede and Paul Girardi, Target identification comparison of Bayesian and
Dempster-Shafer multisensor fusion, IEEE Transactions on Systems, Man, and Cyber-
netics Part A: Systems and Humans. 27 (1997), 569–577.
138. A. Bundy, Incidence calculus: A mechanism for probability reasoning, Journal of au-
tomated reasoning 1 (1985), 263–283.
139. T. Burger, Defining new approximations of belief function by means of dempster’s
combination, Proceedings of the Workshop on the theory of belief functions, 2010.
140. T. Burger, O. Aran, A. Urankar, L. Akarun, and A. Caplier, A dempster-shafer theory
based combination of classifiers for hand gesture recognition, Computer Vision and
Computer Graphics - Theory and Applications, Lecture Notes in Communications in
Computer and Information Science (2008).
556 References
141. T. Burger and A. Caplier, A Generalization of the Pignistic Transform for Partial
Bet, Proceedings of the 10th European Conference on Symbolic and Quantitative
Approaches to Reasoning with Uncertainty (ECSQARU), Verona, Italy, July 1-3,
Springer-Verlag New York Inc, 2009, pp. 252–263.
142. T. Burger and F. Cuzzolin, The barycenters of the k-additive dominating belief func-
tions and the pignistic k-additive belief functions, First International Workshop on the
Theory of Belief Functions (BELIEF’10), Brest, France, 2010.
143. T. Burger and F. Cuzzolin, The barycenters of the k-additive dominating belief func-
tions & the pignistic k-additive belief functions, (2010).
144. T. Burger, Y. Kessentini, and T. Paquet, Dealing with precise and imprecise decisions
with a Dempster-Shafer theory based algorithm in the context of handwritten word
recognition, 2010 12th International Conference on Frontiers in Handwriting Recog-
nition, IEEE, 2010, pp. 369–374.
145. Thomas Burger, Oya Aran, and Alice Caplier, Modeling hesitation and conflict: A
belief-based approach for multi-class problems, Machine Learning and Applications,
Fourth International Conference on (2006), 95–100.
146. P. Burman, A comparative study of ordinary cross-validation, v-fold cross-validation
and the repeated learning-testing methods, Biometrika 76(3) (1989), 503–514.
147. A. C. Butler, F. Sadeghi, S. S. Rao, and S. R. LeClair, Computer-aided de-
sign/engineering of bearing systems using the Dempster-Shafer theory, Artificial In-
telligence for Engineering Design, Analysis and Manufacturin 9:1 (January 1995),
1–11.
148. R. Buxton, Modelling uncertainty in expert systems, International Journal of Man-
Machine Studies 31 (1989), 415–476.
149. Y. Li C. Hu, Q. Tu and S. Ma, Extraction of parametric human model for posture
recognition using genetic algorithm, Fourth International Conference on Automatic
Face and Gesture Recognition, Grenoble, France, March 2000.
150. X. Xu C. Wen and Z. Li, Research on unified description and extension of combination
rules of evidence based on random set theory, The Chinese Journal of Electronics 17.
151. J. Rocha C. Yaniz and F. Perales, 3D region graph for reconstruction of human motion,
Workshop on Perception of Human Motion at ECCV, 1998.
152. Q. Cai and J.K. Aggarwal, Tracking human motion using multiple cameras, Interna-
tional Conference on Pattern Recognition, 1996.
153. Q. Cai, A. Mitiche, and J.K. Aggarwal, Tracking human motion in an indoor environ-
ment, International Conference on Image Processing, 1995.
154. C. Camerer and M. Weber, Recent developments in modeling preferences: uncertainty
and ambiguity, Journal of Risk and Uncertainty 5 (1992), 325–370.
155. L. Campbell and A. Bobick, Recognition of human body motion using phase space
constraints, ICCV’95, Cambridge, MA, 1995.
156. F. Campos and S. Cavalcante, An extended approach for dempster-shafer theory, In-
formation Reuse and Integration, 2003. IRI 2003. IEEE International Conference on,
Oct 2003, pp. 338–344.
157. F. Campos and F. M. C. de Souza, Extending dempster-shafer theory to overcome
counter intuitive results, 2005 International Conference on Natural Language Process-
ing and Knowledge Engineering, Oct 2005, pp. 729–734.
158. F. Campos and F.M.C. de Souza, Extending Dempster-Shafer theory to overcome
counter intuitive results, Proceedings of IEEE NLP-KE ’05, vol. 3, 2005, pp. 729–
734.
References 557
159. J. Cano, M. Delgado, and S. Moral, An axiomatic framework for propagating uncer-
tainty in directed acyclic networks, International Journal of Approximate Reasoning 8
(1993), 253–280.
160. J. Carlson and R.R. Murphy, Use of Dempster-Shafer conflict metric to adapt sensor
allocation to unknown environments, Tech. report, Safety Security Rescue Research
Center, University of South Florida, 2005.
161. Lucas Caro and Araabi Babak Nadjar, Generalization of the Dempster-Shafer theory:
a fuzzy-valued measure, IEEE Transactions on Fuzzy Systems 7 (1999), 255–270.
162. W. F. Caselton and W. Luo, Decision making with imprecise probabilities: Dempster-
Shafer theory and application, Water Resources Research 28 (1992), 3071–3083.
163. M. E. G. V. Cattaneo, Combining belief functions issued from dependent sources.,
ISIPTA, 2003, pp. 133–147.
164. Marco E. G. V. Cattaneo, Combining belief functions issued from dependent sources.,
ISIPTA, 2003, pp. 133–147.
165. Subhash Challa and Don Koks, Bayesian and dempster-shafer fusion, Sadhana 29
(2004), no. 2, 145–174.
166. T.-J. Cham and J. Rehg, A multiple hypothesis approach to figure tracking, Proceed-
ings of CVPR’99, Fort Collins, Colorado, vol. 2, 1999, pp. 239–245.
167. M. Chan, D. Metaxas, and S. Dickinson, A new approach to tracking 3-d objects in
2-d image sequences, Proc. of AAAI’94, Seattle, WA, August 1994.
168. A. Chateauneuf and J. Y. Jaffray, Some characterizations of lower probabilities and
other monotone capacities through the use of Möbius inversion, Mathematical Social
Sciences 17 (1989), 263–283.
169. A. Chateauneuf and J.Y. Jaffray, Some characterization of lower probabilities and
other monotone capacities through the use of mbius inversion, Math. Soc. Sci. 17
(1989), 263–283.
170. A. Chateauneuf and J.-C. Vergnaud, Ambiguity reduction through new statistical data,
International Journal of Approximate Reasoning 24 (2000), 283–299.
171. Alain Chateauneuf, On the use of capacities in modeling uncertainty aversion and risk
aversion, Journal of Mathematical Economics 20 (1991), no. 4, 343–369.
172. Alain Chateauneuf, Combination of compatible belief functions and relations of speci-
ficity, Papiers d’economie mathematique et applications, Universit Panthon Sorbonne
(Paris 1), 1992.
173. , Decomposable capacities, distorted probabilities and concave capacities,
Mathematical Social Sciences 31 (1996), no. 1, 19 – 37.
174. Alain Chateauneuf and Jean-Yves Jaffray, Some characterizations of lower probabili-
ties and other monotone capacities through the use of mbius inversion, Mathematical
Social Sciences 17 (1989), no. 3, 263 – 283.
175. Alain Chateauneuf and Jean-Yves Jaffray, Local mÖbius transforms on monotone ca-
pacities, Proceedings of the European Conference on Symbolic and Quantitative Ap-
proaches to Reasoning and Uncertainty (London, UK, UK), ECSQARU ’95, Springer-
Verlag, 1995, pp. 115–124.
176. C. W. R. Chau, P. Lingras, and S. K. M. Wong, Upper and lower entropies of be-
lief functions using compatible probability functions, Proceedings of the 7th Interna-
tional Symposium on Methodologies for Intelligent Systems (ISMIS’93) (Z.W. Ko-
morowski, J.; Ras, ed.), Trondheim, Norway, 15-18 June 1993, pp. 306–315.
177. A. Cheaito, M. Lecours, and E. Bosse, A non-ad-hoc decision rule for the Dempster-
Shafer method of evidential reasoning, Proceedings of the SPIE - Sensor Fusion: Ar-
chitectures, Algorithms, and Applications II, Orlando, FL, USA, 16-17 April 1998,
pp. 44–57.
558 References
195. Etienne Cme, Laurent Bouillaut, Patrice Aknin, and Same Allou, Bayesian network
for railway infrastructure diagnosis, IPMU, 2006.
196. B. Cobb and P.P. Shenoy, On the plausibility transformation method for translating
belief function models to probability models, Int. J. Approx. Reasoning 41 (2006),
no. 3, 314–330.
197. B. R. Cobb and P. P. Shenoy, A comparison of bayesian and belief function reasoning,
Information Systems Frontiers 5(4) (2003), 345–358.
198. , A comparison of methods for transforming belief function models to probabil-
ity models, Proceedings of ECSQARU’2003, Aalborg, Denmark, July 2003, pp. 255–
266.
199. B.R. Cobb and P.P. Shenoy, On transforming belief function models to probability
models, Tech. report, University of Kansas, School of Business, Working Paper No.
293, February 2003.
200. Paul R. Cohen and Milton R. Grinberg, A framework for heuristic reasoning about
uncertainty, Proceedings of the Eighth International Joint Conference on Artificial
Intelligence - Volume 1 (San Francisco, CA, USA), IJCAI’83, Morgan Kaufmann
Publishers Inc., 1983, pp. 355–357.
201. , Readings from the ai magazine, American Association for Artificial Intelli-
gence, Menlo Park, CA, USA, 1988, pp. 559–566.
202. D. Comaniciu, V. Ramesh, and P. Meer, Kernel-based object tracking, IEEE Trans.
PAMI 25 (2003).
203. Roger Cooke and Philippe Smets, Self-conditional probabilities and probabilistic in-
terpretations of belief functions, Annals of Mathematics and Artificial Intelligence 32
(2001), no. 1, 269–285.
204. K. Coombs, D. Freel, and D. Lampert S. Brahm, Using Dempster-Shafer methods
for object classification in the theater ballistic missile environment, Proceedings of
the SPIE - Sensor Fusion: Architectures, Algorithms, and Applications III, vol. 3719,
Orlando, FL, USA, 7-9 April 1999, pp. 103–113.
205. C.R. Corlin and J. Ellesggard, Real time tracking of a human arm, Tech. report, Lab-
oratory of Image Analysis, Aalborg University, Denmark, 1998.
206. M. Covell, A. Rahimi, M. Harville, and T. Darrell, Articulated pose estimation using
brightness- and depth-constancy constraints, Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition CVPR’00, Hilton Head Island, SC, USA,
July 2000, pp. 438–445.
207. F. G. Cozman, Calculation of posterior bounds given convex sets of prior probability
measures and likelihood functions, Journal of Computational and Graphical Statistics
8(4) (1999), 824–838.
208. , Credal networks, Artificial Intelligence 120 (2000), 199–233.
209. Fabio G. Cozman, Credal networks, Artificial Intelligence 120 (2000), 199–233.
210. Fabio G. Cozman and Serafı́n Moral, Reasoning with imprecise probabilities, Interna-
tional Journal of Approximate Reasoning 24 (2000), 121–123.
211. Fabio Gagliardi Cozman, Computing posterior upper expectations, International Jour-
nal of Approximate Reasoning 24 (2000), no. 23, 191 – 205.
212. H. H. Crapo and G.-C. Rota, On the foundations of combinatorial theory: combinato-
rial geometries, M.I.T. Press, Cambridge, Mass., 1970.
213. A. Cretual, F. Chaumette, and P. Bouthemy, Complex object tracking by visual ser-
voing based on 2d image motion, International Conference on Pattern Recognition,
1998.
560 References
214. Valerie Cross and T. Sudkamp, Compatibility measures for fuzzy evidential reason-
ing, Proceedings of the Fourth International Conference on Industrial and Engineering
Applications of Artificial Intelligence and Expert Systems, Kauai, HI, USA, 2-5 June
1991, pp. 72–78.
215. Valerie Cross and Thomas Sudkamp, Compatibility and aggregation in fuzzy eviden-
tial reasoning, Proceedings of IEEE, 1991, pp. 1901–1906.
216. J. Crowley, P. Stelmaszyk, T. Skordas, and P. Puget, Measurement and integration of
3D structures by tracking edge lines, Int. J. Computer Vision 8 (1992), 29–52.
217. Peter Cucka and Azriel Rosenfeld, Evidence-based pattern-matching relaxation, Pat-
tern Recognition 26 (1993), no. 9, 1417 – 1427.
218. W. Cui and D. I. Blockley, Interval probability theory for evidential support, Interna-
tional Journal of Intelligent Systems 5 (1990), no. 2, 183–192.
219. Shawn P. Curley and James I. Golden, Using belief functions to represent degrees of
belief, Organizational Behavior and Human Decision Processes 58 (1994), no. 2, 271
– 303.
220. F. Cuzzolin, Probabilistic approximations of belief functions, in preparation.
221. , Probabilistic approximations of belief functions, preparing for submission to
the IEEE Transactions on Systems, Man and Cybernetics B.
222. , Visions of a generalized probability theory, PhD dissertation, Università di
Padova, Dipartimento di Elettronica e Informatica, 19 February.
223. , Lattice modularity and linear independence, 18th British Combinatorial
Conference, Brighton, UK, 2001.
224. , Canonical decomposition of belief functions in the belief space, in prepara-
tion (2002).
225. , Geometry of Dempster’s rule of combination, IEEE Transactions on Systems,
Man and Cybernetics part B 34 (2004), no. 2, 961–977.
226. , Simplicial complexes of finite fuzzy sets, Proceedings of the 10th International
Conference on Information Processing and Management of Uncertainty IPMU’04,
Perugia, Italy, 2004, pp. 1733–1740.
227. , Algebraic structure of the families of compatible frames of discernment, An-
nals of Mathematics and Artificial Intelligence 45(1-2) (2005), 241–274.
228. , Probabilistic approximations of belief functions, in preparation (2005).
229. , The geometry of relative plausibility and belief of singletons, submitted to
the International Journal of Approximate Reasoning (2006).
230. , Learning evidential models for object pose estimation, submitted to the In-
ternational Journal of Approximate Reasoning (2006).
231. , Two new Bayesian approximations of belief functions based on convex geom-
etry, submitted to the IEEE Trans. on Systems, Man, and Cybernetics - part B (2006).
232. , Dual properties of relative belief of singletons, submitted to the IEEE Tr.
Fuzzy Systems (2007).
233. , Geometry of relative plausibility and belief of singletons, submitted to the
Annals of Mathematics and Artificial Intelligence (2007).
234. , On the orthogonal projection of a belief function, Symbolic and Quantitative
Approaches to Reasoning with Uncertainty, Lecture Notes in Computer Science, vol.
4724/2007, Springer Berlin / Heidelberg, 2007, pp. 356–367.
235. , On the relationship between the notions of independence in matroids, lattices,
and boolean algebras, British Combinatorial Conference (BCC’07), Reading, UK,
2007.
236. , Relative plausibility, affine combination, and Dempster’s rule, Tech. report,
INRIA Rhone-Alpes, 2007.
References 561
237. F. Cuzzolin, Two new Bayesian approximations of belief functions based on convex
geometry, IEEE Transactions on Systems, Man, and Cybernetics, Part B 37 (2007),
no. 4, 993–1008.
238. F. Cuzzolin, A geometric approach to the theory of evidence, IEEE Transactions on
Systems, Man and Cybernetics part C (2007 (to appear)).
239. F. Cuzzolin, A geometric approach to the theory of evidence, IEEE Transactions on
Systems, Man, and Cybernetics, Part C: Applications and Reviews 38 (2008), no. 4,
522–534.
240. F. Cuzzolin, Alternative formulations of the theory of evidence based on basic plau-
sibility and commonality assignments, Proceedings of the Pacific Rim International
Conference on Artificial Intelligence (PRICAI’08), Hanoi, Vietnam, 2008.
241. F. Cuzzolin, Dual properties of the relative belief of singletons, PRICAI 2008: Trends
in Artificial Intelligence (2008), 78–90.
242. F. Cuzzolin, Dual properties of the relative belief of singletons, Proceedings of the
Tenth Pacific Rim Conference on Artificial Intelligence (PRICAI’08), Hanoi, Viet-
nam, December 15-19 2008, 2008.
243. , Dual properties of the relative belief of singletons, Proceedings of the Pacific
Rim International Conference on Artificial Intelligence (PRICAI’08), Hanoi, Vietnam,
2008.
244. , A geometric approach to the theory of evidence, IEEE Transactions on Sys-
tems, Man, and Cybernetics - Part C 38 (2008), no. 4, 522–534.
245. , lp consistent approximations of belief functions, IEEE Transactions on Fuzzy
Systems (under review) (2008).
246. , On the credal structure of consistent probabilities, Logics in Artificial Intel-
ligence, vol. 5293/2008, Springer Berlin / Heidelberg, 2008, pp. 126–139.
247. F. Cuzzolin, Semantics of the relative belief of singletons, Interval/Probabilistic Un-
certainty and Non-Classical Logics (2008), 201–213.
248. F. Cuzzolin, Semantics of the relative belief of singletons, International Workshop on
Uncertainty and Logic UNCLOG’08, Kanazawa, Japan, 2008.
249. , Semantics of the relative belief of singletons, Workshop on Uncertainty and
Logic, Kanazawa, Japan, March 25-28 2008, 2008.
250. , Complexes of outer consonant approximations, Proceedings of EC-
SQARU’09, Verona, Italy, 2009.
251. , Complexes of outer consonant approximations, Proceedings of EC-
SQARU’09, 2009.
252. , Credal semantics of bayesian transformations in terms of probability inter-
vals, IEEE Transactions on Systems, Man, and Cybernetics - Part B (to appear) (2009).
253. , The intersection probability and its properties, Symbolic and Quantitative
Approaches to Reasoning with Uncertainty - Lecture Notes in Artificial Intelligence,
vol. 5590/2009, Springer, Berlin / Heidelberg, 2009, pp. 287–298.
254. , Rationale and properties of the intersection probability, submitted to Artifi-
cial Intelligence Journal (2009).
255. F. Cuzzolin, Credal semantics of Bayesian transformations in terms of probability
intervals, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
40 (2010), no. 2, 421–432.
256. F. Cuzzolin, Geometric conditioning of belief functions, Proceedings of BELIEF’10,
Brest, France, 2010.
257. , The geometry of consonant belief functions: simplicial complexes of necessity
measures, Fuzzy Sets and Systems 161 (2010), no. 10, 1459–1479.
562 References
280. F. Cuzzolin and R. Frezza, Evidential modeling for pose estimation, Proceedings of
the 4rd Internation Symposium on Imprecise Probabilities and Their Applications
(ISIPTA’05), Pittsburgh, July 2005.
281. F. Cuzzolin, R. Frezza, A. Bissacco, and S. Soatto, Towards unsupervised detection
of actions in clutter, Proc. of the 2002 Asilomar Conference on Signals, Systems, and
Computers, vol. 1, 2002, pp. 463–467.
282. F. Cuzzolin, A. Sarti, and S. Tubaro, Action modeling with volumetric data, Proc. of
ICIP’04, vol. 2, 2004, pp. 881–884.
283. Fabio Cuzzolin, Geometry of relative plausibility and relative belief of singletons,
Annals of Mathematics and Artificial Intelligence 59 (2010), 47–79.
284. Fabio Cuzzolin, Geometry of upper probabilities, Proceedings of the 3rd Interna-
tion Symposium on Imprecise Probabilities and Their Applications (ISIPTA’03), July
2003.
285. , Families of compatible frames of discernment as semimodular lattices, Proc.
of the International Conference of the Royal Statistical Society (RSS2000), September
2000.
286. , Families of compatible frames of discernment as semimodular lattices, Proc.
of the International Conference of the Royal Statistical Society (RSS2000), September
2000.
287. Fabio Cuzzolin, Alessandro Bissacco, Ruggero Frezza, and Stefano Soatto, Towards
unsupervised detection of actions in clutter, submitted to the International Conference
on Computer Vision (ICCV2001), June 2001.
288. Fabio Cuzzolin and Ruggero Frezza, An evidential reasoning framework for object
tracking, SPIE - Photonics East 99 - Telemanipulator and Telepresence Technologies
VI (Matthew R. Stein, ed.), vol. 3840, 19-22 September 1999, pp. 13–24.
289. , An evidential reasoning framework for object tracking, SPIE - Photonics East
99 - Telemanipulator and Telepresence Technologies VI (Matthew R. Stein, ed.), vol.
3840, 19-22 September 1999, pp. 13–24.
290. , Integrating feature spaces for object tracking, Proc. of the International Sym-
posium on the Mathematical Theory of Networks and Systems (MTNS2000), 21-25
June 2000.
291. , Integrating feature spaces for object tracking, Proc. of the International Sym-
posium on the Mathematical Theory of Networks and Systems (MTNS2000), 21-25
June 2000.
292. , Geometric analysis of belief space and conditional subspaces, Proceedings
of the 2nd International Symposium on Imprecise Probabilities and their Applications
(ISIPTA2001), Cornell University, Ithaca, NY, 26-29 June 2001.
293. , Geometric analysis of belief space and conditional subspaces, Proceedings
of the 2nd International Symposium on Imprecise Probabilities and their Applications
(ISIPTA2001), 26-29 June 2001.
294. , Lattice structure of the families of compatible frames, Proceedings of
the 2nd International Symposium on Imprecise Probabilities and their Applications
(ISIPTA2001), 26-29 June 2001.
295. , Lattice structure of the families of compatible frames, Proceedings of
the 2nd International Symposium on Imprecise Probabilities and their Applications
(ISIPTA2001), 26-29 June 2001.
296. , Sequences of belief functions and model-based data association, submitted to
the IAPR Workshop on Machine Vision Applications (MVA2000), November 28-30,
2000.
564 References
317. V.I. Danilov and G.A. Koshevoy, Cores of cooperative games, superdifferentials of
functions and the minkovski difference of sets, Journal of Mathematical Analysis Ap-
plications 247 (2000), 1–14.
318. N. Daucher, M. Dhome, J. Lapreste, and G. Rives, Modeled object pose estimation and
tracking by monocular vision, BMVC’93, Guildford, UK, September 1993, pp. 249–
258.
319. S.J. Davey and S.B. Colgrove, A unified probabilistic data assotiation filter with mul-
tiple models, Tech. Report DSTO-TR-1184, Surveillance System Division, Electonic
and Surveillance Reserach Lab., 2001.
320. L. de Campos, J. Huete, and S. Moral, Probability intervals: a tool for uncertain rea-
soning, Int. J. Uncertainty Fuzziness Knowledge-Based Syst. 1 (1994), 167–196.
321. , Probability intervals: a tool for uncertain reasoning, IJUFKS 1 (1994), 167–
196.
322. Gert de Cooman, A behavioural model for vague probability assessments, Fuzzy Sets
and Systems 154 (2005), no. 3, 305 – 358.
323. Gert de Cooman and D. Aeyels, A random set description of a possibility measure and
its natural extension, (1998), submitted for publication.
324. Gert de Cooman and Marco Zaffalon, Updating beliefs with incomplete observations,
Artif. Intell. 159 (2004), no. 1-2, 75–125.
325. J. Kampé de Fériet, Interpretation of membership functions of fuzzy sets in terms of
plausibility and belief, Fuzzy Information and Decision Processes (M. M. Gupta and
E. Sanchez, eds.), North-Holland, Amsterdam, 1982, pp. 93–98.
326. J. Kamp de Friet, Interpretation of membership functions of fuzzy sets in terms
of plausibility and belief, Fuzzy Information and Decision Processes (E. Sanchez
M.M. Gupta, ed.), North-Holland, Amsterdam, 1982, p. 9398.
327. F. Dupin de Saint Cyr, J. Lang, and N. Schiex, Penalty logic and its link with Dempster-
Shafer theory, Proceedings of UAI’94, 1994, pp. 204–211.
328. Florence Dupin de Saint-Cyr, Jérôme Lang, and Thomas Schiex, Penalty logic and its
link with Dempster-Shafer theory, CoRR abs/1302.6804 (2013).
329. Q. Delamarre and O. Faugeras, 3D articulated models and multi-view tracking with
silhouettes, Proceedings of ICCV’99, Kerkyra, Greece, vol. 2, 20-27 September 1999,
pp. 716–721.
330. , Finding pose of hand in video images: a stereo-based approach, IEEE Pro-
ceedings of the International Conference on Automatic Face and Gesture Recognition
FG’98, Japan, April 1998, pp. 585–590.
331. , 3D articulated models and multi-view tracking with physical forces, Special
Issue of Computer Vision and Image Understanding on Modeling People 81 (March
2001), 328–357.
332. F. Delmotte and D. Gacquer, Detection of defective sources with belief functions, Pro-
ceedings of IPMU08, 2008.
333. D.F. DeMenthon and L.S. Davis, Model-based object pose in 25 lines of code, Int. J.
Computer Vision 15 (June 1995), 123–141.
334. S. Demotier, W. Schon, and T. Denoeux, Risk assessment based on weak information
using belief functions: a case study in water treatment, IEEE Transactions on Systems,
Man and Cybernetics, Part C 36(3) (May 2006), 382– 396.
335. A. P. Dempster, New methods for reasoning towards posterior distributions based on
sample data, Annals of Mathematical Statistics 37 (1966), 355–374.
336. , Upper and lower probability inferences based on a sample from a finite uni-
variate population, Biometrika 54 (1967), 515–528.
566 References
337. , Bayes, Fischer, and belief fuctions, Bayesian and Likelihood Methods in
Statistics and Economics (S. J. Press S. Geisser, J. S. Hodges and A. Zellner, eds.),
1990.
338. , Construction and local computation aspects of network belief functions, In-
fluence Diagrams, Belief Nets and Decision Analysis (R. M. Oliver and J. Q. Smith,
eds.), Wiley, Chirichester, 1990.
339. , Normal belief functions and the Kalman filter, Tech. report, Department of
Statistics, Harvard Univerisity, Cambridge, MA, 1990.
340. A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete
data via the EM algorithm, Journal of the Royal Statistical Society B 39.
341. A.P. Dempster, Belief functions in the 21st century: A statistical perspective”, bookti-
tle=.
342. , Upper and lower probabilities induced by a multivariate mapping, Annals of
Mathematical Statistics 38 (1967), 325–339.
343. , A generalization of Bayesian inference, Journal of the Royal Statistical Soci-
ety, Series B 30 (1968), 205–247.
344. , Upper and lower probabilities generated by a random closed interval, Annals
of Mathematical Statistics 39 (1968), 957–966.
345. , Upper and lower probabilities inferences for families of hypothesis with
monotone density ratios, Annals of Mathematical Statistics 40 (1969), 953–969.
346. , A generalization of Bayesian inference, Classic Works of the Dempster-
Shafer Theory of Belief Functions, 2008, pp. 73–104.
347. , Lindley’s paradox: Comment, Journal of the American Statistical Association
77:378 (June 1982), 339–341.
348. A.P. Dempster and Augustine Kong, Uncertain evidence and artificial analysis, Tech.
report, S-108, Department of Statistics, Harvard University, 1986.
349. Arthur P. Dempster, A generalization of bayesian inference, Classic Works of the
Dempster-Shafer Theory of Belief Functions, 2008, pp. 73–104.
350. C. Van den Acker, Belief function representation of statistical audit evidence, Interna-
tional Journal of Intelligent Systems 15 (2000), 277–290.
351. Y. Deng and W.-K. Shi, A modified combination rule of evidence theory, Journal-
Shanghai Jiaotong University (2003).
352. Y. Deng, Dong Wang, and Qi Li, An improved combination rule in fault diagnosis
based on dempster shafer theory, 2008 International Conference on Machine Learning
and Cybernetics, vol. 1, July 2008, pp. 212–216.
353. D. Denneberg, Conditioning (updating) non-additive probabilities, Ann. Operations
Res. 52 (1994), 21–42.
354. Dieter Denneberg, Conditioning (updating) non-additive measures, Annals of Opera-
tions Research 52 (1994), no. 1, 21–42.
355. Dieter Denneberg, Representation of the choquet integral with the 6-additive mbius
transform, Fuzzy Sets and Systems 92 (1997), no. 2, 139 – 156, Fuzzy Measures and
Integrals.
356. , Totally monotone core and products of monotone measures, International
Journal of Approximate Reasoning 24 (2000), 273–281.
357. Dieter Denneberg and Michel Grabisch, Interaction transform of set functions over a
finite set, Information Sciences 121 (1999), 149–170.
358. T. Denoeux, Inner and outer approximation of belief structures using a hierarchi-
cal clustering approach, Int. Journal of Uncertainty, Fuzziness and Knowledge-Based
Systems 9(4) (2001), 437–460.
References 567
359. , The cautious rule of combination for belief functions and some extensions,
2006 9th International Conference on Information Fusion, July 2006, pp. 1–8.
360. , Construction of predictive belief functions using a frequentist approach,
IPMU, 2006.
361. , Conjunctive and disjunctive combination of belief functions induced by non
distinct bodies of evidence, Artificial Intelligence (2007).
362. , A new justification of the unnormalized dempster’s rule of combination from
the Least Commitment Principle, Proceedings of FLAIRS’08, Special Track on Un-
certaint Reasoning, 2008.
363. T. Denoeux and A. Ben Yaghlane, Approximating the combination of belief functions
using the fast moebius transform in a coarsened frame, International Journal of Ap-
proximate Reasoning 31(1-2) (October 2002), 77–101.
364. Thierry Denoeux, Modeling vague beliefs using fuzzy-valued belief structures, Fuzzy
Sets and Systems.
365. , A k-nearest neighbour classification rule based on Dempster-Shafer theory,
IEEE Transactions on Systems, Man, and Cybernetics 25:5 (1995), 804–813.
366. , Analysis of evidence-theoretic decision rules for pattern classification, Pat-
tern Recognition 30:7 (1997), 1095–1107.
367. , Reasoning with imprecise belief structures, International Journal of Approx-
imate Reasoning 20 (1999), 79–111.
368. , Reasoning with imprecise belief structures, International Journal of Approx-
imate Reasoning 20 (1999), 79–111.
369. Thierry Denœux, Allowing imprecision in belief representation using fuzzy-valued be-
lief structures, pp. 269–281, Springer US, Boston, MA, 2000.
370. Thierry Denoeux, Allowing imprecision in belief representation using fuzzy-valued
belief structures, Proceedings of IPMU’98, vol. 1, July Paris, 1998, pp. 48–55.
371. , An evidence-theoretic neural network classifier, Proceedings of the 1995
IEEE International Conference on Systems, Man, and Cybernetics (SMC’95), vol. 3,
October 1995, pp. 712–717.
372. Thierry Denoeux and A.P. Dempster, The dempstershafer calculus for statisticians,
International Journal of Approximate Reasoning 48 (2008), no. 2, 365 – 377.
373. Thierry Denoeux and G. Govaert, Combined supervised and unsupervised learning
for system diagnosis using Dempster-Shafer theory, Proceedings of the International
Conference on Computational Engineering in Systems Applications, Symposium on
Control, Optimization and Supervision, CESA ’96 IMACS Multiconference, vol. 1,
Lille, France, 9-12 July 1996, pp. 104–109.
374. Thierry Denoeux, Jrg Kohlas, and Paul-Andr Monney, An algebraic theory for statis-
tical information based on the theory of hints, International Journal of Approximate
Reasoning 48 (2008), no. 2, 378 – 398.
375. T. Denouex, Inner and outer approximation of belief structures using a hierarchical
clustering approach, International Journal of Uncertainty, Fuzziness and Knowledge-
Based Systems 9(4) (2001), 437–460.
376. Thierry Denux, Reasoning with imprecise belief structures, International Journal of
Approximate Reasoning 20 (1999), no. 1, 79 – 111.
377. , Conjunctive and disjunctive combination of belief functions induced by
nondistinct bodies of evidence, Artificial Intelligence 172 (2008), no. 2, 234 – 264.
378. M. C. Desmarais and J. Liu, Experimental results on user knowledge assessment with
an evidential reasoning methodology, Proceedings of the 1993 International Workshop
on Intelligent User Interfaces (W.D. Gray, W.E. Hefley, and D. Murray, eds.), Orlando,
FL, USA, 4-7 January 1993, pp. 223–225.
568 References
379. S. Destercke and T. Burger, Toward an axiomatic definition of conflict between belief
functions, IEEE Trans Cybern. 43.
380. S. Destercke and D. Dubois, Idempotent conjunctive combination of belief functions:
Extending the minimum rule of possibility theory, Information Sciences 181 (2011),
no. 18, 3925 – 3945.
381. S. Destercke, D. Dubois, and E. Chojnacki, Unifying Practical Uncertainty Represen-
tations: I. Generalized P-Boxes, ArXiv e-prints (2008).
382. Sébastien Destercke and Thomas Burger, Revisiting the notion of conflicting belief
functions, pp. 153–160, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.
383. Sébastien Destercke and Didier Dubois, Can the minimum rule of possibility theory
be extended to belief functions?, pp. 299–310, Springer Berlin Heidelberg, Berlin,
Heidelberg, 2009.
384. Sebastien Destercke, Didier Dubois, and Eric Chojnacki, Cautious conjunctive merg-
ing of belief functions, pp. 332–343, Springer Berlin Heidelberg, Berlin, Heidelberg,
2007.
385. M. Deutsch-Mccleish, A model for non-monotonic reasoning using Dempster’s rule,
Uncertainty in Artificial Intelligence 6 (P.P. Bonissone, M. Henrion, L.N. Kanal, and
J.F. Lemmer, eds.), Elsevier Science Publishers, 1991, pp. 481–494.
386. M. Deutsch-McLeish, A study of probabilities and belief functions under conflicting
evidence: comparisons and new method, Proceedings of the 3rd International Confer-
ence on Information Processing and Management of Uncertainty in Knowledge-Based
Systems (IPMU’90) (B. Bouchon-Meunier, R.R. Yager, and L.A. Zadeh, eds.), Paris,
France, 2-6 July 1990, pp. 41–49.
387. M. Deutsch-McLeish, P. Yao, Fei Song, and T. Stirtzinger, Knowledge-acquisition
methods for finding belief functions with an application to medical decision mak-
ing, Proceedings of the International Symposium on Artificial Intelligence (H. Cantu-
Ortiz, F.J.; Terashima-Marin, ed.), Cancun, Mexico, 13-15 November 1991, pp. 231–
237.
388. Mary Deutsch-McLeish, A study of probabilities and belief functions under conflict-
ing evidence: Comparisons and new methods, pp. 41–49, Springer Berlin Heidelberg,
Berlin, Heidelberg, 1991.
389. J. Deutscher, A. Blake, and I. Reid, Articulated body motion capture by annealed par-
ticle filtering, Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition CVPR’00, Hilton Head Island, SC, USA, July 2000, pp. 126–133.
390. J. Deutscher, A. Davidson, and I. Reid, Automatic partitioning of high dimensional
search spaces associated with articulated body motion capture, Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition CVPR’01, Hawaii,
December 2001.
391. J. Dezert and F. Smarandache, A new probabilistic transformation of belief mass as-
signment, 2007.
392. J. Dezert, F. Smarandache, and M. Daniel, The generalized pignistic transformation,
Arxiv preprint cs/0409007 (2004).
393. J. Dezert, P. Wang, and A. Tchamova, On the validity of dempster-shafer theory, Infor-
mation Fusion (FUSION), 2012 15th International Conference on, July 2012, pp. 655–
660.
394. Jean Dezert, Foundations for a new theory of plausible and paradoxical reasoning, In-
formation and Security (Tzv. Semerdjiev, ed.), Bulgarian Academy of Sciences, 2002.
395. Jean Dezert, An introduction to the theory of plausible and paradoxical reasoning,
pp. 12–23, Springer Berlin Heidelberg, Berlin, Heidelberg, 2003.
References 569
396. Jean Dezert and Florentin Smarandache, On the generation of hyper-powersets for the
dsmt, Proc. of Fusion 2003 Conf, 2003, pp. 8–11.
397. , An introduction to dsm theory of plausible, paradoxist, uncertain, and im-
precise reasoning for information fusion, Proc. of the 13th International Congress of
Cybernetics and Systems, 2005.
398. P. Diaconis, Review of ’a mathematical theory of evidence’, Journal of American Sta-
tistical Society 73:363 (1978), 677–678.
399. J. Diaz, M. Rifqi, and B. Bouchon-Meunier, A similarity measure between basic belief
assignments, Proceedings of FUSION’06, 2006.
400. S. Dickinson and D. Metaxas, Integrating qualitative and quantitative shape recovery,
Int. J. Computer Vision 13 (1994), 1–20.
401. S. Dickinson, A. Pentland, and A. Rosenfeld, 3-d shape recovery using distributed
aspect matching, IEEE Trans. PAMI 14 (1992), 174–198.
402. S.J. Dickinson and D. Metaxas, Integrating qualitative and quantitative object repre-
sentations in the recovery and tracking of 3-d shape, Computational and Psychophys-
ical Mechanism of Visual Coding (L. Harris and M. Jenkin, eds.), Cambridge Univer-
sity Press, New York, NY.
403. R. P. Dilworth, Dependence relations in a semimodular lattice, Duke Math. J. 11
(1944), 575–587.
404. A. F. Dragoni, P. Giorgini, and A. Bolognini, Distributed knowledge elicitation
through the Dempster-Shafer theory of evidence: a simulation study, Proceedings of
the Second International Conference on Multi-Agent Systems (ICMAS’96), Kyoto,
Japan, 10-13 December 1996, p. 433.
405. T. Drummond and R. Cipolla, Real-time tracking of complex structures with on-line
camera calibration, Proc. of BMVC’99, Nottingham, 1999, pp. 574–583.
406. , Real-time tracking of multiple articulated structures in multiple views,
ECCV’00, Dublin, Ireland, 2000.
407. I. Dryden and K.V. Mardia, General shape distributions in a plane, Adv. Appl. Prob.
23 (1991), 259:276.
408. Werner Dubitzky, Alex G. Bchner, John G. Hughes, and David A. Bell, Towards
concept-oriented databases, Data and Knowledge Engineering 30 (1999), 23–55.
409. D. Dubois and H. Prade, Unfair coins and necessity measures: towards a possibilistic
interpretation of histograms, Fuzzy Sets and Systems 10 (1983), no. 1, 15–20.
410. , A set-theoretic view of belief functions: Logical operations and approxima-
tions by fuzzy sets, International Journal of General Systems 12 (1986), 193–226.
411. , Possibility theory, Plenum Press, New York, 1988.
412. , Consonant approximations of belief functions, International Journal of Ap-
proximate Reasoning 4 (1990), 419–449.
413. , On the combination of evidence in various mathematical frameworks, Relia-
bility Data Collection and Analysis (J. flamm and T. Luisi, eds.), 1992, pp. 213–241.
414. D. Dubois, H. Prade, and S. Sandri, On possibility/probability transformations, 1993.
415. D. Dubois, H. Prade, and S.A. Sandri, On possibility-probability transformations,
Fuzzy Logic: State of the Art (R. Lowen and M. Lowen, eds.), Kluwer Academic
Publisher, 1993, pp. 103–112.
416. D. Dubois, H. Prade, and Ph. Smets, New semantics for quantitative possibility theory,
Proc. of the 6th European Conference on Symbolic and Quantitative Approaches to
Reasoning and Uncertainty (ECSQARU 2001) (Toulouse, France) (S. Benferhat and
Ph. Besnard, eds.), Springer-Verlag, 2001, pp. 410–421.
570 References
417. Didier Dubois, Hélène Fargie, and Henri Prade, Comparative uncertainty, belief func-
tions and accepted beliefs, Proceedings of the Fourteenth Conference on Uncertainty
in Artificial Intelligence (San Francisco, CA, USA), UAI’98, Morgan Kaufmann Pub-
lishers Inc., 1998, pp. 113–120.
418. Didier Dubois, M. Grabisch, Henri Prade, and Philippe Smets, Using the transferable
belief model and a qualitative possibility theory approach on an illustrative example:
the assessment of the value of a candidate, Intern. J. Intell. Systems (2001).
419. Didier Dubois, Michel Grabisch, Henri Prade, and Philippe Smets, Assessing the value
of a candidate: Comparing belief function and possibility theories, Proceedings of the
Fifteenth Conference on Uncertainty in Artificial Intelligence (San Francisco, CA,
USA), UAI’99, Morgan Kaufmann Publishers Inc., 1999, pp. 170–177.
420. Didier Dubois and Henri Prade, On several representations of an uncertain body of
evidence, Fuzzy Information and Decision Processes (M. M. Gupta and E. Sanchez,
eds.), North Holland, Amsterdam, 1982, pp. 167–181.
421. Didier Dubois and Henri Prade, Combination and propagation of uncertainty with be-
lief functions: A reexamination, Proceedings of the 9th International Joint Conference
on Artificial Intelligence - Volume 1 (San Francisco, CA, USA), IJCAI’85, Morgan
Kaufmann Publishers Inc., 1985, pp. 111–113.
422. , On the unicity of dempster rule of combination, International Journal of In-
telligent Systems 1 (1986), no. 2, 133–142.
423. Didier Dubois and Henri Prade, On the unicity of Dempster’s rule of combination,
International Journal of Intelligent Systems 1 (1986), 133–142.
424. , A set theoretical view of belief functions, International Journal of Intelligent
Systems 12 (1986), 193–226.
425. , The mean value of a fuzzy number, Fuzzy Sets and Systems 24 (1987), 279–
300.
426. , The principle of minimum specificity as a basis for evidential reasoning, Un-
certainty in Knowledge-Based Systems (B. Bouchon and R. R. Yager, eds.), Springer-
Verlag, Berlin, 1987, pp. 75–84.
427. , Properties of measures of information in evidence and possibility theories,
Fuzzy Sets and Systems 24 (1987), 161–182.
428. Didier Dubois and Henri Prade, A tentative comparison of numerical approximate
reasoning methodologies, Int. J. Man-Mach. Stud. 27 (1987), no. 5-6, 717–728.
429. Didier Dubois and Henri Prade, Representation and combination of uncertainty with
belief functions and possibility measures, Computational Intelligence 4 (1988), 244–
264.
430. , Modeling uncertain and vague knowledge in possibility and evidence theo-
ries, Uncertainty in Artificial Intelligence, volume 4 (R. D. Shachter, T. S. Levitt, L. N.
Kanal, and J. F. Lemmer, eds.), North-Holland, 1990, pp. 303–318.
431. , Epistemic entrenchment and possibilistic logic, Artificial Intelligence 50
(1991), 223–239.
432. , Focusing versus updating in belief function theory, Tech. report, Internal Re-
port IRIT/91-94/R, IRIT, Universite P. Sabatier, Toulouse, France, 1991.
433. , Evidence, knowledge, and belief functions, International Journal of Approxi-
mate Reasoning 6 (1992), 295–319.
434. , Evidence, knowledge, and belief functions, International Journal of Approxi-
mate Reasoning 6 (1992), no. 3, 295 – 319.
435. , A survey of belief revision and updating rules in various uncertainty models,
International Journal of Intelligent Systems 9 (1994), 61–100.
References 571
458. Z. Elouedi, K. Mellouli, and P. Smets, Assessing sensor reliability for multisensor data
fusion within the transferable belief model, IEEE Transactions on Systems, Man, and
Cybernetics, Part B (Cybernetics) 34 (2004), no. 1, 782–787.
459. Z. Elouedi, K. Mellouli, and Philippe Smets, Decision trees using belief function the-
ory, Proceedings of the Eighth International Conference IPMU: Information Process-
ing and Management of Uncertainty in Knowledge-based Systems, vol. 1, Madrid,
2000, pp. 141–148.
460. , Classification with belief decision trees, Proceedings of the Nineth Inter-
national Conference on Artificial Intelligence: Methodology, Systems, Architectures:
AIMSA 2000, Varna, Bulgaria, 2000.
461. I. A. Essa and A. O. Pentland, Facial expression recognition using a dynamic model
and motion energy, Proc. of the 5th Conference on Computer Vision, 1995, pp. 360–
367.
462. G. Donato et al., Classifying facial actions, IEEE Journal on Pattern Analysis and
Machine Intelligence, vol. 21(10), October 1999, pp. 974–989.
463. Y DENG F Du, W SHI, Feature extraction of evidence and its application in modifi-
cation of evidence theory, Journal of Shanghai Jiaotong University (2004), 164168.
464. R. Fagin and Joseph Y. Halpern, A new approach to updating beliefs, Uncertainty in
Artificial Intelligence, 6 (L.N. Kanal P.P. Bonissone, M. Henrion and J.F. Lemmer,
eds.), 1991, pp. 347–374.
465. R. Fagin, Joseph Y. Halpern, and Nimrod Megiddo, A logic for reasoning about prob-
abilities, Inf. Comput. 87 (1990), no. 1-2, 78–128.
466. R. Fagin and J.Y. Halpern, Uncertainty, belief and probability, Proc. Intl. Joint Conf.
in AI (IJCAI-89), 1988, pp. 1161–1167.
467. , Uncertainty, belief, and probability, Proc. of AAAI’89, 1989, pp. 1161–1167.
468. Xianfeng Fan and Ming J. Zuo, Fault diagnosis of machines based on ds evidence
theory. part 1: Ds evidence theory and its improvement, Pattern Recognition Letters
27 (2006), no. 5, 366 – 376.
469. Tao Feng, Shao-Pu Zhang, and Ju-Sheng Mi, The reduction and fusion of fuzzy cover-
ing systems based on the evidence theory, International Journal of Approximate Rea-
soning 53 (2012), no. 1, 87 – 103.
470. Juan M. Fernndez-Luna, Juan F. Huete, Benjamin Piwowarski, Abdelaziz Kallel, and
Sylvie Le Hgarat-Mascle, Combination of partially non-distinct beliefs: The cautious-
adaptive rule, International Journal of Approximate Reasoning 50 (2009), no. 7, 1000
– 1021.
471. C. Ferrari and G. Chemello, Coupling fuzzy logic techniques with evidential reason-
ing for sensor data interpretation, Proceedings of Intelligent Autonomous Systems 2
(T. Kanade, F.C.A. Groen, and L.O. Hertzberger, eds.), vol. 2, Amsterdam, Nether-
lands, 11-14 December 1989, pp. 965–971.
472. Scott Ferson, Roger B. Nelsen, Janos Hajagos, Daniel J. Berleant, and Jianzhong
Zhang, Dependence in probabilistic modeling , Dempster-Shafer theory , and prob-
ability bounds analysis, SAND2004-3072 (2004), no. October, 1–151.
473. A. Filippidis, Fuzzy and Dempster-Shafer evidential reasoning fusion methods for de-
riving action from surveillance observations, Proceedings of the Third International
Conference on Knowledge-Based Intelligent Information Engineering Systems, Ade-
laide, September 1999, pp. 121–124.
474. , A comparison of fuzzy and Dempster-Shafer evidential reasoning fusion
methods for deriving course of action from surveillance observations, International
Journal of Knowledge-Based Intelligent Engineering Systems 3:4 (October 1999),
215–222.
References 573
497. Y. Xiao G. Xin and H. You, An improved dempster-shafer algorithm for resolving the
conflicting evidences, International Journal of Information Technology 11 (2005).
498. Zhun ga Liu, Jean Dezert, Quan Pan, and Grgoire Mercier, Combination of sources
of evidence with different discounting factors based on a new dissimilarity measure,
Decision Support Systems 52 (2011), no. 1, 133 – 141.
499. Haim Gaifman, Causation, chance and credence: Proceedings of the irvine conference
on probability and causation volume 1, ch. A Theory of Higher Order Probabilities,
pp. 191–219, Springer Netherlands, Dordrecht, 1988.
500. Fabio Gambino, Giovanni Ulivi, and Marilena Vendittelli, The transferable belief
model in ultrasonic map building, Proceedings of IEEE, 1997, pp. 601–608.
501. H. Garcia-Compeán, J.M. López-Romero, M.A. Rodriguez-Segura, and M. So-
colovsky, Principal bundles, connections and BRST cohomology, Tech. report, Los
Alamos National Laboratory, hep-th/9408003, July 1994.
502. P. Gardenfors, Knowledge in flux: Modeling the dynamics of epistemic states, MIT
Press, Cambridge, MA, 1988.
503. Thomas D. Garvey, John D. Lowrance, and Martin A. Fischler, An inference technique
for integrating knowledge from disparate sources, Proceedings of the 7th International
Joint Conference on Artificial Intelligence - Volume 1 (San Francisco, CA, USA),
IJCAI’81, Morgan Kaufmann Publishers Inc., 1981, pp. 319–325.
504. W. L. Gau and D. J. Buehrer, Vague sets, IEEE Transactions on Systems, Man, and
Cybernetics 23 (1993), no. 2, 610–614.
505. D. M. Gavrila, The visual analysis of human movement: A survey, Computer Vision
and Image Understanding, vol. 73, 1999, pp. 82–98.
506. D. M. Gavrila and L. S. Davis, 3D model-based tracking of humans in action: A
multi-view approach, Proceedings of CVPR’96, San Francisco, CA, 18-20 June 1996,
pp. 73–80.
507. , Towards 3D model-based tracking and recognition of human movement:
A multi-view approach, International Workshop on Face and Gesture Recognition,
Zurich, 1995.
508. D.M. Gavrila, The visual analysis of human movement: a survey, Computer Vision
and Image Understanding 73 (1999), 82–98.
509. Jrg Gebhardt and Rudolf Kruse, The context model: An integrating view of vagueness
and uncertainty, International Journal of Approximate Reasoning 9 (1993), no. 3, 283
– 314.
510. W. Genxiu, Belief function combination and local conflict management, Computer
Engineering and Applications 40.
511. See Ng Geok and Singh Harcharan, Data equalisation with evidence combination for
pattern recognition, Pattern Recognition Letters 19 (1998), 227–235.
512. T. George and N.R. Pal, Quantification of conflict in Dempster-Shafer framework: a
new approach, International Journal Of General System 24.
513. Janos J. Gertler and Kenneth C. Anderson, An evidential reasoning extension to quan-
titative model-based failure diagnosis, IEEE Transactions on Systems, Man, and Cy-
bernetics 22:2 (March/April 1992), 275–289.
514. G. Giacinto, R. Paolucci, and F. Roli, Application of neural networks and statisti-
cal pattern recognition algorithms to earthquake risk evaluation, Pattern Recognition
Letters 18 (1997), 1353–1362.
515. M. A. Giese and T. Poggio, Morphable models for the analysis and synthesis of com-
plex motion patterns, International Journal of Computer Vision, vol. 38(1), 2000,
pp. 1264–1274.
References 575
516. I. Gilboa and D. Schmeidler, Updating ambiguous beliefs, Journal of economic theory
59 (1993), 33–49.
517. , Additive representations of non-additive measures and the choquet integral,
Annals of Operations Research 52 (1994), no. 1, 43–65.
518. Itzhak Gilboa and David Schmeidler, Updating ambiguous beliefs, Journal of Eco-
nomic Theory 59 (1993), no. 1, 33 – 49.
519. Itzhak Gilboa and David Schmeidler, Additive representations of non-additive mea-
sures and the choquet integral, Annals of Operations Research 52 (1994), no. 1, 43–
65.
520. R. Giles, Foundations for a theory of possibility, Fuzzy Information and Decision
Processes (1982), 183–195.
521. Peter R. Gillett, Monetary unit sampling: a belief-function implementation for au-
dit and accounting applications, International Journal of Approximate Reasoning 25
(2000), 43–70.
522. M. L. Ginsberg, Non-monotonic reasoning using Dempster’s rule, Proc. 3rd National
Conference on AI (AAAI-84), 1984, pp. 126–129.
523. Lluı́s Godo, Petr Hájek, and Francesc Esteva, A fuzzy modal logic for belief functions,
Fundam. Inf. 57 (2003), no. 2-4, 127–146.
524. M. Goldszmidt and J. Pearl, Default ranking: A practical framework for evidential
reasoning, belief revision and update, In Proceedings of the 3rd International Confer-
ence on Knowledge Representation and Reasoning, 1992, pp. 661–672.
525. Forouzan Golshani, Enrique Cortes-Rello, and Thomas H. Howell, Dynamic route
planning with uncertain information, Knowledge-based Systems 9 (1996), 223–232.
526. I. J. Good, Subjective probability as the measure of a non-measurable set, Logic,
Methodology, and Philosophy of Science (P. Suppes E. Nagel and A. Tarski, eds.),
Stanford Univ. Press, 1962, pp. 319–329.
527. I. R. Goodman, Fuzzy sets as equivalence classes of random sets, Recent Develop-
ments in Fuzzy Sets and Possibility Theory, 1982, p. 327 343.
528. I. R. Goodman and Hung T. Nguyen, Uncertainty models for knowledge-based sys-
tems, North Holland, New York, 1985.
529. J. Gordon and E. H. Shortliffe, Readings in uncertain reasoning, Morgan Kaufmann
Publishers Inc., San Francisco, CA, USA, 1990, pp. 529–539.
530. J. Gordon and E. H. Shortliffe, A method for managing evidential reasoning in a hi-
erarchical hypothesis space: a retrospective, Artificial Intelligence 59:1-2 (February
1993), 43–47.
531. J. Gordon and Edward H. Shortliffe, A method for managing evidential reasoning in
hierarchical hypothesis spaces, Artificial Intelligence 26 (1985), 323–358.
532. Jean Gordon and Edward H. Shortliffe, A method for managing evidential reasoning
in a hierarchical hypothesis space, Artificial Intelligence 26 (1985), 323–357.
533. Jean Goubault-Larrecq, Automata, languages and programming: 34th international
colloquium, icalp 2007, wrocław, poland, july 9-13, 2007. proceedings, ch. Continu-
ous Capacities on Continuous State Spaces, pp. 764–776, Springer Berlin Heidelberg,
Berlin, Heidelberg, 2007.
534. John Goutsias, Modeling random shapes: an introduction to random closed set the-
ory, Tech. report, Department of Electrical and Computer Engineering, John Hopkins
University, Baltimore, JHU/ECE 90-12, April 1998.
535. John Goutsias, Ronald P.S. Mahler, and Hung T. Nguyen, Random sets: theory and
applications (IMA Volumes in Mathematics and Its Applications, Vol. 97), Springer-
Verlag, December 1997.
576 References
536. M. Grabisch, K-order additive discrete fuzzy measures and their representation, Fuzzy
sets and systems 92 (1997), 167–189.
537. , The Moebius transform on symmetric ordered structures and its application
to capacities on finite sets, Discrete Mathematics 287 (1-3) (2004), 17–34.
538. , Belief functions on lattices, Int. J. of Intelligent Systems (2006).
539. M. Grabisch, T. Murofushi, and M. Sugeno, Fuzzy measures and integrals: theory and
applications, New York: Springer, 2000.
540. Michel Grabisch, Belief functions on lattices, CoRR abs/0811.3373 (2008).
541. Michel Grabisch, Hung T. Nguyen, and Elbert A. Walker, Fundamentals of uncer-
tainty calculi with applications to fuzzy inference, Kluwer Academic Publishers, 1995.
542. M. Grabish, Belief functions on lattices, Int. J. of Intelligent Systems (2009), 1–20.
543. Siegfried Graf, A radon-nikodym theorem for capacities., Journal fr die reine und
angewandte Mathematik 320 (1980), 192–214.
544. K. Grauman, G. Shakhnarovich, and T.J. Darrell, Inferring 3d structure with a statis-
tical image-based shape model, 2003, pp. 641–648.
545. Frank J. Groen and Ali Mosleh, Foundations of probabilistic inference with uncertain
evidence, International Journal of Approximate Reasoning 39 (2005), no. 1, 49 – 83.
546. E. Grosicki, M. Carre, J.M. Brodin, and E. Geoffrois, Results of the rimes evaluation
campaign for handwritten mail processing, International Conference on Document
Analysis and Recognition 0 (2009), 941–945.
547. Benjamin N. Grosof, Evidential confirmation as transformed probability, CoRR
abs/1304.3439 (2013).
548. , An inequality paradigm for probabilistic knowledge, CoRR abs/1304.3418
(2013).
549. J. Guan, D. A. Bell, and V. R. Lesser, Evidential reasoning and rule strengths in expert
systems, Proceedings of AI and Cognitive Science ’90 (N.. McTear, M.F.; Creaney,
ed.), Ulster, UK, 20-21 September 1990, pp. 378–390.
550. J. W. Guan and D. A. Bell, Generalizing the dempster-shafer rule of combination to
boolean algebras, Developing and Managing Intelligent System Projects, 1993., IEEE
International Conference on, Mar 1993, pp. 229–236.
551. , The Dempster-Shafer theory on Boolean algebras, Chinese Journal of Ad-
vanced Software Research 3:4 (November 1996), 313–343.
552. , Evidential reasoning in intelligent system technologies, Proceedings of
the Second Singapore International Conference on Intelligent Systems (SPICIS’94),
vol. 1, Singapore, 14-17 November 1994, pp. 262–267.
553. , A linear time algorithm for evidential reasoning in knowledge base systems,
Proceedings of the Third International Conference on Automation, Robotics and Com-
puter Vision (ICARCV ’94), vol. 2, Singapore, 9-11 November 1994, pp. 836–840.
554. J. W. Guan, D. A. Bell, and Z. Guan, Evidential reasoning in expert systems: computa-
tional methods, Proceedings of the Seventh International Conference on Industrial and
Engineering Applications of Artificial Intelligence and Expert Systems (IEA/AIE-94)
(F.D. Anger, R.V. Rodriguez, and M. Ali, eds.), Austin, TX, USA, 31 May - 3 June
1994, pp. 657–666.
555. Jiwen Guan and David A. Bell, Discounting and combination operations in evidential
reasoning, CoRR abs/1303.1511 (2013).
556. Jiwen Guan, David A. Bell, and Victor R. Lesser, Ai and cognitive science ’90: Uni-
versity of ulster at jordanstown 20–21 september 1990, ch. Evidential Reasoning and
Rule Strengths in Expert Systems, pp. 378–390, Springer London, London, 1991.
557. Jiwen Guan, Jasmina Pavlin, and Victor R. Lesser, Combining evidence in the extended
dempster-shafer theory, pp. 163–178, Springer London, London, 1990.
References 577
558. J.W. Guan and D.A. Bell, Approximate reasoning and evidence theory, Information
Sciences 96 (1997), no. 3, 207 – 235.
559. Zhang Guang-Quan, Semi-lattice structure of all extensions of the possibility measure
and the consonant belief function on the fuzzy set, Fuzzy Sets and Systems 43 (1991),
no. 2, 183 – 188.
560. M. Guironnet, D. Pellerin, and Mich‘ele Rombaut, Camera motion classification
based on the transferable belief model, Proceedings of EUSIPCO’06, Florence, Italy,
2006.
561. DENG Yong GUO Hua-wei, SHI Wen-kang and CHEN Zhi-jun, Evidential conflict
and its 3d strategy: discard, discover and disassemble?, Systems Engineering and
Electronics 6 (2007).
562. Michael A. S. Guth, Uncertainty analysis of rule-based expert systems with dempster-
shafer mass assignments, International Journal of Intelligent Systems 3 (1988), no. 2,
123–139.
563. Q. Liu H. Guo, W. Shi and Y. Deng, A new combination rule of evidence, Journal of
Shanghai Jiaotong University 40 (2006), no. 11, 1895–1900.
564. R. Chellappa H. Moon and A. Rosenfeld, 3D object tracking using shape-encoded par-
ticle propagation, Proceeding of the Eighth IEEE International Conference on Com-
puter Vision (ICCV’01), Vancouver, Canada, July 9-12, 2001, pp. 307–314.
565. H.S. Sawhney H. Tao and R. Kumar, Object tracking with Bayesian estimation of
dynamic layer representation, IEEE Transactions on PAMI 24 (January 2002), 75–89.
566. V. Ha and P. Haddawy, Geometric foundations for interval-based probabilities,
KR’98: Principles of Knowledge Representation and Reasoning (Anthony G. Cohn,
Lenhart Schubert, and Stuart C. Shapiro, eds.), San Francisco, California, 1998,
pp. 582–593.
567. , Theoretical foundations for abstraction-based probabilistic planning, Proc.
of the 12th Conference on Uncertainty in Artificial Intelligence, August 1996,
pp. 291–298.
568. M. Ha-Duong, Hierarchical fusion of expert opinion in the transferable belief model,
application on climate sensivity, Working Papers halshs-00112129-v3, HAL, 2006.
569. Peter Haddawy, A variable precision logic inference system employing the dempster-
shafer uncertainty calculus, PhD dissertation, University of Illinois at Urbana-
Champaign.
570. R. Haenni, Are alternatives to dempster’s rule of combination real alternatives?: Com-
ments on about the belief function combination and the conflict management”, Infor-
mation Fusion 3 (2002), 237–239.
571. , Shedding new light on zadeh’s criticism of dempster’s rule of combination,
2005 7th International Conference on Information Fusion, vol. 2, July 2005, pp. 6
pp.–.
572. , Towards a unifying theory of logical and probabilistic reasoning, Proceed-
ings of ISIPTA’05, 2005.
573. , Aggregating referee scores: an algebraic approach, COMSOC’08, 2nd In-
ternational Workshop on Computational Social Choice (U. Endriss and W. Goldberg,
eds.), 2008, pp. 277–288.
574. R. Haenni and N. Lehmann, Resource bounded and anytime approximation of belief
function computations, International Journal of Approximate Reasoning 31(1-2) (Oc-
tober 2002), 103–154.
575. R. Haenni, J.W. Romeijn, G. Wheeler, and J. Williamson, Possible semantics for a
common framework of probabilistic logics, UncLog’08, International Workshop on
578 References
637. A. Hunter and W. Liu, Fusion rules for merging uncertain information, Information
Fusion 7(1) (2006), 97–134.
638. D. Hunter, Dempster-Shafer versus probabilistic logic, Proceedings of the Third
AAAI Uncertainty in Artificial Intelligence Workshop, 1987, pp. 22–29.
639. Daniel Hunter, Dempster-shafer vs. probabilistic logic, CoRR abs/1304.2713 (2013).
640. E. Hunter, Visual estimation of articulated motion using the expectation-constrained
maximization algorithm, PhD dissertation, University of California at San Diego, Oc-
tober 1999.
641. E.A. Hunter, P.H. Kelly, and R.C. Jain, Estimation of articulated motion using kine-
matically constrained mixture densities, Workshop on Motion of Non-Rigid and Ar-
ticulated Objects, Puerto Rico, USA, 1997.
642. V.-N. Huynh, Y. Nakamori, H. Ono, J. Lawry, V. Kreinovich, and H.T. Nguyen (eds.),
Interval / probabilistic uncertainty and non-classical logics, Springer, 2008.
643. I. Iancu, Prosum-prolog system for uncertainty management, International Journal of
Intelligent Systems 12 (1997), 615–627.
644. Laurie Webster II, Jen-Gwo Chen, Simon S. Tan, Carolyn Watson, and André de Ko-
rvin, Vadidation of authentic reasoning expert systems, Information Sciences 117
(1999), 19–46.
645. S. S. Intille and A. F. Bobick, Visual recognition of multi agent action using binary
temporal relations, Proc. of the Conf. on Computer Vision and Pattern Recognition,
vol. 1, 1999, pp. 56–62.
646. Horace H. S. Ip and Richard C. K. Chiu, Evidential reasonign for facial gesture recog-
nition from cartoon images, Proceedings of IEEE, 1994, pp. 397–401.
647. Horace H. S. Ip and Hon-Ming Wong, Evidential reasonign in foreign exchange rates
forecasting, Proceedings of IEEE, 1991, pp. 152–159.
648. Michael Isard and Andrew Blake, Contour tracking by stochastic propagation of
conditional density, Proceedings of the European Conference of Computer Vision
(ECCV96), 1996, pp. 343–356.
649. Mitsuru Ishizuka, Inference methods based on extended Dempster & Shafer’s theory
for problems with uncertainty/fuzziness, New Generation Computing 1 (1983), no. 2,
159–168.
650. Mitsuru Ishizuka, K.S. Fu, and James T.P. Yao, Inference procedures under uncer-
tainty for the problem-reduction method, Information Sciences 28 (1982), no. 3, 179
– 206.
651. M. Itoh and T. Inagaki, A new conditioning rule for belief updating in the Dempster-
Shafer theory of evidence, Transactions of the Society of Instrument and Control En-
gineers 31:12 (December 1995), 2011–2017.
652. Y. A. Ivanov and A. F. Bobick, Recognition of visual activities and interactions by
stochastic parsing, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.
22(8), 2000, pp. 852–872.
653. Y. Iwai, K. Ogaki, and M. Yachida, Posture estimation using structure and motion
models, ICCV’99, Corfu, Greece, September 1999.
654. S. Iwasawa, J. Ohya, K. Takahashi, T. Sakaguchi, S. Kawato, K. Ebihara, and S. Mor-
ishima, Real-time estimation of human body posture from trinocular images, Interna-
tional Workshop on Modeling People at ICCV’99, Corfu, Greece, September 1999.
655. B. Bascle J. Deutscher, B. North and A. Blake, Tracking through singularities and
discontinuities by random sampling, Proceedings of ICCV’99, 1999, pp. 1144–1149.
656. D. Dubois J. Ma, W. Liu and H. Prade, Revision rules in the theory of evidence, Pro-
ceedings of ICTAI 2010, vol. 1, 2010, pp. 295–302.
582 References
657. Nathan Jacobson, Basic algebra I, Freeman and Company, New York, 1985.
658. J. Y. Jaffray, Application of linear utility theory for belief functions, Uncertainty and
Intelligent Systems, Springer-Verlag, Berlin, 1988, pp. 1–8.
659. , Coherent bets under partially resolving uncertainty and belief functions, The-
ory and Decision 26 (1989), 99–105.
660. , Linear utility theory for belief functions, Operation Research Letters 8
(1989), 107–112.
661. , Bayesian updating and belief functions, IEEE Transactions on Systems, Man
and Cybernetics 22 (1992), 1144–1152.
662. J. Y. Jaffray and P. P. Wakker, Decision making with belief functions: compatibility
and incompatibility with the sure-thing principle, Journal of Risk and Uncertainty 8
(1994), 255–271.
663. JEAN-YVES JAFFRAY, On the maximum-entropy probability which is consis-
tent with a convex capacity, International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems 03 (1995), no. 01, 27–33.
664. J.Y. Jaffray, Dynamic decision making with belief functions, Advances in the
Dempster-Shafer Theory of Evidence (M. Fedrizzi R.R. Yager and J. Kacprzyk, eds.),
Wiley, New York, 1994, pp. 331–352.
665. F. Janez, Fusion de sources d’information definies sur des referentiels non exhaustifs
differents. solutions proposees sous le formalisme de la theorie de l’evidence, PhD
dissertation, University of Angers, France.
666. Fabrice Janez and Alain Appriou, Theory of evidence and non-exhaustive frames of
discernment: Plausibilities correction methods, International Journal of Approximate
Reasoning 18 (1998), no. 1, 1 – 19.
667. R. Jeffrey, Conditioning, kinematics, and exchangeability, Causation, chance, and cre-
dence 1 (1988), 221–255.
668. R.C. Jeffrey, The logic of decision, Mc Graw - Hill, 1965.
669. W. Jiang, A. Zhang, and Q. Yang, A new method to determine evidence discounting
coefficient, Lecture Notes in Computer Science, vol. 5226/2008, 2008, pp. 882–887.
670. Qiang Miao Jianping Yang, Hong-Zhong Huang and Rui Sun, A novel information
fusion method based on dempster-shafer evidence theory for conflict resolution, Intel-
ligent Data Analysis 15.
671. N. Jojic, J. Gu, H.C. Shen, and T. Huang, 3-d reconstruction of multipart self-
occluding objects, Asian Conference on Computer Vision, 1998.
672. A. Josang, M. Daniel, and P. Vannoorenberghe, Strategies for combining conflicting
dogmatic beliefs, Proceedings of Fusion 2003, vol. 2, 2003, pp. 1133–1140.
673. , Strategies for combining conflicting dogmatic beliefs, Information Fusion,
2003. Proceedings of the Sixth International Conference of, vol. 2, July 2003,
pp. 1133–1140.
674. Audun Jøsang, A logic for uncertain probabilities, Int. J. Uncertain. Fuzziness Knowl.-
Based Syst. 9 (2001), no. 3, 279–311.
675. Audun Jøsang and Zied Elouedi, Symbolic and quantitative approaches to reasoning
with uncertainty: 9th european conference, ecsqaru 2007, hammamet, tunisia, october
31 - november 2, 2007. proceedings, ch. Interpreting Belief Functions as Dirichlet
Distributions, pp. 393–404, Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
676. Audun Josang and Simon Pope, Dempster’s rule as seen by little colored balls, Com-
put. Intell. 28 (2012), no. 4, 453–474.
677. Audun Josang, Simon Pope, and David McAnally, Normalising the consensus opera-
tor for belief fusion, IPMU, 2006.
References 583
719. Frank Klawonn and Philippe Smets, The dynamic of belief in the transferable belief
model and specialization-generalization matrices, Proceedings of the Eighth Interna-
tional Conference on Uncertainty in Artificial Intelligence (San Francisco, CA, USA),
UAI’92, Morgan Kaufmann Publishers Inc., 1992, pp. 130–137.
720. J. Klein and O. Colot, Automatic discounting rate computation using a dissent crite-
rion, Workshop on the theory of belief functions (BELIEF 2010), 2010, pp. 1–6.
721. John Klein, Christle Lecomte, and Pierre Mich, Hierarchical and conditional combi-
nation of belief functions induced by visual tracking, International Journal of Approx-
imate Reasoning 51 (2010), no. 4, 410 – 428.
722. G. J. Klir, Dynamic decision making with belief functions, Measures of uncertainty in
the Dempster-Shafer theory of evidence (M. Fedrizzi R. R. Yager and J. Kacprzyk,
eds.), Wiley, New York, 1994, pp. 35–49.
723. G. J. Klir and T. A. Folger, Fuzzy sets, uncertainty and information, Prentice Hall,
Englewood Cliffs (NJ), 1988.
724. G. J. Klir and A. Ramer, Uncertainty in the Dempster-Shafer theory: a critical re-
examination, International Journal of General Systems 18 (1990), 155–166.
725. G. J. Klir and B. Yuan, Fuzzy sets and fuzzy logic: theory and applications, Prentice
Hall PTR, Upper Saddle River, NJ, 1995.
726. George J. Klir, Principles of uncertainty: What are they? why do we need them?, Fuzzy
Sets and Systems 74 (1995), 15–31.
727. , On fuzzy-set interpretation of possibility theory, Fuzzy Sets and Systems 108
(1999), 263–273.
728. , Generalized information theory: aims, results, and open problems, Reliabil-
ity Engineering System Safety 85 (2004), no. 13, 21 – 38, Alternative Representations
of Epistemic Uncertainty.
729. , Generalized information theory: aims, results, and open problems, Reliabil-
ity Engineering System Safety 85 (2004), no. 13, 21 – 38, Alternative Representations
of Epistemic Uncertainty.
730. George J. Klir and David Harmanec, Generalized information theory, Kybernetes 25
(1996), no. 7/8, 50–67.
731. George J. Klir, Wang Zhenyuan, and David Harmanec, Constructing fuzzy measures
in expert systems, Fuzzy Sets and Systems 92 (1997), 251–264.
732. M. A. Klopotek, A. Matuszewski, and S. T. Wierzchon, Overcoming negative-valued
conditional belief functions when adapting traditional knowledge acquisition tools to
Dempster-Shafer theory, Proceedings of the International Conference on Computa-
tional Engineering in Systems Applications, Symposium on Modelling, Analysis and
Simulation, CESA ’96 IMACS Multiconference, vol. 2, Lille, France, 9-12 July 1996,
pp. 948–953.
733. M.A. Klopotek and S.T. Wierzchon, An interpretation for the conditional belief func-
tion in the theory of evidence, Foundations of intelligent systems - Lecture Notes in
Computer Science, vol. 1609/1999, Springer Berlin/Heidelberg, 1999, pp. 494–502.
734. Mieczysław A. Kłopotek and Sławomir T. Wierzchoń, Rough sets and current trends
in computing: First international conference, rsctc’98 warsaw, poland, june 22–26,
1998 proceedings, ch. A New Qualitative Rough-Set Approach to Modeling Belief
Functions, pp. 346–354, Springer Berlin Heidelberg, Berlin, Heidelberg, 1998.
735. Mieczysław Alojzy Kłopotek and Sławomir Tadeusz Wierzchoń, Belief functions in
business decisions, ch. Empirical Models for the Dempster-Shafer-Theory, pp. 62–
112, Physica-Verlag HD, Heidelberg, 2002.
736. W.W. Koczkodaj, A new definition of consistency of pairwise comparisons, Mathemat-
ical and Computer Modelling 18 (1993), no. 7, 79 – 84.
586 References
757. Jurg Kohlas, Paul-André Monney, R. Haenni, and N. Lehmann, Model-based diag-
nostics using hints, Symbolic and Quantitative Approaches to Uncertainty, European
Conference ECSQARU95, Fribourg (Ch. Fridevaux and J. Kohlas, eds.), Springer,
1995, pp. 259–266.
758. P. Kohli and Ph. Torr, Efficiently solving dynamic Markov random fields using graph
cuts, Proceedings of ICCV’05, vol. 2, 2005, pp. 922–929.
759. Don Koks and Subhash Challa, An introduction to Bayesian and Dempster-Shafer data
fusion, Tech. report, Defence Science and Tech Org, 2003.
760. D. Koller and H.-H. Nagel, Model-based object tracking in monocular image se-
quences of road traffic scenes, Int. J. Computer Vision 10 (1993), 257–281.
761. H. Kollnig and H.-H. Nagel, 3D pose estimation by fitting image gradients directly to
polyhedral models, ICCV’95, Boston, MA, May 1995, pp. 569–574.
762. Augustine Kong, Multivariate belief functions and graphical models, PhD disserta-
tion, Harvard University, Department of Statistics, 1986.
763. B. 0. Koopman, The bases of probability, Bull. Amer. Math. Soc. 46.
764. , The axioms and algebra of intuitive probability, Ann. Math. 41 (1940), 269–
292.
765. P. Korpisaari and J. Saarinen, Dempster-Shafer belief propagation in attribute fu-
sion, Proceedings of the Second International Conference on Information Fusion (FU-
SION’99), vol. 2, Sunnyvale, CA, USA, 6-8 July 1999, pp. 1285–1291.
766. G.A. Koshevoy, Distributive lattices and products of capacities, Journal of Mathemat-
ical Analysis Applications 219 (1998), 427–441.
767. Volker Kraetschmer, Constraints on belief functions imposed by fuzzy random vari-
ables: Some technical remarks on romer-kandel, IEEE Transactions on Systems, Man,
and Cybernetics, Part B: Cybernetics 28 (1998), 881–883.
768. I. Kramosil, Probabilistic analysis of belief functions.
769. Ivan Kramosil, Expert systems with non-numerical belief functions, Problems of Con-
trol and Information Theory 17 (1988), 285–295.
770. , Possibilistic belief functions generated by direct products of single possibilis-
tic measures, Neural Network World 9:6 (1994), 517–525.
771. , Approximations of believeability functions under incomplete identification of
sets of compatible states, Kybernetika 31 (1995), 425–450.
772. , Dempster-Shafer theory with indiscernible states and observations, Interna-
tional Journal of General Systems 25 (1996), 147–152.
773. , Expert systems with non-numerical belief functions, Problems of control and
information theory 16 (1996), 39–53.
774. , Belief functions generated by signed measures, Fuzzy Sets and Systems 92
(1997), 157–166.
775. , Probabilistic analysis of Dempster-Shafer theory. part one, Tech. report,
Academy of Science of the Czech Republic, Technical Report 716, 1997.
776. , Probabilistic analysis of Dempster-Shafer theory. part three., Tech. report,
Academy of Science of the Czech Republic, Technical Report 749, 1998.
777. , Probabilistic analysis of Dempster-Shafer theory. part two., Tech. report,
Academy of Science of the Czech Republic, Technical Report 749, 1998.
778. , Fuzzy measures and integrals measure-theoretic approach to the inversion
problem for belief functions, Fuzzy Sets and Systems 102 (1999), no. 3, 363 – 369.
779. , Measure-theoretic approach to the inversion problem for belief functions,
Fuzzy Sets and Systems 102 (1999), 363–369.
588 References
780. IVAN KRAMOSIL, Dempster combination rule with boolean-like processed belief
functions, International Journal of Uncertainty, Fuzziness and Knowledge-Based Sys-
tems 09 (2001), no. 01, 105–121.
781. Ivan Kramosil, Belief functions generated by fuzzy and randomized compatibility re-
lations, Fuzzy Sets and Systems 135 (2003), no. 3, 341 – 366.
782. , Nonspecificity degrees of basic probability assignments in Dempster-Shafer
theory, Computers and Artificial Intelligence 18:6 (April-June 1993), 559–574.
783. , Belief functions with nonstandard values, Proceedings of Qualitative and
Quantitative Practical Reasoning (Dav Gabbay, Rudolf Kruse, Andreas Nonnengart,
and H. J. Ohlbach, eds.), Bonn, June 1997, pp. 380–391.
784. , Dempster combination rule for signed belief functions, International Journal
of Uncertainty, Fuzziness and Knowledge-Based Systems 6:1 (February 1998), 79–
102.
785. , Jordan decomposition of signed belief functions, Proceedings of the inter-
national conference on Information Processing and Management of Uncertainty in
Knowledge-Based Systems (IPMU’96), Granada, Universidad de Granada, July 1996,
pp. 431–434.
786. , Monte-carlo estimations for belief functions, Proceedings of the Fourth In-
ternational Conference on Fuzzy Sets Theory and Its Applications (A. Heckerman,
D.; Mamdani, ed.), vol. 16, Liptovsky Jan, Slovakia, 2-6 Feb. 1998, pp. 339–357.
787. , Definability of belief functions over countable sets by real-valued ran-
dom variables, IPMU. Information Processing and Management of Uncertainty in
Knowledge-Based Systems (Svoboda V., ed.), vol. 3, Paris, July 1994, pp. 49–50.
788. , Toward a boolean-valued Dempster-Shafer theory, LOGICA ’92 (Svoboda
V., ed.), Prague, 1993, pp. 110–131.
789. , A probabilistic analysis of Dempster combination rule, The Logica. Year-
book 1997 (Childers Timothy, ed.), Prague, 1997, pp. 174–187.
790. , Measure-theoretic approach to the inversion problem for belief functions,
Proceedings of IFSA’97, Seventh International Fuzzy Systems Association World
Congress, vol. 1, Prague, Academia, June 1997, pp. 454–459.
791. , Strong law of large numbers for set-valued random variables, Proceedings
of the 3rd Workshop on Uncertainty Processing in Expert Systems, Prague, University
of Economics, September 1994, pp. 122–142.
792. David H. Krantz and John Miyamoto, Priors and likelihood ratios as evidence, Journal
of the American Statistical Association 78 (June 1983), 418–423.
793. P. Krause and D. Clark, Representing uncertain knowledge, Kluwer, Dordrecht, 1993.
794. R. Krause and E. Schwecke, Specialization: a new concept for uncertainty handling
with belief functions, International Journal of General Systems 18 (1990), 49–60.
795. Claude; Kreinovich, Vladik; Langrand and Hung T. Nguyen, Combining fuzzy and
probabilistic knowledge using belief functions, Tech. report, University of Texas at El
Paso, Departmental Technical Reports (CS), Paper 414, 2001.
796. R. Kruse, D. Nauck, and F. Klawonn, Reasoning with mass, Uncertainty in Artificial
Intelligence (P. Smets B. D. DÁmbrosio and P. P. Bonissone, eds.), Morgan Kaufmann,
San Mateo, CA, 1991, pp. 182–187.
797. R. Kruse, E. Schwecke, and F. Klawonn, On a tool for reasoning with mass distribu-
tion, Proceedings of the 12th International Joint Conference on Artificial Intelligence
(IJCAI91), vol. 2, 1991, pp. 1190–1195.
798. Rudolf Kruse and Erhard Schwecke, Specialization - a new concept for uncertainty
handling with belief functions, International Journal of General Systems 18 (1990),
no. 1, 49–60.
References 589
799. J.J. Kuch and T. Huang, Vision based hand modeling and tracking for virtual telecon-
ferencing and telecollaboration, Proc. of the Fifth ICCV, pp. 666–672.
800. J. K
”uhr and D. Mundici, De Finetti theorem and Borel states in [0, 1]-valued algebraic
logic, International journal of approximate reasoning 46 (2007), no. 3, 605–616.
801. H. Kyburg, Bayesian and non-Bayesian evidential updating, Artificial Intelligence
31:3 (1987), 271–294.
802. H. E. Kyburg, Bayesian and non-Bayesian evidential updating, Artificial Intelligence
31 (1987), 271–293.
803. Henry E. Kyburg and Jr., Interval-valued probabilities, 1998.
804. E. Ursella ad P. Perona L. Goncalves, E. Di Bernardo, Monocular tracking of the
human arm in 3D, Proceedings of the International Conference on Computer Vision
ICCV’95, Cambridge, MA, 1995, pp. 764–770.
805. M. Lamata and S. Moral, Calculus with linguistic probabilites and belief, Advances
in the Dempster-Shafer Theory of Evidence, Wiley, New York, 1994, pp. 133–152.
806. M.T. Lamata and S. Moral, Classification of fuzzy measures, Fuzzy Sets and Systems
33 (1989), no. 2, 243 – 253.
807. S. Mac Lane, A lattice formulation for transcendence degrees and p-bases, Duke
Math. J. 4 (1938), 455–468.
808. K. Laskey and P.E. Lehner, Belief manteinance: an integrated approach to uncertainty
management, Proceeding of the Seventh National Conference on Artificial Intelligence
(AAAI-88), vol. 1, 1988, pp. 210–214.
809. K. B. Laskey, Beliefs in belief functions: an examination of Shafer’s canonical exam-
ples, AAAI Third Workshop on Uncertainty in Artificial Intelligence, Seattle, 1987,
pp. 39–46.
810. Kathryn Blackmond Laskey, Belief in belief functions: An examination of shafer’s
canonical examples, CoRR abs/1304.2715 (2013).
811. Kathryn Blackmond Laskey and Paul E. Lehner, Assumptions, beliefs and probabili-
ties, Artificial Intelligence 41 (1989), 65–77.
812. , Assumptions, beliefs and probabilities, Artificial Intelligence 41 (1989),
no. 1, 65 – 77.
813. Chia-Hoang Lee, A comparison of two evidential reasoning schemes, Artificial Intel-
ligence 35 (1988), 127–134.
814. E. S. Lee and Q. Zhu, Fuzzy and evidential reasoning, Physica-Verlag, Heidelberg,
1995.
815. E.S. Lee and Qing Zhu, An interval Dempster-Shafer approach, Computers Mathe-
matics with Applications 24 (1992), no. 7, 89 – 95.
816. H.J. Lee and Z. Chen, Determination of 3D human body posture from a single view,
Computer Vision, Graphics, and Image Processing 30 (1985), 148–168.
817. , Knowledge-guided visual perception of 3-d human gait from a single image
sequence, IEEE Transactions on Systems, Man, and Cybernetics 22 (March 1992).
818. Seung-Jae Lee, Sang-Hee Kang, Myeon-Song Choi, Sang-Tae Kim, and Choong-Koo
Chang, Protection level evaluation of distribution systems based on Dempster-Shafer
theory of evidence, Proceedings of the IEEE Power Engineering Society Winter Meet-
ing, vol. 3, Singapore, 23-27 January 2000, pp. 1894–1899.
819. E. Lefevre, O. Colot, and P. Vannoorenberghe, Belief function combination and con-
flict management, Information Fusion 3 (2002), no. 2, 149 – 162.
820. , Belief functions combination and conflict management, Information Fusion
Journal 3 (2002), no. 2, 149–162.
590 References
880. WEIRU LIU, JUN HONG, M.F. McTEAR, and J.G. HUGHES, An extended frame-
work for evidential reasoning systems, International Journal of Pattern Recognition
and Artificial Intelligence 07 (1993), no. 03, 441–457.
881. Weiru Liu, Jun Hong, and Micheal F. McTear, An extended framework for evidential
reasoning systems, Proceedings of IEEE, 1990, pp. 731–737.
882. Tang Hai Ying Chen Jian Zhong Liu Da You, Ouyang Ji Hong and Yu Qiang Yuan,
Research on a simplified evidence theory model, Journal of Computer Research and
Development (1999).
883. L. Ljung and T. Sderstrm, Theory and practice of recursive identification, MIT Press,
1983.
884. K. C. Lo, Agreement and stochastic independence of belief functions, Mathematical
Social Sciences 51(1) (2006), 1–22.
885. G. Lohmann, An evidential reasoning approach to the classification of satellite im-
ages, Symbolic and Qualitative Approaches to Uncertainty (R. Kruse and P. Siegel,
eds.), Springer-Verlag, Berlin, 1991, pp. 227–231.
886. W. Long and Y.-H. Yang, Log-tracker: An attribute based approach to tracking human
body motion, Pattern Recognition and Artificial Intelligence 5 (1991), 439–458.
887. Pierre Loonis, El-Hadi Zahzah, and Jean-Pierre Bonnefoy, Multi-classifiers neural
network fusion versus Dempster-Shafer’s orthogonal rule, Proceedings of IEEE, 1995,
pp. 2162–2165.
888. D. Lowe, Integrated treatment of matching and measurement errors for robust model-
based motion tracking, ICCV’90, 1990, pp. 436–440.
889. , Fitting parameterised 3-d models to images, IEEE Trans. PAMI 13 (1991),
441–450.
890. , Robust model-based motion tracking through the integration of search and
estimation, International Journal on Computer Vision 8 (1992), 113–122.
891. John D. Lowrance, Evidential reasoning with gister: A manual, Tech. report, Artificial
Intelligence Center, SRI International, 333 Ravenswood Avenue, Menlo Park, CA.,
1987.
892. , Automated argument construction, Journal of Statistical Planning Inference
20 (1988), 369–387.
893. , Evidential reasoning with gister-cl: A manual, Tech. report, Artificial Intel-
ligence Center, SRI International, 333 Ravenswood Avenue, Menlo Park, CA., 1994.
894. John D. Lowrance and T. D. Garvey, Evidential reasoning: A developing concept, Pro-
ceedings of the Internation Conference on Cybernetics and Society (Institute of Elec-
trical and Electronical Engineers, eds.), 1982, pp. 6–9.
895. , Evidential reasoning: an implementation for multisensor integration, Tech.
report, SRI International, Menlo Park, CA, Technical Note 307, 1983.
896. John D. Lowrance, T. D. Garvey, and Thomas M. Strat, A framework for evidential-
reasoning systems, Proceedings of the National Conference on Artificial Intelligence
(American Association for Artificial Intelligence, ed.), 1986, pp. 896–903.
897. John D. Lowrance, T.D. Garvey, and Thomas M. Strat, A framework for evidential
reasoning systems, Readings in uncertain reasoning (Shafer and Pearl, eds.), Morgan
Kaufman, 1990, pp. 611–618.
898. C.-P. Lu, G.D. Hager, and E. Mjolsness, Fast and globally convergent pose estimation
from video images, IEEE Trans. PAMI 22 (2000), 610–622.
899. S. Y. Lu and H. E. Stephanou, A set-theoretic framework for the processing of uncer-
tain knowledge.
900. Y. Luo, F.J. Perales, and J.J. Villanueva, An automatic rotoscopy system for human
motion base on a biomechanical graphical model, Computers and Graphics 16 (1992).
594 References
921. M.-H. Masson and T. Denoeux, Belief functions and cluster ensembles, ECSQARU,
July 2009, pp. 323–334.
922. B. Mates, Elementary logic, Oxford University Press, 1972.
923. G. Matheron, Random sets and integral geometry, Wiley Series in Probability and
Mathematical Statistics.
924. , Random sets and integral geometry, Wiley, 1970.
925. , Random sets and integral geometry, Wiley, NY, 1975.
926. S. Mathevet, L. Trassoudaine, P. Checchin, and J. Auzon, Combinaison de segmenta-
tions en rgions, Traitement du signal (1999).
927. Thomas Maurer and Christoph von der Malsburg, Tracking and learning graphs and
pose on image sequences of faces, FG ’96: Proceedings of the 2nd International
Conference on Automatic Face and Gesture Recognition (FG ’96) (Washington, DC,
USA), IEEE Computer Society, 1996, p. 76.
928. Sally McClean and Bryan Scotney, Using evidence theory for the integration of dis-
tributed databases, International Journal of Intelligent Systems 12 (1997), 763–776.
929. Sally McClean, Bryan Scotney, and Mary Shapcott, Using background knowledge in
the aggregation of imprecise evidence in databases, Data and Knowledge Engineering
32 (2000), 131–143.
930. G. McLachlan and D. Peel, Finite mixture models, Wiley-Interscience, 2000.
931. G. V. Meghabghab and D. B. Meghabghab, Multiversion information retrieval: per-
formance evaluation of neural networks vs. Dempster-Shafer model, Proceedings of
the Third Golden West International Conference on Intelligent Systems (E.A. Yfantis,
ed.), Las Vegas, NV, USA, 6-8 June 1994, pp. 537–545.
932. T. Melkonyan and R. Chambers, Degree of imprecision: Geometric and algebraic ap-
proaches, International Journal of Approximate Reasoning (2006).
933. K. Mellouli, On the propagation of beliefs in networks using the Dempster-Shafer
theory of evidence, PhD dissertation, University of Kansas, School of Business, 1986.
934. Khaled Mellouli and Zied Elouedi, Pooling experts opinion using Dempster-Shafer
theory of evidence, Proceedings of IEEE, 1997, pp. 1900–1905.
935. D. Mercier, T. Denoeux, and M. h. Masson, General correction mechanisms for weak-
ening or reinforcing belief functions, 2006 9th International Conference on Informa-
tion Fusion, July 2006, pp. 1–7.
936. D. Mercier, T. Denoeux, and M. Masson, Refined sensor tuning in the belief function
framework using contextual discounting, IPMU, 2006.
937. David Mercier, Thierry Denœux, and Marie-Hélène Masson, Belief function correc-
tion mechanisms, pp. 203–222, Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.
938. David Mercier, Benjamin Quost, and Thierry Denœux, Contextual discounting of be-
lief functions, pp. 552–562, Springer Berlin Heidelberg, Berlin, Heidelberg, 2005.
939. David Mercier, Benjamin Quost, and Thierry Denux, Refined modeling of sensor re-
liability in the belief function framework using contextual discounting, Information
Fusion 9 (2008), no. 2, 246 – 258.
940. D. Metaxas and D. Terzopoulos, Shape and nonrigid motion estimation through
physics-based synthesis, IEEE Trans. Pattern Analysis and Machine Intelligence 15
(1993), 580–591.
941. D. Meyer, J. Denzler, and H. Niemann, Model based extraction of articulated objects
in image sequences, Fourth International Conference on Image Processing, 1997.
942. I. Mikic, Human body model acquisition and tracking using multi-camera voxel data,
PhD dissertation, University of California at San Diego, January 2002.
596 References
943. I. Mikic, M. Trivedi, E. Hunter, and P. Cosman, Articulated body posture estimation
from multi-camera voxel data, Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition CVPR’01, Hawaii, December 2001.
944. E. Miranda, I. Couso, and P. Gil, Extreme points of credal sets generated by 2-
alternating capacities, International Journal of Approximate Reasoning 33 (2003),
95–115.
945. E. Miranda and G. de Cooman, Marginal extension in the theory of coherent lower
previsions, Int. J. of Approximate Reasoning 46 (2007), no. 1, 188–225.
946. Enrique Miranda, A survey of the theory of coherent lower previsions, International
Journal of Approximate Reasoning 48 (2008), no. 2, 628 – 658, In Memory of Philippe
Smets (19382005).
947. Enrique Miranda, Hung T. Nguyen, and Jrg Kohlas, Special section: Random sets and
imprecise probabilities (issues in imprecise probability) uncertain information: Ran-
dom variables in graded semilattices, International Journal of Approximate Reasoning
46 (2007), no. 1, 17 – 34.
948. P. Miranda, M. Grabisch, and P. Gil, On some results of the set of dominating k-
additive belief functions, IPMU, 2004, pp. 625–632.
949. P. Miranda, M. Grabisch, and P. Gil, Dominance of capacities by k-additive belief
functions, European Journal of Operational Research 175 (2006), 912–930.
950. T. Moeslund, Summaries of 107 computer vision-based human motion capture papers,
Tech. report, Laboratory of Image Analysis, Aalborg University, Denmark, 1999.
951. T. Moeslund and E. Granum, A survey of computer vision-based human motion cap-
ture, Image and Vision Computing 81 (2001), 231–268.
952. T.B. Moeslund and E. Granum, 3D human pose estimation using 2D-data and an
alternative phase space representation, Workshop on Human Modeling, Analysis and
Synthesis at CVPR2000, Hilton Head Island, June 2000.
953. , Multiple cues in model-based human motion capture, Fourth International
Conference on Automatic Face and Gesture Recognition, Grenoble, France, March
2000.
954. S. M. Mohiddin and T. S. Dillon, Evidential reasoning using neural networks, Pro-
ceedings of IEEE, 1994, pp. 1600–1606.
955. I. Molchanov, Theory of random sets, Springer-Verlag, 2005.
956. Paul-Andr Monney, Analyzing linear regression models with hints and the dempster-
shafer theory, International Journal of Intelligent Systems 18 (2003), no. 1, 5–29.
957. Paul-Andr Monney, Moses W. Chan, Enrique H. Ruspini, and Marco E.G.V. Cattaneo,
Belief functions combination without the assumption of independence of the informa-
tion sources, International Journal of Approximate Reasoning 52 (2011), no. 3, 299 –
315.
958. Paul-Andr Monney, Moses W. Chan, Enrique H. Ruspini, and Johan Schubert, De-
pendence issues in knowledge-based systems conflict management in dempstershafer
theory using the degree of falsity, International Journal of Approximate Reasoning 52
(2011), no. 3, 449 – 460.
959. Paul-Andre Monney, A mathematical theory of arguments for statistical evidence,
Physica, 19 November 2002.
960. Paul-André Monney, Planar geometric reasoning with the thoery of hints, Computa-
tional Geometry. Methods, Algorithms and Applications, Lecture Notes in Computer
Science, vol. 553 (H. Bieri and H. Noltemeier, eds.), 1991, pp. 141–159.
961. Paul-André Monney, Dempster specialization matrices and the combination of belief
functions, pp. 316–327, Springer Berlin Heidelberg, Berlin, Heidelberg, 2001.
References 597
962. Andrew Moore, Very fast em-based mixture model clustering using multiresolution kd-
trees, Advances in Neural Information Processing Systems (340 Pine Street, 6th Fl.,
San Francisco, CA 94104) (M. Kearns and D. Cohn, eds.), Morgan Kaufman, April
1999, pp. 543–549.
963. D. Moore, I. Essa, and M. Hayes III, Exploiting human actions and object context for
recognition tasks, Proc. of the International Conference on Computer Vision, vol. 1,
1999, pp. 80–86.
964. S. Moral and L. M. de Campos, Partially specified belief functions, Proceedings of the
Ninth Conference on Uncertainty in Artificial Intelligence (A. Heckerman, D.; Mam-
dani, ed.), Washington, DC, USA, 9-11 July 1993, pp. 492–499.
965. S. Moral and N. Wilson, Importance sampling monte-carlo algorithms for the calcu-
lation of dempster-shafer belief, Proc. of IPMU’96, 1996.
966. Serafin Moral and Antonio Salmeron, A Monte-Carlo algorithm for combining
Dempster-Shafer belief based on approximate pre-computation, Proceedings of The
Fifth European Conference on Symbolic and Quantitative Approaches to Reasoning
with Uncertainty - Ecsqaru ( Lecture Notes in Computer Science Series), London, 5-9
July 1999.
967. D. Morris and J.M. Rehg, Singularity analysis for articulated object tracking, Pro-
ceedings of CVPR’98, 1998, pp. 289–296.
968. E. Moutogianni and M. Lalmas, A Dempster-Shafer indexing for structured document
retrieval: implementation and experiments on a Web museum collection, IEE Two-day
Seminar. Searching for Information: Artificial Intelligence and Information Retrieval
Approaches, Glasgow, UK, 11-12 Nov. 1999, pp. 20–21.
969. O. Munkelt, C. Ridder, D. Hansel, and W. Hafner, A model driven 3D image interpre-
tation system applied to person detection in video images, International Conference
on Pattern Recognition, 1998.
970. T. Murai, M. Miyakoshi, and M. Shimbo, Soundness and completeness theorems be-
tween the dempster-shafer theory and logic of belief, Fuzzy Systems, 1994. IEEE
World Congress on Computational Intelligence., Proceedings of the Third IEEE Con-
ference on, Jun 1994, pp. 855–858 vol.2.
971. Tetsuya Murai, Yasuo Kudo, and Yoshiharu Sato, Discovery science: 6th international
conference, ds 2003, sapporo, japan, october 17-19, 2003. proceedings, ch. Associ-
ation Rules and Dempster-Shafer Theory of Evidence, pp. 377–384, Springer Berlin
Heidelberg, Berlin, Heidelberg, 2003.
972. Toshiaki Murofushi and Michio Sugeno, Some quantities represented by the choquet
integral, Fuzzy Sets and Systems 56 (1993), no. 2, 229 – 235.
973. Catherine K. Murphy, Combining belief functions when evidence conflicts, Decision
Support Systems 29 (2000), 1–9.
974. , Combining belief functions when evidence conflicts, Decision Support Sys-
tems 29 (2000), no. 1, 1 – 9.
975. R. R. Murphy, Adaptive rule of combination for observations over time, Multisensor
Fusion and Integration for Intelligent Systems, 1996. IEEE/SICE/RSJ International
Conference on, Dec 1996, pp. 125–131.
976. Robin R. Murphy, Dempster-Shafer theory for sensor fusion in autonomous mobile
robots, IEEE Transactions on Robotics and Automation 14 (1998), 197–206.
977. M. Turk N. Jojic and T. Huang, Tracking self-occluding articulated objects in dense
disparity maps, Proceedings of the IEEE International Conference on Computer Vi-
sion ICCV’99, Corfu, Greece, September 1999.
598 References
1019. ZDZISLAW PAWLAK, Rough set theory and its applications to data analysis, Cy-
bernetics and Systems 29 (1998), no. 7, 661–688.
1020. J. Pearl, Readings in uncertain reasoning, Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA, 1990, pp. 540–574.
1021. Judea Pearl, On evidential reasoning in a hierarchy of hypotheses, Artificial Intelli-
gence 28:1 (1986), 9–15.
1022. Judea Pearl, On evidential reasoning in a hierarchy of hypotheses, Artif. Intell. 28
(1986), no. 1, 9–15.
1023. , Probabilistic reasoning in intelligent systems: Networks of plausible infer-
ence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988.
1024. Judea Pearl, Reasoning with belief functions: a critical assessment, Tech. report,
UCLA, Technical Report R-136, 1989.
1025. , Reasoning with belief functions: an analysis of compatibility, International
Journal of Approximate Reasoning 4 (1990), 363–389.
1026. , Reasoning with belief functions: An analysis of compatibility, International
Journal of Approximate Reasoning 4 (1990), 363–389.
1027. , Rejoinder to comments on ‘reasoning with belief functions: an analysis of
compatibility’, International Journal of Approximate Reasoning 6 (1992), 425–443.
1028. M. Pechwitz, S.S. Maddouri, V. Maergner, N. Ellouze, and H. Amiri, Ifn/enit -
database of handwritten arabic words, Colloque International Francophone sur l’Ecrit
et le Doucement (2002), 129–136.
1029. A. Pentland, Automatic extraction of deformable models, Int. J. Computer Vision 4
(1990), 107–126.
1030. A. Pentland and B. Horowitz, Recovery of non-rigid motion and structure, IEEE Trans.
Pattern Analysis and Machine Intelligence 13 (1991), 730–742.
1031. F.J. Perales and J. Torres, A system for human motion matching between synthetic and
real images based on biomechanic graphical models, IEEE Workshop on Motion of
Non-rigid and Articulated Objects, Austin, Texas, 1994.
1032. Andrs Perea, A model of minimal probabilistic belief revision, Theory and Decision
67 (2009), no. 2, 163–222.
1033. Joseph S. J. Peri, Dempster-shafer theory, bayesian theory, and measure theory, 2005,
pp. 378–389.
1034. C. Perneel, H. Van De Velde, and M. Acheroy, A heuristic search algorithm based on
belief functions, Proceedings of Fourteenth International Avignon Conference, vol. 1,
Paris, France, 30 May-3 June 1994, pp. 99–107.
1035. Laurent Perrussel, Luis Enrique Sucar, and Michael Scott Balch, Selected papers un-
certain reasoning at flairs 2010 mathematical foundations for a theory of confidence
structures, International Journal of Approximate Reasoning 53 (2012), no. 7, 1003 –
1019.
1036. C. Peterson, Local dempster shafer theory, Tech. report, CSC-AMTAS-98001,. C.S.C
Internal Report, 1998.
1037. Simon Petit-Renaud and Thierry Denoeux, Handling different forms of uncertainty
in regression analysis: a fuzzy belief structure approach, Proceedings of The Fifth
European Conference on Symbolic and Quantitative Approaches to Reasoning with
Uncertainty - Ecsqaru ( Lecture Notes in Computer Science Series), London, 5-9 July
1999.
1038. F. Pichon and T. Denoeux, T-norm and uninorm-based combination of belief functions,
Fuzzy Information Processing Society, 2008. NAFIPS 2008. Annual Meeting of the
North American, May 2008, pp. 1–6.
References 601
1039. F. Pichon and T. Denux, Interpretation and computation of alpha-junctions for com-
bining belief functions, Proceedings of the 6th International Symposium on Imprecise
Probability: Theories and Applications (ISIPTA’09), 2009.
1040. W. Pieczynski, Unsupervised Dempster-Shafer fusion of dependent sensors, Proceed-
ings of the 4th IEEE Southwest Symposium on Image Analysis and Interpretation,
Austin, TX, USA, 2-4 April 2000, pp. 247–251.
1041. W. Ping and Y. Genqing, Improvement method for the combining rule of dempster-
shafer evidence theory based on reliability, Journal of Systems Engineering and Elec-
tronics 16 (2005), no. 2, 471–474.
1042. C. S. Pinhanez and A. F. Bobick, Human action detection using pnf propagation of
temporal constraints, Proc. of the Conference on Computer Vision and Pattern Recog-
nition, 1998, pp. 898–904.
1043. Axel Pinz, Manfred Prantl, Harald Ganster, and Hermann Kopp-Borotschnig, Active
fusion - a new method applied to remote sensing image interpretation, Pattern Recog-
nition Letters 17 (1996), 1349–1359.
1044. L. Polkowski and A. Skowron, Rough mereology: A new paradigm for approximate
reasoning, International Journal of Approximate Reasoning 15 (1996), 333–365.
1045. , Rough mereology: A new paradigm for approximate reasoning, International
Journal of Approximate Reasoning 15 (1996), no. 4, 333 – 365, Rough Sets.
1046. R. Poppe and M. Poel, Comparison of silhouette shape descriptors for example-based
human pose recovery, 2006, pp. 541–546.
1047. R. W. Poppe and M. Poel, Example-based pose estimation in monocular images using
compact fourier descriptors, CTIT Technical Report series TR-CTIT-05-49, Univer-
sity of Twente, Enschede, 2005.
1048. R.W. Poppe, Evaluating example-based pose estimation: Experiments on the hu-
maneva sets, Online Proceedings of the Workshop on Evaluation of Articulated Hu-
man Motion and Pose Estimation (EHuM) at the International Conference on Com-
puter Vision and Pattern Recognition (CVPR) (Minnesota, Minneapolis), June 2007,
pp. 1–8.
1049. G. Priest, R. Routley, and J. Norman, Paraconsistent logic: Essays on the inconsistent,
Philosophia Verlag, 1989.
1050. G. Provan, An analysis of ATMS-based techniques for computing Dempster-Shafer
belief functions, Proceedings of the International Joint Conference on Artificial Intel-
ligence, 1989.
1051. Gregory Provan, An analysis of exact and approximation algorithms for Dempster-
Shafer theory, Tech. report, Department of Computer Science, University of British
Columbia, Tech. Report 90-15, 1990.
1052. , The validity of Dempster-Shafer belief functions, International Journal of Ap-
proximate Reasoning 6 (1992), 389–399.
1053. Gregory M. Provan, The application of Dempster-Shafer theory to a logic-based visual
recognition system, Uncertainty in Artificial Intelligence, 5 (L. N. Kanal M. Henrion,
R. D. Schachter and J. F. Lemmers, eds.), North Holland, Amsterdam, 1990, pp. 389–
405.
1054. , A logic-based analysis of Dempster-Shafer theory, International Journal of
Approximate Reasoning 4 (1990), 451–495.
1055. X.P. Wu Q. Ye and Y.X. Song, An evidence combination method of introducing weight
factors, Fire Control and Command Control 32.
1056. R. Qian and T. Huang, Motion analysis of articulated objects with applications to
human ambulatory patterns, DARPA’92, 1992, pp. 549–553.
602 References
1076. J. Rehg and T. Kanade, Digiteyes: Vision-based human hand tracking, Tech. report,
CS-TR-93-220, Carnegie Mellon University, School of Computer Science, 1993.
1077. , Visual tracking of high dof articulated structures: an application to human
hand tracking, Proc. of the Third European Conference on Computer Vision, Stock-
holm, Sweden (J. Eklundh, ed.), vol. 2, 1994, pp. 35–46.
1078. J. M. Rehg and T. Kanade, Model-based tracking of self-occluding articulated objects,
Proceedings of the International Conference on Computer Vision ICCV’95, Cam-
bridge, MA, 20-23 June 1995, pp. 618–623.
1079. James M. Rehg and Takeo Kanade, Digiteyes: Vision-based human hand tracking,
Tech. report, School of Computer Science, Carnegie Mellon University, CMU-CS-93-
220, December 1993.
1080. , Visual tracking of self-occluding articulated objects, Tech. report, School of
Computer Science, Carnegie Mellon University, CMU-CS-94-224, December 1994.
1081. G. Resconi, G. Klir, U. St. Clair, and D. Harmanec, On the integration of uncertainty
theories, Fuzziness and Knowledge-Based Systems 1 (1993), 1–18.
1082. G. Resconi, A.J. van der Wal, and D. Ruan, Speed-up of the monte carlo method
by using a physical model of the Dempster-Shafer theory, International Journal of
Intelligent Systems 13 (1998), 221–242.
1083. Germano Resconi, George J Klir, David Harmanec, and Ute St Clair, Interpretations
of various uncertainty theories using models of modal logic: a summary, Fuzzy Sets
and Systems 80 (1996), 7–14.
1084. Germano Resconi, George J. Klir, David Harmanec, and Ute St. Clair, Interpretations
of various uncertainty theories using models of modal logic: A summary, Fuzzy Sets
and Systems 80 (1996), no. 1, 7 – 14, Fuzzy Modeling.
1085. Sergio Rinaldi and Lorenzo Farina, I sistemi lineari positivi: teoria e applicazioni,
Cittá Studi Edizioni.
1086. B. Ristic and P. Smets, Belief function theory on the continuous space with an appli-
cation to model based classification, IPMU, 2004, pp. 1119–1126.
1087. B. Ristic and Ph. Smets, The TBM global distance measure for the association of
uncertain combat ID declarations, Information Fusion 7(3) (2006), 276–284.
1088. Christoph Roemer and Abraham Kandel, Applicability analysis of fuzzy inference by
means of generalized Dempster-Shafer theory, IEEE Transactions on Fuzzy Systems
3:4 (November 1995), 448–453.
1089. Christopher Roesmer, Nonstandard analysis and Dempster-shafer theory, Interna-
tional Journal of Intelligent Systems 15 (2000), 117–127.
1090. K. Rohr, Towards model-based recognition of human movements in image sequences,
CVGIP: Image Understanding 59 (1994), 94–115.
1091. , Human movement analysis based on explicit motion models, chapter 8, pages
171-198, Kluwer Academic Publisher, Dordrecht Boston, 1997.
1092. Christoph Romer and Abraham Kandel, Constraints on belief functions imposed by
fuzzy random variables, IEEE Transactions on Systems, Man, and Cybernetics, Part
B: Cybernetics 25 (1995), 86–99.
1093. R. Rosales and S. Sclaroff, Learning and synthesizing human body motion and pos-
ture, Fourth Int. Conf. on Automatic Face and Gesture Recognition, Grenoble, France,
March 2000.
1094. R. Rosales, M. Siddiqui, J. Alon, and S. Sclaroff, Estimating 3d body pose using un-
calibrated cameras, 2001, pp. I:821–827.
1095. Kimmo I Rosenthal, Quantales and their applications, Longman scientific and tech-
nical, Longman house, Burnt Mill, Harlow, Essex, UK, 1990.
604 References
1096. David Ross, Random sets without separability, Annals of Probability 14:3 (July 1986),
1064–1069.
1097. Dan Roth, On the hardness of approximate reasoning, Artificial Intelligence 82
(1996), no. 12, 273 – 302.
1098. E. H. Ruspini, J.D. Lowrance, and T. M. Strat, Understanding evidential reasoning,
International Journal of Approximate Reasoning 6 (1992), 401–424.
1099. E.H. Ruspini, Epistemic logics, probability and the calculus of evidence, Proc. 10th
Intl. Joint Conf. on AI (IJCAI-87), 1987, pp. 924–931.
1100. Enrique H. Ruspini, The logical foundations of evidential reasoning, Tech. report, SRI
International, Menlo Park, CA, Technical Note 408, 1986.
1101. Enrique H. Ruspini, Classic works of the dempster-shafer theory of belief func-
tions, ch. Epistemic Logics, Probability, and the Calculus of Evidence, pp. 435–448,
Springer Berlin Heidelberg, Berlin, Heidelberg, 2008.
1102. Matthew J. Ryan, Violations of belief persistence in dempstershafer equilibrium,
Games and Economic Behavior 39 (2002), no. 1, 167 – 174.
1103. I. Bloch S. Le Hégarat-Mascle and D. Vidal-Madjar, Introduction of neighborhood
information in evidence theory and application to data fusion of radar and optical
images with partial cloud cover, Pattern Recognition 31 (1998), 1811–1823.
1104. Alessandro Saffiotti, A belief-function logic, Universit Libre de Bruxelles, MIT Press,
pp. 642–647.
1105. , A hybrid framework for representing uncertain knowledge, Procs. of the 8th
AAAI Conf. Boston, MA, 1990, pp. 653–658.
1106. , A hybrid belief system for doubtful agents, Uncertatiny in Knowledge Bases,
Lecture Notes in Computer Science 251, Springer-Verlag, 1991, pp. 393–402.
1107. , Using Dempster-Shafer theory in knowledge representation, Uncertainty in
Artificial Intelligence 6 (P. Smets B. Dámbrosio and P. P. Bonissone, eds.), Morgan
Kaufann, San Mateo, CA, 1991, pp. 417–431.
1108. , A belief function logic, Proceedings of the 10th AAAI Conf. San Jose, CA,
1992, pp. 642–647.
1109. , Issues of knowledge representation in Dempster-Shafer’s theory, Advances
in the Dempster-Shafer theory of evidence (R.R. Yager, M. Fedrizzi, and J. Kacprzyk,
eds.), Wiley, 1994, pp. 415–440.
1110. , Using dempster-shafer theory in knowledge representation, CoRR
abs/1304.1123 (2013).
1111. Alessandro Saffiotti, S. Parsons, and E. Umkehrer, Comparing uncertainty manage-
ment techniques, Microcomputers in Civil Engineering 9 (1994), 367–380.
1112. Alessandro Saffiotti and E. Umkehrer, PULCINELLA: A general tool for propagation
uncertainty in valuation networks, Tech. report, IRIDIA, Libre Universite de Brux-
elles, 1991.
1113. Leonard J. Savage, The foundations of statistics, John Wiley Sons, Inc., 1954.
1114. K. Schneider, Dempster-Shafer analysis for species presence prediction of the winter
wren (Troglodytes troglodytes), Proceedings of the 1st International Conference on
GeoComputation (R.J. Abrahart, ed.), vol. 2, Leeds, UK, 17-19 Sept. 1996, p. 738.
1115. J. Schubert, Cluster-based specification techniques in Dempster-Shafer theory, Pro-
ceedings of ECSQARU’95 (C. Froidevaux and J. Kohlas, eds.), 1995.
1116. , Managing inconsistent intelligence, Information Fusion, 2000. FUSION
2000. Proceedings of the Third International Conference on, vol. 1, July 2000,
pp. TUB4/10–TUB4/16 vol.1.
1117. Johan Schubert, On nonspecific evidence, International Journal of Intelligent Systems
8:6 (1993), 711–725.
References 605
1136. H. Segawa and T. Totsuka, Torque-based recursive filtering approach to the recovery
of 3D articulated motion from image sequences, ICCV’99, Corfu, Greece, September
1999.
1137. T. Seidenfeld, Statistical evidence and belief functions, Proc. of the Biennial Meeting
of the Philosophy of Science Association, 1978, pp. 478–489.
1138. , Some static and dynamic aspects of rubust Bayesian theory, Random Sets:
Theory and Applications (Goutsias, Malher, and Nguyen, eds.), Springer, 1997,
pp. 385–406.
1139. T. Seidenfeld, M. Schervish, and J. Kadane, Coherent choice functions under uncer-
tainty, Proceedings of ISIPTA’07, 2007.
1140. T. Seidenfeld and L. Wasserman, Dilation for convex sets of probabilities, Annals of
Statistics 21 (1993), 1139–1154.
1141. Teddy Seidenfeld, Statistical evidence and belief functions, PSA: Proceedings of the
Biennial Meeting of the Philosophy of Science Association 1978 (1978), 478–489.
1142. K. Sentz and S. Ferson, Combination of evidence in Dempster-Shafer theory, Tech.
report, SANDIA Tech. Report, SAND2002-0835, April 2002.
1143. Pavel Sevastianov, Numerical methods for interval and fuzzy number comparison
based on the probabilistic approach and dempstershafer theory, Information Sciences
177 (2007), no. 21, 4645 – 4661.
1144. G. Shafer, A mathematical theory of evidence, Princeton University Press, 1976.
1145. , Jeffrey’s rule of conditioning, Philosophy of Sciences 48 (1981), 337–362.
1146. , Belief functions and parametric models, Journal of the Royal Statistical So-
ciety, Series B 44 (1982), 322–352.
1147. G. Shafer and P. P. Shenoy, Local computation on hypertrees, Working paper No. 201,
School of Business, University of Kansas (1988).
1148. Glenn Shafer, Foundations of probability theory, statistical inference, and statistical
theories of science: Proceedings of an international research colloquium held at the
university of western ontario, london, canada, 10–13 may 1973 volume ii foundations
and philosophy of statistical inference, ch. A Theory of Statistical Evidence, pp. 365–
436, Springer Netherlands, Dordrecht, 1976.
1149. Glenn Shafer, A mathematical theory of evidence, Princeton University Press, 1976.
1150. , A theory of statistical evidence, Foundations of Probability Theory, Statistical
Inference, and Statistical Theories of Science (W. L. Harper and C. A. Hooker, eds.),
vol. 2, Reidel, Dordrecht, 1976, with discussion, pp. 365–436.
1151. , Nonadditive probabilites in the work of Bernoulli and Lambert, Arch. History
Exact Sci. 19 (1978), 309–370.
1152. , Allocations of probability, Annals of Probability 7:5 (1979), 827–839.
1153. , Constructive probability, Synthese 48 (1981), 309–370.
1154. , Two theories of probability, Philosophy of Science Association Proceedings
1978 (P. Asquith and I. Hacking, eds.), vol. 2, Philosophy of Science Association, East
Lansing (MI), 1981.
1155. , Belief functions and parametric models, Journal of the Royal Statistical So-
ciety B.44 (1982), 322–352.
1156. , The combination of evidence, Tech. report, School of Business, University of
Kansas, Lawrence, KS, Working Paper 162, 1984.
1157. , Conditional probability, International Statistical Review 53 (1985), 261–277.
1158. , Nonadditive probability, Encyclopedia of Statistical Sciences (Kotz and
Johnson, eds.), Wiley, 1985, pp. 6, 271–276.
1159. , The combination of evidence, International Journal of Intelligent Systems 1
(1986), 155–179.
References 607
1160. Glenn Shafer, The combination of evidence, International Journal of Intelligent Sys-
tems 1 (1986), no. 3, 155–179.
1161. Glenn Shafer, Belief functions and possibility measures, Analysis of Fuzzy Informa-
tion 1: Mathematics and logic (Bezdek, ed.), CRC Press, 1987, pp. 51–84.
1162. , Probability judgment in artificial intelligence and expert systems, Statistical
Science 2 (1987), 3–44.
1163. , Perspectives on the theory and practice of belief functions, International Jour-
nal of Approximate Reasoning 4 (1990), 323–362.
1164. , Perspectives on the theory and practice of belief functions, International Jour-
nal of Approximate Reasoning 4 (1990), 323–36.
1165. , Perspectives on the theory and practice of belief functions, International Jour-
nal of Approximate Reasoning 4 (1990), no. 5, 323 – 362.
1166. , A note on Dempster’s Gaussian belief functions, Tech. report, School of Busi-
ness, University of Kansas, Lawrence, KS, 1992.
1167. , Rejoinders to comments on ‘perspectives on the theory and practice of belief
functions’, International Journal of Approximate Reasoning 6 (1992), 445–480.
1168. , Comments on ”constructing a logic of plausible inference: a guide to cox’s
theorem”, by kevin s. van horn, Int. J. Approx. Reasoning 35 (2004), no. 1, 97–105.
1169. , Probability judgement in artificial intelligence, CoRR abs/1304.3429 (2013).
1170. , Bayes’s two arguments for the rule of conditioning, Annals of Statistics 10:4
(December 1982), 1075–1089.
1171. Glenn Shafer and R. Logan, Implementing Dempster’s rule for hierarchical evidence,
Artificial Intelligence 33 (1987), 271–298.
1172. Glenn Shafer and Prakash P. Shenoy, Propagating belief functions using local compu-
tations, IEEE Expert 1 (1986), (3), 43–52.
1173. Glenn Shafer, Prakash P. Shenoy, and K. Mellouli, Propagating belief functions in
qualitative Markov trees, International Journal of Approximate Reasoning 1 (1987),
(4), 349–400.
1174. Glenn Shafer and R. Srivastava, The Bayesian and belief-function formalism: A gen-
eral perspective for auditing, Auditing: A Journal of Practice and Theory (1989).
1175. Glenn Shafer and Amos Tversky, Classic works of the dempster-shafer theory of be-
lief functions, ch. Languages and Designs for Probability Judgment, pp. 345–374,
Springer Berlin Heidelberg, Berlin, Heidelberg, 2008.
1176. Glenn Shafer and Vladimir Vovk, Probability and finance: It’s only a game!, Wiley,
New York, 2001.
1177. Gregory Shakhnarovich, Paul Viola, and Trevor Darrell, Fast pose estimation with
parameter-sensitive hashing, ICCV ’03: Proceedings of the Ninth IEEE International
Conference on Computer Vision (Washington, DC, USA), IEEE Computer Society,
2003, p. 750.
1178. L. Shapley, A value for n-person games, Contributions to the Theory of Games,.
1179. L.S. Shapley, Cores of convex games, Int. J. Game Theory 1 (1971), 11–26.
1180. Lokendra Shastri, Evidential reasoning in semantic networks: A formal theory and its
parallel implementation (inheritance, categorization, connectionism, knowledge rep-
resentation), Ph.D. thesis, 1985, AAI8528562.
1181. Lokendra Shastri and Jerome A. Feldman, Evidential reasoning in semantic networks:
A formal theory, Proceedings of the 9th International Joint Conference on Artificial
Intelligence - Volume 1 (San Francisco, CA, USA), IJCAI’85, Morgan Kaufmann
Publishers Inc., 1985, pp. 465–474.
608 References
1182. P.P. Shenoy, No double counting semantics for conditional independence, Tech. report,
Working Paper No. 307. School of Business, University of Kansas, Lawrence, KS,
2005.
1183. Prakash P. Shenoy, On spohn’s rule for revision of beliefs, International Journal of
Approximate Reasoning 5 (1991), no. 2, 149 – 181.
1184. Prakash P. Shenoy, Uncertainty in knowledge bases: 3rd international conference on
information processing and management of uncertainty in knowledge-based systems,
ipmu ’90 paris, france, july 2–6, 1990 proceedings, ch. On Spohn’s theory of epistemic
beliefs, pp. 1–13, Springer Berlin Heidelberg, Berlin, Heidelberg, 1991.
1185. Prakash P. Shenoy, Using Dempster-Shafer’s belief function theory in expert systems,
Advances in the Dempster-Shafer Theory of Evidence (M. Fedrizzi R. R. Yager and
J. Kacprzyk, eds.), Wiley, New York, 1994, pp. 395–414.
1186. Prakash P. Shenoy and K. Mellouli, Propagation of belief functions: a distributed ap-
proach, Uncertainty in Artificial Intelligence 2 (Lemmer and Kanal, eds.), North Hol-
land, 1988, pp. 325–336.
1187. Prakash P. Shenoy and Glenn Shafer, An axiomatic framework for Bayesian and belief
function propagation, Proceedings of the AAAI Workshop of Uncertainty in Artificial
Intelligence, 1988, pp. 307–314.
1188. , Axioms for probability and belief functions propagation, Uncertainty in Arti-
ficial Intelligence, 4 (L. N. Kanal R. D. Shachter, T. S. Lewitt and J. F. Lemmer, eds.),
North Holland, Amsterdam, 1990, pp. 159–198.
1189. Prakash P. Shenoy, Glenn Shafer, and K. Mellouli, Propagation of belief functions: a
distributed approach, Proceedings of the AAAI Workshop of Uncertainty in Artificial
Intelligence, 1986, pp. 149–160.
1190. F. K. J. Sheridan, A survey of techniques for inference under uncertainty, Artificial
Intelligence Review 5 (1991), 89–119.
1191. C. Shi, Y. Cheng, Q. Pan, and Y. Lu, A new method to determine evidence distance,
Proceedings of the 2010 International Conference on Computational Intelligence and
Software Engineering (CiSE), 2010, pp. 1–4.
1192. Margaret F. Shipley, Charlene A. Dykman, and Andre’ de Korvin, Project manage-
ment: using fuzzy logic and the Dempster-Shafer theory of evidence to select team
members for the project duration, Proceedings of IEEE, 1999, pp. 640–644.
1193. Margaret F. Shipley and Andr De Korvin, Rough set theory fuzzy belief functions
related to statistical confidence:application & evaluation for golf course closing,
Stochastic Analysis and Applications 13 (1995), no. 4, 487–502.
1194. H. Sidenbladh and M.J. Black, Learning the statistics of people in images and video,
IJCV 54 (2003), 189–209.
1195. H. Sidenbladh, M.J. Black, and D.J. Fleet, Stochastic tracking of 3D human figures
using 2d image motion, ECCV’00, 2000.
1196. H. Sidenbladh, F. de la Torre, and M.J. Black, A framework for modeling the ap-
pearance of 3D articulated figures, Int. Conference on Automatic Face and Gesture
Recognition, 2000.
1197. Roman Sikorski, Boolean algebras, Springer Verlag, 1964.
1198. M.-A Simard, J. Couture, and E. Bosse, Data fusion of multiple sensors attribute infor-
mation for target identity estimation using a Dempster-Shafer evidential combination
algorithm, Proceedings of the SPIE - Signal and Data Processing of Small Targets
(K. Anderson, P.G.; Warwick, ed.), vol. 2759, Orlando, FL, USA, 9-11 April 1996,
pp. 577–588.
References 609
1218. , The combination of evidence in the transferable belief model, IEEE Tr. PAMI
12 (1990), 447–458.
1219. , Varieties of ignorance, Information Sciences 57-58 (1991), 135–144.
1220. , Belief functions: the disjunctive rule of combination and the generalized
Bayesian theorem, International Journal of Approximate reasoning 9 (1993), 1–35.
1221. , The axiomatic justification of the transferable belief model, Tech. report, Uni-
versite’ Libre de Bruxelles, Technical Report TR/IRIDIA/1995-8.1, 1995.
1222. , Belief functions on real numbers, International Journal of Approximate Rea-
soning 40 (2005), no. 3, 181–223.
1223. Ph. smets, Decision making in the tbm: the necessity of the pignistic transformation,
International Journal of Approximate Reasoning 38(2) (February 2005), 133–147.
1224. Ph. Smets, The application of the matrix calculus to belief functions, International
Journal of Approximate Reasoning 31(1-2) (October 2002), 1–30.
1225. Philippe Smets, Belief functions : the disjunctive rule of combination and the general-
ized Bayesian theorem, International Journal of Approximate Reasoning 9.
1226. , Theory of evidence and medical diagnostic, Medical Informatics Europe 78
(1978), 285–291.
1227. , Information content of an evidence, International Journal of Man Machine
Studies 19 (1983), 33–43.
1228. , Data fusion in the transferable belief model, Proceedings of the 1984 Amer-
ican Control Conference, 1984, pp. 554–555.
1229. , Bayes’ theorem generalized for belief functions, Proceedings of ECAI-86,
vol. 2, 1986, pp. 169–171.
1230. , Belief functions, Non-Standard Logics for Automated Reasoning (Ph. Smets,
A. Mamdani, D. Dubois, and H. Prade, eds.), Academic Press, London, 1988, pp. 253–
286.
1231. , Belief functions versus probability functions, Uncertainty and Intelligent Sys-
tems (Saitta L. Bouchon B. and Yager R., eds.), Springer Verlag, Berlin, 1988, pp. 17–
24.
1232. , Constructing the pignistic probability function in a context of uncertainty,
Uncertainty in Artificial Intelligence, 5 (M. Henrion, R.D. Shachter, L.N. Kanal, and
J.F. Lemmer, eds.), Elsevier Science Publishers, 1990, pp. 29–39.
1233. , The transferable belief model and possibility theory, Proceedings of
NAFIPS-90 (Kodratoff Y., ed.), 1990, pp. 215–218.
1234. , About updating, Proceedings of the 7th conference on Uncertainty in Arti-
ficial Intelligence (B. Dámbrosio, Ph. Smets, and Bonissone P. P. and, eds.), 1991,
pp. 378–385.
1235. , Patterns of reasoning with belief functions, Journal of Applied Non-Classical
Logic 1:2 (1991), 166–170.
1236. , Probability of provability and belief functions, Logique et Analyse 133-134
(1991), 177–195.
1237. , The transferable belief model and other interpretations of Dempster-Shafer’s
model, Uncertainty in Artificial Intelligence, volume 6 (P.P. Bonissone, M. Henrion,
L.N. Kanal, and J.F. Lemmer, eds.), North-Holland, Amsterdam, 1991, pp. 375–383.
1238. , The nature of the unnormalized beliefs encountered in the transferable belief
model, Proceedings of the 8th Annual Conference on Uncertainty in Artificial Intelli-
gence (UAI-92) (San Mateo, CA), Morgan Kaufmann, 1992, pp. 292–29.
References 611
1239. , The nature of the unnormalized beliefs encountered in the transferable be-
lief model, Proceedings of the 8th Conference on Uncertainty in Artificial Intelli-
gence (AI92) (DÁmbrosio B. Dubois D., Wellmann M.P. and Smets Ph., eds.), 1992,
pp. 292–297.
1240. , Resolving misunderstandings about belief functions’, International Journal
of Approximate Reasoning 6 (1992), 321–34.
1241. , The transferable belief model and random sets, International Journal of In-
telligent Systems 7 (1992), 37–46.
1242. , The transferable belief model for expert judgments and reliability problems,
Reliability Engineering and System Safety 38 (1992), 59–66.
1243. , Jeffrey’s rule of conditioning generalized to belief functions, Proceedings of
the 9th Conference on Uncertainty in Artificial Intelligence (UAI93) (Mamdani A.
Heckerman D., ed.), 1993, pp. 500–505.
1244. , Quantifying beliefs by belief functions : An axiomatic justification, Proceed-
ings of the 13th International Joint Conference on Artificial Intelligence, IJCAI93,
1993, pp. 598–603.
1245. , Belief induced by the knowledge of some probabilities, Proceedings of the
10th Conference on Uncertainty in Artificial Intelligence (AI94) (Lopez de Man-
taras R. Heckerman D., Poole D., ed.), 1994, pp. 523–530.
1246. , What is Dempster-Shafer’s model ?, Advances in the Dempster-Shafer The-
ory of Evidence (Fedrizzi M. Yager R.R. and Kacprzyk J., eds.), Wiley, 1994, pp. 5–34.
1247. , Non standard probabilistic and non probabilistic representations of uncer-
tainty, Advances in Fuzzy Sets Theory and Technology, 3 (Wang P.P., ed.), Duke Uni-
versity, Durham, NC, 1995, pp. 125–154.
1248. , Probability, possibility, belief : which for what ?, Foundations and Applica-
tions of Possibility Theory (Kerre E.E. De Cooman G., Ruan D., ed.), World Scientific,
Singapore, 1995, pp. 20–40.
1249. Philippe Smets, The α-junctions: Combination operators applicable to belief func-
tions, pp. 131–153, Springer Berlin Heidelberg, Berlin, Heidelberg, 1997.
1250. Philippe Smets, The normative representation of quantified beliefs by belief functions,
Artificial Intelligence 92 (1997), 229–242.
1251. , The transferable belief model for uncertainty representation, 1997.
1252. , The application of the transferable belief model to diagnostic problems, Int.
J. Intelligent Systems 13 (1998), 127–158.
1253. , Numerical representation of uncertainty, Handbook of Defeasible Reasoning
and Uncertainty Management Systems, Vol. 3: Belief Change (Gabbay D., Smets Ph.
(Series Eds). Dubois D., and Prade H. (Vol. Eds.), eds.), Kluwer, Doordrecht, 1998,
pp. 265–309.
1254. , Probability, possibility, belief: Which and where ?, Handbook of Defeasible
Reasoning and Uncertainty Management Systems, Vol. 1: Quantified Representation
of Uncertainty and Imprecision (Gabbay D. and Smets Ph., eds.), Kluwer, Doordrecht,
1998, pp. 1–24.
1255. , The transferable belief model for quantified belief representation, Handbook
of Defeasible Reasoning and Uncertainty Management Systems, Vol. 1: Quantified
Representation of Uncertainty and Imprecision (Gabbay D. and Smets Ph., eds.),
Kluwer, Doordrecht, 1998, pp. 267–301.
1256. , Practical uses of belief functions, Uncertainty in Artificial Intelligence 15
(Laskey K. B. and Prade H., eds.), 1999, pp. 612–621.
612 References
1257. Philippe Smets, Practical uses of belief functions, Proceedings of the Fifteenth Con-
ference on Uncertainty in Artificial Intelligence (San Francisco, CA, USA), UAI’99,
Morgan Kaufmann Publishers Inc., 1999, pp. 612–621.
1258. Philippe Smets, Quantified epistemic possibility theory seen as an hyper cautious
transferable belief model, 2000.
1259. , Decision making in a context where uncertainty is represented by belief func-
tions, Belief Functions in Business Decisions (Srivastava R., ed.), Physica-Verlag,
2001, pp. 495–504.
1260. , Analyzing the combination of conflicting belief functions, Information Fusion
8 (2007), no. 4, 387 – 412.
1261. , The a-junctions: the commutative combination operators applicable to belief
functions, Proceedings of the International Joint Conference on Qualitative and Quan-
titative Practical Reasoning (ECSQARU / FAPR ’97) (Nonnengart A. Gabbay D.,
Kruse R. and Ohlbach H. J., eds.), Bad Honnef, Germany, 9-12 June 1997, pp. 131–
153.
1262. , Probability of deductibility and belief functions, Proceedings of the European
Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty
(ECSQARU’93) (M. Clark, R. Kruse, and S. Moral, eds.), Granada, Spain, 8-10 Nov.
1993, pp. 332–340.
1263. , Upper and lower probability functions versus belief functions, Proceed-
ings of the International Symposium on Fuzzy Systems and Knowledge Engineering,
Guangzhou, China, 1987, pp. 17–21.
1264. , Applying the transferable belief model to diagnostic problems, Proceedings
of 2nd International Workshop on Intelligent Systems and Soft Computing for Nuclear
Science and Industry (D. Ruan, P. D’hondt, P. Govaerts, and E.E. Kerre, eds.), Mol,
Belgium, 25-27 September 1996, pp. 285–292.
1265. , The canonical decomposition of a weighted belief, Proceedings of the Inter-
national Joint Conference on AI, IJCAI95, Montréal, Canada, 1995, pp. 1896–1901.
1266. , The concept of distinct evidence, Proceedings of the 4th Conference on In-
formation Processing and Management of Uncertainty in Knowledge-Based Systems
(IPMU 92), Palma de Mallorca, 6-10 July 92, pp. 789–794.
1267. , Data fusion in the transferable belief model, Proc. 3rd Intern. Conf. Infora-
tion Fusion, Paris, France 2000, pp. 21–33.
1268. , Transferable belief model versus Bayesian model, Proceedings of ECAI 1988
(Kodratoff Y., ed.), Pitman, London, 1988, pp. 495–500.
1269. , No Dutch Book can be built against the TBM even though update is not ob-
tained by Bayes rule of conditioning, SIS, Workshop on Probabilistic Expert Systems
(R. Scozzafava, ed.), Roma, Italy, 1993, pp. 181–204.
1270. , Belief functions and generalized Bayes theorem, Proceedings of the Second
IFSA Congress, Tokyo, Japan, 1987, pp. 404–407.
1271. Philippe Smets and Roger Cooke, How to derive belief functions within probabilistic
frameworks?, Proceedings of the International Joint Conference on Qualitative and
Quantitative Practical Reasoning (ECSQARU / FAPR ’97), Bad Honnef, Germany,
9-12 June 1997.
1272. Philippe Smets and Y. T. Hsia, Default reasoning and the transferable belief model,
Uncertainty in Artificial Intelligence 6 (P.P. Bonissone, M. Henrion, L.N. Kanal, and
J.F. Lemmer, eds.), Wiley, 1991, pp. 495–504.
1273. Philippe Smets, Y. T. Hsia, Alessandro Saffiotti, R. Kennes, H. Xu, and E. Emkehrer,
The transferable belief model, Symbolic and Quantitative Approaches to Uncertainty
References 613
(Kruse R. and Siegel P., eds.), Springer Verlag, Lecture Notes in Computer Science
No. 458, Berlin, 1991, pp. 91–96.
1274. Philippe Smets and Yen-Teh Hsia, Defeasible reasoning with belief functions, Tech.
report, Universite’ Libre de Bruxelles, Technical Report TR/IRIDIA/90-9, 1990.
1275. Philippe Smets and Yen-Teh Hsia, Default reasoning and the transferable belief
model, Proceedings of the Sixth Annual Conference on Uncertainty in Artificial In-
telligence (New York, NY, USA), UAI ’90, Elsevier Science Inc., 1991, pp. 495–504.
1276. Philippe Smets and Robert Kennes, The transferable belief model, Artificial Intelli-
gence 66 (1994), 191–234.
1277. Philippe Smets and R. Kruse, The transferable belief model for belief representation,
Uncertainty Management in information systems: from needs to solutions (Motro A.
and Smets Ph., eds.), Kluwer, Boston, 1997, pp. 343–368.
1278. C. Sminchisescu and B. Triggs, Covariance scaled sampling for monocular 3D body
tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recog-
nition CVPR’01, Hawaii, December 2001.
1279. Cedric A. B. Smith, Consistency in statistical inference and decision, Journal of the
Royal Statistical Society, Series B 23 (1961), 1–37.
1280. , Personal probability and statistical analysis, Journal of the Royal Statistical
Society, Series A 128 (1965), 469–489.
1281. P. Smith and O.R. Jones, The philosophy of mind: An introduction, Cambridge Uni-
versity Press, 1986.
1282. M. J. Smithson, Ignorance and uncertainty: Emerging paradigms, Springer, New York
(NY), 1989.
1283. Paul Snow, The vulnerability of the transferable belief model to dutch books, Artificial
Intelligence 105 (1998), 345–354.
1284. Leen-Kit Soh, Costas Tsatsoulis, Todd Bowers, and Andrew Williams, Representing
sea ice knowledge in a Dempster-Shafer belief system, Proceedings of IEEE, 1998,
pp. 2234–2236.
1285. Y. Song, L. Goncaves, E. Di Bernardo, and P. Perona, Monocular perception of biolog-
ical motion - detection and labelling, Int. Conf. on Computer Vision, 1999, pp. 805–
812.
1286. Andrea Sorrentino, Fabio Cuzzolin, and Ruggero Frezza, Using hidden Markov mod-
els and dynamic size functions for gesture recognition, Proceedings of the 8th British
Machine Vision Conference (BMVC97) (Adrian F. Clark, ed.), vol. 2, September
1997, pp. 560–570.
1287. Z. A. Sosnowski and J. S. Walijewski, Generating fuzzy decision rules with the use of
Dempster-Shafer theory, Proceedings of the 13th European Simulation Multiconfer-
ence 1999 (H. Szczerbicka, ed.), vol. 2, Warsaw, Poland, 1-4 June 1999, pp. 419–426.
1288. M. Spies, Conditional events, conditioning, and random sets, IEEE Transactions on
Systems, Man, and Cybernetics 24 (1994), 1755–1763.
1289. R. Spillman, Managing uncertainty with belief functions, AI Expert 5:5 (May 1990),
44–49.
1290. Wolfgang Spohn, Causation in decision, belief change, and statistics: Proceedings
of the irvine conference on probability and causation, ch. Ordinal Conditional Func-
tions: A Dynamic Theory of Epistemic States, pp. 105–134, Springer Netherlands,
Dordrecht, 1988.
1291. , A general non-probabilistic theory of inductive reasoning, Proceedings of
the Fourth Annual Conference on Uncertainty in Artificial Intelligence (Amsterdam,
The Netherlands, The Netherlands), UAI ’88, North-Holland Publishing Co., 1990,
pp. 149–158.
614 References
1292. R. P. Srivastava and Glenn Shafer, Integrating statistical and nonstatistical audit ev-
idence using belief functions: a case of variable sampling, International Journal of
Intelligent Systems 9:6 (June 1994), 519–539.
1293. Rajendra P. Srivastava, Alternative form of dempster’s rule for binary variables, Inter-
national Journal of Intelligent Systems 20 (2005), no. 8, 789–797.
1294. T. Starner and A. Pentland, Real-time american sign language recognition from video
using hmm, Proc. of ISCV 95, vol. 29, 1997, pp. 213–244.
1295. R. Stein, The Dempster-Shafer theory of evidential reasoning, AI Expert 8:8 (August
1993), 26–31.
1296. R.S. Stephens, Real-time 3D object tracking, Image and Vision Computing 8 (1990),
91–96.
1297. Manfred Stern, Semimodular lattices, Cambridge University Press, 1999.
1298. P. R. Stokke, T. A. Boyce, John D. Lowrance, J. William, and K. Ralston, Evidential
reasoning and project early warning systems, Research and Technology Management
(1994).
1299. , Industrial project monitoring with evidential reasoning, Nordic Advanced
Information Technology Magazine 8 (1994), 18–27.
1300. E. Straszecka, On an application of Dempster-Shafer theory to medical diagnosis sup-
port, Proceedings of the 6th European Congress on Intelligent Techniques and Soft
Computing (EUFIT’98), vol. 3, Aachen, Germany: Verlag Mainz, 1998, pp. 1848–
1852.
1301. Thomas M. Strat, The generation of explanations within evidential reasoning systems,
Proceedings of the Tenth Joint Conference on Artificial Intelligence (Institute of Elec-
trical and Electronical Engineers, eds.), 1987, pp. 1097–1104.
1302. , Making decisions with belief functions, Proceedings of the 5th Workshop on
Uncertainty in AI, 1989, pp. 351–360.
1303. , Decision analysis using belief functions, International Journal of Approxi-
mate Reasoning 4 (1990), 391–417.
1304. , Making decisions with belief functions, Uncertainty in Artificial Intelligence,
5 (L. N. Kanal M. Henrion, R. D. Schachter and J. F. Lemmers, eds.), North Holland,
Amsterdam, 1990.
1305. , Decision analysis using belief functions, Advances in the Dempster-Shafer
Theory of Evidence, Wiley, New York, 1994.
1306. , Continuous belief functions for evidential reasoning, Proceedings of the Na-
tional Conference on Artificial Intelligence (Institute of Electrical and Electronical
Engineers, eds.), August 1984, pp. 308–313.
1307. Thomas M. Strat and John D. Lowrance, Explaining evidential analyses, International
Journal of Approximate Reasoning 3 (1989), no. 4, 299 – 353.
1308. , Explaining evidential analysis, International Journal of Approximate Rea-
soning 3 (1989), 299–353.
1309. R. L. Streit, The moments of matched and mismatched hidden Markov models, IEEE
Trans. on Acoustics, Speech, and Signal Processing Vol. 38(4) (April 1990), 610–622.
1310. Xiaoyan Su, Sankaran Mahadevan, Wenhua Han, and Yong Deng, Combining depen-
dent bodies of evidence, Applied Intelligence 44 (2016), no. 3, 634–644.
1311. J. J. Sudano, Pignistic probability transforms for mixes of low- and high- probability
events, Proceedings of the International Conference on Information Fusion, 2001.
1312. , Inverse pignistic probability transforms, Proceedings of the International
Conference on Information Fusion, 2002.
References 615
1313. J.J. Sudano, Pignistic probability transforms for mixes of low- and high-probability
events, Proceedings of the Fourth International Conference on Information Fusion
(ISIF’01), Montreal, Canada, 2001, pp. 23–27.
1314. , Equivalence between belief theories and nave bayesian fusion for systems
with independent evidential data, Proceedings of the Sixth International Conference
on Information Fusion (ISIF’03), 2003.
1315. Thomas Sudkamp, The consistency of Dempster-Shafer updating, International Jour-
nal of Approximate Reasoning 7 (1992), 19–44.
1316. M. Sugeno, Fuzzy automata and dicision processes, ch. Fuzzy measures and fuzzy
integrals: A survey, p. 89102, North-Holland, Amsterdam, 1977.
1317. Michio Sugeno, Yasuo Narukawa, and Toshiaki Murofushi, Choquet integral and fuzzy
measures on locally compact space, Fuzzy Sets and Systems 99 (1998), no. 2, 205 –
211.
1318. H. Sun and M. Farooq, Conjunctive and disjunctive combination rules of evidence,
Signal Processing, Sensor Fusion, and Target Recognition XIII. Edited by Kadar, Ivan.
Proceedings of the SPIE, Volume 5429, pp. 392-401 (2004). (I. Kadar, ed.), August
2004, pp. 392–401.
1319. P. Suppes and M. Zanotti, On using random relations to generate upper and lower
probabilities, Synthese 36 (1977), 427–440.
1320. Q. Pang S.Y. Zhang and H.C. Zhang, A new kind of combination rule of evidence
theory, Control and Decision 15 (2000), 540–544.
1321. Gabor Szasz, Introduction to lattice theory, Academic Press, New York and London,
1963.
1322. R. Szeliski and S.B. Kang, Recovering 3D shape and motion from image streams using
nonlinear least squares, J. Vis. Comm. Im. Repr. 5 (1994), 10–28.
1323. P. Dutta T. Ali and H. Boruah, A new combination rule for conflict problem of
dempster-shafer evidence theory, International Journal of Energy, Information and
Communications 3.
1324. M. Harville T. Darrell, G. Gordon and J. Woodfill, Integrated person tracking using
stereo, color, and pattern detection, CVPR’98, 1998, pp. 601–608.
1325. Hideo Tanaka and Hisao Ishibuchi, Evidence theory of exponential possibility distri-
butions, International Journal of Approximate Reasoning 8 (1993), no. 2, 123 – 140.
1326. Hideo Tanaka, Kazutomi Sugihara, and Yutaka Maeda, Non-additive measures by in-
terval probability functions, Inf. Sci. Inf. Comput. Sci. 164 (2004), no. 1-4, 209–227.
1327. Y. Tang and J. Zheng, Dempster conditioning and conditional independence in evi-
dence theory, AI 2005: Advance in Artificial Intelligence, vol. 3809/2005, Springer
Berlin/Heidelberg, 2005, pp. 822–825.
1328. H. Tao, H.S. Sawhney, and R. Kumar, Dynamic layer representation with applications
to tracking, CVPR’00, vol. 2, 2000, pp. 134–141.
1329. A. Tchamova and J. Dezert, On the behavior of dempster’s rule of combination and
the foundations of dempster-shafer theory, 2012 6th IEEE International Conference
Intelligent Systems, Sept 2012, pp. 108–113.
1330. O. Teichmuller, p-algebren, Deutsche Math. 1 (1936), 362–388.
1331. P. Teller, Conditionalization and observation, Synthese 26 (1973), no. 2, 218–258.
1332. D. Terzopoulos and D. Metaxas, Dynamic 3D models with local and global deforma-
tions: Deformable superquadrics, IEEE Trans. Pattern Analysis and Machine Intelli-
gence 13 (1991), 703–714.
1333. B. Tessem, Interval probability propagation, IJAR 7 (1992), 95–120.
1334. , Approximations for efficient computation in the theory of evidence, Artif.
Intell 61 (1993), no. 2, 315–329.
616 References
1335. Bjornar Tessem, Approximations for efficient computation in the theory of evidence,
Artificial Intelligence 61:2 (1993), 315–329.
1336. H. M. Thoma, Belief function computations, Conditional Logic in Expert Systems,
North Holland, 1991, pp. 269–308.
1337. Stelios C. Thomopoulos, Theories in distributed decision fusion: comparison and gen-
eralization, 1991, pp. 623–634.
1338. Sebastian Thrun, Wolfgang Burgard, and Dieter Fox, A probabilistic approach to con-
current mapping and localization for mobile robots, Autonomous Robots 5 (1998),
253–271.
1339. Tai-Peng Tian, Rui Li, and Stan Sclaroff, Articulated pose estimation in a learned
smooth space of feasible solutions, CVPR ’05: Proceedings of the 2005 IEEE Com-
puter Society Conference on Computer Vision and Pattern Recognition (CVPR’05) -
Workshops (Washington, DC, USA), IEEE Computer Society, 2005, p. 50.
1340. M. Tonko, K. Schafer, F. Heimes, and H.-H. Nagel, Towards visual servoed manip-
ulation of car engine parts, Proceedings of the IEEE International Conference on
Robotics and Automation ICRA’97, Albuquerque, NM, vol. 4, April 1997, pp. 3166–
3171.
1341. Bruce E. Tonn, An algorithmic approach to combining belief functions, International
Journal of Intelligent Systems 11 (1996), no. 7, 463–476.
1342. Vicen Torra, A new combination function in evidence theory, International Journal of
Intelligent Systems 10 (1995), no. 12, 1021–1033.
1343. M. Troffaes, Decision making under uncertainty using imprecise probabilities, Inter-
national Journal of Approximate Reasoning 45 (2007), no. 1, 17–29.
1344. Elena Tsiporkova, Bernard De Baets, and Veselka Boeva, Dempster’s rule of condi-
tioning traslated into modal logic, Fuzzy Sets and Systems 102 (1999), 317–383.
1345. , Evidence theory in multivalued models of modal logic, Journal of Applica-
tions of Nonclassical Logic (1999).
1346. Elena Tsiporkova, Veselka Boeva, and Bernard De Baets, Dempster-Shafer theory
framed in modal logic, International Journal of Approximate Reasoning 21 (1999),
157–175.
1347. , Dempstershafer theory framed in modal logic, International Journal of Ap-
proximate Reasoning 21 (1999), no. 2, 157 – 175.
1348. Simukai W. Utete, Billur Barshan, and Birsel Ayrulu, Voting as validation in robot
programming, International Journal of Robotics Research 18 (1999), 401–413.
1349. R. Vaillant and O. Faugeras, Using extremal boundaries for 3-d object modeling, IEEE
Trans. PAMI 14 (February 1992), 157–173.
1350. Vakili, Approximation of hints, Tech. report, Institute for Automation and Operation
Research, University of Fribourg, Switzerland, Tech. Report 209, 1993.
1351. P.; Bosse E. Valin, P.; Djiknavorian, A pragmatic approach for the use of dempster-
shafer theory in fusing realistic sensor data, Tech. report, DRDC-VALCARTIER-SL-
2010-457, Defence RD Canada - Valcartier), 2010.
1352. B. L. van der Waerden, Moderne algebra, vol. 1, Springer-Verlag, Berlin, 1937.
1353. P. Vasseur, C. Pegard, E. Mouaddib, and L. Delahoche, Perceptual organization ap-
proach based on Dempster-Shafer theory, Pattern Recognition 32 (1999), 1449–1462.
1354. G. Verghese, K. Gale, and C.R. Dyer, Real-time, parallel motion tracking of three-
dimensional objects from spatiotemporal image sequences, Parallel Algorithms for
Machine Intelligence and Vision (Kumar et al., ed.), Springer-Verlag, 1990.
1355. Christian Viard-Gaudin, Pierre Michel Lallican, Philippe Binter, and Stefan Knerr, The
ireste on/off (ironoff) dual handwriting database, International Conference on Docu-
ment Analysis and Recognition 0 (1999), 455–458.
References 617
1356. M. Vincze, M. Ayromlou, and W. Kubinger, An integrating framework for robust real-
time 3D object tracking, ICVS’99, 1999, pp. 135–150.
1357. J. von Neumann and O. Morgenstern, Theory of games and economic behavior,
Princeton University Press, 1944.
1358. F. Voorbraak, A computationally efficient approximation of Dempster-Shafer theory,
International Journal on Man-Machine Studies 30 (1989), 525–536.
1359. , On the justification of Dempster’s rule of combination, Artificial Intelligence
48 (1991), 171–197.
1360. Frans Voorbraak, On the justification of dempster’s rule of combination, Artificial In-
telligence 48 (1991), no. 2, 171 – 197.
1361. F. Vorbraak, A computationally efficient approximation of Dempster-Shafer theory,
International Journal on Man-Machine Studies 30 (1989), 525–536.
1362. S. Wachter and H.H. Nagel, Tracking persons in monocular image sequences, Work-
shop on Motion of Non-Rigid and Articulated Objects, Puerto Rico, USA, 1997.
1363. , Tracking persons in monocular image sequences, CVIU 74 (1999), 174–192.
1364. Peter P. Wakker, Dempster belief functions are based on the principle of complete ig-
norance, International Journal of Uncertainty, Fuzziness and Knowledge-Based Sys-
tems 08 (2000), no. 03, 271–284.
1365. , Dempster-belief functions are based on the principle of complete ignorance,
Proceedings of the 1st International Sysmposium on Imprecise Probabilites and Their
Applications, Ghent, Belgium, 29 June - 2 July 1999, pp. 535–542.
1366. P. Walley, Statistical reasoning with imprecise probabilities, Chapman and Hall, New
York, 1991.
1367. Peter Walley, Coherent lower (and upper) probabilities, Tech. report, University of
Warwick, Coventry (U.K.), Statistics Research Report 22, 1981.
1368. , The elicitation and aggregation of beliefs, Tech. report, University of War-
wick, Coventry (U.K.), 1982, Statistics Research Report 23.
1369. , The elicitation and aggregation of beliefs, Tech. report, University of War-
wick, Coventry (U.K.), Statistics Research Report 23, 1982.
1370. , Belief function representations of statistical evidence, The Annals of Statis-
tics 15 (1987), 1439–1465.
1371. , Statistical reasoning with imprecise probabilities, Chapman and Hall, Lon-
don, 1991.
1372. , Measures of uncertainty in expert systems, Artificial Intelligence 83 (1996),
1–58.
1373. , Imprecise probabilities, The Encyclopedia of Statistical Sciences (C. B.
Read, D. L. Banks, and S. Kotz, eds.), Wiley, New York (NY), 1997.
1374. , Towards a unified theory of imprecise probability, International Journal of
Approximate Reasoning 24 (2000), 125–148.
1375. Peter Walley and T. L. Fine, Towards a frequentist theory of upper and lower proba-
bility, The Annals of Statistics 10 (1982), 741–761.
1376. A. Wallner, Maximal number of vertices of polytopes defined by f-probabilities,
ISIPTA 2005 – Proceedings of the Fourth International Symposium on Imprecise
Probabilities and Their Applications (F. G. Cozman, R. Nau, and T. Seidenfeld, eds.),
SIPTA, 2005, pp. 126–139.
1377. C. C. Wang and H. S. Don, Evidential reasoning using neural networks, Neural Net-
works, 1991. 1991 IEEE International Joint Conference on, Nov 1991, pp. 497–502
vol.1.
618 References
1378. Chua-Chin Wang and Hen-Son Don, A continuous belief function model for evidential
reasoning, Proceedings of the Ninth Biennial Conference of the Canadian Society for
Computational Studies of Intelligence (R.F. Glasgow, J.; Hadley, ed.), Vancouver, BC,
Canada, 11-15 May 1992, pp. 113–120.
1379. Chua-Chin Wang and Hon-Son Don, Evidential reasoning using neural networks, Pro-
ceedings of IEEE, 1991, pp. 497–502.
1380. , A geometrical approach to evidential reasoning, Proceedings of IEEE, 1991,
pp. 1847–1852.
1381. , The majority theorem of centralized multiple bams networks, Information
Sciences 110 (1998), 179–193.
1382. , A robust continuous model for evidential reasoning, Journal of Intelligent
and Robotic Systems: Theory and Applications 10:2 (June 1994), 147–171.
1383. J. Wang, G. Lorette, and P. Bouthemy, Analysis of human motion: A model-based
approach, Scandinavian Conference on Image Analysis, 1991.
1384. , Human motion analysis with detection of sub-part deformations, SPIE -
Biomedical Image Processing and Three-Dimensional Microscopy, 1992.
1385. P. Wang, The reliable combination rule of evidence in dempster-shafer theory, Image
and Signal Processing, 2008. CISP ’08. Congress on, vol. 2, May 2008, pp. 166–170.
1386. Pei Wang, A defect in dempster-shafer theory, CoRR abs/1302.6849 (2013).
1387. S. Wang and M. Valtorta, On the exponential growth rate of Dempster-Shafer be-
lief functions, Proceedings of the SPIE - Applications of Artificial Intelligence X:
Knowledge-Based Systems, vol. 1707, Orlando, FL, USA, 22-24 April 1992, pp. 15–
24.
1388. Ying-Ming Wang, Jian-Bo Yang, Dong-Ling Xu, and Kwai-Sang Chin, On the com-
bination and normalization of interval-valued belief structures, Information Sciences
177 (2007), no. 5, 1230 – 1247, Including: The 3rd International Workshop on Com-
putational Intelligence in Economics and Finance (CIEF2003).
1389. Z. Wang and G.J. Klir, Fuzzy measure theory, New York: Plenum Press, 1992.
1390. Zhenyuan Wang and George J. Klir, Choquet integrals and natural extensions of lower
probabilities, International Journal of Approximate Reasoning 16 (1997), 137–147.
1391. , Choquet integrals and natural extensions of lower probabilities, International
Journal of Approximate Reasoning 16 (1997), no. 2, 137 – 147.
1392. Chua-Chin Wanga and Hon-Son Don, A polar model for evidential reasoning, Infor-
mation Sciences 77:3-4 (March 1994), 195–226.
1393. L. A. Wasserman, Belief functions and statistical inference, Canadian Journal of
Statistics 18 (1990), 183–196.
1394. , Comments on shafer’s ‘perspectives on the theory and practice of belief func-
tions‘, International Journal of Approximate Reasoning 6 (1992), 367–375.
1395. L.A. Wasserman, Prior envelopes based on belief functions, Annals of Statistics 18
(1990), 454–464.
1396. J. Watada, Y. Kubo, and K. Kuroda, Logical approach: to evidential reasoning under
a hierarchical structure, Proceedings of the International Conference on Data and
Knowledge Systems for Manufacturing and Engineering, vol. 1, Hong Kong, 2-4 May
1994, pp. 285–290.
1397. M. Weber, M. Welling, and P. Perona, Unsupervised learning of models for recog-
nition, Proc. of the 6th European Conference on Computer Vision, vol. 1, June/July
2000, pp. 18–32.
1398. , Unsupervised learning of models for recognition, Proc. of the 6th European
Conference on Computer Vision, vol. 1, June/July 2000, pp. 18–32.
References 619
1438. W. Z. Wu, Y. Leung, and J. S. Mi, On generalized fuzzy belief functions in infinite
spaces, IEEE Transactions on Fuzzy Systems 17 (2009), no. 2, 385–397.
1439. Wei-Zhi Wu, Yee Leung, and Wen-Xiu Zhang, Connections between rough set theory
and dempster-shafer theory of evidence, International Journal of General Systems 31
(2002), no. 4, 405–430.
1440. Wei-Zhi Wu, Mei Zhang, Huai-Zu Li, and Ju-Sheng Mi, Knowledge reduction in ran-
dom information systems via dempstershafer theory of evidence, Information Sciences
174 (2005), no. 34, 143 – 164.
1441. Weizhi Wu and Jusheng Mi, Rough sets and knowledge technology: First international
conference, rskt 2006, chongquing, china, july 24-26, 2006. proceedings, ch. Knowl-
edge Reduction in Incomplete Information Systems Based on Dempster-Shafer The-
ory of Evidence, pp. 254–261, Springer Berlin Heidelberg, Berlin, Heidelberg, 2006.
1442. Yong-Ge Wu, Jing-Yu Yang, Ke-Liu, and Lei-Jian Liu, On the evidence inference the-
ory, Information Sciences 89 (1996), no. 3, 245 – 260.
1443. P. Wunsch, S. Winkler, and G. Hirzinger, Real-time pose estimation of 3D objects from
camera images using neural networks, ICRA’97, vol. 3, 1997, pp. 3232–3237.
1444. L Liu X Wu, Q Ye, Dempster-shafer theory of evidence based on improved bp and its
application, Journal of Wuhan University of Technology (2007).
1445. Yan Xia, S.S. Iyengar, and N.E. Brener, An event driven integration reasoning scheme
for handling dynamic threats in an unstructured environment, Artificial Intelligence
95 (1997), 169–186.
1446. Xiaoming Sun Xin Guan, Xiao Yi and You He, Efficient fusion approach for conflict-
ing evidence, Journal of Tsinghua University(Science and Technology) 1 (2009).
1447. Guoping Xu, Weifeng Tian, Li Qian, and Xiangfen Zhang, A novel conflict reassign-
ment method based on grey relational analysis (gra), Pattern Recognition Letters 28
(2007), no. 15, 2080 – 2087.
1448. H. Xu, An efficient implementation of the belief function propagation, Proc. of the 7th
Uncertainty in Artificial Intelligence (Smets Ph. DÁmbrosio B. D. and Bonissone P.
P., eds.), 1991, pp. 425–432.
1449. , An efficient tool for reasoning with belief functions, Proc. of the 4th Inter-
national Conference on Information Proceeding and Management of Uncertainty in
Knowledge-Based Systems, 1992, pp. 65–68.
1450. , An efficient tool for reasoning with belief functions uncertainty in intelligent
systems, Advances in the Dempster-Shafer Theory of Evidence (Valverde L. Bouchon-
Meunier B. and Yager R. R., eds.), North-Holland: Elsevier Science, 1993, pp. 215–
224.
1451. , Computing marginals from the marginal representation in Markov trees,
Proc. of the 5th International Conference on Information Proceeding and Management
of Uncertainty in Knowledge-Based Systems, 1994, pp. 275–280.
1452. , Computing marginals from the marginal representation in Markov trees, Ar-
tificial Intelligence 74 (1995), 177–189.
1453. H. Xu and R. Kennes, Steps towards an efficient implementation of Dempster-
Shafer theory, Advances in the Dempster-Shafer Theory of Evidence (R.R. Yager,
M. Fedrizzi, and J. Kacprzyk, eds.), John Wiley and Sons, Inc., 1994, pp. 153–174.
1454. H. Xu and Philippe Smets, Evidential reasoning with conditional belief functions, Pro-
ceedings of the 10th Uncertainty in Artificial Intelligence (Lopez de Mantaras R. and
Poole D., eds.), 1994, pp. 598–605.
1455. , Generating explanations for evidential reasoning, Proceedings of the 11th
Uncertainty in Artificial Intelligence (Besnard Ph. and Hanks S., eds.), 1995, pp. 574–
581.
622 References
1500. B. Ben Yaghlane, Philippe Smets, and K. Mellouli, Independence concepts for be-
lief functions, Proceedings of Information Processing and Management of Uncertainty
(IPMU’2000), 2000.
1501. Koichi Yamada, A new combination of evidence based on compromise, Fuzzy Sets and
Systems 159 (2008), no. 13, 1689 – 1708.
1502. M. Yamamoto and K. Koshikawa, Human motion analysis based on a robot arm
model, CVPR’91, 1991, pp. 664–665.
1503. M. Yamamoto, Y. Ohta, T. Yamagiwa, and K. Yamanaka, Human action tracking
guided by key-frames, Fourth Int. Conf. on Automatic Face and Gesture Recognition,
Grenoble, France, March 2000.
1504. M. Yamamoto, A. Sato, S. Kawada, T. Kondo, and Y. Osaki, Incremental tracking
of human actions from multiple views, Proceedings of the Conference on Computer
Vision and Pattern Recognition CVPR’98, Santa Barbara, CA, June 1998, pp. 2–7.
1505. S. Yamamoto, Y. Mae, Y. Shirai, and J. Miura, Realtime multiple object tracking based
on optical flows, Proc. Robotics and Automation, vol. 3, 1995, pp. 2328–2333.
1506. Jian-Bo Yang and Madan G. Singh, An evidential reasoning approach for multiple-
attribute decision making with uncertainty, IEEE Transactions on Systems, Man, and
Cybernetics 24:1 (January 1994), 1–18.
1507. Jian-Bo Yang and Dong-Ling Xu, Evidential reasoning rule for evidence combination,
Artificial Intelligence 205 (2013), 1 – 29.
1508. Miin-Shen Yang, Tsang-Chih Chen, and Kuo-Lung Wu, Generalized belief function,
plausibility function, and dempster’s combinational rule to fuzzy sets, International
Journal of Intelligent Systems 18 (2003), no. 8, 925–937.
1509. Y. Y. Yao, Two views of the theory of rough sets in finite universes, International Jour-
nal of Approximate Reasoning 15 (1996), 291–317.
1510. , A comparative study of fuzzy sets and rough sets, Information Sciences
109(1-4) (1998), 227–242.
1511. , Granular computing: basic issues and possible solutions, Proceedings of the
5th Joint Conference on Information Sciences, 2000, pp. 186–189.
1512. Y. Y. Yao and P. J. Lingras, Interpretations of belief functions in the theory of rough
sets, Information Sciences 104(1-2) (1998), 81–106.
1513. Yan-Qing Yao, Ju-Sheng Mi, and Zhou-Jun Li, Attribute reduction based on gener-
alized fuzzy evidence theory in fuzzy decision systems, Fuzzy Sets and Systems 170
(2011), no. 1, 64 – 75, Theme: Information processing.
1514. Yiyu (Y. Y. ). Yao, Churn-Jung Liau, and Ning Zhong, Foundations of intelligent sys-
tems: 14th international symposium, ismis 2003, maebashi city, japan, october 28-31,
2003. proceedings, ch. Granular Computing Based on Rough Sets, Quotient Space
Theory, and Belief Functions, pp. 152–159, Springer Berlin Heidelberg, Berlin, Hei-
delberg, 2003.
1515. Y.Y. Yao and P.J. Lingras, Interpretations of belief functions in the theory of rough
sets, Information Sciences 104 (1998), no. 12, 81 – 106.
1516. J. Yen, GERTIS: a Dempster-Shafer approach to diagnosing hierarchical hypotheses,
Communications ACM 32 (1989), 573–585.
1517. , Generalizing the Dempster-Shafer theory to fuzzy sets, IEEE Transactions on
Systems, Man, and Cybernetics 20:3 (1990), 559–569.
1518. John Yen, A reasoning model based on an extended dempster-shafer theory, Pro-
ceedings of the Fifth AAAI National Conference on Artificial Intelligence, AAAI’86,
AAAI Press, 1986, pp. 125–131.
1519. John Yen, Computing generalized belief functions for continuous fuzzy sets, Interna-
tional Journal of Approximate Reasoning 6 (1992), 1–31.
References 625