0% found this document useful (0 votes)
17 views34 pages

Module 3 PDF

This document discusses symbolic reasoning under uncertainty. It introduces nonmonotonic reasoning, where conclusions may need to be retracted if new information is introduced. An example of the "ABC Murder story" is given to illustrate issues in reasoning with uncertain or incomplete knowledge. The document outlines approaches to nonmonotonic reasoning, including default logic, abduction, inheritance, and the closed world assumption. It discusses defining possible worlds given known facts and preferences between models. Logics for nonmonotonic reasoning aim to handle reasoning when knowledge is incomplete or inconsistent.

Uploaded by

Manjunath Shenoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views34 pages

Module 3 PDF

This document discusses symbolic reasoning under uncertainty. It introduces nonmonotonic reasoning, where conclusions may need to be retracted if new information is introduced. An example of the "ABC Murder story" is given to illustrate issues in reasoning with uncertain or incomplete knowledge. The document outlines approaches to nonmonotonic reasoning, including default logic, abduction, inheritance, and the closed world assumption. It discusses defining possible worlds given known facts and preferences between models. Logics for nonmonotonic reasoning aim to handle reasoning when knowledge is incomplete or inconsistent.

Uploaded by

Manjunath Shenoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Artificial Intelligence, VII SEM, CSE, Module -III

MODULE III

Chapter 7

SYMBOLIC REASONING UNDER UNCERTAINTY


In this chapter and the next, we explore and discuss techniques for solving problems with incomplete
and uncertain models.

What is Reasoning?

 Reasoning is the act of deriving a conclusion from certain premises using a given
methodology.
 Reasoning is a process of thinking, logically arguing and drawing inferences.

 UNCERTAINTY IN REASONING

 The world is an uncertain place; often the Knowledge is imperfect which causes uncertainty.
Therefore reasoning must be able to operate under uncertainty.
 Uncertainty is major problem in knowledge elicitation, especially when the expert's
knowledge must be quantized in rules.
 Uncertainty may cause bad treatment in medicine, loss of money in business.

 INTRODUCTION TO NONMONOTONIC REASONING

 Non-Monotonic Logic/Reasoning: A non-monotonic logic is a formal logic whose


consequence relation is not monotonic. Logic is non-monotonic, if the truth of a proposition
may change when new information (axioms) is added. i.e., Non-monotonic logic allows a
statement to be retracted (taken back). Also used to formalize plausible (believable)
reasoning.
 Example 1: Birds typically fly.
Tweety is a bird.
Tweety flies (most probably).
- Conclusion of non-monotonic argument may not be correct.
 Example 2 with Ref. to Example 1
If Tweety is a penguin, it is incorrect to conclude that Tweety flies.
- All non-monotonic reasoning is concerned with consistency. Inconsistency is resolved, by
removing the relevant conclusion(s) derived by default rules.

 The techniques that can be used to reason effectively even when a complete, consistent, and
constant model of the world is not available are discussed here. One of the examples, which

Dept. of CSE, RNSIT 134


Artificial Intelligence, VII SEM, CSE, Module -III

we call the ABC Murder story, clearly illustrates many of the main issues these techniques
must deal with.
 Let Abbott, Babbitt, and Cabot be suspects in a murder case. Abbott has an alibi (explanation/
defense), in the register of a respectable hotel in Albany. Babbitt also has an alibi for his
brother-in-law testified that Babbitt was visiting him in Brooklyn at the time. Cabot pleads
alibi too, claiming to have been watching a ski meet in the Catskills, but we have only his word
for that. So we believe -
1. That Abbott did not commit the crime.
2. That Babbitt did not commit the crime.
3. That Abbott or Babbitt or Cabot did.
 But presently Cabot documents his alibi that he had the good luck to have been caught by
television in the sidelines at the ski meet. A new belief is thus thrust upon us:
4. That Cabot did not.
 Our beliefs (1) through (4) are inconsistent, so we must choose one for rejection. Which has the
weakest evidence? The basis for (1) in the hotel register is good, since it is a fine old hotel. The
basis for (2) is weaker, since Babbitt’s brother-in-law might be lying. The basis for (3) is
perhaps twofold: that there is no sign of burglary (robbery) and that only Abbott, Babbitt, and
Cabot seem to have stood to gain from the murder apart from burglary. This exclusion of
burglary seems conclusive, but the other consideration does not. There could be some fourth
beneficiary. For (4), finally, the basis is conclusive: the evidence from television. Thus (2) and
(3) are the weak points. To resolve the inconsistency of (1) through (4) we should reject (2) or
(3), thus either incriminating (drop some body) Babbitt or widening our net for some new
suspect.
 See also how the revision progresses downward. If we reject (2), we also revise our previous
underlying belief, however tentative, that the brother-in-law was telling the truth and Babbitt
was in Brooklyn. If instead we reject (3), we also revise our previous underlying belief that
none but Abbott, Babbitt, and Cabot stood to gain from the murder apart from burglary.
 Finally, certain arbitrariness should be noted in the organization of this analysis. The
inconsistent beliefs (1) through (4) were singled out, and then various further beliefs were
accorded a subordinate status as underlying evidence: a belief about a hotel register, a belief
about the prestige of the hotel, a belief about the television, a perhaps unwarranted belief
about the veracity of the brother-in-law, and so on.
 The strategy illustrated would seem in general to be a good one: divide and conquer. In
probing the evidence, where do we stop? In probing the evidence for (1) through (4) we searched
up various underlying beliefs, but we could have probed further, seeking evidence in turn for
them.
 This story illustrates some of the problems posed by uncertain, fuzzy, and often changing
knowledge. A variety of logical frameworks and computational methods have been proposed
for handling such problems.

In this chapter and the next, we discuss two approaches:

Dept. of CSE, RNSIT 135


Artificial Intelligence, VII SEM, CSE, Module -III

 Nonmonotonic reasoning, in which the axioms and/or the rules of inference are extended to
make it possible to reason with incomplete information. These systems preserve the property
that, at any given moment, a statement is either believed to be true, believed to be false, or
not believed to be either.
 Statistical reasoning in which the representation is extended to allow some kind of numeric
measure of certainty (not simply TRUE or FALSE) to be associated with each statement.
 In Conventional reasoning systems, first-order predicate logic, are designed to work with
information and that has three important properties:
- It is complete with respect to the domain of interest. In other words, all the facts that are
necessary to solve a problem are present in the system or can be derived from existing facts
with rules of first-order logic.
- It is consistent.
- The new facts can be added as they become available. If these new facts are consistent with
all the other facts that have already been asserted, then nothing will ever be retracted (taken
back) from the set of facts that are known to be true. This property is called monotonicity. In
other words, logic is monotonic if the truth of a proposition does not change when new
information (axioms) is added. The traditional logic like FOPL is monotonic.
 If any of these above properties is not satisfied, conventional logic-based reasoning systems
become inadequate. Then Nonmonotonic reasoning systems are designed to handle such
problems in which all of these properties may be missing.

In order to do this, we must address several key issues, including the following:

1. How can the knowledge base be extended to be made on the basis of lack of knowledge as well
as on the presences of it?
2. How can the knowledge base be updated properly when a new fact is added to the system (or
when an old one is removed)?
The usual solution to this problem is keep track of proofs, which are often called justifications.
3. How can knowledge be used to help resolve conflicts when there are several inconsistent
nonmonotonic inferences that could be drawn?

To do this, we require additional methods for resolving such conflicts in ways that are most appropriate
for the particular problem that is being solved.

 LOGICS FOR NONMONOTONIC REASONING

Because monotonicity is fundamental to the definition of first-order predicate logic, we are


forced to find some alternative to support nonmonotonic reasoning. We examine several because
no single formalism with all the desired properties has yet emerged. In particular, we would like to
find a formalism that does all of the following things.
 Defines the set of possible worlds that could exist, given the facts that we do have.
 Provides a way to say that we prefer to believe in some models rather than others.
 Provides the basis for a practical implementation of this kind of reasoning.
Dept. of CSE, RNSIT 136
Artificial Intelligence, VII SEM, CSE, Module -III

 Corresponds to our intuitions about how this kind of reasoning works.

- Logics for NMR (Nonmonotonic reasoning) Systems


Default Reasoning
1. Non-monotonic Logic
2. Default Logic
3. Abduction
4. Inheritance
Minimalist Reasoning
1. The Closed world assumption
2. Circumscription

- Models and Interpretations


Define an interpretation of a set of wff’s consistent of
1. A domain (D)
2. A function(f) that assigns
- to each predicate a relation
- to each n-ary function an operator that maps from Dn into D
- to each constant an element of D
- Define a model of a set of wff’s, an interpretation that satisfies them.

 Models, Wff’s, and the importance of Non-monotonic Reasoning


A model of a set of wff’s is an interpretation that satisfies them.
Consider Figure below, which shows one way of visualizing how nonmonotonic reasoning
works.

 Default Reasoning
 This is a very common form of non-monotonic reasoning. The conclusions are drawn based
on what is most likely to be true. There are two approaches for Default reasoning and both are
logic type: Non-monotonic logic and Default logic.
Dept. of CSE, RNSIT 137
Artificial Intelligence, VII SEM, CSE, Module -III

1. Nonmonotonic Logic
- Provides a basis for default reasoning.
- It has already been defined. It says, "The truth of a proposition may change when new
information (axioms) is added and logic may be build to allow the statement to be
retracted."
- Non-monotonic logic is a predicate logic with one extension called modal operator M
which means “consistent with everything we know”. The purpose of M is to allow
consistency. i.e., FOPL is augmented with a modal operator M and can be read as “is
consistent”.
- Here the Rules are Wff’s.
- A way to define consistency with PROLOG notation is :
To show that fact P is true, we attempt to prove ¬P.
If we fail, we may say that P is consistent since ¬P is false.
Examples 1:
x, y : Related(x , y)  M GetAlong(x, y) →  WillDefend(x , y)

Should be read as “For all x and y, if x and y are related and if the fact that x gets along
with y is consistent with everything else that is believed, then conclude that x will not
defend y”

∀x: plays_instrument(x) 𝖠 M manage(x) → jazz_musician(x)

States that “For all x, the x plays an instrument and if the fact that x can manage is consistent
with all other knowledge then we can conclude that x is a jazz musician.
Example 2:
A second problem that arises in this approach is what to do when multiple nonmonotonic
statements taken together would be inconsistent. For example, consider the following set of
assertions: A Quaker and a Republican Example

Quakers are pacifists (Peace lover). Republicans are not pacifists. Richard is a republican
and a Quaker. Is he a pacifist?
These rules are ambiguous. Let us clarify:
Only a typical quaker is a pacifist. Only a typical republican is not a pacifist.
This can be expressed in terms of consistency:
∀x: Quaker(x)  CONSISTENT (Pacifist(x)) → Pacifist(x) OR
x: Quaker(x)  M Pacifist(x) → Pacifist(x)

∀x: Republican(x)  CONSISTENT (¬Pacifist(x)) →  Pacifist(x) OR


x: Republican(x)  M Pacifist(x) →  Pacifist(x)
If we apply the first rule to Richard, we find he is a pacifist (nothing contradicts this
conclusion), but then the second rule cannot be used and vice versa. In effect, neither
pacifist(x) nor ¬pacifist (x) can be proven.

Dept. of CSE, RNSIT 138


Artificial Intelligence, VII SEM, CSE, Module -III

2. Default Logic
- Default logic initiates a new inference rule: A : B
C
Where, A is known as the prerequisite, B as the justification, and C as the consequent.
Read the above inference rule as: " if A is provable and if it is consistent to assume B, then
conclude C ". The rule says that given the prerequisite, the consequent can be inferred,
provided it is consistent with the rest of the data.
- Example : Rule that "birds typically fly" would be represented as
bird(x) : flies(x)
flies (x)
which says " If x is a bird and the claim that x flies is consistent with what we know,
then infer that x flies".
Since, all we know about Tweety is that: Tweety is a bird, we therefore inferred that
Tweety flies.
- These inferences are used as basis for computing possible set of extensions to the
knowledge base.
- Here, Rules are not Wff’s
- Applying Default Rules :
While applying default rules, it is necessary to check their justifications for
consistency, not only with initial data, but also with the consequents of any other default
rules that may be applied. The application of one rule may thus block the application of
another. To solve this problem, the concept of default theory was extended.

- The idea behind non-monotonic reasoning is to reason with first order logic, and if an
inference cannot be obtained then use the set of default rules available within the first order
formulation.

3. Abduction
- Abduction means systematic guessing: "infer" an assumption from a conclusion.
- Definition: "Given two Wffs: AB and B, for any expressions A and B, if it is consistent
to assume A, do so".
- Refers to deriving Conclusions, applying the implications in reverse.
- For example, the following formula:
∀x: RainedOn(x) → wet(x)
could be used "backwards" with a specific x:
if wet(Tree) then RainedOn(Tree)
This, however, would not be logically justified. We could say:
wet(Tree)  CONSISTENT(rainedOn(Tree)) → rainedOn(Tree)
We could also attach probabilities, for example like this:
wet(Tree) → rainedOn(Tree) || 70%
wet(Tree) → morningDewOn(Tree) || 20%
wet(Tree) → sprinkledOn(Tree) || 10%

Dept. of CSE, RNSIT 139


Artificial Intelligence, VII SEM, CSE, Module -III

- Example: Given
x: Measles(x) → Spots(x)
Spots (Jill)
conclude Measles(Jill)

In many domains, abductive reasoning is particularly useful if some measure of certainty is


attached to the resulting expressions.

4. Inheritance
- Consider baseball knowledge base described in chapter 4.
- The concept is “An object inherits attribute values from all the classes of which it is a
member, if not doing so leads to a contradiction, in which case a value from a more
restricted class has precedence over a value from a border class.”
- These logical ideas provide a basis for describing this idea more formally and can write
its inheritable knowledge as rules in Default Logic.
- We can write a rule to account for the inheritance of a default value for the height of a
baseball player as:
Baseball-player(x): height (x, 6-1)
height (x, 6-1)
- Assert Pitcher(Three-Finger-Brown). This concludes that Three-Finger-Brown is a
baseball player. This rule allows us to conclude that his height is 6-1.
- If, on the other hand, we had asserted a conflicted value for Three Finger had an axiom
like

x, y, z : height(x, y)  height(x, z) → y = z

- Which is not allowed to have more than one height, and then we would not be able to
apply the default rule. Thus an explicitly stated value will block the inheritance of a
default value which is exactly what we want.
- But now, let’s encode the default rule for the height of adult males in general.

Adult-Male(x): height(x, 5-10)


height(x, 5-10)
- This rule does not work. If we again assert Pitcher (Three-Finger-Brown), then the
resulting theory contains two extensions: first rule fires and Brown’s height is 6-1 and this
new rule applies and Brown’s height is 5-10. Neither of these extensions is preferred.
- we could rewrite the default rule for adult males in general as:
Adult –Male(x): Baseball-Player(x)  Midget(x)  - Jockey(x)  height (x, 5-10)
height(x, 5-10)

- A clearer approach is to say something like, “Adult males typically have a height of 5-10
unless they are abnormal in some way.” So we could write, for example:

Dept. of CSE, RNSIT 140


Artificial Intelligence, VII SEM, CSE, Module -III

x : Adult-Male(x)  AB (x,aspect1) →height(x, 5-10)


x : Baseball-player(x) →AB(x, aspect 1)
x : Midget(x) → AB(x, aspect 1)
x : Jockey(x) → AB(x, aspect 1)

Then, if we add the single default rule:


:  AB(x, y)
 AB(x, y)
We get the desired result.
- This effectively blocks the application of the default knowledge about adult males in the
case that more specific information from the class of baseball player is available.

 Minimalist Reasoning

 The idea behind using minimal models as a basis for nonmonotonic reasoning about the world
is the following: “There are many fewer true statements than false ones. If something is true
and relevant it makes sense to assume that it has been entered into the knowledge base.
Therefore, assume that the only true statements are those that necessarily must be true in
order to maintain the consistency of the knowledge base.”

1. The Closed World Assumption


- A simple kind of minimalist reasoning is the Closed World Assumption or CWA. The
CWA says “that the only objects that satisfy any predicate P are those that must”.
- The CWA is particularly powerful as a basis for reasoning with databases, which are
assumed to be complete with respect to the properties they describe. For example, a
personnel database can safely be assumed to list all of the company’s employees. If
someone asks whether Smith works for the company, we should reply “no” unless he is
explicitly listed as an employee.
- Some worlds are not closed. It can fail to produce an appropriate answer for either of two
reasons:
The first is that its assumptions are not always true in the world. Some parts of the
world are not realistically “closable”.
The second kind of problem is that the CWA arises from the fact that it is a purely
syntactic reasoning process and its results depend on the form of the assertions that
are provided.
- Let’s look at two specific examples of this problem. Consider a knowledge base that
consists of just a single statement:

A (Joe)  B (Joe)

We derive: A (Joe)
B (Joe)
Dept. of CSE, RNSIT 141
Artificial Intelligence, VII SEM, CSE, Module -III

The CWA allows us to conclude both ? A (Joe) and ? B (Joe), since neither A nor B
must necessarily be true of Joe. So, the resulting extended knowledge base is
inconsistent.
- The problem is that we have assigned a special status to positive instances of
predicates, as opposed to negative ones. Specifically, the CWA forces completion of
knowledge base by adding the negative assertion P whenever it is consistent to do so.
But the assignment of a real world property to some predicate P and its complement to
the negation of P may be arbitrary. For example, suppose we define a predicate single
and create the following knowledge base:
Single (John)
Single (Mary)
Then, if we ask about Jane, the CWA will yield the answer Single (Jane).
But now suppose we had chosen instead to use the predicate Married rather than
Single. Then the corresponding knowledge base would be
Married (John)
Married (Mary)
If we now ask about Jane, the CWA will yield the result Married (Jane).

2. Circumscription
- Circumscription is a Nonmonotonic logic to formalize the common sense assumption.
Circumscription is a formalized rule of conjecture (guess) that can be used along with
the rules of inference of first order logic.
- Circumscription involves formulating rules of thumb with "abnormality" predicates
and then restricting the extension of these predicates, circumscribing them, so that they
apply to only those things to which they are currently known.
- Example: Take the case of Bird Tweety
The rule of thumb is that "birds typically fly" is conditional. The predicate
"Abnormal" signifies abnormality with respect to flying ability. Observe that the
rule ∀ x: (Bird(x)  Abnormal(x) → Flies(x)) does not allow us to infer that
"Tweety flies", since we do not know that it is abnormal with respect to flying
ability.
But if we add axioms which circumscribe the abnormality predicate to which they
are currently known say "Bird Tweety" then the inference can be drawn. This
inference is non-monotonic.
- Two advantages over CWA :
Operates on whole formulas, not individual predicates.
Allows some predicates to be marked as closed and others as open.

 IMPLIMENTATION ISSUES

 The issues and weaknesses related to implementation of Nonmonotonic reasoning in problem


solving are:
Dept. of CSE, RNSIT 142
Artificial Intelligence, VII SEM, CSE, Module -III

1. How to derive exactly those Nonmonotonic conclusions that are relevant to solving the
problem at hand while not wasting time on those that are not necessary.
2. How to update our knowledge incrementally as problem solving progresses.
3. How to overcome the problem where more than one interpretation of the known facts is
qualified or approved by the available inference rules.
4. In general the theories are not computationally effective, decidable or semi decidable.

 The solutions are offered, considering the reasoning processes into two parts:
- One, a problem solver that uses whatever mechanism it happens to have to draw
conclusions as necessary, and
- Second, a truth maintenance system whose job is to maintain consistency in knowledge
representation of a knowledge base.
 Search controls used are:
- Depth-first search
- Breadth-first search

 AUGMENTING A PROBLEM-SOLVER

 Problem-solving can be done using either forward or backward reasoning. Problem-solving


using uncertain knowledge is no exception. As a result, there are two basic approaches to this
kind of problem-solving.
- Reason forward from what is known: Treat nonmonotonically derivable conclusions the
same way monotonically derivable ones are handled. Nonmonotonic reasoning systems that
support this kind of reasoning allow standard forward-chaining rules to be augmented with
UNLESS (except) clauses, which introduced a basis for reasoning by default. Control is
handled in the same way that all other control decision in the system is made.
- Reason backward to determine whether some expression P is true. Nonmonotonic
reasoning systems that support this kind of reasoning may do either or both of the following
two things:
Allow default clause in backward rules. Resolve conflicts among defaults using the
same, control strategy that is used for other kinds of reasoning.
Support a kind of debate in which an attempt is made to construct arguments both in
favor of P and opposed to it. Then some additional knowledge is applied to the
arguments to determine which side has the stronger case.
 Let’s look at backward reasoning first. We will begin the simple case of backward reasoning in
which we attempt to prove an expression P. suppose that we have a knowledge base that
consists of the backward rules shown in figure below.
Figure: Backward Rules Using UNLESS

Dept. of CSE, RNSIT 143


Artificial Intelligence, VII SEM, CSE, Module -III

 Knowledge base the usual PROLOG- style control structure in which rules are matched top to
bottom, left to right. Then if we ask the question? Suspect {x}, the program will first try Abbott
and return Abbott as its answer. If we had also included the facts.
RegisteredHotel(Abbott, Albany)
FarAway(Albany)
Then, the program would have failed to conclude that Abbott was a suspect and it would
instead have located Babbitt and then Cabot.
 Figure below shows how the same knowledge could be represented as forward rules.
Figure: Forward Rules Using UNLESS

Dept. of CSE, RNSIT 144


Artificial Intelligence, VII SEM, CSE, Module -III

 IMPLIMENTATION: DEPTH-FIRST SEARCH

 Dependency-Directed Backtracking
 Depth-first approach to nonmonotonic reasoning: We need to know a fact F, which can be
derived by making some assumption A, which seems believable. So we make assumption A,
derive F, and then derive some additional facts G and H from F. We later derive some other
facts M and N, but they are completely independent of A and F. Later, a new fact comes in that
individual A. We need to withdraw our proof of F, and also our proofs of G and H since they
depended on F. But what about M and N? They didn’t depend on F, so there is no logical need
to invalidate them. But if we use a conventional backtracking scheme, we have to back up
past conclusions in the order in which we derived them, so we have to backup past M and N,
thus undoing them, in order to get back to F,G, H and A. To get around this problem, we need
a slightly different notice of backtracking, one that is based on logical dependencies rather than
the chronological order in which decisions were made. We call this new method dependency-
directed backtracking.

 As an example, suppose we want to build a program that generates a solution to a fairly simple
problem. Finding a time at which three busy people can all attend a meeting? One way to solve
such a problem is first to make an assumption that the meeting will be held on some
particular day, say Wednesday, add to the database. Then proceed to find a time, checking
along the way for any inconsistencies in people’s schedules. If a conflict arises, the statement
representing the assumption must be discarded and replaced by another, hope fully non-
contradictory, one.
 This kind of solution can be handled by a straightforward tree search with chronological
back tracking. All assumptions, as well as the inferences drawn from them, are recorded at
the search node that created them. When a node is determined to represent a contradiction,
simply backtrack to the next node from which there remain unexplored paths. The
assumptions and their inferences will disappear automatically. The drawback to this approach
is illustrated in Figure below, which shows part of the search tree of a program that is trying to
schedule a meeting. To do so, the program must solve a constraint satisfaction problem to

Dept. of CSE, RNSIT 145


Artificial Intelligence, VII SEM, CSE, Module -III

find a day and time at which none of the participants is busy and at which there is a sufficiently
large room available.
Figure: Nondependency-Directed Backtracking

 In order to solve the problem, the system must try to satisfy one constraint at a time. Initially,
there is little reason to choose one alternative over another, so it decides to schedule the meeting
on Wednesday. That creates a new constraint that must be met by the rest of the solution. The
assumption that the meeting will be held on Wednesday is stored at the node it generated. Next
the program tries to select a time at which all participants are available. Among them, they have
regularly scheduled daily meetings at all times except 2:00. So 2:00 is chosen as the meeting
time. But it would not have mattered which day was chosen. Then the program discovers that on
Wednesday there are no rooms available. So it backtracks past the assumption that the day would
be Wednesday and tries another day, Tuesday. Now it must duplicate the chain of reasoning the
led it to choose 2:00 as the time because that reasoning was lost when it backtracked to redo the
choice of day. This occurred even though that reasoning did not depend in any way on the
assumption that the day would be Wednesday. By withdrawing statements based on the order in
which they were generated by the search process rather than on the basis of responsibility for
inconsistency, we may waste a great deal of effort.
 If we want to use dependency-directed backtracking instead, so that we do not waste this effort,
then we need to do the following things:
- Associate with each node one or more justifications. Each justification corresponds to a
derivation process that led to the node. Each justification must contain a list of all the
nodes on which its derivation depended.
- Provide a mechanism that, when given a contradiction node and its justification, computes
the “no-good” set of assumptions that underlie the justification. The no-good set is defined
to be the minimal set of assumptions such that if you remove any element from the set, the
justification will no longer be valid and the inconsistent node will no longer be believed.
- Provide a mechanism for considering a no-good set and choosing an assumption to
retract.

Dept. of CSE, RNSIT 146


Artificial Intelligence, VII SEM, CSE, Module -III

 Justification-Based Truth Maintenance Systems


 The idea of a Truth Maintenance System (TMS) is to provide the ability to do dependency-
directed backtracking and to support nonmonotonic reasoning.
 Truth Maintenance System (TMS) is a critical part of a reasoning system. Its purpose is to
assure that inferences made by the reasoning system (RS) are valid.
 The RS provides the TMS with information about each inference it performs, and in return the
TMS provides the RS with information about the whole set of inferences.
 The TMS maintains the consistency of a knowledge base as soon as new knowledge is added. It
considers only one state at a time so it is not possible to manipulate environment.
 Several implementations of TMS have been proposed for non-monotonic reasoning. The
important ones are the :
- Justification-Based Truth Maintenance Systems (JTMS)
- Logic-Based Truth Maintenance Systems (LTMS)
- Assumption-based Truth Maintenance Systems (ATMS).

 The TMS maintains consistency in knowledge representation of a knowledge base. The


functions of TMS are to:
- Provide justifications for conclusions
When a problem solving system gives an answer to a user's query, an explanation
of that answer is required.
Example: An advice to a stockbroker is supported by an explanation of the reasons
for that advice. This is constructed by the Inference Engine (IE) by tracing the
justification of the assertion.
- Recognize inconsistencies
The Inference Engine (IE) may tell the TMS that some sentences are
contradictory. Then, TMS may find that all those sentences are believed true, and
reports to the IE that it can eliminate the inconsistencies by determining the
assumptions used and changing them appropriately.
Example: A statement that either Abbott, or Babbitt, or Cabot is guilty together
with other statements that Abbott is not guilty, Babbitt is not guilty, and Cabot is
not guilty, form a contradiction.
- Support default reasoning
In the absence of any firm knowledge, in many situations we want to reason from
default assumptions.
Example: If "Tweety is a bird", then until told otherwise, assume that "Tweety
flies" and for justification use the fact that "Tweety is a bird" and the assumption
that "birds fly".
 We consider a simple form of truth maintenance system, a Justification-Based Truth
Maintenance System (or JTMS and we refer just as TMS for the rest of this discussion),
 Example: To see how a TMS works, let’s return to the ABC Murder story. Initially, we
might believe that Abbott is the primary suspect because he was a beneficiary of the deceased
(dead) and he had no alibi. There are three assertions here, a specific combination of which

Dept. of CSE, RNSIT 147


Artificial Intelligence, VII SEM, CSE, Module -III

we now believe, although we may change our beliefs later. We can represent these assertions
in shorthand as follows:
- Suspect Abbott (Abbott is the primary murder suspect.)
- Beneficiary Abbott (Abbott is a beneficiary of the victim.)
- Alibi Abbott (Abbott was at an Albany hotel at the time.)

In the notation of Default Logic, we can state the rule that produced it as

Beneficiary(x): Alibi(x)
Suspect(x)

Figure: A Justification

Figure above shows how these three facts would be represented in a dependency network,
which can be created as a result of applying the first rule of Figure: Backward Rules using
UNLESS. The assertion Suspect Abbott has an associated TMS justification. Each
justification consists of two parts: an IN-list and an OUT-list. In the figure, the assertions on
the IN-list are connected to the justification by + links, those on the OUT-list by - links. The
justification is connected by an arrow to the assertion that it supports. In the justification
shown, there is exactly one assertion in each list. Beneficiary Abbott is in the IN-list and Alibi
Abbott is in the OUT-list. Such a justification says that Abbott should be a suspect just when
it is believed that he is a beneficiary and it is not believed that he has an alibi.

More generally, assertion (usually called nodes) in a TMS dependency network is believed
when they have a valid justification. A justification is valid if every assertion in the IN-list is
believed and none of those in the OUT-list.

Labeling in a Dependency Network


Labeling must maintain two properties: Consistency and Well- foundedness.
The state of affairs in above Figure is incomplete. We are told that Abbott is a beneficiary.
We have no further justification for this fact and we must simply accept it. For such facts, we
give a premise justification: a justification with empty IN- and OUT- lists. Premise
justifications are always valid. Figure below shows such a justification added to the network
and a consistent labeling for that network, which shows Suspect Abbott labeled IN.

Dept. of CSE, RNSIT 148


Artificial Intelligence, VII SEM, CSE, Module -III

Figure: Labeled Nodes with Premise Justification

Abbott was primary suspect but looking at hotel register provided a valid reason to believe
Abbott’s alibi. Figure below shows the effect of adding such a justification to the network.
Now suspect Abbott and Register forged are OUT and Alibi, Registered, and Far away
Abbott are IN.
Figure: Changed Labeling

Babbitt will have a similar justification based upon lack of belief that his brother-in-law
lied as shown in Figure below. Now suspect Babbitt and Lies B-I-L are OUT and Alibi,
Say So B-I-L (Brother-In-Law) Babbitt are IN.
Figure: Babbitt’s Justification

Dept. of CSE, RNSIT 149


Artificial Intelligence, VII SEM, CSE, Module -III

Figure below illustrates the fact that the only support for the alibi of attending the ski
show is that Cabot is telling the truth about being there. The only support for his telling
the truth would be if we knew he was at the ski show. But this is a circular argument.
The task of a TMS is to disallow such arguments. In particular, if the support for a node
only depends on an unbroken chain of positive links (IN-list links) leading back to itself
then that node must be labeled OUT if the labeling is to be well-founded.
Figure: Cabot’s justification

The other major task of a TMS is resolving contradictions. In a TMS, a contradiction


node does not represent a logical contradiction but rather a state of the database explicitly
declared to be undesirable. In this example, we have a contradiction if we do not have at
least one murder suspect. Thus a contradiction might have the justification shown in
Figure below, where the node other Suspects mean that there are suspects other than
Abbott, Babbitt, and Cabot. This is one way of explicitly representing an instance of the
closed world assumption. For now, it has none and must be labeled OUT.
Figure: A Contradiction

Now we learn that Cabot was seen on television attending the ski tournament. Adding this
to the dependency network first illustrates the fact that nodes can have more than one
justification as shown in Figure below.

Dept. of CSE, RNSIT 150


Artificial Intelligence, VII SEM, CSE, Module -III

Figure: A Second Justification

Suppose, in particular, that we choose to believe that Babbitt’s brother-in-law lied. What
should be the justification for that belief? Figure below shows a complete abductive
justification for the belief that Babbitt’s brother-in-law lied.
Figure: A Complete Abductive Justification

At this point, we have described the key reasoning operations that are performed by a JTMS:
- Consistent labeling
- Contradiction resolution
Also described a set of important reasoning operations that a JTMS does not perform,
including:
- Applying rules to derive conclusions
- Creating justifications for the results of applying rules
- Choosing among alternative ways of resolving a contradiction
- Detecting contradictions
All of these operations must be performed by the problem-solving program that is using the JTMS.

 Logic-Based Truth Maintenance Systems


 A logic-based truth maintenance system (LTMS) is very similar to a JTMS. It differs in one
important way. In a JTMS, the nodes in the network are treated as atoms by the TMS,
which assumes no relationships among them except the ones that are explicitly stated in the

Dept. of CSE, RNSIT 151


Artificial Intelligence, VII SEM, CSE, Module -III

justifications. In particular, a JTMS has no problem simultaneously labeling both P and  P


IN. For example, we could have represented explicitly both Lies B-I-L and not Lies B-I-L
and labeled both of them IN. No contradiction will be detected automatically.
 In an LTMS, on the other hand, a contradiction would be asserted automatically in such a
case. If we had constructed the ABC example in an LTMS system, we would not have
created an explicit contradiction corresponding to the assertion that there was no suspect.
Instead we would replace the contradiction node by one that asserted something like No
Suspect. Then we would assert Suspect. When No Suspect came IN, it would cause a
contradiction to be asserted automatically.

 IMPLEMENTATION: BREADTH-FIRST SEARCH

 The Assumption-Based Truth Maintenance System (ATMS) is an alternative way of


implementing nonmonotonic reasoning. In both JTMS and LTMS systems, a single line of
reasoning is pursued at a time, and dependency-directed backtracking occurs whenever it is
necessary to change the system’s assumptions.
 In an ATMS, alternative paths are maintained in parallel. Backtracking is avoided at the
expense of maintaining multiple contexts, each of which corresponds to a set of consistent
assumptions. As reasoning proceeds in an ATMS-based system, the universe of consistent
contexts is pruned as contradictions are discovered. The remaining consistent contexts are used
to label assertions, thus indicating the contexts in which each assertion has a valid justification.
Assertions that do not have a valid justification in any consistent context can be pruned from
consideration by the problem solver. As the set of consistent contexts gets smaller, so too does
the set of assertions that can consistently be believed by the problem solver. Essentially, an
ATMS system works breadth-first, considering all possible contexts at once, while both JTMS
and LTMS systems operate depth-first.
 The ATMS, like the JTMS, is designed to be used in combination with a separate problem
solver. The Problem solver’s job is to:
- Create nodes that correspond to assertions
- Associate each node with one or more justifications, each of which describes reasoning
chain that led to the node.
- Inform the ATMS of inconsistent contexts.
 The role of the ATMS is then to:
- Propagate inconsistencies, thus ruling out contexts that include sub contexts that are
known to be inconsistent.
- Label each problem solver node with the contexts in which it has a valid justification.
This is done by combining contexts that correspond to the components of a justification. In
particular, given a justification of the form
A1  A2 … An → C
 Assign as a context for the node corresponding to C the intersection of the contexts
corresponding to the nodes A1 through An. It is necessary to think of the set of contexts that are

Dept. of CSE, RNSIT 152


Artificial Intelligence, VII SEM, CSE, Module -III

defined by a set of assumptions as forming a lattice, as shown in below figure for a simple
example with four assumptions. Lines going upward indicate a subset relationship.

Figure: A Context Lattice

 The first thing this lattice does for us is to illustrate a simple mechanism by which
contradictions (inconsistent contexts) can be propagated so that large parts of the space of 2n
contexts can be eliminated. Suppose that the context labeled {A2, A3} is asserted to be
inconsistent. Then all contexts that include it (i.e., those that are above it) must also be
inconsistent.
 As an example of how an ATMS-based problem-solver works, let’s return to the ABC Murder
story. Again, our goal is to find a primary suspect. We need the following assumptions:
 A1. Hotel register was forged.
 A2. Hotel register was not forged.
 A3. Babbitt’s brother-in-law lied.
 A4. Babbitt’s brother-in-law did not lie.
 A5. Cabot lied.
 A6. Cabot did not lie.
 A7. Abbott, Babbitt, and Cabot are the only possible suspects.
 A8. Abbott, Babbitt, and Cabot are not the only suspects.
 The problem-solver could then generate the nodes and associated justifications shown in the
first two columns of Figure below. In the figure, the justification for a node that corresponds to
a decision to make assumption N is shown as {N}. Justifications for nodes that correspond to the
result of applying reasoning rules are shown as the rule involved. Then the ATMS can assign
labels to the nodes as shown in the second two columns. The first shows the label that would be

Dept. of CSE, RNSIT 153


Artificial Intelligence, VII SEM, CSE, Module -III

generated for each justification taken by it. The second shows the label (possibly containing
multiple contexts) that is actually assigned to the node given all its current justifications. These
columns are identical in simple cases, but they may differ in more complex situations as we see
for nodes 12, 13, and 14 of our example.

Figure: Nodes and Their Justifications and Labels

There are several things to notice about this example:

- Nodes may have several justifications if there are several possible reasons for believing
them. This is the case for nodes 12, 13, and 14.
- Recall that when we were using a JTMS, a node was labeled IN if it had at least one valid
justification. Using an ATMS, a node will end up being labeled with a consistent context if
it has at least one justification that can occur in a consistent context.
- The label assignment process is sometimes complicated. We describe it in more detail
below.

Suppose that problem-solving program first created nodes 1 through 14, representing the various
dependencies among them without committing to which of them it currently believes. It can
indicate known contradictions by marking as no good the context:

- A, B, C are the only suspects; A, B, C are not the only suspects: {A7,A8}

Dept. of CSE, RNSIT 154


Artificial Intelligence, VII SEM, CSE, Module -III

CHAPTER 8

STATISTICAL REASONING
Several representation techniques that can be used to model belief systems in which, at any given point,
a particular fact is believed to be true, believed to be false, or not considered one way or the other.

Let’s consider two classes of such problems:

The first class contains problems in which there is genuine randomness in the world. Playing card
games such as bridge and blackjack is a good example of this class. Although in these problems it is
not possible to predict the world with certainty, some knowledge about the likelihood of various
outcomes is available, and we would like to be able to exploit it.

The second class contains problems that could, be modeled using the techniques we described in the last
chapter. In these problems, the relevant world is not random. It behaves “normally” unless there is
some kind of exception. Many common sense tasks fall into this category, as do many expert reasoning
tasks such as medical diagnosis. For problems like this, statistical measures may serve a very useful
function as summaries of the world. We explore several techniques that can be used to augment
knowledge representation techniques with statistical measures that describe levels of evidence and
belief.

 PROBABILITY AND BAYES’ THEOREM

 An important goal for many problem - solving systems is to collect evidence as the system goes
along and to modify its behavior on the basis of the evidence. To model this behavior, we need a
statistical theory of evidence. Bayesian statistics is such a theory. The fundamental notion of
Bayesian statistics is that of conditional probability:
P(H\E)
Means that the probability of hypothesis H given that we have observed evidence E. To
compute this, we need to take into account the prior probability of H (the probability that we
would assign to H if we had no evidence) and the extent to which E provides evidence of H.
 To do this, we need to define a universe that contains an exhaustive, mutually exclusive set of
Hi ’s, among which we are trying to discriminate. Then, let
P(Hi \E) = the probability that hypothesis Hi is true given evidence E
P(E\ Hi) = the probability that we will observe evidence E given that hypothesis Hi is true.
P(Hi) = the a priori probability that hypothesis Hi is true in the absence of any specific
evidence. These probabilities are called prior probabilities or prlors.
k = the number of possible hypotheses
Bayes’ theorem then states that
𝐏(𝐄\𝐇𝐢) . 𝐏(𝐇𝐢)
𝐏(𝐇𝐢\𝐄) = ∑𝐤𝐧=𝟏 𝐏(𝐄\𝐇𝐧) . 𝐏(𝐇𝐧)
 For example, that we are interested in examining the geological evidence at a particular location
to determine whether that would be a good place to dig to find a desired mineral. If we know
Dept. of CSE, RNSIT 155
Artificial Intelligence, VII SEM, CSE, Module -III

the prior probabilities of finding each of the various minerals and we know the probabilities
that if a mineral is present then certain physical characteristics will be observed, then we can
use Bayes’ formula to compute, from the evidence we collect. This is, in fact, what is done by
the PROSPECTOR program. This has been used successfully to help locate deposits of several
minerals, including copper and uranium.
 The key to using Bayes’ theorem as a basis for uncertain reasoning is to recognize exactly
what it says. Specifically, when we say P (A\B), we are describing the conditional probability
of A given that the only evidence we have is B. If there is also other relevant evidence, then it
too must be considered. Suppose, for example, that we are solving a medical diagnosis problem.
Consider the following assertions:
S: patient has spots
M: patient has measles
F: patient has high fever
without any additional evidence, the presence of spots serves as evidence in favor of measles.
It also serves as evidence of fever since measles would cause fever. But suppose we already
know that the patient has measles. Either spots alone or fever alone would constitute evidence
in favor of measles. If both are present, we need to take both into account in determining the
total weight of evidence. But, since spots and fever are not independent events, we cannot just
sum their effects. Instead, we need to represent explicitly the conditional probability that arises
from their combination. In general, given a prior body of evidence and some new observation
E, we need to compute
P (H\E, e) = (𝐇\𝐄) . 𝐏(𝐞\ 𝐄,𝐇)
𝐏(𝐞\𝐄)

 In an arbitrarily complex world, the size of the set of joint probabilities that we require in
order to compute this function grows as 2n if there are n different propositions being considered.
This makes using Bayes’ theorem difficult, for several reasons:
- The knowledge acquisition problem is intractable because too many probabilities have to
be provided. In addition, there is substantial empirical evidence has to be provided, that
people are very poor probability estimators.
- The space that would be required to store all the probabilities is too large.
- The time required to compute the probabilities is too large.
 Bayesian statistics provide an attractive basis for an uncertain reasoning system. As a result,
several mechanisms for exploiting its power while at the same time making it tractable have
been developed. In the rest of the discussion, we explore three of these:
- Attaching certainty factors to rules
- Bayesian networks
- Dempster-Shafer theory

Dept. of CSE, RNSIT 156


Artificial Intelligence, VII SEM, CSE, Module -III

 CERTAINTY FACTORS AND RULE-BASED SYSTEMS

 The approach that we discuss here was found in the MYCIN system, which attempts to
recommend appropriate therapies for patients with bacterial infections. It interacts with the
physician to acquire the clinical data it needs. MYCIN is an example of an expert system, since
it performs a task normally done by a human expert. Here we concentrate on the use of
probabilistic reasoning.
 MYCIN represents most of its diagnostic knowledge as a set of rules. Each rule has associated
with it a certainty factor, which is a measure of the extent to which the evidence that is
described by the antecedent of the rule supports the conclusion that is given in the rule’s
consequent. A typical MYCIN rule looks like:
If: 1. the stain of the organism is gram-positive, and
2. the morphology of the organism is coccus, and
3. the growth conformation of the organism is clumps (cluster),
then there is suggestive evidence (0.7) that the identity of the organism is staphylococcus.
This is the form in which the rules are stated to the user.

 They are actually represented internally in LISP list structure. The rule we just saw would be
represented internally as
PREMISE: ($AND (SAME CNTXT GRAM GRAMPOS)
(SAME CNTXT MORPH COCCUS)
(SAME CNTXT CONFORM CLUMPS))
ACTION: (CONCLUDE CNTXT IDENT STAPHYLOCOCCUS TALLY 0.7)

 MYCIN uses these rules to reason backward to the clinical data available from its goal of
finding significant disease-causing organisms. Once it finds the identities of such organisms, it
then attempts to select a therapy by which the disease (s) may be treated.
 In order to understand how MYCIN exploits uncertain information, we need answer to two
questions: “What do certainty factors mean?” and “How does MYCIN combine the
estimates of certainty in each of its rules to produce a final estimate of the certainty of its
conclusions?”
In the rest of this discussion we answer all these questions.
 A certainty factor (CF [h, e]) is defined in terms of two components:
- MB [h, e] - a Measure of Belief (between 0 and 1) in hypothesis h given the evidence e.
MB measures the extent to which the evidence supports the hypothesis. It is Zero if the
evidence fails to support the hypothesis.
- MD [h, e] - a Measure of Disbelief (between 0 and 1) in hypothesis h given the evidence e.
MD measures the extent to which the evidence supports the negation of the hypothesis. It
is Zero if the evidence supports the hypothesis.
From these two measures, we can define the certainty factors as

Dept. of CSE, RNSIT 157


Artificial Intelligence, VII SEM, CSE, Module -III

CF [h, e] = MB[h, e] - MD[h, e]

Since any particular piece of evidence either supports or denies a hypothesis, a single
number be sufficient for each rule to define both the MB and MD and thus the CF.

 The CF’s of MYCIN’s rules are provided by the experts who write the rules. They reflect the
experts’ assessments of the strength of the evidence in support of the hypothesis. As MYCIN
reasons, these CF’s need to be combined to reflect the operation of multiple pieces of evidence
and multiple rules applied to a problem. Figure below illustrates three combinations of
scenarios that we need to consider. In Figure (a), several rules all provide evidence that related
to a single hypothesis. In Figure (b), we need to consider our belief in a collection of several
propositions taken together. In Figure (c), the output of one rule provides the input to another.
Figure: Combining Uncertain Rules

 What formulas should be used to perform these combinations? Before that we need to learn
some properties that we would like the combining functions to satisfy:
- Since the order in which the evidence collected is arbitrary, the combining functions should
be commutative and associative.
- Until certainty is reached, additional confirming evidence should increase MB and
similarly for disconfirming evidence MD.
- If uncertain inferences are chained together, then the result should be less certain than
either of the inferences alone.
 Having accepted the desirability of these properties, let’s first consider the scenario in Figure (a)
above, in which several pieces of evidence are combined to determine the CF of one
hypothesis. The measures of belief and disbelief of a hypothesis given two observations S1 and
S2 are computed from:

Dept. of CSE, RNSIT 158


Artificial Intelligence, VII SEM, CSE, Module -III

Above can be read as the measures of belief in h is 0 if h is disbelieved with certainty.


Otherwise, the measure of belief in h given two observations is the measure of belief given only
one observation plus some increment for the second observation. This increment is computed
by first taking the difference between 1 (certainty) and the belief given only the first observation.
A corresponding explanation can be given to compute disbelief. From MB and MD, CF can be
computed.

Notice that if several sources of corroborating evidence are pooled, the absolute value of CF will
increase. If conflicting evidence is introduced, the absolute value of CF will decrease.

 Example: Suppose we make an initial observation that confirms our belief in h with MB = 0.3.
Then MD [h, s1] = 0 and CF [h, s1] = 0.3. Now we make a second observation, which also
confirms h, with MB [h, s2] =0.2. Now:

MB [h, s1 s2] = 0.3+0.2 ∙ 0.7


= 0.44
MD [h, s1 s2] = 0.0
CF [h, s1 s2] = 0.44

 Consider the scenario of Figure (b), in which we need to compute the certainty factor of a
combination of hypotheses. In particular, this is necessary when we need to know the certainty
factor of a rule antecedent that contains several clauses. The combination certainty factor can be
computed from its MB and MD. The formulas MYCIN uses for the MB of the conjunction and
the disjunction of two hypotheses are:

MB [h1  h2, e] = min (MB [h1, e], MB [h2, e])


MB [h1  h2, e] = max (MB [h1, e], MB [h2, e])
MD can be computed similarly.

 Finally, we need to consider the scenario in Figure (c), in which rules are chained together with
the result that must provide the input to another. The certainty factor of the hypothesis must
take into account both the strength with which the evidence suggests the hypothesis and the
level of confidence in the evidence. MYCIN provides a chaining rule that is defined as follows.
Let MB' [h, s] be the measure of belief in h given that we are absolutely sure of the validity of
s. Let e be the evidence that led us to believe in s. Then:

MB [h, s] = MB' [h, s] ∙ max (0, CF[s, e])

 Since initial CF’s in MYCIN are estimates that are given by experts who write the rules. The
original work did, however, Provide one by defining MB, which is proportionate decrease in
disbelief in h as a result of e as:

Dept. of CSE, RNSIT 159


Artificial Intelligence, VII SEM, CSE, Module -III

Similarly, the MD is the proportionate decrease in belief in h as a result of e as:

But this definition is incompatible with Bayesian conditional probability.


The following, slightly revised one is not:

The definition of MD must also be changed similarly.


Note:

When p(h | e) = 0 MB(h, e) = 0 MD(h, e) = 1


p(h | e) = 1 MB(h, e) = 1 MD(h, e) = 0

Let’s first consider the scenario in Figure (a)

MB [h, s  s2] = 0.6 + (0.6 ∙ 0.4) = 0.84


MB [h, (s1 s2)  s3] = 0.84 + (0.6 ∙ 0.16) = 0.936
This is a substantially different result than the true value, as expressed by the expert, of 0.7.

 Now let’s consider what happens when independent assumptions are violated in the scenario
of Figure (c). Let’s consider a concrete example in which:
S: sprinkler was on last night
W: grass is wet
R: it rained last night
We can write MYCIN-style rules that describe predictive relationships among these three
events:
If: the sprinkler was on last night
then there is suggestive evidence (0.9) that the grass will be wet this morning
Dept. of CSE, RNSIT 160
Artificial Intelligence, VII SEM, CSE, Module -III

Taken alone, this rule may accurately describe the world. But now consider a second rule:
If: the grass is wet this morning
then there is suggestive evidence (0.8) that it rained last night

Taken alone, this rule makes sense when rain is the most common source of water on the grass.
But if the two rules are applied together, using MYCIN’s rule for chaining, we get
MB [W, S] = 0.8 {sprinkler suggests wet}

MB [R, W] = 0.8 ∙ 0.9 = 0.72 {wet suggests rains}

In other words, we believe that it rained because we believe the sprinkler was on.

 BAYESIAN NETWORKS

 Here, we describe an alternative approach known as Bayesian networks. The main idea is that
to describe the real world, it is not necessary to use a huge joint probability table in which we
list the probabilities of all conceivable combinations of events. Here, we can use a more local
representation in which we will describe clusters of events that interact.
 Let’s return to the example of the sprinkler, rain, and grass. Figure (a) below shows the flow
of constraints we described in MYCIN-style rules. Specifically, we construct a directed acyclic
graph (DAG) that represents causality relationships among variables. The idea of a causality
graph (or network) has proved to be very useful in several systems, particularly medical
diagnosis systems such as CAS - NET and INTERNIST/CADUCEUS. The variables in such a
graph may be propositional (values TRUE or FALSE) or they may be variables that take on
values of some other type e.g., a specific disease, a body temperature, or a reading taken by
some other diagnostic device. In Figure (b) below, we show a causality graph for the wet grass
example. In addition to the three nodes the graph contains a new node corresponding to the
propositional variable that tells us whether it is currently the rainy season.

 A DAG illustrates the causality relationships that occur among the nodes it contains. In order to
use it as a basis for probabilistic reasoning, however, we need more information. In particular,
we need to know, for each value of a parent node, what evidence is provided about the values
that the child node can take on. We can state this in a table in which the conditional
probabilities are provided. We show such a table for our example in Figure below. For

Dept. of CSE, RNSIT 161


Artificial Intelligence, VII SEM, CSE, Module -III

example, from the table we see that the prior probability of the rainy season is 0.5. “Then, if
it is the rainy season, the probability of rain on a given night is 0.9, if it is not, the probability
is only 0.1.
Figure: Conditional Probabilities for a Bayesian Network

Attribute Probability
P(Wet\Sprinkler, Rain) 0.95
P(Wet\Sprinkler, Rain) 0.9
P(Wet\Sprinkler, Rain) 0.8
P(Wet\Sprinkler,  Rain) 0.1
P(Sprinkler\Rainy Season) 0.0
P(Sprinkler\Rainy Season) 1.0
P(Rain\Rainy Season) 0.9

P(Rain\Rainy Season) 0.1

P(Rainy Season) 0.5

 There are three broad classes of algorithms for doing these computations: a message-passing
method, a clique triangulation method, and a variety of stochastic algorithms.
 The message-passing approach is based on the observation that, to compute the probability of a
node A given what is known about other nodes in the network, it is necessary to know three
things:
- π- the total support arriving at A from its parent nodes (which represent its causes).
- λ- the total support arriving at A from its children (which represent its symptoms).
- The entry in the fixed conditional probability matrix that relates A to its causes.

 DEMPSTER - SHAFER THEORY

 Here, we consider an alternative technique, called Dempster-Shafer theory. This new approach
considers sets of propositions and assigns to each of them an interval in which the degree of
belief must lie.
[Belief, Plausibility]

Belief (usually denoted Bel) measures the strength of the evidence in favor of a set of
propositions. It ranges from 0 (indicating no evidence) to 1 (denoting certainty). Plausibility
(Pl) is defined to be
Pl (s) = 1–Bel (s)

Dept. of CSE, RNSIT 162


Artificial Intelligence, VII SEM, CSE, Module -III

 Pl also ranges from 0 to 1 and measures the extent to which evidence in favor of s leaves room
for belief in s. In particular, if we have certain evidence in favor of s, then Bel (s) will be 1
and Pl(s) will be 0. This tells us that the only possible value for Bel(s) is also 0.
 Suppose that we are currently considering three competing hypotheses: A, B, and C. If we have
no information, we represent each of them in the range [0, 1]. As evidence is accumulated, this
interval can be expected to shrink, representing increased confidence that we know how likely
each hypothesis is. The interval approach makes it clear that we have no information when we
start.
 To do this, we need to start, with an exhaustive universe of mutually exclusive hypotheses.
We’ll call this the frame of discernment and we’ll write it as Θ. For example, in a simplified
diagnosis problem, Θ might consist of the set {All, Flu, Cold, Pneu}:

All: allergy
Flu: flu
Cold: cold
Pneu: pneumonia
 Our goal is to attach some measures of belief to elements of Θ. However, not all evidence is
directly supportive of individual elements. Often it supports sets of elements i.e., subsets of Θ.
For example, in our diagnosis problem, fever might support {Flu, Cold, Pneu}.
 Dempster-Shafer theory lets us handle interactions by manipulating sets of hypotheses directly.
The key function we use is a probability density function, which we denote as m. The function
m is defined not just for elements of Θ but for all subsets of it. The Quantity m (p) measures
the amount of belief that is currently assigned to exactly the set p of hypotheses. If Θ contains n
elements, then there are 2n subsets of Θ. We must assign m so that the sum of all the m values
assigned to the subsets of Θ is 1.
 Let us see how m works for our diagnosis problem. Assume that we have no information about
how to choose among the four hypotheses when we start the diagnosis task. Then we define m
as:
{Θ} (1.0)
All other values of m are thus 0. This means that the actual value must be some element
among All, Flu, Cold, or Pneu. Now suppose we acquire a piece of evidence that suggests (at a
level of 0.6) that the correct diagnosis is in the set {Flu, Cold, Pneu}. Fever might be such a
piece of evidence. We update m as follows:

{Flu, Cold, Pneu} (0.6)


{Θ} (0.4)

 Having defined m, we can now define Bel (p) for a set p as the sum of the values of m for p
and for all of its subsets. Thus Bel (p) is our overall belief that the correct answer lies
somewhere in the set p.

Dept. of CSE, RNSIT 163


Artificial Intelligence, VII SEM, CSE, Module -III

 Suppose we are given two belief functions m1 and m2. Let X be the set of subsets of Θ to which
m1 assigns a nonzero value and let Y be the corresponding set for m2. We define the
combination m3 of m1 and m2 to be

This gives us a new belief function that we can apply to any subset Z of Θ. For example,
suppose m1 corresponds to our belief after observing fever:
{Flu, Cold, Pneu} (0.6)
Θ (0.4)
Suppose m2 corresponds to our belief after observing a runny nose:

{All, Flu, Cold} (0.8)


Θ (0.2)
Then we can combine m1 and m2 leads to m3:

The four sets are generated by intersecting elements of X and elements of Y are shown in the
body of the table.

If no nonempty subsets are created, the scaling factor is 1. To see how it works, let’s add a new
piece of evidence to our example. As a result of applying m1 and m2, we produced m3.

{Flu, Cold} (0.48)


{All, Flu, Cold} (0.32)
{Flu, Cold, Pneu} (0.12)
Θ (0.08)
Now, let m4 correspond to our belief given the evidence that the problem goes away when the
patient goes on a trip:

{ALL} (0.9)
Θ (0.1)
We can apply the numerator of the combination rule to produce (ϕ is empty set):

Dept. of CSE, RNSIT 164


Artificial Intelligence, VII SEM, CSE, Module -III

{A} (0.9) Θ (0.1)


{F, C} (0.48) ϕ (0.432) {F, C} (0.048)
{A, F, C} (0.32) {A } (0.288) {A, F, C} (0.032)
{F, C, P} (0.12) ϕ (0.108) {F, C, P} (0.012)
Θ (0.08) {A} (0.072) Θ (0.008)

But there is now a total belief of 0.54 associated with ϕ, only 0.45 is associated with outcomes
that are in fact possible. So we need to scale the remaining values by the factor 1- 0.54 = 0.46. If
we do this, and also combine alternative ways of generating the set {All, Flu, Cold}, then we get
the final combined belief function, m5.

{Flu, Cold} (0.104)


{All, Flu, Cold} (0.696)
{Flu, Cold, Pneu} (0.026)
{All} (0.157)
Θ (0.017)

 FUZZY LOGIC

 Here, we take a different approach and briefly consider what happens if we make fundamental
changes to our idea of set membership and corresponding changes to our definitions of
logical operations.
 The motivation for fuzzy sets is provided by the need to represent such propositions as:

John is very tall.


Mary is slightly ill.
Sue and Linda are close friends.
Exceptions to the rule are nearly impossible.
Most Frenchmen are not very tall.

 Fuzzy set theory allows us to represent set membership as a possibility distribution, such as
the ones shown in Figure (a) below. For the set of tall people and the set of very tall people.
Notice how this contrasts with the standard Boolean definition for tall people shown in Figure
(b) below. In the latter, one is either tall or not and there must be a specific height that defines
the boundary. The same is true for very tall. In the former, one’s tallness increases with one’s
height until the value of 1 is reached.

Dept. of CSE, RNSIT 165


Artificial Intelligence, VII SEM, CSE, Module -III

Figure: Fuzzy versus Conventional Set Membership

Dept. of CSE, RNSIT 166


Artificial Intelligence, VII SEM, CSE, Module -III

RNS Institute of Technology, Bengaluru – 98


Department of Computer Science and Engineering
7th Semester
Question Bank - III Module

1. What do you mean by Uncertainty? Discuss briefly the approaches to deal with the same.
2. What are Non-Monotonic Reasoning Systems? Explain from the context of ABC murder story.
3. Explain the different logic for implementing the same along with issues associated with it.
4. Discuss the importance of Truth Maintenance System (TMS)s and their variants (Types).
5. State the Bayes theorem and Illustrate how it helps in Reasoning under uncertainty.
6. Write a note on i) Rule based Systems ii) Certainty Factors.
7. What are the advantages of Bayesian Networks? Explain with an example.
8. Briefly discuss the way reasoning is done using i) Fuzzy Logic ii) Dempster Shafer
Theory.

Dept. of CSE, RNSIT 167

You might also like