0% found this document useful (0 votes)
120 views24 pages

Unit Iii: Reasoning Under Uncertainty: Logics of Non-Monotonic Reasoning - Implementation-Basic

This document discusses reasoning under uncertainty. It begins by explaining different types of uncertainty like partial observability, noisy sensors, uncertainty in outcomes, and complexity in modeling. It then discusses methods for handling uncertainty like default logic, rules with certainty factors, Bayesian networks, Dempster-Shafer theory, and fuzzy logic. Non-monotonic reasoning is explained as logic whose consequences are not monotonic and can change with new information. Probability is introduced with basic notation and concepts like subjective probabilities related to one's knowledge. Probabilistic reasoning uses logic to represent knowledge with facts, rules and probabilities.

Uploaded by

NANDHINI AK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views24 pages

Unit Iii: Reasoning Under Uncertainty: Logics of Non-Monotonic Reasoning - Implementation-Basic

This document discusses reasoning under uncertainty. It begins by explaining different types of uncertainty like partial observability, noisy sensors, uncertainty in outcomes, and complexity in modeling. It then discusses methods for handling uncertainty like default logic, rules with certainty factors, Bayesian networks, Dempster-Shafer theory, and fuzzy logic. Non-monotonic reasoning is explained as logic whose consequences are not monotonic and can change with new information. Probability is introduced with basic notation and concepts like subjective probabilities related to one's knowledge. Probabilistic reasoning uses logic to represent knowledge with facts, rules and probabilities.

Uploaded by

NANDHINI AK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

UNIT III

Reasoning under uncertainty: Logics of non-monotonic reasoning - Implementation- Basic


probability notation - Bayes rule – Certainty factors and rule based systems-Bayesian networks –
Dempster - Shafer Theory - Fuzzy Logic.

1.What is uncertainty.Explain

Uncertainty

Let action At = leave for airport t minutes before flight

Will At get me there on time?

Problems:

1) partial observability (road state, other drivers' plans, etc.)

2) noisy sensors (KCBS tra_c reports)

3) uncertainty in action outcomes (at tire, etc.)

4) immense complexity of modelling and predicting tra_c

Hence a purely logical approach either

1) risks falsehood: \A25 will get me there on time"

or 2) leads to conclusions that are too weak for decision making:

―A25 will get me there on time if there's no accident on the bridge and it doesn't rain and my tires
remain intact etc etc."

Methods for handling uncertainty


Default or nonmonotonic logic:

Assume my car does not have a at tire

Assume A25 works unless contradict by evidence


Issues: What assumptions are reasonable? How to handle contradiction?

Rules with fudge factors:

Issues: Problems with combination, e.g., Sprinkler causes Rain??

2.What is non-monotonic reasoning .Explain

A non-monotonic logic is a formal logic whose consequence relation is not monotonic.


Most studied formal logics have a monotonic consequence relation, meaning that adding a
formula to a theory never produces a reduction of its set of consequences. Intuitively,
monotonicity indicates that learning a new piece of knowledge cannot reduce the set of what is
known.

A monotonic logic cannot handle various reasoning tasks such as reasoning by default
(consequences may be derived only because of lack of evidence of the contrary), abductive
reasoning (consequences are only deduced as most likely explanations), some important
approaches to reasoning about knowledge

Default reasoning

An example of a default assumption is that the typical bird flies. As a result, if a given
animal is known to be a bird, and nothing else is known, it can be assumed to be able to fly.
The default assumption must however be retracted if it is later learned that the considered
animal is a penguin. This example shows that a logic that models default reasoning should not
be monotonic.

Logics formalizing default reasoning can be roughly divided in two categories: logics
able to deal with arbitrary default assumptions (default logic, defeasible logic/defeasible
reasoning/argument (logic), and answer set programming) and logics that formalize the
specific default assumption that facts that are not known to be true can be assumed false by
default (closed world assumption and circumscription).
Abductive reasoning

Abductive reasoning is the process of deriving the most likely explanations of the
known facts. An abductive logic should not be monotonic because the most likely
explanations are not necessarily correct.

For example, the most likely explanation for seeing wet grass is that it rained;
however, this explanation has to be retracted when learning that the real cause of the grass
being wet was a sprinkler. Since the old explanation (it rained) is retracted because of the
addition of a piece of knowledge (a sprinkler was active), any logic that models explanations
is non-monotonic.

Reasoning about knowledge

If a logic includes formulae that mean that something is not known, this logic should not
be monotonic. Indeed, learning something that was previously not known leads to the removal of
the formula specifying that this piece of knowledge is not known. This second change (a removal
caused by an addition) violates the condition of monotonicity. A logic for reasoning about
knowledge is the autoepistemic logic.

Belief revision

Belief revision is the process of changing beliefs to accommodate a new belief that might be
inconsistent with the old ones. In the assumption that the new belief is correct, some of the old
ones have to be retracted in order to maintain consistency. This retraction in response to an
addition of a new belief makes any logic for belief revision to be non-monotonic. The belief
revision approach is alternative to paraconsistent logics, which tolerate inconsistency rather than
attempting to remove it.

3. What is probability and Basic probability notation


Probability
Given the available evidence, A25 will get me there on time with probability 0:04

(Fuzzy logic handles degree of truth NOT uncertainty e.g., WetGrass is true to degree 0:2)

Probabilistic assertions summarize effects of

laziness: failure to enumerate exceptions, qualifications, etc.

ignorance: lack of relevant facts, initial conditions, etc.

Subjective or Bayesian probability:

Probabilities relate propositions to one's own state of knowledge

e.g., P(A25/no reported accidents) = 0:06

These are not claims of a \probabilistic tendency" in the current situation (but might be learned from past
experience of similar situations)

Probabilities of propositions change with new evidence:

e.g., P(A25jno reported accidents; 5 a.m.) = 0:15

(Analogous to logical entailment status KB j= _, not truth.)

Making decisions under uncertainty


Suppose I believe the following:

P(A25 gets me there on timej : : :) = 0:04

P(A90 gets me there on timej : : :) = 0:70

P(A120 gets me there on timej : : :) = 0:95

P(A1440 gets me there on timej : : :) = 0:9999

Which action to choose?

Depends on my preferences for missing ight vs. airport cuisine, etc.


Utility theory is used to represent and infer preferences

Decision theory = utility theory + probability theory

Probabilistic Reasoning
Using logic to represent and reason we can represent knowledge about the world
with facts and rules, like the following ones:
bird(tweety).
fly(X) :- bird(X).
We can also use a theorem-prover to reason about the world and deduct new facts about the
world, for e.g.,
?- fly(tweety).
Yes
However, this often does not work outside of toy domains - non-tautologous certain rules
are hard to find. A way to handle knowledge representation in real problems is to extend logic by
using certainty factors. In other words, replace
IF condition THEN fact with
IF condition with certainty x THEN fact with certainty f(x)
Unfortunately cannot really adapt logical inference to probabilistic inference,
since the latter is not context-free. Replacing rules with conditional probabilities makes
inferencing simpler.
Replace smoking -> lung cancer
or
lots of conditions, smoking -> lung cancer
with
P(lung cancer | smoking) = 0.6
Uncertainty is represented explicitly and quantitatively within probability theory,
a formalism that has been developed over centuries.
A probabilistic model describes the world in terms of a set S of possible states -
the sample space. We don‘t know the true state of the world, so we (somehow) come up
with a probability distribution over S which gives the probability of any state being the
true one.
The world usually described by a set of variables or attributes. Consider the
probabilistic model of a fictitious medical expert system. The ‗world‘ is described by 8
binary valued variables:
Visit to Asia? A
Tuberculosis? T
Either tub. or lung cancer? E
Lung cancer? L
Smoking? S
Bronchitis? B
Dyspnoea? D
Positive X-ray? X

8
We have 2 = 256 possible states or configurations and so 256 probabilities to find.
10.3 Review of Probability Theory .The primitives in probabilistic reasoning are random
variables. Just like primitives in Propositional Logic are propositions.
A random variable is not in fact a variable, but a function from a sample space S to
another space, often the real numbers. For example, let the random variable Sum (representing
outcome of two die throws) be defined thus:
Sum(die1, die2) = die1 +die2
Each random variable has an associated probability distribution determined by the underlying
distribution on the sample space
Continuing our example : P(Sum = 2) = 1/36,
P(Sum = 3) = 2/36, . . . , P(Sum = 12) = 1/36
Consdier the probabilistic model of the fictitious medical expert system mentioned before. The
sample space is described by 8 binary valued variables.
Visit to Asia? A
Tuberculosis? T
Either tub. or lung cancer? E
Lung cancer? L
Smoking? S
Bronchitis? B
Dyspnoea? D
Positive X-ray? X
8
There are 2 = 256 events in the sample space. Each event is determined by a joint instantiation
of all of the variables.

S = {(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = f),


(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = t), . . .
(A = t, T = t,E = t,L = t, S = t,B = t,D = t,X = t)}

Since S is defined in terms of joint instantations, any distribution defined on it is called a


joint distribution. ll underlying distributions will be joint distributions in this module. The
variables {A,T,E, L,S,B,D,X} are in fact random variables, which ‗project‘ values.

L(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = f) = f


L(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = t) = f
L(A = t, T = t,E = t,L = t, S = t,B = t,D = t,X = t) = t

Each of the random variables {A,T,E,L,S,B,D,X} has its own distribution, determined by
the underlying joint distribution. This is known as the margin distribution. For example, the
distribution for L is denoted P(L), and this distribution is defined by the two probabilities P(L =
f) and P(L = t). For example,
P(L = f)
= P(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = f)
+ P(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = t)
+ P(A = f, T = f,E = f,L = f, S = f,B = f,D = t,X = f)
...
P(A = t, T = t,E = t,L = f, S = t,B = t,D = t,X = t)
P(L) is an example of a marginal distribution.

4.Explain Bayes' Rule


Bayes' Rule and conditional independence
Wumpus World

Specifying the probability model

Observations and query


Using conditional independence
Basic insight: observations are conditionally independent of other hidden squares given neighbouring
hidden squares

Manipulate query into a form where we can use this!

5.Explain Bayesian networks


A simple, graphical notation for conditional independence assertions and hence for compact specification
of full joint distributions

Syntax:

a set of nodes, one per variable

a directed, acyclic graph (link _ \directly inuence")

a conditional distribution for each node given its parents:

In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving
the distribution over Xi for each combination of parent values

Example
Topology of network encodes conditional independence assertions:

Weather is independent of the other variables

Toothache and Catch are conditionally independent given Cavity

Example
I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes
it's set of by minor earthquakes. Is there a burglar?

Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls

Network topology reects \causal" knowledge:

 A burglar can set the alarm off


 An earthquake can set the alarm off
 The alarm can cause Mary to call
 The alarm can cause John to call

Compactness

Constructing Bayesian networks


Need a method such that a series of locally testable assertions of conditional independence guarantees the
required global semantics

Example
Deciding conditional independence is hard in noncausal directions (Causal models and conditional
independence seem hardwired for humans!)

Assessing conditional probabilities is hard in noncausal directions Network is less compact: 1 + 2 + 4 + 2


+ 4=13 numbers needed

Example: Car diagnosis


Initial evidence: car won't start

Testable variables (green), \broken, so _x it" variables (orange)

Hidden variables (gray) ensure sparse structure, reduce parameters


Example: Car insurance

Compact conditional distributions


Hybrid (discrete+continuous) networks

Option 1: discretization|possibly large errors, large CPTs

Option 2: finitely parameterized canonical families

1) Continuous variable, discrete+continuous parents (e.g., Cost)

2) Discrete variable, continuous parents (e.g., Buys?)

Continuous child variables


Need one conditional density function for child variable given continuous

parents, for each possible assignment to discrete parents

Most common is the linear Gaussian model, e.g.,:

Mean Cost varies linearly with Harvest, variance is fixed Linear variation is unreasonable over the full
range but works OK if the likely range of Harvest is narrow
All-continuous network with LG distributions full joint distribution is a multivariate Gaussian
Discrete+continuous LG network is a conditional Gaussian network i.e., a

multivariate Gaussian over all continuous variables for each combination of

discrete variable values

Discrete variable w/ continuous parents

Inference in Bayesian networks Inference tasks

Inference by enumeration

Slightly intelligent way to sum out variables from the joint without actually

constructing its explicit representation


6.Explain Dempster - Shafer Theory in AI

The Dempster–Shafer theory (DST) is a mathematical theory of evidence It allows one to


combine evidence from different sources and arrive at a degree of belief (represented by a belief
function) that takes into account all the available evidence. The theory was first developed by Arthur P.
Dempster and Glenn Shafer.

In a narrow sense, the term Dempster–Shafer theory refers to the original conception of
the theory by Dempster and Shafer. However, it is more common to use the term in the wider
sense of the same general approach, as adapted to specific kinds of situations. In particular, many
authors have proposed different rules for combining evidence, often with a view to handling
conflicts in evidence better.

Dempster–Shafer theory is a generalization of the Bayesian theory of subjective


probability; whereas the latter requires probabilities for each question of interest, belief functions
base degrees of belief (or confidence, or trust) for one question on the probabilities for a related
question.

These degrees of belief may or may not have the mathematical properties of probabilities;
how much they differ depends on how closely the two questions are related.[4] Put another way, it
is a way of representing epistemic plausibilities but it can yield answers that contradict those
arrived at using probability theory.

Dempster–Shafer theory is based on two ideas: obtaining degrees of belief for one
question from subjective probabilities for a related question, and Dempster's rule[ for combining
such degrees of belief when they are based on independent items of evidence.

In essence, the degree of belief in a proposition depends primarily upon the number of
answers (to the related questions) containing the proposition, and the subjective probability of
each answer. Also contributing are the rules of combination that reflect general assumptions
about the data.

In this formalism a degree of belief (also referred to as a mass) is represented as a belief


function rather than a Bayesian probability distribution. Probability values are assigned to sets of
possibilities rather than single events: their appeal rests on the fact they naturally encode
evidence in favor of propositions.

Dempster–Shafer theory assigns its masses to all of the non-empty subsets of the entities that
compose a system.[clarification needed]

Belief and plausibility


Shafer's framework allows for belief about propositions to be represented as intervals, bounded
by two values, belief (or support) and plausibility:

belief ≤ plausibility.

Belief in a hypothesis is constituted by the sum of the masses of all sets enclosed by it (i.e. the
sum of the masses of all subsets of the hypothesis).[clarification needed]

It is the amount of belief that directly supports a given hypothesis at least in part, forming a
lower bound. Belief (usually denoted Bel) measures the strength of the evidence in favor of a set
of propositions. It ranges from 0 (indicating no evidence) to 1 (denoting certainty).

Plausibility is 1 minus the sum of the masses of all sets whose intersection with the
hypothesis is empty. It is an upper bound on the possibility that the hypothesis could be true, i.e.
it ―could possibly be the true state of the system‖ up to that value, because there is only so much
evidence that contradicts that hypothesis.

Plausibility (denoted by Pl) is defined to be Pl(s)=1-Bel(~s). It also ranges from 0 to 1


and measures the extent to which evidence in favor of ~s leaves room for belief in s. For
example, suppose we have a belief of 0.5 and a plausibility of 0.8 for a proposition, say ―the cat
in the box is dead.‖ This means that we have evidence that allows us to state strongly that the
proposition is true with a confidence of 0.5. However, the evidence contrary to that hypothesis
(i.e. ―the cat is alive‖) only has a confidence of 0.2.

The remaining mass of 0.3 (the gap between the 0.5 supporting evidence on the one hand,
and the 0.2 contrary evidence on the other) is ―indeterminate,‖ meaning that the cat could either
be dead or alive. This interval represents the level of uncertainty based on the evidence in your
system.

Hypothesis Mass Belief Plausibility

Null (neither alive nor dead) 0 0 0


Alive 0.2 0.2 0.5

Dead 0.5 0.5 0.8

Either (alive or dead) 0.3 1.0 1.0

The null hypothesis is set to zero by definition (it corresponds to ―no solution‖).
The orthogonal hypotheses ―Alive‖ and ―Dead‖ have probabilities of 0.2 and 0.5,
respectively. This could correspond to ―Live/Dead Cat Detector‖ signals, which have
respective reliabilities of 0.2 and 0.5. Finally, the all-encompassing ―Either‖ hypothesis
(which simply acknowledges there is a cat in the box) picks up the slack so that the sum
of the masses is 1.

The belief for the ―Alive‖ and ―Dead‖ hypotheses matches their corresponding
masses because they have no subsets; belief for ―Either‖ consists of the sum of all three
masses (Either, Alive, and Dead) because ―Alive‖ and ―Dead‖ are each subsets of
―Either‖. The ―Alive‖ plausibility is 1 − m (Dead) and the ―Dead‖ plausibility is 1 − m
(Alive). Finally, the ―Either‖ plausibility sums m(Alive) + m(Dead) + m(Either). The
universal hypothesis (―Either‖) will always have 100% belief and plausibility —it acts as
a checksum of sorts.

Here is a somewhat more elaborate example where the behavior of belief and plausibility begins
to emerge. We're looking through a variety of detector systems at a single faraway signal light,
which can only be coloured in one of three colours (red, yellow, or green):

Combining beliefs

Beliefs corresponding to independent pieces of information are combined using


Dempster's rule of combination, which is a generalization of the special case of Bayes' theorem
where events are independent. Note that the probability masses from propositions that contradict
each other can also be used to obtain a measure of how much conflict there is in a system. This
measure has been used as a criterion for clustering multiple pieces of seemingly conflicting
evidence around competing hypotheses.
In addition, one of the computational advantages of the Dempster–Shafer
framework is that priors and conditionals need not be specified, unlike Bayesian methods,
which often use a symmetry (minimax error) argument to assign prior probabilities to
random variables (e.g. assigning 0.5 to binary values for which no information is available
about which is more likely). However, any information contained in the missing priors and
conditionals is not used in the Dempster–Shafer framework unless it can be obtained
indirectly—and arguably is then available for calculation using Bayes equations.

7.Explain Fuzzy Logic in AI.

Fuzzy logic is a form of many-valued logic or probabilistic logic; it deals with


reasoning that is approximate rather than fixed and exact. In contrast with traditional
logic theory, where binary sets have two-valued logic: true or false, fuzzy logic variables
may have a truth value that ranges in degree between 0 and 1.

Fuzzy logic has been extended to handle the concept of partial truth, where the
truth value may range between completely true and completely false.[1] Furthermore,
when linguistic variables are used, these degrees may be managed by specific functions.

Overview

The reasoning in fuzzy logic is similar to human reasoning. It allows for approximate
values and inferences as well as incomplete or ambiguous data (fuzzy data) as opposed to only
relying on crisp data (binary yes/no choices). Fuzzy logic is able to process incomplete data and
provide approximate solutions to problems other methods find difficult to solve

Degrees of truth

Fuzzy logic and probabilistic logic are mathematically similar – both have truth values
ranging between 0 and 1 – but conceptually distinct, due to different interpretations—see
interpretations of probability theory. Fuzzy logic corresponds to "degrees of truth", while
probabilistic logic corresponds to "probability, likelihood"; as these differ, fuzzy logic and
probabilistic logic yield different models of the same real-world situations.
Both degrees of truth and probabilities range between 0 and 1 and hence may seem
similar at first. For example, let a 100 ml glass contain 30 ml of water. Then we may consider
two concepts: Empty and Full. The meaning of each of them can be represented by a certain
fuzzy set. Then one might define the glass as being 0.7 empty and 0.3 full.

Applying truth values

A basic application might characterize subranges of a continuous variable. For instance, a


temperature measurement for anti-lock brakes might have several separate membership functions
defining particular temperature ranges needed to control the brakes properly. Each function maps
the same temperature value to a truth value in the 0 to 1 range. These truth values can then be
used to determine how the brakes should be controlled.

Fuzzy logic temperatureIn this image, the meanings of the expressions cold, warm, and hot are
represented by functions mapping a temperature scale. A point on that scale has three "truth values"—
one for each of the three functions. The vertical line in the image represents a particular temperature
that the three arrows (truth values) gauge. Since the red arrow points to zero, this temperature may be
interpreted as "not hot". The orange arrow (pointing at 0.2) may describe it as "slightly warm" and the
blue arrow (pointing at 0.8) "fairly cold".

Linguistic variables

While variables in mathematics usually take numerical values, in fuzzy logic


applications, the non-numeric linguistic variables are often used to facilitate the expression of
rules and facts.

A linguistic variable such as age may have a value such as young or its antonym old.
However, the great utility of linguistic variables is that they can be modified via linguistic hedges
applied to primary terms. The linguistic hedges can be associated with certain functions.

The most important propositional fuzzy logics are:

– Monoidal t-norm-based propositional fuzzy logic MTL is an axiomatization of logic where


conjunction is defined by a left continuous t-norm, and implication is defined as the residuum of
the t-norm. Its models correspond to MTL-algebras that are prelinear commutative bounded
integral residuated lattices.
– Basic propositional fuzzy logic BL is an extension of MTL logic where conjunction is defined by
a continuous t-norm, and implication is also defined as the residuum of the t-norm. Its models
correspond to BL-algebras.
– Łukasiewicz fuzzy logic is the extension of basic fuzzy logic BL where standard conjunction is
the Łukasiewicz t-norm. It has the axioms of basic fuzzy logic plus an axiom of double negation,
and its models correspond to MV-algebras.
– Gödel fuzzy logic is the extension of basic fuzzy logic BL where conjunction is Gödel t-norm. It
has the axioms of BL plus an axiom of idempotence of conjunction, and its models are called G-
algebras.
– Product fuzzy logic is the extension of basic fuzzy logic BL where conjunction is product t-norm.
It has the axioms of BL plus another axiom for cancellativity of conjunction, and its models are
called product algebras.
– Fuzzy logic with evaluated syntax (sometimes also called Pavelka's logic), denoted by EVŁ, is a
further generalization of mathematical fuzzy logic. While the above kinds of fuzzy logic have
traditional syntax and many-valued semantics, in EVŁ is evaluated also syntax. This means that
each formula has an evaluation. Axiomatization of EVŁ stems from Łukasziewicz fuzzy logic. A
generalization of classical Gödel completeness theorem is provable in EVŁ.

You might also like