Module 4
Module 4
Knowledge and
Reasoning
By Dr. Sonali Patil
1
Reasoning Under Uncertainty
• Uncertainty
• Sources of uncertainty
• Methods to handle Uncertainty
• Probability Theory
• Uncertainty and Rational Decisions
• Basic Probability Notations
• Probability Axioms
• Independence
• Bayes Rule
• Bayesian Networks
2
Uncertainty
Reasoning under uncertainty.
• There are different types of uncertainties. What are the different
ways in which you can deal with that?
• The doorbell problem
• The doorbell rang at 12 O'clock at midnight.
• Que to ans
• was someone there at the door?
• Mohan was sleeping in the room. Did Mohan wake up when the
doorbell rang?
• My fact is that the doorbell rang at 12 O'clock in the midnight.
Therefore if we place the propositions in the logic form
3
Uncertainty
• Given Doorbell, can we say AtDoor(x), because
AtDoor(x) Doorbell?
• Can we say that there is some one at Door? We can using the
deductive reasoning/normal implication (p implies q, if p true…
Q is necessarily true, if p false…q may be or may not be true)
• Abductive Reasoning (p implies q and we find q is true then we
infer p. Most of the time right, but may not always). Other
reasons, though rare
• Short Circuit
• Wind
• Dog or other Animal pressed the button
4
Uncertainty
• Given Doorbell, can we say Wake(Mohan), because
Doorbell Wake(Mohan)?
• Using, Deductive Reasoning. Yes, if proposition 2 is always true
• However always this may not be true (May be tired and in
sound sleep)
Hence, we cannot answer either of Questions with certainty.
5
Uncertainty
Planning Example
• Let action A(t) denote leaving for the airport t minutes before the
flight
– For a given value of t, will A(t) get me there on time?
• Problems:
– Partial observability (roads, other drivers’ plans, etc.)
– Noisy sensors (traffic reports)
– Uncertainty in action outcomes (flat tire, etc.)
– Immense complexity of modelling and predicting traffic
6
Uncertainty
• Diagnosis always involves uncertainty.
• Eg:
Dental diagnosis: (toothache)
Toothache Cavity
Its wrong as not all people with toothaches have cavity. It may
be due other problems
Toothache Cavity V Gum Problem V Abscess…….
In order to complete the list, we have to add an almost unlimited
list of possible problems
The causal rule for this:
Cavity Toothache
This is not also the right one. Not all cavities cause pain
Uncertainty
• Trying to cope up with domains like Medical diagnosis fails for 3
main reasons:
Laziness:
Too hard to list out all antecedence & consequents needed to
ensure an exception-less rule and too hard to use such rules.
Theoretical ignorance:
The domains like Medical science has No complete theory for the
domain.
Practical ignorance:
Even all rules are known – uncertain about a particular patient.
Not all test have been or can be run.
Uncertainty
• The problems like Doorbell/diagnosis are very common in real
world
• In AI, we need to reason under such circumstances
• We solve such problems by proper modelling of Uncertainty
and impreciseness and developing appropriate reasoning
techniques
9
Sources of uncertainty
• Implications may be weak
• Incomplete Knowledge
• We may not know or guess all the possible antecedents or consequents 10
• Propagation of Uncertainties
• In absence of interdependencies of propagation of uncertain
knowledge the uncertainty of the conclusions increases
11
12
Methods of handling
Uncertainty
• Fuzzy Logic
• Logic that extends traditional 2-valued logic to be a continuous
logic (values from 0 to 1)
• while this early on was developed to handle natural language
ambiguities such as “you are very tall” it instead is more successfully
applied to device controllers
• Probabilistic Reasoning
• Using probabilities as part of the data and using Bayes theorem or
variants to reason over what is most likely
• Hidden Markov Models
• A variant of probabilistic reasoning where internal states are not
observable (so they are called hidden)
• Certainty Factors and Qualitative Fuzzy Logics
• More ad hoc approaches (non formal) that might be more flexible 13
or at least more human-like (MYCIN expert system)
• Neural Networks
Uncertainty tradeoffs
• Bayesian networks: Nice theoretical properties combined
with efficient reasoning make BNs very popular; limited
expressiveness, knowledge, engineering challenges may
limit uses
• Non-monotonic logic: Represent commonsense
reasoning, but can be computationally very expensive
• Certainty factors: Not semantically well founded
• Fuzzy reasoning: Semantics are unclear (fuzzy!), but has
proved very useful for commercial applications
14
Probability Theory
• Deals with degrees of belief
• Provides a way of summarizing the uncertainty that comes from
our laziness & ignorance thereby solving the qualification
problem (specifying all exceptions)
• A90 will take us to airport on time, as long as car doesn't break
down or run out of gas, does not indulge into accident, no
accidents on bridge, plane doesn't live early, no meteorite hits the
car, and ….)
0.8
• Toothache cavity
• The probability that the patient has a cavity, given that she has a
toothache is 0.8
15
Probability Theory
• Consider previous statement: “The probability that the
patient has a cavity, given that she has a toothache is 0.8”
• If we later learn that patient has a history of gum disease we
can say “The probability that the patient has a cavity, given
that she has a toothache and a history of gum disease, is 0.4”
• If further we gather evidence, we can say “The probability that
the patient has a cavity, given all we know now, is almost zero”
• Above three statements do not contradict each other; each is a
separate assertion about a difference knowledge state
16
Uncertainty and rational
decisions
• “Say A90 has 92% chance of catching our flight. Is it
rational choice? Not necessarily
• A180 has higher probability of reaching. If its vital to not
miss the flight, then its worth risking the longer wait time
at airport
• A1440 almost guarantees reaching on time but I’d have
to stay overnight in the airport (intolerable wait and may
be unpleasant diet of airport food)
• To make choices, an agent must have preferences
between different possible outcomes of various plans
• Utility Theory is used to represent & reason with 17
preferences
Uncertainty and rational
decisions
• Utility Theory
• Every state has a degree of usefulness or utility, to an
agent and the agent will prefer states with higher
utility
• The utility state is relative to agent
• Ex. Consider the state in which White has checkmated
Black in chess. Here, Utility is high for agent playing
White but low for agent playing Black
• A Utility function can account for any set of
preferences- quirky or typical, noble or perverse
18
Uncertainty and rational
decisions
• Decision Theory
• Preferences as expressed by utilities, are combined with
probabilities in general theory of rational decisions
Decision Theory = Probability Theory + Utility Theory
• Maximum Expected Utility(MEU)
• An agent is rational if and only if it chooses the action that yields
the highest expected utility, averaged over all possible outcomes of
the action. This is principle of MEU
• Here the term expected is not vague. Its average or statistical mean
of outcomes weighted by the probability of outcome
• The basic difference between A decision-theoretic agent &
other agents is that the former’s belief state represents
19
not just the possibilities for world states but also their
probabilities
Uncertainty and rational
decisions
20
Uncertainty and rational
decisions summary
• Rational behavior:
• For each possible action, identify the possible
outcomes
• Compute the probability of each outcome
• Compute the utility of each outcome
• Compute the probability-weighted (expected) utility
over possible outcomes for each action
• Select the action with the highest expected utility
(principle of Maximum Expected Utility)
21
Basic Probability Notation
• A random variable is a variable whose possible values are the
numerical outcomes of a random experiment.
• It is a function which associates a unique numerical value with every
outcome of an experiment.
• Its value varies with every trial of the experiment.
• It describes an outcome that cannot be determined in advance
• It is Boolean , Discrete or continuous
• Ex. Roll of a die, number of emails received in a day etc.
• The sample space S of the random variable X is the set of all possible
worlds
• The possible worlds are mutually exclusive & exhaustive (at a time one
possible outcome and all possible outcomes are in the S)
• Tossing a coin: S={H,T}
• Tossing two coins simultaneously S={HH, HT, TH, TT}
• Rolling a die: S={ 1,2,3,4,5,6} 23
Basic Probability Notation
26
Unconditional or Prior
Probabilities
• Degree of belief in proposition in the absence of any other
information/evidence
• P(Fever)=0.1
• The probability that the patient has fever is 0.1(in
absence of any other information
• A die is rolled, P(odd), P(even) indicated the probability of
getting the even number and the probability of getting the
even number on the rolled dice respectively. Both of these
are prior probabilities
• When a pair of dice rolled simultaneously, the possible
outcomes are 36. P(doubles), P(Total=15) are prior 27
probabilities
Unconditional or Prior
Probabilities
• The random variables Fever, Doubles, Odd, Even are
Discrete Random variables as they take finite number of
distinct values
• The Boolean random variables have values True or false
ex. P(cavity)
• A continuous random variable is a random variable that
takes infinite number of distinct values
• EX. P(Temp=x) = Uniform[18C,26C] (x)
• Expresses that the temperature is distributed uniformly
between 10 and 26 degrees
• This is called Probability Density Function 28
Conditional or Posterior
Probabilities
• Let A be an event in the world and B be another event.
Suppose that events A and B are not mutually
exclusive, but occur conditionally on the occurrence of the
other. The probability that event A will occur if event B occurs
is called the conditional probability. Conditional probability is
denoted mathematically as p(A|B) in which the vertical bar
represents GIVEN and the complete probability expression is
interpreted as “Conditional probability of event A occurring
given that event B has occurred”.
Product Rule
• Similarly, the conditional probability of event B occurring given that
event A has occurred equals
30
Product Rule
Probability Axioms
• All probabilities are between 0 & 1
• 0 P(A) 1
• Necessarily True propositions have probability 1
and necessarily false propositions have probability
0
• ( P(true) = 1 and P(false) = 0)
• Probability of disjunction Inclusion-Exclusion Principle
• P(A B) = P(A) + P(B) - P(A B)
31
eie(a)
Inference using Full Joint Distribution
Probability distribution P(Cavity, Toothache)
Toothache Toothache
Cavity 0.04 0.06
Cavity 0.01 0.89
Sum of all entries =1
P(Cavity) = 0.04 + 0.06 = 0.1 (using Axioms)
P(Cavity Toothache) = 0.04 + 0.01 + 0.06 = 0.11
P(Cavity|Toothache) = P(CavityToothache)/P(Toothache)
= 0.04 / (0.04 + 0.01)
= 0.8
• Obtain P( cavity), P(Toothache), P( Toothache), P(cavity| 33
toothache) P( cavity| toothache), P( cavity| toothache),
Inference using Full Joint Distribution
34
Inference using Full Joint
Distribution
P(cavity|toothache)= P(cavity^ Toothache)/ P(Toothache)
= (0.108+0.012)/(0.108+0.012+0.016+0.064)
= 0.6
• Observe, P(cavity|toothache)+P( cavity|toothache)
=0.6 + 0.4=1 as it should be
• 1/P(toothache) remains constant no matter which value
of cavity we calculate. Such constants in probability are
called as normalization constant
35
Inference using Full Joint
Distribution
36
Independence
marginals
How to verify Independence?
41
Bayesian or Bayes Rule
• Let A be an event in the world and B be another event.
Hence from product rule
P(A B) = P(A|B) *P(B)
P(A B) = P(B|A) *P(A)
• LHS are same. Equating the RHS of both equations yields
P(A|B) = P(B | A) * P(A) / P(B)
P(B|A) = P(A | B) * P(B) / P(A)
where:
p(A|B) is the conditional probability that event A occurs given that event B has occurred;
Bayes rule/ p(B|A) is the conditional probability of event B occurring given that event A has occurred;
43
Bayesian or Bayes Rule
The Joint
Probability
44
Bayesian or Bayes Rule
• If the occurrence of event A depends on only two mutually
exclusive events, B and NOT B , we obtain:
p(A) = p(AB) p(B) + p(A B) p(B)
p BAp A
p AB
p B A p A p B A p A 45
Bayesian or Bayes Rule
• The Bayesian rule expressed in terms of hypotheses and
evidence looks like this:
p EH p H
pHE
p E H p H p E H p H
where:
p(H) is the prior probability of hypothesis H being true;
p(E|H) is the probability that hypothesis H being true will result
in evidence E; p(ØH) is the prior
probability of hypothesis H being false;
p(E|ØH ) is the probability of finding evidence E
even when hypothesis H is false.
46
Example: Bayes Rule
47
Example: Bayes Rule
48
Example: Bayes’ rule
• Disease Meningitis:
• It Cause patient to have stiff neck- 50% of the time.
• Prior Probability that Patients has meningitis is 1/50000.
• Prior Probability that patient has stiff neck is 1/20.
• Let s be stiff neck & m be Meningitis.
P(s|m) = 0.5
P(m) = 1/50000
P(s)= 1/20
P(m|s) = P(s|m) P(m)
P(s)
= 0.5 * 1/50000 = 0.0002.
1/20
• 1 in 5000 patients with stiff neck to have Meningitis.
Conditional Independence
52
Conditional Independence
53
Conditional Independence
Conditional Independence
Conditional Independence
56
Conditional Independence
57
Conditional Independence
58
Conditional Independence &
Chain Rule
59
60
What Bayesian Networks are good for?
Diagnosis: P(cause|symptom)=?
cause
Prediction: P(symptom|cause)=?
C1 C2
Classification: max P(class|data)
class
Decision-making (given a cost function) symptom
Medicine
Speech Bio-
informatics
recognition
Text 61
Classification Computer
Stock market
troubleshooting
Why learn Bayesian networks?
X Y
X is the Parent of Y.
3. Each node Xi has a conditional probability distribution P(Xi|
Parent(Xi))
4. No directed cycles.
Topology
• Nodes & Links – specifies the conditional independence
relationships.
• Variables Weather ,Toothache ,Catch ,Cavity .
Example: Traffic
Example: Traffic II
Example: Alarm Network
69
Semantics of Bayesian network
• 2 ways to understand:
1. To see the network as a representation of the joint probability
distribution.
2. View as an encoding of a collection of conditional independence
statements.
Semantics of Bayesian network
Probabilities in Bayes Nets
79
Example: Burglar Alarm
Example: Burglar Alarm
81
Example: Traffic
82
Example: Traffic
83
Example: Traffic
84
85
Summary
86