0% found this document useful (0 votes)
10 views14 pages

Unit 4 Uncertainty

Uploaded by

gamewithsp2024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views14 pages

Unit 4 Uncertainty

Uploaded by

gamewithsp2024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING

ARTIFICIAL INTELLIGENCE

UNIT-IV

PLANNING AND MACHINE LEARNING

Basic plan generation systems – Strips - Advanced plan generation systems - K strips
- Strategic explanations - Why, Why not and how explanations. Learning - Machine
learning, adaptive learning.

4.1 Uncertainty
 Agents almost never have access to the whole truth about the environment
(i.e)Agent must therefore act under uncertainity.
 Uncertainity can also arise because of incompleteness and incorrectness in the agent’s
understanding of the properties of the environment.

4.1.1 Handling of Uncertainty:-


 Identifying uncertainity in dental diagnosis system.
 For all P Symptom(P,toothache) → Diagnosis(P,Cavity)
 This rule is logically wrong.Not all patients with toothache have cavities,some of them may
have gum disease or impacted wisdom teeth or one of several other problems.
 For all P symptom(P,toothache) → Disease(P,cavity) ᴠ Disease(P-Gumdisease) ᴠ
Disease(P,Impacted Wisdom)….
(i.e)unlimited set of possibilities are exists for toothache symptom.
Change into casual rule as:
 For all P disease(P,cavity) → Symptom(P,toothache),but this rule is not right
either,not all cavities cause pain.
 Trying to FOL in medical diagnosis thus fails for three main reasons.
I. LAZINES: Too much work to list the complete set of antecedents and
consequents needed.
II. THEORETICAL IGNORANCE: Medical science has no complete theory
for domain.
III. PRACTICAL IGNORANCE: Even if we know all the rules,uncertainit y
arises because some tests cannot be run on the patients body.

 CONCLUSION:
1

o Agents knowledge can at best provide only a degree of belief in the relevant
Page

sentences.the total used to deal with degree of belief will be probability theory,which
assigns or numerical degree of belief between 0 to 1 to sentences.
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING

 PRIOR (or) UNCONDITIONAL PROBABILITY:Before the evidence is obtained.


 POSTERIOR (or) CONDITIONAL PROBABILITY:After the evidence is obtained.
 UTILITY THEORY:To represent and reasons with preference(i.e)utility-quality of being
useful.
Decision theory=probability theory + Utility theory
 The fundamentals idea of decision theory is that an agent is rational if and only if it chooses
the action that yields the highest expected utility,averaged overall the possible outcomes of
the action-maximium expected utility.(i.e)Weighting the utility of a particular outcome by the
probability that it occurs.
 The following shows a decision theoretic agent

Function DT-Agent (percept)returns an action


Static: belief_state,probabilistic beliefs about the current state of world
action, the Agent’s action
Update: belief_state based on action and percept
Calculate outcomes probabilities for actions,
given action description and current Belief_state
Select action with highest expected utility
given probabilities of outcomes and Utility information
Return action

4.2 Review of Probability

AXIOMS OF PROBABILITY:

I. All probabilities are between 0 and 1. 0 ≤ P(A) ≤ 1


II. Necessarily true (i.e. valid) proposition have probability 1 an necessarily false (i.e.
unsatisfiable)proposition have probability 0 P(True) = 1 P(False) = 0
III. The probability of a disjunction is given by P(A ᴠ B) = P(A) + P(B)-P(A ᴧ B)
IV. Let B = ⌐A in the axiom (III)
V. P(True) = P(A) + P(⌐A) – P(False) (by logical equivalence)
VI. 1 = P(A) + P(⌐A) (by step 2)
VII. P(⌐A) = 1-P(A) (by algebra)


Joint probability distribution:
An agent’s probability assignments to all propositions in the domain (both simple and
complex)

Ex: Trivial medical domain with two Boolean variables.

Toothache ⌐Toothache

Cavity 0.04 0.06

⌐Cavity 0.01 0.89


2
Page

I. Adding across a row or column gives the unconditional probability of a variable.


P(Cavity) = 0.06 + 0.04 = 0.1

SVCET
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING

P(Cavity + Toothache) = 0.04 +0.01 + 0.06 = 0.11

II. Conditional Probability


P(Cavity / Toothache) =

= = 0.80
Bayes’Rule:

1. Recall two forms of the product rule


P(A ᴧ B) = P(A/B) P(B)
P(A ᴧ B) = P(B/A) P(A)
Equating the two righthand sides and dividing by P(A),i.e.

P(A/B)=
Is called as Baye’s rule (or) Baye’s law (or) Baye’s theorem
2. From the above equation the general law of multivalued variables can be written using the P
notation:

P(Y / X) =
3. From the above equation on some background evidence E:

P(Y / X,E) =
4. Disadvantage
It requires three terms to compute one conditional probability (P(B/A))
- One conditional probability P(A/B)
- Two unconditional probability P(B) and P(A)
5. Advantage
If three values are known,then the unknown fourth value → P(B/A) is computed
easily.
6. Example:
Given: P(S/M) = 0.5 , P(M) = 1/5000 , P(S) = 1/20
S – the proposition that the patient has a stiff nect
M – the proposition that the patient has meningitis
P(S/M) – only one in 5000 patients with a stiff neck to have meningitis

P(M/S) = = 0.0002
7. Normalization
a) Consider again the equation for calculating the probability of meningitis given
a stiff neck.

P(M/S) =
b) Consider the patient is suffering from whiplash W given a stiff neck.
3 Page

P(W/S) =

SVCET
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING

c) To perform relative likelihood between a and b,we need P(S/W) = 0.8 and
P(W) = 1/1000 and P(S) is nit required since it is already defined

= = =

i.e.whiplash is 80 times more likely than meningitis,given a stiff

neck.

d) Disadvantages: consider the folloeing equations:

P(M/S) = ………. (1)

P(⌐M/S) = ………….. (2)


Adding (1) and (2) using the fact that
P(M/S) + P(⌐M/S) = 1,we obtain
P(S) = P(S/M) P(M) + P(S/⌐M) P(⌐M)
Substituting into the equation for P(M/S),we have

P(M/S) =
This process is called normalization ,because it treats 1/P(S) as a normalizing
constant that allows the conditional terms to sum to 1
The general multivalued normalization equation is

P( ) = αP( P(Y)
α – normalization constant
8. Baye’s Rule and evidence
a) Two conditional probability relating to cavities:
P(Cavity / Toothache) = 0.8
P(Cavity /Catch) =0.95
Using Baye’s Rule:

P(Cavity/Toothache ᴧ Catch) =
b) Bayesian updating is done (i.e) evidence one piece at a time.

P(Cavity/Toothache) = P(Cavity) ………..(1)


c) When catch is observed apply Bayes Rule with constant conditioning context

P(Cavity/Toothache ᴧ Catch) = …………(2)


From (1) and (2)

= P(Cavity)
4
Page

d) Mathematically the equation are rewritten as:


Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING

P(Catch/Cavity ᴧ Toothache) = P(Catch/Cavity)


P(Toothache/Cavity ᴧ Catch) = P(Toothache/cavity)
These equations express the conditional independence of Toothache and catch on
given Cavity.
e) Using conditional independences,simplify the equation of Bayes updating

P(Cavity/Toothache ᴧ Catch) = P(Cavity)


f) Using normalization,it is further reduced as
P(Cavity/Toothache ᴧ Catch) → P(X/Y,Z) = P(X/Z)
P(Z/X,Y) = α P(Z) P(X/Z) P(Y/Z) (i.e.) P(Z/X,Y)sum to 1

4.3 Bayesian Network:-

4.3.1 Syntax:

 A data structure used to represent knowledge in an uncertain domain (i.e) to represent the
dependence between variables and to give a whole specification of the joint probability
distribution.
 A belief network is a graph in which the following holds.
I. A set of random variables makes up the nodes of the network.
II. A set of directed links or arrows connects pairs of nodes x→y,x has a direct
influence on y.
III. Each node has a conditional probability tale that quantifies the effects that the
parents have on the node.The parents of a node are all nodes that have arrows
pointing to it.
IV. Graph has no directed cycles(DAG)
 The other names of Belief network are Bayesian network ,probabilistic network, casual
network and knowledge map.

 Example:

A new burglar alarm has been installed at home.

 It is fairly reliable at detecting a burglary but also responds on occasion to minor


earthquakes.
 You also have two neighbours,John and Mary,who have promised to call you at
work when they hear the alarm.
 John always calls when he hears the alarm but sometimes confuses the telephone
ringing with the alarm and calls then too.
 Mary on the otherhand likes rather loud music and sometimes misses the alarm
together.
 Given the evidence of who has or has not called estimate the probability of a
burglary
5

Uncertainty:
Page

I. Mary currently listening to loud music


Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING

II. John confuses telephone ring with alarm → laziness and ignorance in the
operation
III. Alarm may fail off → power failure, dead battery, cut wires etc.

Belief network
Earthquake
Burglary eeee

Alarm
Mary calls
John calls

Conditional probability table for the random variable Alarm:

Burglary Earthquake P(Alarm/Burglary,Earthquake)


True False
T T 0.950 0.050
T F 0.950 0.050
F T 0.290 0.710
F F 0.001 0.999

Each row in a table must sum to 1,because the entry represents set of cases for the variable. A table
with n Boolean variables contain 2n independently specifiable probabilities.

P(B)
.001 P(E)
Burglary Earthquake .002

B E P(A)
T T .95
Alarm T F .94
F T .29
F F .001

John calls Mary calls


A P(J)
T .90 A P(M)
T .70
F .05
Belief network with conditional probability F .01
6
Page
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING

4.3.2 Semantics

There are two ways in which one can understand the semantics of Belief networks

1. Network as a representation of the joint probabilit y distribution-used to know how to construct


networks.

2. Encoding of a collection of conditional independence statements-designing inference procedure.

 Joint probability distribution: How to construct network’s? A belief network provides a


complete description of the domain.Every entry in the joint probability distribution can be
calculated from the information in the network.A entry in the joint is the probability of a
conjunction of particular assignment to each variable(i.e)
P(X1 = x,ᴧ….ᴧxn = xn)
 We use the notation P(x1…..xn)as an abbreviation for this.The value of this entry is given
by the following formula:
P(x1……xn) = i |Parents(Xi))
 Thus each entry in the joint is represented by the product of the appreciate elements of the
CPT in the belief network.The CPT’s therefore provide a decomposed representation of
the joint.
 The probability of the event that alarm has sounded but neither a burglary nor an
earthquake has occurred,and both John and Mary call.We use single letter names for the
variables.
P(JᴧMᴧA⌐Bᴧ⌐E)
= P(J/A) P(M/A) P(A|⌐Bᴧ⌐E) P(⌐B) P(⌐E)
= 0.90 * 0.70 * 0.001 * 0.999 * 0.998
= 0.00062
Noisy OR: It is the logical relationship of uncertaint y.In proposition logic we might say
fever is true, If and only if cold, flu or malaria is true. The Noisy OR made adds some
uncertainity to this strict logical approach. The model makes three assumptions.
I. It assumes the each cause has an independent chance of causing the effect.
II. It assumes that all possible causes are listed.
III. It assumes that whatever inhibits Flu from causing a fever.These inhibits are not
responded as nodes but rather are summarized as “noise parameters”
Example
P(Fever/cold) = 0.4
P(Fever/Flu)=0.8 Noise parameters are 0.6,0.2 and 0.1
P(Fever/Malaria)=0.9
 Conclusion:

I. If no parent node is true then the output is false with 100% certainity.
II. If exactly one parent is true,then the output is false with probability equal to the
noise parameter for that node.
III. The probability that the output node is false is just the product of the noise
7

parameters for all the input nodes that are true.


Page
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING

 Conditional independent relations in belief networks:

 From the given network is it possible to read off whether a set of nodes X is
independent of another set Y,given a set of evidence nodes E? the answer is
yes,and the method is provided by the notion of direction dependent separation or
de-seperation.
 If every undirected path from a node in X to a node in Y is de-seperated by E then
X and Y are conditionally independent given E.

X E Y
 Z

A path from X to Y can be blocked given evidence E

 Three paths in which a path from x to y can be blocked,given a evidence E.If


every path from x to Y is blocked,then we say E deseperates x and y(i.e)
I. Z is in E and z has one arrow on the path leading in and one arrow out.
II. Z is in E and Z has both arrows leading out.
III. Neither Z nor any descendents of Z is in E and both arrows lead into Z.

Example belief network for d-seperation:Car’s electrical system and engine

Battery

Gas
Radio Ignition

Starts

Moves
8 Page

SVCET
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING

1. Whether there is a Gas in the car and whether the car Radio plays are independent given
evidence about whether the Spark plugs fire
2. Gas and Radio are independent if battery works.
3. Gas and Radio are independent given no evidence at all.
4. Gas and Radio are dependent on evidence start.

4.5. Inference in Temporal models


The generic temporal model has the following set of inference tasks:
1. Monitoring (or) filtering
Filtering (Monitoring):computing the conditional distribution over the current state, given all
evidence to data, P(Xt |e1:t)

 In the umbrella example, monitoring would mean computing the probability of rain today,
given all the observation of the umbrella so far, including today

X0 X1 Xk Xt

E1 Ek Et

2.Prediction

Prediction:computing the conditional distribution over the future state,given all evidence to
date,P(Xt+k|e1:t),for k>0.
In the umbrella example,prediction would mean computing the probability of rain
tomarrow(k=1),or the day after tomarrow(k=2),etc.,given all the observations of the umbrella
so far Xt+1
X0 X1 Xk Et+1

E1 Et Et
9
Page

SVCET
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING

Monitoring(filtering)

 Filtering(monitoring):computing the conditional distribution over the current state,given all


evidence to data,corresponds to computing the distribution P(Xt|e1:t),or P(Xt+1|e1:t+1):
P(Xt+1|e1:t+1) = P(Xt+1|e1:t,et+1) = P(Xt+1|et+1,e1:t)
 General form of Baye’s rule conditional also on evidence e

P(Y|X,e) = = αP(X|Y,e) P(Y|e)


 In temporal Markov process,it reads:
P(Xt+1|et+1,e1:t) = αP(et+1|Xt+1,e1:t) P(Xt+1|e1:t )
 Since evidence et depends only on the current state Xt
P(Xt+1|et+1,e1:t) = αP(et+1|Xt+1,e1:t) P(Xt+1|e1:t )
 Then we can simplify
P(Xt+1|e1:t+1) = αP(et+1|Xt+1) P(Xt+1|e1:t)
 The second term P(Xt+1|e1:t),corresponds to a one-step prediction of the nextstate,given
evidence up to time t,and the first term updates this new state with the new evidence at time
t+1his updating is called filtering.
 Let us now obtain the one-step prediction:
P(Xt+1|e1:t) = (Xt+1|Xt ) P(Xt|e1:t)
 The first term is the (Markov) transition model and the second term is a current state
distribution given evidence up to date
P(Xt+1|e1:t) = (Xt+1|Xt ) P(Xt|e1:t)
 The recursive formula for monitoring/filtering then reads
P(Xt+1|e1:t+1) = αP(et+1|Xt+1) (Xt+1|Xt ) P(Xt |e1:t)
We can write the same set of equations for P(Xt|e1:t),where we replace
t+1 ← t and t ← t-1 prediction to the far future
 What happens when we want to predict further into future given only the evidence up to this
date?
 It can be shown that predicted distribution for state vector converges towards one constant
vector,the so called fixed point (for every t > mixing time):
P(Xt |e1:t) = P(Xt+1|e1:t+1)
 This is called a stationary distribution of the Markov process,and the time required to reach
this stationary state is called the mixing time.
 Stationary distribution of the Markov process dooms to failure any attempt to predict the
actual state for a number of steps ahead that is more than a small fraction of the mixing time.

3. Most likely sequence

 Given all evidence to date,we want to find the sequence of states that is most likely to have
generated all the evidence,i.e. argmax X1:t P(X1:t|e1:t)
 In the umbrella example,if the umbrella appears on each of the first three days and is absent
on the fourth,then the most likely explanation is that it rained on the first three days and it did
10

not rain on the fourth.


Page
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING

 Algorithms for this task are useful in many applications,including speech recognition,i.e. to
find the most likely sequence of words, given series sounds,or the construction of bit strings
transmitted over a noisy channel(cell phone),etc.

Rt-1 P(Rt)
T 0.7

F 0.3
Raint-1 Raint Raint+1

Rt P(Ut )
T 0.9

f 0.2
Umbrellat-1
Umbrellat
Umbrellat+1

 Suppose that [true,true,false,true,true]is the umbrella sequence,which the security guard


observes first five days on the job.
 What is the weather sequence most likely to explain this out of 2 =32 possible sequences,i.e.
5

argmax X1:t P(X1:t |e1:t )?


 For each state,the bold arrow indicates its best predecessor as measured by the product of the
preceding sequence probability m1:t and the transition probability P(Xt|Xt-1)
 To derive the recursive formula,let us focus on paths that reach the state Rain5 = true.the most
likely path consists of the most likely path to some state at t=4 followed by the transition to
Rain5 = true.
 The state at t=4,which will become part of the path to Rain5 = true is whichever maximizes
the likelihood of that path.
 There is a recursive relationship between most likely paths to each state Xt+1 and most likely
paths to each state Xt.

Rain1 Rain2 Rain3 Rain4 Rain5

true true true true true

false false false false false

true true true true true


11
Page
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING

.8182 .5155 .0361 .0334 .0210

.1818 .0491 .1237 .0173 .0024

m1:1 m1:2 m1:3 m1:4 m1:5

 Viterbi algorithm:

 Let us denoted by m1:t the probability of the best sequence reaching each state at time
t.
M1:t = 1,…….,Xt-1,Xt |e1:t)
 Then the recursive relationship between most likely paths to each state Xt+1 and most
likely paths to each state Xt , reads
m1:t+1 = 1,…….,Xt ,Xt+1|e1:t+1)

=αP(et+1|Xt+1) P(Xt+1|Xt) X1,….,Xt-1,Xt|e1:t))


This is the viterbi formula

4.6 Hidden Markov model

 An HMM is a temporal probabilistic model in which the state of the process is described
by a single discrete random variable.
 The possible values of the variable are the possible states of the world.
 The umbrella example described in the HMM,since it has just one state variable Raint.
Additional state variables can be added to a temporal model while staying within the
HMM framenetwork,but only by combining all the state variable into a single
“megavariable” whose values are all possible tuples of values of the individual state
variables.

 Simplified matrix algorithms:

 With a single,discrete state variable Xt ,we can give concrete form to the representations
of the transition model,and the forward and backward messages.
 Let the state variable Xt have values denoted by integers 1,….,S,where S is the number of
possible states.
 The transition model P(Xt |Xt-1) becomes an S x S matrix T,where
Tij = P(Xt = j|Xt-1 = i)
Tij – probability of a transition from state I to state j.
 For example,the transition matrix for the umbrella world is
T = P(Xt|Xt-1) =
12
Page
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING

 We also put the sensor model in matrix form.In this case,because the value of the evidence
variable Et is known to be say et ,we needuse only that part of the model specifying the
probability that et appears.
 For each time step t,we construct a diagonal matrix Ot whose diagonal entries are given by
the values P(et |Xt = i) and whose entries are 0.
O1 =
 We use column vectors to represent the forward and backward messages,the computations
become simple matrix-vector operations.
The forward equation becomes
F1:t+1 = α Ot+1 TT f1:t …………(1)
and the backward equation becomes
b k+1:t = TOk+1 bk+2:t …………(2)
 From these equations,we can see that the time complexity of the forward and backward
algorithm applied to a sequence of length t is O(S2t).The space complexity is O(St).
 Besides providing an elegant description of the filtering and smoothing algorithms for
HMMs,the matrix formulation reveals opportunities for improved algorithms.
 The first is a simple variation on the forward-backward algorithm that allows smoothing to
be carried out in constant space,independently of the length of the sequence.
 The idea is that smoothing for any particular time slice k requires the simultaneous
presence of both forward and backward messages,f1:k and bk+1:t .
 The forward-backward algorithms achieves this by storing the fs computed on the forward
pass so that they are available during the backward pass.
f1:t = α’ (TT)-1 Ot+1-1 f1:t+1
 The modified smoothing algorithm works by first running the standard forward pass to
compute ft:t and then running the backward pass for both b and f together,using them to
compute the smoothed estimate at each step.
 A second area in which the matrix formulation reveals an improvement is in online
smoothing with a fixed lag.
 Let us suppose that the lag is d; that is,we are smoothing at time slice t-d,where the current
time is t.By equation.
αf1:t-dbt-d+1:t
for slice t-d.Then,when a new observation arrives,we need to compute
αf1:t-d+1b t-d+2:t+1
for slice t-d+1.First,we can compute f1:t-d+1 from f1:t-d, using the standard filtering process.
 Computing the backward message incrementally is more trickly,because there is no simple
relationship between the old backward message bt-d+1:t and the new backward message
bt-d+2:t+1.
 Instead ,we will examine the relationship between the old backward message b t-d+1:t and the
backward message at the front of the sequence,bt+1:t.To do this,we apply equation(2) d
times to get
bt-d+1:t = bt+1:t = Bt-d+1:t 1. ………….(3)
13

Where the matrix Bt-d+1:t is the product of the sequence of T and O matrices.
Page

SVCET
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING

 B can be thought of as a “transformation operator” that transforms a later backward


message into an earlier one.
bt-d+2:t+1 = bt+2:t+1 = Bt-d+2:t+1 1. …………(4)
 Examining the product expressions in the above two equations(3) & (4),we see that they
have a simple relationship:to get the second product,”divide” the first product by the first
element TOt-d+1, and multiply by the new last element TOt+1.
 In matrix language,then there is a simple relationship between the old and new B matrices:
Bt-d+2:t+1 = Ot-d+1-1 T-1 Bt-d+1:t TOt+1. …………….(5)
 This equation provides an incremental update for the B matrix,which in turn(eqn (4))
allows us to compute the backward message bt-d+2:t+1.

14
Page

SVCET

You might also like