Unit 4 Uncertainty
Unit 4 Uncertainty
ARTIFICIAL INTELLIGENCE
UNIT-IV
Basic plan generation systems – Strips - Advanced plan generation systems - K strips
- Strategic explanations - Why, Why not and how explanations. Learning - Machine
learning, adaptive learning.
4.1 Uncertainty
Agents almost never have access to the whole truth about the environment
(i.e)Agent must therefore act under uncertainity.
Uncertainity can also arise because of incompleteness and incorrectness in the agent’s
understanding of the properties of the environment.
CONCLUSION:
1
o Agents knowledge can at best provide only a degree of belief in the relevant
Page
sentences.the total used to deal with degree of belief will be probability theory,which
assigns or numerical degree of belief between 0 to 1 to sentences.
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING
AXIOMS OF PROBABILITY:
Joint probability distribution:
An agent’s probability assignments to all propositions in the domain (both simple and
complex)
Toothache ⌐Toothache
SVCET
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING
= = 0.80
Bayes’Rule:
P(A/B)=
Is called as Baye’s rule (or) Baye’s law (or) Baye’s theorem
2. From the above equation the general law of multivalued variables can be written using the P
notation:
P(Y / X) =
3. From the above equation on some background evidence E:
P(Y / X,E) =
4. Disadvantage
It requires three terms to compute one conditional probability (P(B/A))
- One conditional probability P(A/B)
- Two unconditional probability P(B) and P(A)
5. Advantage
If three values are known,then the unknown fourth value → P(B/A) is computed
easily.
6. Example:
Given: P(S/M) = 0.5 , P(M) = 1/5000 , P(S) = 1/20
S – the proposition that the patient has a stiff nect
M – the proposition that the patient has meningitis
P(S/M) – only one in 5000 patients with a stiff neck to have meningitis
P(M/S) = = 0.0002
7. Normalization
a) Consider again the equation for calculating the probability of meningitis given
a stiff neck.
P(M/S) =
b) Consider the patient is suffering from whiplash W given a stiff neck.
3 Page
P(W/S) =
SVCET
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING
c) To perform relative likelihood between a and b,we need P(S/W) = 0.8 and
P(W) = 1/1000 and P(S) is nit required since it is already defined
= = =
neck.
P(M/S) =
This process is called normalization ,because it treats 1/P(S) as a normalizing
constant that allows the conditional terms to sum to 1
The general multivalued normalization equation is
P( ) = αP( P(Y)
α – normalization constant
8. Baye’s Rule and evidence
a) Two conditional probability relating to cavities:
P(Cavity / Toothache) = 0.8
P(Cavity /Catch) =0.95
Using Baye’s Rule:
P(Cavity/Toothache ᴧ Catch) =
b) Bayesian updating is done (i.e) evidence one piece at a time.
= P(Cavity)
4
Page
4.3.1 Syntax:
A data structure used to represent knowledge in an uncertain domain (i.e) to represent the
dependence between variables and to give a whole specification of the joint probability
distribution.
A belief network is a graph in which the following holds.
I. A set of random variables makes up the nodes of the network.
II. A set of directed links or arrows connects pairs of nodes x→y,x has a direct
influence on y.
III. Each node has a conditional probability tale that quantifies the effects that the
parents have on the node.The parents of a node are all nodes that have arrows
pointing to it.
IV. Graph has no directed cycles(DAG)
The other names of Belief network are Bayesian network ,probabilistic network, casual
network and knowledge map.
Example:
Uncertainty:
Page
II. John confuses telephone ring with alarm → laziness and ignorance in the
operation
III. Alarm may fail off → power failure, dead battery, cut wires etc.
Belief network
Earthquake
Burglary eeee
Alarm
Mary calls
John calls
Each row in a table must sum to 1,because the entry represents set of cases for the variable. A table
with n Boolean variables contain 2n independently specifiable probabilities.
P(B)
.001 P(E)
Burglary Earthquake .002
B E P(A)
T T .95
Alarm T F .94
F T .29
F F .001
4.3.2 Semantics
There are two ways in which one can understand the semantics of Belief networks
I. If no parent node is true then the output is false with 100% certainity.
II. If exactly one parent is true,then the output is false with probability equal to the
noise parameter for that node.
III. The probability that the output node is false is just the product of the noise
7
From the given network is it possible to read off whether a set of nodes X is
independent of another set Y,given a set of evidence nodes E? the answer is
yes,and the method is provided by the notion of direction dependent separation or
de-seperation.
If every undirected path from a node in X to a node in Y is de-seperated by E then
X and Y are conditionally independent given E.
X E Y
Z
Battery
Gas
Radio Ignition
Starts
Moves
8 Page
SVCET
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING
1. Whether there is a Gas in the car and whether the car Radio plays are independent given
evidence about whether the Spark plugs fire
2. Gas and Radio are independent if battery works.
3. Gas and Radio are independent given no evidence at all.
4. Gas and Radio are dependent on evidence start.
In the umbrella example, monitoring would mean computing the probability of rain today,
given all the observation of the umbrella so far, including today
X0 X1 Xk Xt
E1 Ek Et
2.Prediction
Prediction:computing the conditional distribution over the future state,given all evidence to
date,P(Xt+k|e1:t),for k>0.
In the umbrella example,prediction would mean computing the probability of rain
tomarrow(k=1),or the day after tomarrow(k=2),etc.,given all the observations of the umbrella
so far Xt+1
X0 X1 Xk Et+1
E1 Et Et
9
Page
SVCET
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING
Monitoring(filtering)
Given all evidence to date,we want to find the sequence of states that is most likely to have
generated all the evidence,i.e. argmax X1:t P(X1:t|e1:t)
In the umbrella example,if the umbrella appears on each of the first three days and is absent
on the fourth,then the most likely explanation is that it rained on the first three days and it did
10
Algorithms for this task are useful in many applications,including speech recognition,i.e. to
find the most likely sequence of words, given series sounds,or the construction of bit strings
transmitted over a noisy channel(cell phone),etc.
Rt-1 P(Rt)
T 0.7
F 0.3
Raint-1 Raint Raint+1
Rt P(Ut )
T 0.9
f 0.2
Umbrellat-1
Umbrellat
Umbrellat+1
Viterbi algorithm:
Let us denoted by m1:t the probability of the best sequence reaching each state at time
t.
M1:t = 1,…….,Xt-1,Xt |e1:t)
Then the recursive relationship between most likely paths to each state Xt+1 and most
likely paths to each state Xt , reads
m1:t+1 = 1,…….,Xt ,Xt+1|e1:t+1)
An HMM is a temporal probabilistic model in which the state of the process is described
by a single discrete random variable.
The possible values of the variable are the possible states of the world.
The umbrella example described in the HMM,since it has just one state variable Raint.
Additional state variables can be added to a temporal model while staying within the
HMM framenetwork,but only by combining all the state variable into a single
“megavariable” whose values are all possible tuples of values of the individual state
variables.
With a single,discrete state variable Xt ,we can give concrete form to the representations
of the transition model,and the forward and backward messages.
Let the state variable Xt have values denoted by integers 1,….,S,where S is the number of
possible states.
The transition model P(Xt |Xt-1) becomes an S x S matrix T,where
Tij = P(Xt = j|Xt-1 = i)
Tij – probability of a transition from state I to state j.
For example,the transition matrix for the umbrella world is
T = P(Xt|Xt-1) =
12
Page
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING
We also put the sensor model in matrix form.In this case,because the value of the evidence
variable Et is known to be say et ,we needuse only that part of the model specifying the
probability that et appears.
For each time step t,we construct a diagonal matrix Ot whose diagonal entries are given by
the values P(et |Xt = i) and whose entries are 0.
O1 =
We use column vectors to represent the forward and backward messages,the computations
become simple matrix-vector operations.
The forward equation becomes
F1:t+1 = α Ot+1 TT f1:t …………(1)
and the backward equation becomes
b k+1:t = TOk+1 bk+2:t …………(2)
From these equations,we can see that the time complexity of the forward and backward
algorithm applied to a sequence of length t is O(S2t).The space complexity is O(St).
Besides providing an elegant description of the filtering and smoothing algorithms for
HMMs,the matrix formulation reveals opportunities for improved algorithms.
The first is a simple variation on the forward-backward algorithm that allows smoothing to
be carried out in constant space,independently of the length of the sequence.
The idea is that smoothing for any particular time slice k requires the simultaneous
presence of both forward and backward messages,f1:k and bk+1:t .
The forward-backward algorithms achieves this by storing the fs computed on the forward
pass so that they are available during the backward pass.
f1:t = α’ (TT)-1 Ot+1-1 f1:t+1
The modified smoothing algorithm works by first running the standard forward pass to
compute ft:t and then running the backward pass for both b and f together,using them to
compute the smoothed estimate at each step.
A second area in which the matrix formulation reveals an improvement is in online
smoothing with a fixed lag.
Let us suppose that the lag is d; that is,we are smoothing at time slice t-d,where the current
time is t.By equation.
αf1:t-dbt-d+1:t
for slice t-d.Then,when a new observation arrives,we need to compute
αf1:t-d+1b t-d+2:t+1
for slice t-d+1.First,we can compute f1:t-d+1 from f1:t-d, using the standard filtering process.
Computing the backward message incrementally is more trickly,because there is no simple
relationship between the old backward message bt-d+1:t and the new backward message
bt-d+2:t+1.
Instead ,we will examine the relationship between the old backward message b t-d+1:t and the
backward message at the front of the sequence,bt+1:t.To do this,we apply equation(2) d
times to get
bt-d+1:t = bt+1:t = Bt-d+1:t 1. ………….(3)
13
Where the matrix Bt-d+1:t is the product of the sequence of T and O matrices.
Page
SVCET
Artificial Intelligence CSE/IIIYr/VISem UNIT-IV/UNCERTAIN KNOWLEDGE AND REASONING
14
Page
SVCET