Unit 3 - Ai
Unit 3 - Ai
Declarative Procedural
• Declarative knowledge deals with factoid questions (what is the capital of
India? Etc.)
• Procedural knowledge deals with “How”
• Procedural knowledge can be embedded in declarative knowledge
Discrete Yes
Boolean percept
feature values:
<0, 0, 0, 0, 0>
P?
A/B P?
V
S P?
2 P P
1 P?
1 2 3 4
Stench, none, none, none, none
Models(KB) Models(P_in_(3,1)) P
• For all X
• if (X is a rose)
• then there exists Y
• (X has Y) and (Y is a thorn)
X Negation, ¬X
X Constant false, (F)
T F
T F
F T
F F
• Only the last of these (negation) is widely used (and has a symbol,¬ ,for the operation
42 Department of Computer Science and Engineering
Combined tables for unary operators
43
Binary operators
• There are sixteen possible binary operators:
X Y
T T T T T T T T T T F F F F F F F F
T F T T T T F F F F T T T T F F F F
F T T T F F T T F F T T F F T T F F
F F T F T F T F T F T F T F T F T F
• All these operators have names, but I haven’t tried to fit them in
• Only a few of these operators are normally used in logic
• Notice in particular that material implication (⇒) only approximately means the same as the English
word “implies”
• All45the other operators can be constructed
Departmentfrom a combination
of Computer of these (along with unary not, ¬)
Science and Engineering
Logical expressions
• All logical expressions can be computed with some combination of and (∧),
or (∨), and not (¬) operators
• For example, logical implication can be computed this way:
X Y ¬X ¬X ∨ Y X⇒Y
T T F T T
T F F F F
F T T T T
F F T T T
• Notice that ¬X ∨ Y is equivalent to X ⇒ Y
T T F T T T
T F F F T F
F T T T F F
F F T T T T
Department of Computer Science and Engineering 47
Another example
• Exclusive or (xor) is true if exactly one of its operands is true
X Y ¬X ¬Y ¬X ∧ Y X ∧ ¬Y (¬X∧Y)∨(X∧¬Y) X xor Y
T T F F F F F F
T F F T F T T T
F T T F T F T T
F F T T F F F F
6. Remove universal quantifiers by (1) moving them all to the left end; (2)
making the scope of each the entire sentence; and (3) dropping the
“prefix” part
Ex: (∀x)P(x) ⇒ P(x)
7. Put into conjunctive normal form (conjunction of disjunctions) using
distributive and associative laws
(P ∧ Q) ∨ R ⇒ (P ∨ R) ∧ (Q ∨ R)
(P ∨ Q) ∨ R ⇒ (P ∨ Q ∨ R)
8. Split conjuncts into separate clauses
9. Standardize variables so each clause contains only variable names that do
not occur in any other clause
82 Department of Computer Science and Engineering
An example
(∀x)(P(x) → ((∀y)(P(y) → P(f(x,y))) ∧ ¬(∀y)(Q(x,y) → P(y))))
2. Eliminate →
(∀x)(¬P(x) ∨ ((∀y)(¬P(y) ∨ P(f(x,y))) ∧ ¬(∀y)(¬Q(x,y) ∨ P(y))))
3. Reduce scope of negation
(∀x)(¬P(x) ∨ ((∀y)(¬P(y) ∨ P(f(x,y))) ∧(∃y)(Q(x,y) ∧ ¬P(y))))
4. Standardize variables
(∀x)(¬P(x) ∨ ((∀y)(¬P(y) ∨ P(f(x,y))) ∧(∃z)(Q(x,z) ∧ ¬P(z))))
5. Eliminate existential quantification
(∀x)(¬P(x) ∨((∀y)(¬P(y) ∨ P(f(x,y))) ∧(Q(x,g(x)) ∧ ¬P(g(x)))))
6. Drop universal quantification symbols
(¬P(x) ∨ ((¬P(y) ∨ P(f(x,y))) ∧(Q(x,g(x)) ∧ ¬P(g(x)))))
83 Department of Computer Science and Engineering
Example
7. Convert to conjunction of disjunctions
(¬P(x) ∨ ¬P(y) ∨ P(f(x,y))) ∧ (¬P(x) ∨ Q(x,g(x))) ∧
(¬P(x) ∨ ¬P(g(x)))
8. Create separate clauses
¬P(x) ∨ ¬P(y) ∨ P(f(x,y))
¬P(x) ∨ Q(x,g(x))
¬P(x) ∨ ¬P(g(x))
9. Standardize variables
¬P(x) ∨ ¬P(y) ∨ P(f(x,y))
¬P(z) ∨ Q(z,g(z))
¬P(w) ∨ ¬P(g(w))
84 Department of Computer Science and Engineering
Resolution
• Resolution is a sound and complete inference procedure for FOL
• Reminder: Resolution rule for propositional logic:
• P1 ∨ P2 ∨ ... ∨ Pn
• ¬P1 ∨ Q2 ∨ ... ∨ Qm
• Resolvent: P2 ∨ ... ∨ Pn ∨ Q2 ∨ ... ∨ Qm
• Examples
• P and ¬ P ∨ Q : derive Q (Modus Ponens)
• (¬ P ∨ Q) and (¬ Q ∨ R) : derive ¬ P ∨ R
• P and ¬ P : derive False [contradiction!]
• (P ∨ Q) and (¬ P ∨ ¬ Q) : derive True
85 Department of Computer Science and Engineering
Resolution in first-order logic
• Given sentences
P1 ∨ ... ∨ Pn
Q1 ∨ ... ∨ Qm
• in conjunctive normal form:
• each Pi and Qi is a literal, i.e., a positive or negated predicate symbol with its
terms,
• if Pj and ¬Qk unify with substitution list θ, then derive the resolvent sentence:
subst(θ, P1 ∨... ∨ Pj-1 ∨ Pj+1 ... Pn ∨ Q1 ∨ …Qk-1 ∨ Qk+1 ∨... ∨ Qm)
• Example
• from clause P(x, f(a)) ∨ P(x, f(y)) ∨ Q(y)
• and clause ¬P(z, f(a)) ∨ ¬Q(z)
• derive resolvent P(z, f(y)) ∨ Q(y) ∨ ¬Q(z)
• using θ = {x/z} Department of Computer Science and Engineering
86
Resolution refutation
• Given a consistent set of axioms KB and goal sentence Q, show that KB |=
Q
• Proof by contradiction: Add ¬Q to KB and try to prove false.
i.e., (KB |- Q) ↔ (KB ∨ ¬Q |- False)
• Resolution is refutation complete: it can establish that a given sentence Q
is entailed by KB, but can’t (in general) be used to generate all logical
consequences of a set of sentences
• Also, it cannot be used to prove that Q is not entailed by KB.
• Resolution won’t always give an answer since entailment is only
semidecidable
• And you can’t just run two proofs in parallel, one trying to prove Q and the other
trying to prove ¬Q, since KB might not entail either one
88 Department of Computer Science and Engineering
Refutation resolution proof tree
¬allergies(w) v sneeze(w) ¬cat(y) v ¬allergic-to-cats(z) ∨ allergies(z)
w/z
y/Felix
z/Lise
sneeze(Lise) ¬sneeze(Lise)
false
negated query
89 Department of Computer Science and Engineering
We need answers to the following questions
{z/T}
R6: ¬C(T) A
{}
R7: FALSE
98 Department of Computer Science and Engineering
Knowledge and Reasoning
Table of Contents
• Knowledge and reasoning-Approaches and issues of knowledge reasoning-
Knowledge base agents
• Logic Basics-Logic-Propositional logic-syntax ,semantics and inferences-
Propositional logic- Reasoning patterns
• Unification and Resolution
• Knowledge representation using rules-Knowledge representation using semantic nets
• Knowledge representation using frames-Inferences-
• Uncertain Knowledge and reasoning-Methods-Bayesian probability and belief
network
• Probabilistic reasoning-Probabilistic reasoning over time-Probabilistic reasoning
over time
• Other uncertain techniques-Data mining-Fuzzy
Department of Computer Science andlogic-Dempster
Engineering -shafer theory 99
Production Rules
• Condition-Action Pairs
• IF this condition (or premise or antecedent) occurs,
THEN some action (or result, or conclusion, or
consequence) will (or should) occur
• IF the traffic light is red AND you have stopped,
THEN a right turn is OK
Statement AND statements All conditions must be true for a conclusion to be true
104
Rule-based Inference
• Production rules are typically used as part of a
production system
• Production systems provide pattern-directed control of
the reasoning process
• Production systems have:
• Productions: set of production rules
• Working Memory (WM): description of current state
of the world
• Recognise-act cycle
Department of Computer Science and Engineering 105
Production Systems
Production
Rules
C1→A1 Working
C2→A2 Environment
Memory
C3→A3
…
Cn→An
Conflict Conflict
Set Resolution
Animal Ostrich
Can breathe is-a Runs fast
Can eat Cannot fly
Has skin Is tall
Fish Salmon
is-a Can swim is-a Swims upstream
Has fins Is pink
Has gills Is edible
BEAGLE COLLIE
instance
size: small FICTIONAL
instance CHARACTER instance
instance
SNOOPY instance
LASSIE
friend of
CHARLIE
Department BROWN
of Computer Science and Engineering 119
Semantic Networks
What does or should a node represent?
• A class of objects?
• An instance of an class?
• The canonical instance of a class?
• The set of all instances of a class?
DOG COLLIE
Fixed Fixed
legs: 4 breed of: DOG
type: sheepdog
Default
diet: carnivorous Default
sound: bark size: 65cm
Variable Variable
size: colour:
colour:
ELEPHANT
subclass: MAMMAL
colour: grey
size: large
Nellie
instance: ELEPHANT
likes: apples
• elephant(clyde)
∴
mammal(clyde)
has_part(clyde, head)
ELEPHANT
subclass: MAMMAL
has_trunk: yes
*colour: grey
*size: large
*furry: no
Clyde
instance: ELEPHANT
colour: pink
owner: Fred
Nellie
instance: ELEPHANT
size: small of Computer Science and Engineering
Department 131
Frames (Contd.)
• Can represent subclass and instance relationships (both
sometimes called ISA or “is a”)
• Properties (e.g. colour and size) can be referred to as slots and
slot values (e.g. grey, large) as slot fillers
• Objects can inherit all properties of parent class (therefore
Nellie is grey and large)
• But can inherit properties which are only typical (usually called
default, here starred), and can be overridden
• For example, mammal is typically furry, but this is not so for an
elephant
• Deduction
• Induction
• Abduction
• In real life, it is not always possible to determine the state of the environment as it might not be clear. Due to
partially observable or non-deterministic environments, agents may need to handle uncertainty and deal with it.
• Uncertain data: Data that is missing, unreliable, inconsistent or noisy
• Uncertain knowledge: When the available knowledge has multiple causes leading to multiple effects or
incomplete knowledge of causality in the domain
• Uncertain knowledge representation: The representations which provides a restricted model of the real system,
or has limited expressiveness
• Inference: In case of incomplete or default reasoning methods, conclusions drawn might not be completely
accurate. Let’s understand this better with the help of an example.
• IF primary infection is bacteria cea
• AND site of infection is sterile
• AND entry point is gastrointestinal tract
• THEN organism is bacteriod (0.7).
• In such uncertain situations, the agent does not guarantee a solution but acts on its own assumptions
and probabilities and gives some degree of belief that it will reach the required solution.
• For example, In case of Medical diagnosis consider the rule Toothache = Cavity. This is
not complete as not all patients having toothache have cavities. So we can write a
more generalized rule Toothache = Cavity V Gum problems V Abscess… To make this
rule complete, we will have to list all the possible causes of toothache. But this is not
feasible due to the following rules:
• Laziness- It will require a lot of effort to list the complete set of antecedents and
consequents to make the rules complete.
• Theoretical ignorance- Medical science does not have complete theory for the domain
• Practical ignorance- It might not be practical that all tests have been or can be
conducted for the patients.
• Such uncertain situations can be dealt with using
Probability theory
Truth Maintenance systems
Fuzzy logic.
Department of Computer Science and Engineering 144
Uncertain knowledge and reasoning
Probability
• Probability is the degree of likeliness that an event will occur. It provides a certain degree of belief in case
of uncertain situations. It is defined over a set of events U and assigns value P(e) i.e. probability of
occurrence of event e in the range [0,1]. Here each sentence is labeled with a real number in the range of
0 to 1, 0 means the sentence is false and 1 means it is true.
• Conditional Probability or Posterior Probability is the probability of event A given that B has already
occurred.
• P(A|B) = (P(B|A) * P(A)) / P(B)
• For example, P(It will rain tomorrow| It is raining today) represents conditional probability of it raining
tomorrow as it is raining today.
• P(A|B) + P(NOT(A)|B) = 1
• Joint probability is the probability of 2 independent events happening simultaneously like rolling two dice
or tossing two coins together. For example, Probability of getting 2 on one dice and 6 on the other is
equal to 1/36. Joint probability has a wide use in various fields such as physics, astronomy, and comes
into play when there are two independent events. The full joint probability distribution specifies the
probability of each complete assignment of values to random variables.
Department of Computer Science and Engineering 145
Uncertain knowledge and reasoning
Bayes Theorem
• It is based on the principle that every pair of features being
classified is independent of each other. It calculates probability
P(A|B) where A is class of possible outcomes and B is given
instance which has to be classified.
• P(A|B) = P(B|A) * P(A) / P(B)
• P(A|B) = Probability that A is happening, given that B has
occurred (posterior probability)
• P(A) = prior probability of class
• P(B) = prior probability of predictor
• P(B|A) = likelihood Department of Computer Science and Engineering 146
Uncertain knowledge and reasoning
CONDITIONAL PROBABILITY
• The Bayesian network has mainly two components:
Causal Component
Actual numbers
• Each node in the Bayesian network has condition probability
distribution P(Xi |Parent(Xi) ), which determines the effect of the parent
on that node.
• Bayesian network is based on Joint probability distribution and
conditional probability. So let's first understand the joint probability
distribution:
Department of Computer Science and Engineering 156
Bayesian probability and belief network
Problem:
• Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake
occurred, and David and Sophia both called the Harry.
Solution:
• The Bayesian network for the above problem is given below. The network structure is showing that
burglary and earthquake is the parent node of the alarm and directly affecting the probability of alarm's
going off, but David and Sophia's calls depend on alarm probability.
• The network is representing that our assumptions do not directly perceive the burglary and also do not
notice the minor earthquake, and they also not confer before calling.
• The conditional distributions for each node are given as conditional probabilities table or CPT.
• Each row in the CPT must be sum to 1 because all the entries in the table represent an exhaustive set of
cases for the variable.
• In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if there are two
parents, then CPT will contain 4 probability values
Department of Computer Science and Engineering 159
Bayesian probability and belief network
List of all events occurring in this network:
• Burglary (B)
• Earthquake(E)
• Alarm(A)
• David Calls(D)
• Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can rewrite the
above probability statement using joint probability distribution:
• P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
• =P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
• = P [D| A]. P [ S| A, B, E]. P[ A, B, E]
• = P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
• = P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]
Department of Computer Science and Engineering 160
Bayesian probability and belief network
Let's take the observed probability for
the Burglary and earthquake
component:
P(B= True) = 0.002, which is the
probability of burglary.
P(B= False)= 0.998, which is the
probability of no burglary.
P(E= True)= 0.001, which is the
probability of a minor earthquake
P(E= False)= 0.999, Which is the
probability that an earthquake not
occurred.
The Conditional
probability of Sophia that True 0.75 0.25
she calls is depending on
its Parent Node "Alarm."
False 0.02 0.98
AP(S= True)P(S=
False)True0.750.25False0.020.98 AP(S=
Department of Computer Science and Engineering
True)P(S=
Bayesian probability and belief network
• From the formula of joint distribution, we can write the problem statement in the form of
probability distribution:
• P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using Joint distribution.
• The semantics of Bayesian Network:
• There are two ways to understand the semantics of the Bayesian network, which is given below:
1. To understand the network as the representation of the Joint probability distribution.
• It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of conditional independence
statements.
• It is helpful in designing inference procedure.
Department of Computer Science and Engineering 165
Bayes' theorem in Artificial intelligence
Bayes' theorem:
• Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian
reasoning, which determines the probability of an event with uncertain
knowledge.
• In probability theory, it relates the conditional probability and marginal
probabilities of two random events.
• Bayes' theorem was named after the British mathematician Thomas Bayes.
The Bayesian inference is an application of Bayes' theorem, which is
fundamental to Bayesian statistics.
• It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
• Bayes' theorem allows updating the probability prediction of an event by
observing new information of the real world.
Department of Computer Science and Engineering 166
Bayes' theorem in Artificial intelligence
Where:
• Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A). This is very useful in
cases where we have a good probability of these three terms and want to determine the fourth one. Suppose we
want to perceive the effect of some unknown cause, and want to compute that cause, then the Bayes' rule
becomes:
Example-1:
Question: what is the probability that a patient has diseases meningitis with a stiff neck?
• Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs 80% of the time.
He is also aware of some more facts, which are given as follows:
The Known probability that a patient has meningitis disease is 1/30,000.
The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the proposition that patient has meningitis. , so we can
calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000= 3.3*10-5
P(a)= .02
• Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.
Department of Computer Science and Engineering 168
Applying Bayes' theorem in Artificial
intelligence
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The probability that the card is
king is 4/52, then calculate posterior probability P(King|Face), which means the drawn face card is a
king card.
Solution:
Definition
• Probabilistic reasoning is the representation of knowledge where the concept of probability is applied to indicate the
uncertainty in knowledge.
Reasons to use Probabilistic Reasoning in AI
• Some reasons to use this way of representing knowledge is given below:
• When we are unsure of the predicates.
• When the possibilities of predicates become too large to list down.
• When during an experiment, it is proven that an error occurs.
• Probability of a given event = Chances of that event occurring / Total number of Events.
Notations and Properties
• Consider the statement S: March will be cold.
• Probability is often denoted as P(predicate).
• Considering the chances of March being cold is only 30%, therefore, P(S) = 0.3
• Probability always takes a value between 0 and 1. If the probability is 0, then the event will never occur and if it is 1, then it
will occur for sure.
• Then, P(¬S) = 0.7
• This means, the probability of March not being cold is 70%.
• Property 1: P(S) + P(¬S) = 1
Bayesian Network
When designing a Bayesian Network, we keep
the local probability table at each node.
Bayesian Network - Example
Consider a Bayesian Network as given below:
P(No electricity = T) x
T T P(Not Interested = T) =
0.2 x 0.3 = 0.06
The updated
Bayesian Network
is:
Fuzzy Input
Fuzzy Output
0.7
0.2
X1
5 10 20 30 40 60
(Y – 20)/(30-20) = 0.5
X1 and X2 = 0.5 Y – 20 = 0.5* 10 = 5
Y = 25 Mins
Department of Computer Science and Engineering 198
Knowledge and Reasoning
Table of Contents
) = ({
P(ϴ) = ({
Detectives after receiving the crime scene, assign mass probabilities to various elements of the
power set:
Event Mass
No one is guilty 0
B is guilty 0.1
J is guilty 0.2
S is guilty 0.1
Either B or J is guilty 0.1
Either B or S is guilty 0.1
Either S or J is guilty 0.3 Department of Computer Science and Engineering 204
Dempster Shafer Problem
Belief in A:
The belief in an element A of the power set is the sum of the
masses of elements which are subsets of A (including A itself)
Ex: Given A= {q1, q2, q3}
Bet (A)
={m(q1)+m(q2)+m(q3)+m(q1,q2)+m(q2,q3),m(q1,q3)+m(q1,q2,q3)}
Ex: Given the above mass assignments,
Bel(B) = m(B) =0.1
Bel (B,J) = m(B)+m(J)+m(B,J) = 0.1+0.2=0.1 0.4
A {B} {J} {S} {B,J} {B,S} {S,J} {B,J,S}
RESULT:
M(A) 0.1 0.2 0.1 0.1 0.1 0.3 0.1