Knowledge Representation & Reasoning
Knowledge Representation & Reasoning
Knowledge Representation & Reasoning
Reasoning
Knowledge Based Systems
Lecture:28
Early Belief
The early researchers in AI believed that best approach
to problem solution is through the development of
general purpose problem solvers
Advantages:
A set of strict rules.
Can be used to derive more facts.
Truths of new statements can be verified.
Guaranteed correctness.
Popular in AI systems. e.g Automated theorem proving
Procedural Knowledge
Here the Knowledge is encoded in some procedures. It
consists of small programs that know how to do
specific things, how to proceed.
Representation of Knowledge
Propositional Logic
FOPL(First Order Predicate knowledge)
Frames & Associative Networks
Scripts
Case grammar theory
Production Rules
Inference System
Forward & backward declaration
Knowledge Representation &
Reasoning
Propositional Logic
Lecture:30
What is Logic???
Ex:
It is raining = P
New Delhi is the capital of India = Q
Proposition
A statement that is either true or false.
Examples of propositions:
Pitt is located in the Oakland section of Pittsburgh.
France is in Europe.
It rains outside.
2 is a prime number and 6 is a prime
Complex Sentences:
The truth value of a complex sentence in a model is computed from the
truth-values of its atomic sub-sentences in the same way as in
propositional logic.
Ex: Brother(Lakshman, Ram) ∧ Brother(Ram , Lakshman) : Lakshman is
the brother of Ram and is the Ram brother of Lakshman
Semantics of FOPL(Quantifiers)
Universal quantifiers:Universal Quantification allows us to make a
statement about a collection of objects:
∀(x) : Cat(x) ⇒ Mammal(x) : All cats are mammals
∀(x) : Father(Bill,x) ⇒ Mother(Hillary,x) : All of Bill’s kids are also
Hillary’s kids.
1. Man(Marcus)
2. Pompeian(Marcus)
3. ∀ x Pompeian(x) ⇒ Roman(x)
4. Ruler(Caesar)
5. ∀x Romans(x) ⇒ Loyalto(x,Caesar) ∨ Hate(x,Caesar)
6. ∀x ∃y Loyalto(x,y)
7. ∀x ∀y Man(x) ∧ Ruler(y) ∧ Tryassassinate(x,y) ⇒ ¬
Loyalto(x,y)
8. Tryassassinate(Marcus,Caesar)
Thank You
Knowledge Representation &
Reasoning
Predicate Logic(Clause Form Conversion)
Lecture:34
Conversion to Clause Form
Eliminate bi-conditionals and implications:
Eliminate ⇔, replacing α ⇔ β with (α ⇒ β) ∧ (β ⇒ α).
Eliminate ⇒, replacing α ⇒ β with ¬α ∨ β.
Move ¬ inwards:
¬(∀ x p) ≡ ∃ x ¬p,
¬(∃ x p) ≡ ∀ x ¬p,
¬(α ∨ β) ≡ ¬α ∧ ¬β,
¬(α ∧ β) ≡ ¬α ∨ ¬β,
¬¬α ≡ α.
Standardize variables apart by renaming them: each quantifier
should use a different variable.
Conversion to Clause Form
Skolemize: each existential variable is replaced by a Skolem
constant or Skolem function of the enclosing universally
quantified variables.
For instance, ∃x Rich(x) becomes Rich(G1) where G1 is a new
Skolem constant.
“Everyone has a heart”: ∀ x Person(x) ⇒ ∃y Heart(y) ∧ Has(x, y)
becomes ∀ x Person(x) ⇒ Heart(H(x)) ∧ Has(x, H(x)), where H is a
new symbol (Skolem function).
Drop universal quantifiers.
For instance, ∀ x Person(x) becomes Person(x).
Distribute ∧ over ∨:
(α ∧ β) ∨ γ ≡ (α ∨ γ) ∧ (β ∨ γ).
Conversion to Clause Form-Example
1. Man(marcus)
2. Pompian(marcus)
3. Pompeian(x) ∨ Roman(x)
4. ruler(caesar)
5. (Roman(X)∨ loyal(X,caesar)∨ hate(X,caesar))
6. (loyal(X,f(X))
7. (person(X) ∨ ruler(Y) ∨ tryassasin(X,Y) ∨
loyal(X,Y))
8. tryassasin(marcus,caesar)
Solemnization
Skolemization eliminates existential quantifiers by replacing
each existentially quantified variable with a Skolem
constant or Skolem function. It is done as follows:
If the left most quantifier in an expression is an existential
quantifier, replaces all occurance of the variable it quantifies
with an arbitrary constant not appearing elsewhere in the
expression & delete the quantifier. The same procedure
should be followed for all other existential quantifier not
preceeded by a universal quantifier.
Ex:
∃x Rich(x) becomes Rich(G1) where G1 is a new Skolem
constant
For each existential quantifier that is preceeded by one or
more universal quantifier, replace all occurance of the
existential quantifier variable by a function symbol not
appearing elsewhere in the expression. The argument
assigned to the function should match all the variable
appearing in each universal quantifier, which proceeds the
existential quantifier.
Ex:
∃u ∀v ∀x ∃y : P(f(u), v, x, y) ⇒ Q(v, u, y)
∀v ∀x P(f(a), v, x, g(v, x)) ⇒ Q(v, u, g(v, x))
u ⇒ a , because it is preceded by ∃
y ⇒ g(v, x), because it is preceded by before ∀v ∀x
Inference Rules for Quantifiers:
Universal Instantiation: The rule of Universal Instantiation says that we can
infer any sentence obtained by substituting a ground term for the variable. To
write the inference rule formally , we use the notation of substitution –
SUBSET(𝞠, P) to denote the result of applying substitution 𝞠 to the sentence as
shown below:
∀v P
SUBSET[(v/g), P]
Then we apply UI(Universal Instantiation) to the first sentence using all possible
ground term substitutions from the vocabulary of knowledge base. i.e:
(x | Ravana)
(x | Kumbi)
king(Ravana) ∧ Greedy(Ravana) ⇒ Evil(Ravana)
king(Kumbi) ∧ Greedy(Kumbi) ⇒ Evil(Kumbi)
(c). If the resolvent is the empty clause, then a contradiction has been found. If it
is not then add it to the set of clauses available to the procedure.
Example
1. Marcus was a man
2. Marcus was a Pompeian
3. All Pompeians were Romans
4. Caesar was a ruler.
5. All Romans were either loyal to Caesar or hated him.
6. Everyone is loyal to someone.
7. Men only try to assassinate rulers they are not loyal to.
8. Marcus tried to assassinate Caesar
iii) The substitution must be consistent. Substituting y for x now and then z
for x later is inconsistent. (a substitution y for x written as y/x)
Unify Q(x) and P(x) ------ FAIL as literals are different and can not be unified
Unify Q(x) and Q(x)------ Nil as literals are identical so no scope of
unification
Unify P(x) and P(x,y)-- ---- FAIL as both literals have different number of
arguments
Algorithm: UNIFY (L1, L2)
1. if L1 or L2 is an atom part of same thing do
(a) if L1 or L2 are identical then return NIL
(b) else if L1 is a variable then if L1 occurs in L2 then return {Fail} else return (L2/L1).
(c) else if L2 is a variable then if L2 occurs in L1 then return {Fail} else return (L1/L2).
(d)else return F.
2. If the initial predicate symbols in L1 and L2 are not identical, then return {Fail}.
3.If L1 and L2 have different number of arguments, then return {Fail}.
4. Set SUBSET to NIL.( at the end of this procedure , SUBSET will contain all the
substitutions used to unify L1 and L2).
5. For i = 1 to number of elements in L1 :
a) call UNIFY with the i th argument of L1 and i th argument of L2, putting the result in S.
b) if S contain Fail then return {Fail}.
c) if S is not equal to NIL then :
i) apply S to the remainder of both L1 and L2
ii) SUBST := APPEND (S, SUBST) return SUBST.
6. return SUBSET.
Example 2: P(~A) – The probability to pick a card that is not a spade out of
a deck of 52 cards is 1 – 1/4 = 3/4
Example 3: P(A v B) – The probability to pick a card that is either a spade
or an Ace is 1/4 + 1/13 - 1/4 *1/13 = 16/52 = 4/13
Another way to obtain the same result: There are 13 spade cards and 3
additional Ace cards in the set of desired outcomes. The total number of
cards is 52, thus the probability is 16/52.
Example 4: P(A ∧ B) – The probability to pick the spade Ace is 1/52.
Random Variables and Probability
Distributions
To handle more conveniently the outcomes, we can
treat them as values of so called random variables. For
example “spade” is one possible value of the variable
Suit, “clubs” is another possible value.
The set of the probabilities of each value is called
probability distribution of the random variable.
Let X be a random variable with a domain. <x 1, x2 , x3 ,
……… xn>
The probability distribution of X is denoted by
P(X) = <P(X = x1), P(X = x2), ………….P(X = xn)>
Note that: P(X = x1) + P(X = x2) + …+ P(X = xn) = 1
Example 5: Let Weather be a random variable with
values Assume that records for some town show that in
a year 100 days are rainy, 50 days are snowy, 120 days
are cloudy (but without snow or rain) and 95 days are
sunny.
i.e. P(Weather = sunny) = 95/365 = 0.26
P(Weather = cloudy) = 120/365 = 0.33
P(Weather = rainy) = 100/365 = 0.27
P(Weather = snowy) = 50/365 = 0.14 Thus
P(Weather) = <0.26, 0.33, 0.27. 0.14>is the probability
distribution of the random variable Weather.
Joint Distributions
Lets take an Example to understand the concept:
Consider a sample S of of 1000 individuals age 25 – 30. Assume
that 600 individuals come from high-income families, 570 of those
with high income have college education and 100 individuals with
low income have college education.
The following table illustrates the example:
C ~C
College ed. Not college ed.
(High income) 570 30 600
~H( Low income) 100 300 400
670 330 1000
Example contd..
Let H be the subset of S of individuals coming from high-income
families, |H| = 600
Let C be the subset of S of individuals that have college education, |
C| = 670.
We can compute also P(H&C), P(H & ~C), P(~H & C), P(~H & ~C)
Example contd..
P(H&C) = |H &C| / |S| = 570/1000 = 0.57 (57%) - the
probability of a randomly selected individual in S to be of
high-income family and to have college education.
P(H & ~C) = |H& ~C| / |S| = 30/1000 = 0.03 (3%) - the
probability of a randomly selected individual in S to be of
high-income family and not to have college education.
P(~H & C) = |~H& C| / |S| = 100/1000 = 0.1 (10%) - the
probability of a randomly selected individual in S to be of
low-income family and to have college education.
P(~H & ~C) = |~H& ~C| / |S| = 300/1000 = 0.3(30%) - the
probability of a randomly selected individual in S to be of
low-income family and not to have college education.
Example contd..
Thus we come to the following table:
C ~C
College ed. Not college ed.
H High income 0.57 0.03 0.6
~H Low income 0.10 0.30 0.4
0.67 0.33 1.0
Here we will treat C and H as random variables with values
“yes” and “no”. The values in the table represent the joint
distribution of C and H,
For example P(C = yes, H = yes) = 0.57
Joint Distribution Definition
Definition: Let X1, X2, .., Xn be a set of random variables each
with a range of specific values. P(X1,X2,…,Xn) is called joint
distribution of the variables X1, X2, …, Xn and it is defined by a
n dimensional table, where each cell corresponds to one particular
assignment of values to the variables X1, X2, …, Xn.
Each cell in the table corresponds to an atomic event – described
by a particular assignment of values to the variables.
Given a joint distribution table we can compute prior
probabilities:
P(H) = P(H & C) + P(H& ~C) = 0.57 + 0.03 = 0.6
Given a joint distribution table we can compute conditional
probabilities.
Conditional Probabilities
We may ask: what is the probability of an individual in
S to have a college education given that he/she comes
from a high income family?
In this case we consider only those individuals that
come from high income families. Their number is 600.
The number of individuals with college education
within the group of high-family income is 570. Thus
the probability to have college education given high-
income family is 570/600 = 0.95.
This type of probability is called conditional
probability
The probability of event B given event A is denoted as P(B|A), read “P of B
given A
|C & H|
In our example, P(C|H) = ----------------
|H|
We will represent P(C|H) by P(C&H) and P(H)
|C & H|
-------------
|C & H| |S| P(C&H)
P(C|H) = ------------- = ------------------ = --------------
|H| |H| P(H)
------------
|S|
Therefore P(C|H) = P(C&H) / P(H)
Conditional Probabilities:
Definition
The conditional probability of an event B to occur given that event A has
occurred is:
P(B&A)
P(B|A) = -----------------
P(A)
P(B|A) is known also as posterior probability of B.
P(B & A) is an element of the joint distribution of the random variables A
and B.
In our example, P(C&H) = P(C = yes, H = yes). Thus given the joint
distribution P (H, C), we can compute the prior probability P(H), P(~H),
P(C), P(~C) and then the conditional probability P(C|H), P(C|~H), P(H|
C), P(H|~C) .
P(C & H) 0.57
P(C|H) =----------------------- = --------------= 0.95
P(H) 0.6
Independent Events
Some events are not related, for example each
outcome in a sequence of coin flips is independent on
the previous outcome.
Bayesian Belief Networks
A Bayesian network, Bayes network, belief
network, Bayes(ian) model or probabilistic directed acyclic
graphical model is a probabilistic graphical model (a type
of statistical model) that represents a set of random
variables and their conditional dependencies via a directed
acyclic graph (DAG).
A ≻ B : A is preferred to B
A ∼ B :The agent is indifferent between A and B
A ≿ B : The agent prefers A to B, or is indifferent between them.
Axioms of Utility Theory
Orderability : Given any two states, the a rational agent prefers one of them, else the two
as equally preferable.
(A ≻ B) ∨ (B ≻ A) ∨ (A ∼ B)
Transitivity : Given any three states, if an agent prefers A to B and prefers B to C, agent
must prefer A to C.
(A ≻ B) ∧ (B ≻ C) ⇒ (A ≻ C)
Continuity : If some state B is between A and C in preference, then there is a p for which
the rational agent will be indifferent between state B and the lottery in which A comes with
probability p, C with probability (1-p).
(A ≻ B ≻ C) ⇒ ∃p [p,A; 1-p,C] ∼ B
Substitutability : If an agent is indifferent between two lotteries, A and B, then there is a
more complex lottery in which A can be substituted with B.
(A ∼ B) ⇒ [p,A; 1-p,C] ∼ [p,B; 1-p,C]
Monotonicity: If an agent prefers A to B, then the agent must prefer the lottery in which A
occurs with a higher probability
(A ≻ B) ⇒ (p q ⇔ [p,A; 1-p,B] ≿ [q,A; 1-q,B])
Decomposability : Compound lotteries can be reduced to simpler lotteries using the laws
of probability.[
p,A; 1-p,[q,B; 1-q,C]] ∼ [p,A; (1-p)q,B; (1-p)(1-q),C]
Maximum Expected Utility principle
(MEU)
The agent obeys the axioms of utility theory, then-
It follows from these axioms that there exists a real-valued
function U that operates on states such that
U(A) > U(B) ⇔ A ≻ B
U(A) = U(B) ⇔ A ∼ B
The utility of the lottery is the expected utility, that is the sum
of utilities of outcomes weighted by their probability
U[p :A; (1-p): B] = p U(A) + (1-p)U( B)
Maximum Expected Utility principle (MEU): Rational agent
makes the decisions in the presence of uncertainty by
maximizing its expected utility.
Utility Values
A numeric utility value only has to provide an ordering
of axioms according to preference. They can be
subjective any other way.
Captures preferences for rewards, and resource
consumption.
Captures risk attitude.
Expected Monetary Value(EMV)
EMV is the "strict objective utility" where utility is in dollar
amounts
gambling a $1000 on the toss of a coin, the EMV would be
%50*$1000 = $500
Grayson (1960) proved Bernoulli (1738) right showing that
the utility of money for most people is proportional to the
logarithm of the amount.
U(current_$ + $_gained) = -263.31 + 22.09log(n + 150,000)
for the range -$150,000-$800,000