Knowledge Representation & Reasoning

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 134

Knowledge Representation &

Reasoning
Knowledge Based Systems
Lecture:28
Early Belief
The early researchers in AI believed that best approach
to problem solution is through the development of
general purpose problem solvers

The system becomes effective only when the solution


methods incorporate domain specific rules and facts.
Definition and Importance of Knowledge
Knowledge can be defined as the body of facts and principles
accumulated by Humankind.

The meaning of knowledge is closely related to the meaning of


intelligence. Intelligence requires possession of an access to
knowledge.

E.g.: In biological organisms, knowledge is stored as complex


structures or interconnected neurons.

E.g.: In computers, knowledge is also stored as symbolic structures,


but in form of collection of magnetic spots and voltage states.
Knowledge Vs Data
Knowledge should not be confused with data. Let’s take
an example to differentiate both:-
A physician treating a patient uses both knowledge and data.
The data is patient’s record, including patient history,
measurements of vital signs, drugs given, response to drugs
and so on..
The knowledge is what the physician learned in medical
school and in years of internship, specialization and practice.

Thus we can say that knowledge requires the use of data


and information.
Belief & Hypothesis
It consists of facts, prejudices, beliefs, hypothesis and
most importantly heuristics.

Belief: Belief is meaningful and coherent expression


(may be true or false).
Hypothesis: Justified belief that is not known to be
true. Thus hypothesis is a belief which is backed up
with some supporting evidence.
Knowledge: True justified belief.
Types of Knowledge
Procedural Knowledge: Procedural knowledge is
compiled knowledge related to the performance of
some tasks. Ex-steps used to solve some algebraic
equation.
Declarative Knowledge: It is passive knowledge
expressed as statements of facts about the world. Ex.
Personal data in a database is example of declarative
knowledge.
Heuristic knowledge: It is an special type of
knowledge used by humans to solve a complex
problem. Heuristics are the knowledge used to make
good judgments.
Knowledge Based Agents
 Knowledge-Based agents combine general knowledge
with current percepts to infer hidden aspects of current
state prior to selecting actions.
The agent operates as follows:
1. It TELLs the knowledge base what it perceives.
2. It ASKs the knowledge base what action it should
perform.
3. It performs the chosen action.
Knowledge Based Systems
In the early days of AI, one of the important lessons
learned was that general purpose problem solvers
which used a limited number of laws or axioms, that are
too weak to be effective in problem solving of any
complexity.

Thus this realization eventually led to the design of


what is now known as knowledge based system. Ex.
Dendral, Mycin
Components of Knowledge Based System
The knowledge is stored in a knowledge base separate
from control and inferencing components.
 This makes it possible to add new knowledge or refine
exiting knowledge without recompiling the control and
inferencing programs.
Properties of Knowledge Representation
Strategies
Representational Adequacy -- the ability to represent
the required knowledge.
Inferential Adequacy - the ability to manipulate the
knowledge represented to produce new knowledge
corresponding to that inferred from the original.
Inferential Efficiency -- the ability to direct the
inferential mechanisms into the most productive
directions by storing appropriate guides.
Acquisitional Efficiency- the ability to acquire new
knowledge using automatic methods wherever possible
rather than reliance on human intervention.
Thank You
Knowledge Representation &
Reasoning
Approaches to Knowledge Representation
Lecture:29
Approaches to Knowledge Representation

There are FOUR approaches to KR:

Simple relational knowledge


Inheritable knowledge
Inferential Knowledge
Procedural Knowledge
Simple relational knowledge
The simplest way of storing facts is to use a
relational method where each fact about a set of
objects is set out systematically in columns.
This representation gives little opportunity for
inference, but it can be used as the knowledge
basis for inference engines.
Inheritable knowledge
Relational knowledge is made
up of objects consisting of
attributes corresponding
associated values. Knowledge
base is extended more by
allowing inference
mechanisms:-
Property inheritance
 Here elements inherit values from
being members of a class and data
must be organized into a
hierarchy of classes.

slot and filler structure


Inferential Knowledge
Represent knowledge as formal logic and can be used to
derive more facts. Ex:
All dogs have tails
V(X) : dog(x)  hasatail(x) 

Advantages:
A set of strict rules.
Can be used to derive more facts.
Truths of new statements can be verified.
Guaranteed correctness.
Popular in AI systems. e.g Automated theorem proving
Procedural Knowledge
Here the Knowledge is encoded in some procedures. It
consists of small programs that know how to do
specific things, how to proceed.
Representation of Knowledge
Propositional Logic
FOPL(First Order Predicate knowledge)
Frames & Associative Networks
Scripts
Case grammar theory
Production Rules
Inference System
Forward & backward declaration
Knowledge Representation &
Reasoning
Propositional Logic
Lecture:30
What is Logic???

The logic consists of:-


A formal system for expressing knowledge about a
domain consisting of-
 Syntax set of legal sentences(well formed formula)
 Semantic interpretation of legal sentences

A Proof System –a set of axioms plus rules of inference


for deducing sentences from a knowledge base.
Propositional Logic
Propositional logic deals with the determination of the truth of a
sentence.
The sentence holds a propositional symbols, that can either be
true or false.
The names of the symbols can be anything from alphabets like P,
Q or R to symbols like α, β, γ to variable names like ‘IsOld’, and
may hold meaning relative to their contexts in the concept.

Ex:
It is raining = P
New Delhi is the capital of India = Q
Proposition
A statement that is either true or false.
Examples of propositions:
Pitt is located in the Oakland section of Pittsburgh.
France is in Europe.
It rains outside.
2 is a prime number and 6 is a prime

How are you? Not a proposition.


Syntax
The syntax of Propositional logic defines the allowable sentences. It is of
Two types:-

Atomic Sentence: It consist of a single propositional symbol. Each symbol


stands for proposition either TRUE or FALSE.Two prepositions are fixed
in meaning. TRUE is always true and FALSE is always false.

Complex Sentences: Complex Sentences are constructed from simpler


sentences using logical connectives. There are five connectives in common
use:
¬ or ~ (not) : A sentence such as ¬P or ~P is called negation P. a literal is either an
atomic sentence or a negated atomic sentence.
∧(and) : A sentence such as P∧Q is called conjunction.
∨(or) : A sentence such as P∨Q is called disjunction.
⇒ or → (implies) : A sentence such as (P∨Q) → R is called implication or
conditional. Its premise or antecedent is (P∨Q) and its consequent or conclusion is
R. Implication are also called IF_THEN statements.
⇔ or ↔ (if and only if or Iff): P ↔ Q is bi-condional sentence.
Semantics
Semantics or meaning of a sentence is just the value
True or False. It is an assignment of the truth value to
sentence.
Properties of Statements/Sentence
Satisfiability : A statement is satisfiable if there is some
interpretation for which it is true.
Contradiction: A sentence is contradictory(unsatisfiable) If there
is no interpretation and no world in which a sentence is true.
Validity: A sentence is valid or necessarily true if and only if it is
true under all possible interpretations in all possible worlds.
Logical Equivalence: Two sentences are logically equivalent if
they have the same truth value under every interpretation, written
as P∧Q and Q∧P are logically equivalent.

Validity and satisfiablity are connected. P is valid iff ~P is


unsatisfiable; contrapositively, P is satisfiable iff ~P is not valid.
Laws in PL
Reasoning Patterns in Propositional logic
One method is to use the truth tables as the standard semantic method.
That is, new sentences, which are the logical consequences of the
given in the knowledgebase are inferred. This inference can be drawn
with the help of following rules:
Modus Ponens: the best known inference rule is called Modus
Ponens. From P and P→Q, Q is inferred. The rule is written as:
P
P→Q
Q
Ex:
Given : (Gaurav is a father)
And : (Gaurav is a father) → (Gaurav has a child)
Conclude : (Gaurav has a child)
Chain Rule: From P→Q and Q→P, P→R inferred as:
P→Q
Q→R
P→R
Ex:Given : (Gaurav is a father) → (Gaurav has a child)
And : (Gaurav has a child) → (Gaurav is married)
Conclude : (Gaurav is a father) → (Gaurav is married)

And-Elimination: From a conjunction any of the conjuncts can be


inferred.
P∧Q
P
Ex: (Gaurav is a father) ∧ (Gaurav has a child)
Gaurav is a father or Gaurav has a child can be inferred.
Thank You
Knowledge Representation &
Reasoning
Propositional Logic(CNF,DNF)
Lecture:31
Normal Forms in Propositional Logic

CNF(Conjunctive Normal Form): A formula P is in


CNF if it is of the form
P = P1 ∧ P2 ∧ P3.……….∧ Pn ; n ≥ 1

DNF(Disjunctive Normal Form): A formula P is in


CNF if it is of the form
P = P1 ∨ P2 ∨ P3 ..……….∨ Pn; ; n ≥ 1
Conversion Procedure to Normal Form
Step I : Eliminate implications and bidirectional. We
use the following laws :
(P→Q) = (¬P∨Q)
(P↔Q)= (P→Q) ∧ (Q→P)
= (¬P ∨ Q) ∧ (¬Q ∨ P)
Step II : Reduce the NOT symbol by the formula (¬
(¬P)) = P and apply De-Morgan’s theorem to bring
negation before the atoms.
¬ (P ∨ Q) = ¬ P ∧ ¬ Q
¬ (P ∧ Q) = ¬ P ∨ ¬ Q
Step III : Use Distributive laws and other equivalent formula given in
table III to obtain the normal form
P ∧ (Q ∨ R) = (P ∧ Q) ∨ (P ∧ R)
P ∨ (Q ∧ R) = (P ∨ Q) ∧ (P ∨ R)

Example : Convert ((P→Q) → R) into CNF


((P→Q) → R)
= ¬ (P→Q) ∨ R
= ¬ (¬ P ∨ Q) ∨ R
= (P ∧ ¬ Q) ∨ R
= (P ∨ R) ∧ (¬ Q ∨ R)

Hence, (P ∨ R) ∧ (¬ Q ∨ R) is the CNF of ((P→Q) → R).


Knowledge Representation &
Reasoning
Propositional Logic(Resolution)
Lecture:32
Resolution
Resolution is a single inference rule. It is a simple
iterative procedure , where at each step two clauses
(called the parent clause) are resolved to yield a new
clause, that has been inferred from them .Suppose
there are two clauses in the system are:
winter ∨ summer
¬ winter ∨ cold
summer ∨ cold

Resolution procedure will make this deduction.


Resolution contd..
Resolution process proofs by Refutation.

In other words, to prove a statement (show that it is


valid), resolution shown the negation of the statement
produces a contradiction with the known statements. If
the produced clause is empty clause, then a
contradiction has been found. If a contradiction exists,
then eventually it will be found.
Resolution in Propositional Logic
Algorithm of Propositional Resolution :
1. Convert all the propositions of F to clause form
2. Negate P and convert the result to clause form. Add it to the set of
clauses obtained in step 1.
3. Repeat until either a contradiction is found or no progress can be made:
1. Select two clauses. Call these the parent clauses.
2. Resolve them together. The resulting clause, called the resolvent, will be
the disjunction of all of the literals of both of the parent clauses with the
following exception: If there are any pairs of literals L and ¬L such that
one of the parent clauses contains L and the other contains ¬L, then select
one such pair and eliminate both L and ¬L from the resolvent.
3. If the resolvent is the empty clause, then a contradiction has been found. If
it is not then add it to the set of clauses available to the procedure.
Suppose we are given the axioms shown in the first
column of following table and we want to prove R.
Converted to Clause
Sl. No Given Axioms
Form
1 P P
2 (P˄Q) → R ¬(P˄Q)˅ R
=¬P ˅ ¬Q ˅ R
3 (S˅T) → Q ¬S ˅ Q
4   ¬T ˅ Q
5 T T

Now the algorithm will be applied. First the axioms


are converted into Clause form. Then we negate R,
producing ¬R, which is already in clause form. Then
we select the pairs of clauses to resolve together.
In order for proposition 2 to be true, one of three
things must be true: ¬P, ¬Q, or R. But we are assuming
¬R is true.

 Thus these clauses to be true in a single interpretation.


This is indicated by the empty clause.
Features of Propositional Logic- Merits
The propositional logic is declarative. The syntax
corresponds to facts.
Unlike most data structures and databases
propositional logic allows partial/disjunctive /negated
information
It is compositional in nature i.e. P1 ∧ P2 is derived from
meaning of P1 and P2.
propositional logic is context-independent unlike
natural language, where meaning depends on the
context.
Features of Propositional Logic-Limitations
Propositional logic is only one of the many formal languages. It is
possible that the structure of an argument is lost in converting it from
English to propositional logic.
Ex:1
Socrates is a man.
Plato is man.
All men are mortal.
P: Socrates is a man.
Q: Plato is man.
R: All men are mortal.
Here, we cannot infer from these sentences that Socrates and Plato are
mortal, because propositional logic fails in capturing the relationship
between any individual (Plato or Socrates).
Limitations
Ex: 2
All men are mortal.
Aristotle is a man.
Therefore, Aristotle is mortal. 
A: All men are mortal.
B: Aristotle is a man.
C: Therefore, Aristotle is mortal.
 The argument becomes :
A, B
C
Although clearly a "logical" conclusion, this is a completely invalid
argument in propositional logic since A, B and C have no relations to
each other.
Limitations
Propositional does not permit us to make generalized
statements about class of similar objects.
Ex: All men are moral. can only be represented like:
man1 is moral ∧ man2 is mortal ∧ . ……….
Thank You
Knowledge Representation &
Reasoning
Predicate Logic
Lecture:33
Predicate Logic/ First order Predicate
logic(FOPL)
Propositional logic : Assumes that the world contains
facts
First-order logic: Assumes that the world contains
Objects
people, houses, numbers, theories, Donald Duck, colors,
centuries, …
Relations
red, round, prime, multistoried, ... brother of, bigger than,
part of, has color, occurred after, owns, …
Functions
+, middle of, father of, one more than, beginning of, ...
Syntax of FOPL
Syntax of FOPL like PL is determined by the allowed symbols and
rules of combinations.

 Constants: are fixed value terms that belongs to a given domain of


discourse. Ex: KingJohn, 2, Koblenz, C,...
 Predicates: Predicate Symbols refer to a particular relation among
objects. Ex: Brother, >, =, ...
 Functions : Functions allow us to refer to objects indirectly (via some
relationship). Ex: Sqrt, LeftLegOf , ...
 Variables: are terms that can assume different values over a given
domain. Ex: x, y, a, b, ...
 Connectives : ∧ ,∨, ¬ ,⇒ ,⇔
 Quantifiers : ∀(Universal quantifiers), ∃(Existential Quantifiers)
Syntax of FOPL
Syntax of First-order Logic: Atomic Sentences
Example : Brother ( KingJohn , RichardTheLionheart )
 ( Length(LeftLegOf(Richard)), Length(LeftLegOf(KingJohn))
)

Syntax of First-order Logic: Complex Sentences :


Built from atomic sentences using connectives
¬S, S1 ∧S2 ,S1 ∨S2, S1 ⇒ S2, S1 ⇔ S2 (as in propositional logic)

Example: Sibling( KingJohn,Richard ) ⇒


Sibling( Richard,KingJohn )
Semantics of FOPL
Semantics are determined by interpretation assigned to
predicate rather than prepositions.
Models of first-order logic:
Sentences are true or false with respect to models, which
consist of a domain (also called universe) and an interpretation
Domain: A non-empty (finite or infinite) set of arbitrary
elements
Interpretation: Assigns to each-
– constant symbol: a domain element
– predicate symbol: a relation on the domain
– function symbol: a function on the domain
Semantics of FOPL contd..
Atomic sentences:
predicate ( term1;………; termn )
Constants and variable functional expression are referred to as term is true
in a certain model (that consists of a domain and an interpretation)
iff the domain elements that are the interpretations of term1;…….;termn are
in the relation that is the interpretation of predicate
Ex: Brother(Lakshman, Ram) : Lakshman is the brother of Ram.

Complex Sentences:
The truth value of a complex sentence in a model is computed from the
truth-values of its atomic sub-sentences in the same way as in
propositional logic.
Ex: Brother(Lakshman, Ram) ∧ Brother(Ram , Lakshman) : Lakshman is
the brother of Ram and is the Ram brother of Lakshman
Semantics of FOPL(Quantifiers)
Universal quantifiers:Universal Quantification allows us to make a
statement about a collection of objects:
∀(x) : Cat(x) ⇒ Mammal(x) : All cats are mammals 
∀(x) : Father(Bill,x) ⇒ Mother(Hillary,x) : All of Bill’s kids are also
Hillary’s kids.

Existential Quantifiers :Existential Quantification allows us to state


that an object does exist (without naming it):
∃(x) : Cat(x) ∧ Mean(x) : There is a mean cat.
∃(x) : Father(Bill,x) ∧ Mother(Hillary,x) : There is a kid whose father
is Bill and whose
mother is Hillary.
Nested Quantifiers :
∀x,y Parent(x,y) ⇒ Child(y,x) : All parents love child.
∀x∃y Loves(x,y) : everybody loves somebody.
Connection between ∀ and ∃
Both quantifiers are intimately connected to each
other, through negation.
Ex: Everyone dislikes garlic : ∀x ¬Likes(x, Garlic)
 There does not exist someone who likes garlic :
¬ ∃y Likes(x, Garlic)
Both the sentences are similar in meaning, therefore
∀x ¬Likes(x, Garlic) ≡ ¬ ∃y Likes(x, Garlic)
Example
Representing facts with Predicate
Logic

1. Marcus was a man


2. Marcus was a Pompeian
3. All Pompeian were Romans
4. Caesar was a ruler.
5. All Romans were either loyal
to Caesar or hated him.
6. Everyone is loyal to someone.
7. Men only try to assassinate
rulers they are not loyal to.
8. Marcus tried to assassinate
Caesar
Predicate Logic Knowledgebase

1. Man(Marcus)
2. Pompeian(Marcus)
3. ∀ x Pompeian(x) ⇒ Roman(x)
4. Ruler(Caesar)
5. ∀x Romans(x) ⇒ Loyalto(x,Caesar) ∨ Hate(x,Caesar)
6. ∀x ∃y Loyalto(x,y)
7. ∀x ∀y Man(x) ∧ Ruler(y) ∧ Tryassassinate(x,y) ⇒ ¬
Loyalto(x,y)
8. Tryassassinate(Marcus,Caesar)
Thank You
Knowledge Representation &
Reasoning
Predicate Logic(Clause Form Conversion)
Lecture:34
Conversion to Clause Form
Eliminate bi-conditionals and implications:
Eliminate ⇔, replacing α ⇔ β with (α ⇒ β) ∧ (β ⇒ α).
Eliminate ⇒, replacing α ⇒ β with ¬α ∨ β.
Move ¬ inwards:
¬(∀ x p) ≡ ∃ x ¬p,
¬(∃ x p) ≡ ∀ x ¬p,
¬(α ∨ β) ≡ ¬α ∧ ¬β,
¬(α ∧ β) ≡ ¬α ∨ ¬β,
¬¬α ≡ α.
Standardize variables apart by renaming them: each quantifier
should use a different variable.
Conversion to Clause Form
Skolemize: each existential variable is replaced by a Skolem
constant or Skolem function of the enclosing universally
quantified variables.
For instance, ∃x Rich(x) becomes Rich(G1) where G1 is a new
Skolem constant.
“Everyone has a heart”: ∀ x Person(x) ⇒ ∃y Heart(y) ∧ Has(x, y)
becomes ∀ x Person(x) ⇒ Heart(H(x)) ∧ Has(x, H(x)), where H is a
new symbol (Skolem function).
Drop universal quantifiers.
For instance, ∀ x Person(x) becomes Person(x).
Distribute ∧ over ∨:
(α ∧ β) ∨ γ ≡ (α ∨ γ) ∧ (β ∨ γ).
Conversion to Clause Form-Example
1. Man(marcus)
2. Pompian(marcus)
3.  Pompeian(x) ∨ Roman(x)
4. ruler(caesar)
5. (Roman(X)∨ loyal(X,caesar)∨ hate(X,caesar))
6. (loyal(X,f(X))
7. (person(X) ∨ ruler(Y) ∨ tryassasin(X,Y) ∨
loyal(X,Y))
8. tryassasin(marcus,caesar)
Solemnization
Skolemization eliminates existential quantifiers by replacing
each existentially quantified variable with a Skolem
constant or Skolem function. It is done as follows:
If the left most quantifier in an expression is an existential
quantifier, replaces all occurance of the variable it quantifies
with an arbitrary constant not appearing elsewhere in the
expression & delete the quantifier. The same procedure
should be followed for all other existential quantifier not
preceeded by a universal quantifier.
Ex:
∃x Rich(x) becomes Rich(G1) where G1 is a new Skolem
constant
For each existential quantifier that is preceeded by one or
more universal quantifier, replace all occurance of the
existential quantifier variable by a function symbol not
appearing elsewhere in the expression. The argument
assigned to the function should match all the variable
appearing in each universal quantifier, which proceeds the
existential quantifier.
Ex:
∃u ∀v ∀x ∃y : P(f(u), v, x, y) ⇒ Q(v, u, y)
∀v ∀x P(f(a), v, x, g(v, x)) ⇒ Q(v, u, g(v, x))
u ⇒ a , because it is preceded by ∃
y ⇒ g(v, x), because it is preceded by before ∀v ∀x
Inference Rules for Quantifiers:
Universal Instantiation: The rule of Universal Instantiation says that we can
infer any sentence obtained by substituting a ground term for the variable. To
write the inference rule formally , we use the notation of substitution –
SUBSET(𝞠, P) to denote the result of applying substitution 𝞠 to the sentence as
shown below:
∀v P
SUBSET[(v/g), P]

For any variable v and ground symbol(a term without variable) g.


Ex:A knowledge base contains the axiom : “All greedy kings are evil”, therefore
∀x: king(x) ∧ Greedy(x) ⇒ Evil(x)
By applying the rule for Universal Instantiation as
∀x P
SUBSET[(x/Duryodhan), P]
the statement becomes:
king(Duryodhan) ∧ Greedy(Duryodhan) ⇒ Evil(Duryodhan)
Inference Rules for Quantifiers:
Existential Instantiation: Existential Instantiation is
slightly complicated than UI.for any sentence P variable
v and constant k which does not appear anywhere else in
the knowledgebase.
∃v P
SUBSET[(v/k), P]
Ex: from the sentence ∃x Crown(x) ∧ OnHead(x, Ram)
We can infer the sentence
Crown(u) ∧ OnHead(u, Ram)
Here u, does not appear in the knowledgebase elsewhere.
Thank You
Knowledge Representation &
Reasoning
Predicate Logic(Resolution)
Lecture:35
Resolution to Propositional Logic
It is possible to reduce first order inference to propositional inference. Suppose our
knowledgebase contains:
∀x: king(x) ∧ Greedy(x) ⇒ Evil(x)
King(Ravana)
Greedy(Ravana)
Brother(Kumbi, Ravana)

Then we apply UI(Universal Instantiation) to the first sentence using all possible
ground term substitutions from the vocabulary of knowledge base. i.e:
(x | Ravana)
(x | Kumbi)
king(Ravana) ∧ Greedy(Ravana) ⇒ Evil(Ravana)
king(Kumbi) ∧ Greedy(Kumbi) ⇒ Evil(Kumbi)

if we view king(Kumbi) , Greedy(Kumbi) as propositional symbols, we can apply any


of the complete propositional algorithms to obtain conclusion, such as Evil(Kumbi).
Resolution Principle
Robinson in 1965 introduced the resolution principle,
which can be directly applied to any set of clauses.

The principal is  "Given any two clauses A and B, if


there is a literal P1 in A which has a  complementary
literal P2 in B, delete P1 & P2 from A and B and
construct a disjunction of the remaining clauses. The
clause so constructed is called resolvent of A and B." 
Resolution Algorithm
1. Convert all the statements of F to clause form
2. Negate P and convert the result to clause form. Add it to the set of clauses
obtained in step 1.
3. Repeat until either a contradiction is found or no progress can be made or a
predetermined amount of effort has been expended:
(a) Select two clauses. Call these the parent clauses.
(b) Resolve them together. The resulting clause, called the resolvent, will be
the disjunction of all of the literals of both the parent clauses with appropriate
substitutions performed and with the following exception:
If there is one pair of literals T1 and T2 such that one of the parent clauses contains
T1 and the other contains T2 and if T1 and T2 are unifiable, then neither T1 nor T2
should appear in the resolvent. We call T1 and T2 complementary literals. Use the
substitution produced by the unification to create the resolvent. If there is one pair of
complementary literals, only one such pair should be omitted from the resolvent.

(c). If the resolvent is the empty clause, then a contradiction has been found. If it
is not then add it to the set of clauses available to the procedure.
Example
1. Marcus was a man
2. Marcus was a Pompeian
3. All Pompeians were Romans
4. Caesar was a ruler.
5. All Romans were either loyal to Caesar or hated him.
6. Everyone is loyal to someone.
7. Men only try to assassinate rulers they are not loyal to.
8. Marcus tried to assassinate Caesar

9. Find out :- Does Marcus hate Caesar ?


Axioms in clause form are:
1. man(marcus)
2. Pompian(marcus)
3.  Pompeian(x1) ∨ Roman(x1)
Removing implication and universal quantifier by replacing it with a
variable x1
4. ruler(caesar)
5. (roman(x2)∨ loyal(x2,caesar)∨ hate(x2,caesar))
Removing implication and universal quantifier by replacing it with a
variable x2
6. (loyal(x3,f(x3))
Removing universal quantifier by replacing it with a variable x3 and
existential quantifier by F(x3) since existential quantifier was inside
universal, that’s why a function.
7. (person(x4) ∨ ruler(x4) ∨ tryassasin(x4,y1) ∨ loyal(x4, y1))
Removing universal quantifier for x and y by replacing it with x4 and
y1 respectively.
By applying Resolution Theorem :
lets take the negation as : hate(Marcus, Caesar)
Unification Algorithm
In propositional logic it is easy to determine that two
literals cannot both be true at the same time. Simply look
for L and ~L .
In predicate logic, this matching process is more
complicated, since bindings of variables must be
considered.
Ex: man (john) and man(john) is a contradiction while
man (john) and man(Himalayas) is not.
Thus in order to determine contradictions we need a
matching procedure that compares two literals and
discovers whether there exist a set of substitutions that
makes them identical . There is a recursive procedure that
does this matching . It is called Unification algorithm.
The matching rules are :
 
i) Different constants , functions or predicates cannot match, whereas
identical ones can.
 
ii) A variable can match another variable, any constant or a function or
predicate expression, subject to the condition that the function or [predicate
expression must not contain any instance of the variable being matched
(otherwise it will lead to infinite recursion).

iii) The substitution must be consistent. Substituting y for x now and then z
for x later is inconsistent. (a substitution y for x written as y/x)
 

Unify Q(x) and P(x) ------ FAIL as literals are different and can not be unified
Unify Q(x) and Q(x)------ Nil as literals are identical so no scope of
unification
Unify P(x) and P(x,y)-- ---- FAIL as both literals have different number of
arguments
Algorithm: UNIFY (L1, L2)
1. if L1 or L2 is an atom part of same thing do
(a) if L1 or L2 are identical then return NIL
(b) else if L1 is a variable then if L1 occurs in L2 then return {Fail} else return (L2/L1).
(c) else if L2 is a variable then if L2 occurs in L1 then return {Fail} else return (L1/L2).
(d)else return F.

2. If the initial predicate symbols in L1 and L2 are not identical, then return {Fail}.
3.If L1 and L2 have different number of arguments, then return {Fail}.
4. Set SUBSET to NIL.( at the end of this procedure , SUBSET will contain all the
substitutions used to unify L1 and L2).
5. For i = 1 to number of elements in L1 :
a) call UNIFY with the i th argument of L1 and i th argument of L2, putting the result in S.
b) if S contain Fail then return {Fail}.
c) if S is not equal to NIL then :
i) apply S to the remainder of both L1 and L2
ii) SUBST := APPEND (S, SUBST) return SUBST.
6. return SUBSET.

Ex: UNIFY (Knows(Ravana, x), Knows(Ravana, Sita)) = (x/Sita)


Thank You
Knowledge Representation &
Reasoning
Forward and Backward Chaining
Lecture:36
Forward & Backward Chaining
Forward chaining:
--works from the facts to a conclusion.
-- Sometimes called the data driven approach.
--To chain forward, match data in working memory
against 'conditions' of rules in the rule-base.
--When one of them fires, this is liable to produce
more data.
--So the cycle continues.
--A forward-chaining system tends to produce a
sequence which seems random & unconnected.
Forward chaining is the best choice if

--All the facts are provided with the problem


statement;
or:

--There are many possible goals, and a smaller number


of patterns of data; or:
-- There isn't any sensible way to guess what the goal
is at the beginning of the consultation.
Forward chaining Illustration
--Goal state: Z Termination condition: stop if Z is
derived or no further rule can be applied
Backward Chaining
--working from the conclusion to the facts.
--Sometimes called the goal-driven approach.
--To chain backward, match a goal in working memory
against 'conclusions' of rules in the rule- base.
--When one of them fires, this is liable to produce
more goals.
-- So the cycle continues.
--A backwards-chaining system tends to produce a
sequence of questions which seems focused and
logical to the user,
Backward chaining is the best choice if

--The goal is given in the problem statement, or can


sensibly be guessed at the beginning of the
consultation;
or:

--The system has been built so that it sometimes asks


for pieces of data (e.g. "please now do the gram test on
the patient's blood, and tell me the result"), rather than
expecting all the facts to be presented to it.
Backward chaining Illustration
Forward and Backward Chaining

Backward Chaining : If you already know


what you are looking for .

Forward Chaining : If you don't


necessarily know the final state of your
solution.
Forward
Vs
Backwar
d
Chaining
Thank You
Knowledge Representation &
Reasoning
Probabilistic Reasoning
Lecture:37
Probability
Probability of an Event : Consider an experiment that may have different
outcomes. We are interested to know what is the probability of a particular set
of outcomes.
 Let sample space S be the set of all possible outcomes Let Event A be any
subset of S

Definition : probability(A) = (number of outcomes in A)/ (total number of


outcomes)
P(A) = |A| / |S|
 i.e. the probability of A is equal to the number of outcomes of interest divided
by the number of all possible outcomes.
P(A) is called prior (unconditional) probability of A P(~A) is the probability
event A not to take place.
Example 1: the probability to pick a spade card out of a deck of 52 cards is
13/52 = ¼ The probability to pick an Ace out of a deck of 52 cards is 4/52 =
1/13
Probability Axioms
1. 0  P(A) 1
2. P(A) = 1 – P( ~A)
3. P(A v B) = P(A) + P(B) – P(A ∧ B)
P(A v B) means the probability of either A or B or both to be true P(A ∧
B) means the probability of both A and B to be true.

Example 2: P(~A) – The probability to pick a card that is not a spade out of
a deck of 52 cards is 1 – 1/4 = 3/4
Example 3: P(A v B) – The probability to pick a card that is either a spade
or an Ace is 1/4 + 1/13 - 1/4 *1/13 = 16/52 = 4/13
Another way to obtain the same result: There are 13 spade cards and 3
additional Ace cards in the set of desired outcomes. The total number of
cards is 52, thus the probability is 16/52.
Example 4: P(A ∧ B) – The probability to pick the spade Ace is 1/52.
Random Variables and Probability
Distributions
To handle more conveniently the outcomes, we can
treat them as values of so called random variables. For
example “spade” is one possible value of the variable
Suit, “clubs” is another possible value.
The set of the probabilities of each value is called
probability distribution of the random variable.
Let X be a random variable with a domain. <x 1, x2 , x3 ,
……… xn>
The probability distribution of X is denoted by
P(X) = <P(X = x1), P(X = x2), ………….P(X = xn)>
Note that: P(X = x1) + P(X = x2) + …+ P(X = xn) = 1
Example 5: Let Weather be a random variable with
values Assume that records for some town show that in
a year 100 days are rainy, 50 days are snowy, 120 days
are cloudy (but without snow or rain) and 95 days are
sunny.
i.e. P(Weather = sunny) = 95/365 = 0.26
P(Weather = cloudy) = 120/365 = 0.33
P(Weather = rainy) = 100/365 = 0.27
P(Weather = snowy) = 50/365 = 0.14 Thus
P(Weather) = <0.26, 0.33, 0.27. 0.14>is the probability
distribution of the random variable Weather.
Joint Distributions
Lets take an Example to understand the concept:
 Consider a sample S of of 1000 individuals age 25 – 30. Assume
that 600 individuals come from high-income families, 570 of those
with high income have college education and 100 individuals with
low income have college education.
The following table illustrates the example:
C ~C
College ed. Not college ed.
(High income) 570 30 600
~H( Low income) 100 300 400
 
670 330 1000
Example contd..
Let H be the subset of S of individuals coming from high-income
families, |H| = 600
Let C be the subset of S of individuals that have college education, |
C| = 670.

The prior probabilities of H, ~H, C and ~C are:

 P(H) = 600 / 1000 = 0.6 (60%)


P(~H) = 400 / 1000 = 0.4 (40%)
P(C) = 670 / 1000 = 0.67 (67%)
P(~C) = 330 / 1000 = 0.33 (33%)

We can compute also P(H&C), P(H & ~C), P(~H & C), P(~H & ~C)
Example contd..
P(H&C) = |H &C| / |S| = 570/1000 = 0.57 (57%) - the
probability of a randomly selected individual in S to be of
high-income family and to have college education.
 P(H & ~C) = |H& ~C| / |S| = 30/1000 = 0.03 (3%) - the
probability of a randomly selected individual in S to be of
high-income family and not to have college education.
P(~H & C) = |~H& C| / |S| = 100/1000 = 0.1 (10%) - the
probability of a randomly selected individual in S to be of
low-income family and to have college education.
P(~H & ~C) = |~H& ~C| / |S| = 300/1000 = 0.3(30%) - the
probability of a randomly selected individual in S to be of
low-income family and not to have college education.
Example contd..
Thus we come to the following table:
C ~C
College ed. Not college ed.
H High income 0.57 0.03 0.6
~H Low income 0.10 0.30 0.4
  0.67 0.33 1.0
Here we will treat C and H as random variables with values
“yes” and “no”. The values in the table represent the joint
distribution of C and H,
For example P(C = yes, H = yes) = 0.57
Joint Distribution Definition
Definition: Let X1, X2, .., Xn be a set of random variables each
with a range of specific values. P(X1,X2,…,Xn) is called joint
distribution of the variables X1, X2, …, Xn and it is defined by a
n dimensional table, where each cell corresponds to one particular
assignment of values to the variables X1, X2, …, Xn.
 Each cell in the table corresponds to an atomic event – described
by a particular assignment of values to the variables.
Given a joint distribution table we can compute prior
probabilities:
P(H) = P(H & C) + P(H& ~C) = 0.57 + 0.03 = 0.6
Given a joint distribution table we can compute conditional
probabilities.
Conditional Probabilities
We may ask: what is the probability of an individual in
S to have a college education given that he/she comes
from a high income family?
In this case we consider only those individuals that
come from high income families. Their number is 600.
The number of individuals with college education
within the group of high-family income is 570. Thus
the probability to have college education given high-
income family is 570/600 = 0.95.
This type of probability is called conditional
probability
The probability of event B given event A is denoted as P(B|A), read “P of B
given A
|C & H|
In our example, P(C|H) = ----------------
|H|
We will represent P(C|H) by P(C&H) and P(H)
|C & H|
-------------
|C & H| |S| P(C&H)
P(C|H) = ------------- = ------------------ = --------------
|H| |H| P(H)
------------
|S|
 Therefore P(C|H) = P(C&H) / P(H)
Conditional Probabilities:
Definition
The conditional probability of an event B to occur given that event A has
occurred is:
P(B&A)
P(B|A) = -----------------
P(A)
P(B|A) is known also as posterior probability of B.
 P(B & A) is an element of the joint distribution of the random variables A
and B.
In our example, P(C&H) = P(C = yes, H = yes). Thus given the joint
distribution P (H, C), we can compute the prior probability P(H), P(~H),
P(C), P(~C) and then the conditional probability P(C|H), P(C|~H), P(H|
C), P(H|~C) .
P(C & H) 0.57
P(C|H) =----------------------- = --------------= 0.95
P(H) 0.6
Independent Events
Some events are not related, for example each
outcome in a sequence of coin flips is independent on
the previous outcome.

Definition : Two events A and B are independent if


P(A|B) = P(A), and P(B|A) = P(B).
Theorem: A and B are independent if and only if
P(A & B) = P(A)*P(B).
Thank You
Knowledge Representation &
Reasoning
Bayes Theorem
Lecture:38
Baye’s Theorem
From the definition of conditional probability :
 we have
P(A&B) = P(A|B)*P(B)
P(B&A) = P(B|A)*P(A)
 
However, P(A&B) = P(B&A)
Therefore P(B|A)*P(A) = P(A|B)*P(B)
P(A|B) * P(B)
P(B|A) = ------------------------
P(A)  
This is the Bayes' formula for conditional probabilities, known also as Bayes'
Theorem.

Note that the conditional probability relationship, ‘|’, is not commutative;


Therefore  
Bayes theorem for evidence (effect) E, given hypothesis H1,
H2…..Hn is :
 
P(H1 ) P(E|H1 )
P(H1 |E) = -----------------------------------------------
P(H1 ) P(E|H1 ) + ... + P(Hn ) PE|Hn )

Therefore the generalized form of bayes theorem is :


P(E| Hi )P(Hi)
P(Hi | E) =----------------------------------------------
Example
Assume the probability of getting a flu is 0.2
Assume the probability of getting a fever is 0.3
Assume the probability of having a fever given the flu: 0.9
What is the probability of having a flu given the fever?

P(fever | flu) * P(flu)


P(flu | fever) =------------------ -------
P(fever)
= 0.9 x 0.2/0.3
= 0.18/0.3 = 0.6

 
Bayesian Belief Networks
A Bayesian network, Bayes network, belief
network, Bayes(ian) model or probabilistic directed acyclic
graphical model is a probabilistic graphical model (a type
of statistical model) that represents a set of random
variables and their conditional dependencies via a directed
acyclic graph (DAG). 

Each node is associated with a probability function that


takes, as input, a particular set of values for the node's parent
variables, and gives (as output) the probability (or
probability distribution, if applicable) of the variable
represented by the node.
Bayesian Belief Networks
For example, if m parent nodes represent m Boolean
variables then the probability function could be
represented by a table of 2m  entries.
one entry for each 2m of the possible combinations of
its parents being true or false.
If A and B are independent
P( A, B) = P( A)P(B)
If A and B are conditionally independent given C :
P( A, B | C) = P( A | C)P(B | C)
P( A | C, B) = P( A | C)
Alarm System example:
Question:
Assume your house has an alarm system against
burglary.
 You live in the seismically active area and the alarm
system can get occasionally set off by an earthquake.
You have two neighbors, Mary and John, who do not
know each other. If they hear the alarm they call you,
but this is not guaranteed.
Solution:
We want to represent the probability distribution of
events: – Burglary, Earthquake, Alarm, Mary calls and
John calls.
Example
E.g: What is the probability that the alarm goes off and
both John and Mary call, but there is neither a burglary
nor an earthquake?
P(J & M & A & ¬B & ¬E)
= P(J|A)P(M|A)P(A|¬B,¬E)P(¬B)P(¬E)
= 0.9 x 0.7 x 0.001 x 0.999 x 0.998
= 0.00062
Each node is is conditionally independent of its non-
descendants given its parents.
P( X ∣U1, ... ,U m , Z1j ,... , Znj)= P(X ∣ U1, ... ,U m)
The conditional probability distributions define the
joint probability distribution of the variables of the
network
Thank You
Knowledge Representation &
Reasoning
Utility Theory
Lecture:39
Utility Theory :Example
: Lets take an example of lottery:

Until now: The optimal action choice was the option


that maximized the expected monetary value.
(Selection based on expected values)
But is the expected monetary value always the quantity
we want to optimize?
Is the expected monetary value always the quantity we
want to optimize?
Answer: Yes, but only if we are risk-neutral.
But what if we do not like the risk (we are risk-
averse)?
 In that case we may want to get the premium for
undertaking the risk (of loosing the money)
Example: – we may prefer to get $101 for sure against
$102 in expectation but with the risk of loosing the
money
Problem: How to model decisions and account for the
risk?
Utility function (denoted U)
‘U’ Quantifies how we “value” outcomes, i.e., it
reflects our preferences – Can be also applied to
“value” outcomes other than money and gains (e.g.
utility of a patient being healthy, or ill)
Decision making: – uses expected utilities (denoted
EU)

Under some conditions on preferences we can always


design the utility function that fits our preferences
Utility theory Definition
Defines axioms on preferences that involve uncertainty and ways to
manipulate them.
Uncertainty is modeled through lotteries – Lottery:
Lottery :-
[p :A; (1-p): C]
Outcome A with probability p
Outcome C with probability (1-p)
Notation: -

A ≻ B : A is preferred to B
A ∼ B :The agent is indifferent between A and B
A ≿ B : The agent prefers A to B, or is indifferent between them.
Axioms of Utility Theory
Orderability : Given any two states, the a rational agent prefers one of them, else the two
as equally preferable.
(A ≻ B) ∨ (B ≻ A) ∨ (A ∼ B)
Transitivity : Given any three states, if an agent prefers A to B and prefers B to C, agent
must prefer A to C.
(A ≻ B) ∧ (B ≻ C) ⇒ (A ≻ C)
Continuity : If some state B is between A and C in preference, then there is a p for which
the rational agent will be indifferent between state B and the lottery in which A comes with
probability p, C with probability (1-p).
(A ≻ B ≻ C) ⇒ ∃p [p,A; 1-p,C] ∼ B
Substitutability : If an agent is indifferent between two lotteries, A and B, then there is a
more complex lottery in which A can be substituted with B.
(A ∼ B) ⇒ [p,A; 1-p,C] ∼ [p,B; 1-p,C]
Monotonicity: If an agent prefers A to B, then the agent must prefer the lottery in which A
occurs with a higher probability
(A ≻ B) ⇒ (p  q ⇔ [p,A; 1-p,B] ≿ [q,A; 1-q,B])
Decomposability : Compound lotteries can be reduced to simpler lotteries using the laws
of probability.[
p,A; 1-p,[q,B; 1-q,C]] ∼ [p,A; (1-p)q,B; (1-p)(1-q),C]
Maximum Expected Utility principle
(MEU)
The agent obeys the axioms of utility theory, then-
It follows from these axioms that there exists a real-valued
function U that operates on states such that
 
U(A) > U(B) ⇔ A ≻ B
U(A) = U(B) ⇔ A ∼ B
The utility of the lottery is the expected utility, that is the sum
of utilities of outcomes weighted by their probability
U[p :A; (1-p): B] = p U(A) + (1-p)U( B)
Maximum Expected Utility principle (MEU): Rational agent
makes the decisions in the presence of uncertainty by
maximizing its expected utility.
Utility Values
A numeric utility value only has to provide an ordering
of axioms according to preference. They can be
subjective any other way.
Captures preferences for rewards, and resource
consumption.
Captures risk attitude.
Expected Monetary Value(EMV)
EMV is the "strict objective utility" where utility is in dollar
amounts 
gambling a $1000 on the toss of a coin, the EMV would be
%50*$1000 = $500 
Grayson (1960) proved Bernoulli (1738) right showing that
the utility of money for most people is proportional to the
logarithm of the amount. 
U(current_$ + $_gained) = -263.31 + 22.09log(n + 150,000) 
for the range -$150,000-$800,000 

This means people are more willing to gamble if they have


more money, or debt, beyond some threshold.
RISK
Risk averse: is where an agent's utility measurement for a
lottery is less than the EMV of the lottery 
Risk seeking: would be the opposite 
Certainty Equivalent :is the amount an agent would rather
walk away with rather than gamble at winning some increased
amount at a given probability. Studies show in the $1000 coin
toss that most people would rather walk away with $400. Thus
$400 is the certainty equivalent of a lottery consisting of a
$1000 pay off at 50% odds. 
Insurance premium = EMV - (the certainty equivalent)
For the $1000 coin toss it would be $500-$400 = $100
Thank You
Knowledge Representation &
Reasoning
Hidden Markov Model
Lecture: 40
Markov chain 
A Markov chain is a special sort of belief network
used to represent sequences of values, such as the
sequence of states in a dynamic system or the sequence
of words in a sentence.

Markov chain is based on a principle of


“memorylesness”
Markov Models
Set of states: {S1,S2, ……….Sn}
Process moves from one state to another generating a
sequence of states : Si1, Si2, ……….Sik,….
Markov chain property: probability of each
subsequent state depends only on what was the
previous state:
P(Sik| Si1,Si2, ……….Sik-1) = P(Sik| Sik-1)
To define Markov model, the following probabilities
have to be specified:-
transition probabilities : aij= P(Si| Sj)
initial probabilities :πi = P(Si)
By Markov chain property, probability of state sequence
can be found by the formula:

 Suppose we want to calculate a probability of a sequence


of states in our example, {‘Dry’,’Dry’,’Rain’,Rain’}.
P({‘Dry’,’Dry’,’Rain’,Rain’} ) = P(‘Rain’|’Rain’)
P(‘Rain’|’Dry’) P(‘Dry’|’Dry’) P(‘Dry’)= =
0.3*0.2*0.8*0.6 C
Hidden Markov Model(HMM)
A hidden Markov model (HMM) is an augmentation
of the Markov chain to include observations.
Just like the state transition of the Markov chain, an
HMM also includes observations of the state.
These observations can be partial in that different
states can map to the same observation
and noisy in that the same state can stochastically map
to different observations at different times.
The assumptions behind an HMM are that the state at
time t+1 only depends on the state at time t, as in the
Markov chain.
The observation at time t only depends on the state at
time t. The observations are modeled using the
variable Ot for each time t whose domain is the set of
possible observations.
The belief network representation of an HMM is
depicted in Figure below. A stationary HMM includes the following
probability distributions:
P(S0) specifies initial conditions.
P(St+1|St) specifies the dynamics.
P(Ot|St) specifies the sensor model.
A hidden Markov model as a belief network
Hidden Markov Model(HMM)
Set of states: {S1,S2, ……….Sn}
Process moves from one state to another generating a sequence of states :
Si1, Si2, ……….Sik,….
Markov chain property: probability of each subsequent state depends only
on what was the previous state:
P(Sik| Si1,Si2, ……….Sik-1) = P(Sik| Sik-1)
States are not visible, but each state randomly generates one of M
observations (or visible states) :
 {v1, v2, v3,…..vk….}
To define hidden Markov model, the following probabilities have to be
specified:
matrix of transition probabilities A=(aij), aij = P(Si| Sj)
matrix of observation probabilities B=(bi (vm )), bi (vm ) = P(vm | Si)
a vector of initial probabilities π=(πi), πi = P(Si) .
Model is represented by M=(A, B, π).
Example of HMM
Solution:
Two states : ‘Low’ and ‘High’ atmospheric pressure.
Two observations : ‘Rain’ and ‘Dry’.
Transition probabilities: P(‘Low’|‘Low’)=0.3 ,
P(‘High’|‘Low’)=0.7 , P(‘Low’|‘High’)=0.2,
P(‘High’|‘High’)=0.8
Observation probabilities : P(‘Rain’|‘Low’)=0.6 ,
P(‘Dry’|‘Low’)=0.4 , P(‘Rain’|‘High’)=0.4 ,
P(‘Dry’|‘High’)=0.3 .
Initial probabilities: say P(‘Low’)=0.4 ,
P(‘High’)=0.6 .
Solution:
Calculation of observation sequence probability
Suppose we want to calculate a probability of a sequence of
observations in our example, {‘Dry’,’Rain’}. Consider all
possible hidden state sequences:
 P({‘Dry’,’Rain’} ) = P({‘Dry’,’Rain’} , {‘Low’,’Low’}) +
P({‘Dry’,’Rain’} , {‘Low’,’High’}) + P({‘Dry’,’Rain’} ,
{‘High’,’Low’}) + P({‘Dry’,’Rain’} , {‘High’,’High’})
where first term is : P({‘Dry’,’Rain’} , {‘Low’,’Low’})=
P({‘Dry’,’Rain’} | {‘Low’,’Low’}) P({‘Low’,’Low’}) =
P(‘Dry’|’Low’)P(‘Rain’|’Low’) P(‘Low’)P(‘Low’|’Low) =
0.4*0.4*0.6*0.4*0.3
What is hidden in HMM??
In regular Markov model , the state is directly visible to the
observer. Therefore, the state transition probabilities are the
only parameters. In hidden Markov Model, the states are not
directly visible but output which is dependent on the state is
visible.
Hidden Markov Model gets its name from two defining
properties:
It assumes that the observation at a time t was generated by some
process whose state Sk is hidden from observer.
It assumes that the state of this hidden process satisfies the Markov
property, i.e. given the value of Sk-1, the current state Sk is
independent of all the states prior to k-1. That is called the first
order Markov property.
Thank You

You might also like