0% found this document useful (0 votes)
13 views

AI&ML 4 & 5 Module Notes

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

AI&ML 4 & 5 Module Notes

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

MODULE-4

Limitations of Propositional logic:


o Propositional logic is a puny mechanism to express knowledge and inferencing
o We cannot represent relations like ALL, some, or none with propositional logic.
Example:
a. All the girls are intelligent.
b. Some apples are sweet.
o Propositional logic has limited expressive power.
o In propositional logic, we cannot describe statements in terms of their
properties or logical relationships.
Alternate Solutions:
Programming languauges:
The syntax and semantics of programming languages can be used to express the
reasoning. In programming language procedures/methods can be the logic and data
structures containing data will be the facts. However programming languages lack
following:
One cannot derive facts from other facts
Inferencing not automatic and needs to programmed for each case
They lack expressiveness required to handle partial information
Composionality is missing.

Natural Language libraries:


Natural language processing (NLP) is a branch of artificial intelligence (AI) that
enables computers to comprehend, generate, and manipulate human language.
Natural language processing has the ability to interrogate the data with natural
language text or voice. This is also called “language in.” Most consumers have probably
interacted with NLP without realizing it. For instance, NLP is the core technology
behind virtual assistants, such as the Oracle Digital Assistant (ODA), Siri, Cortana, or
Alexa. Ambiguity, generally used in natural language processing, can be referred as the
ability of being understood in more than one way. In simple terms, we can say that
ambiguity is the capability of being understood in more than one way. Natural
language is very ambiguous. NLP has the following types of ambiguities −
Lexical Ambiguity
The ambiguity of a single word is called lexical ambiguity. For example, treating the
word silver as a noun, an adjective, or a verb.
Syntactic Ambiguity
This kind of ambiguity occurs when a sentence is parsed in different ways. For
example, the sentence “The man saw the girl with the telescope”. It is ambiguous
whether the man saw the girl carrying a telescope or he saw her through his telescope.
Semantic Ambiguity
This kind of ambiguity occurs when the meaning of the words themselves can be
misinterpreted. In other words, semantic ambiguity happens when a sentence
contains an ambiguous word or phrase. For example, the sentence “The car hit the pole
while it was moving” is having semantic ambiguity because the interpretations can be
“The car, while moving, hit the pole” and “The car hit the pole while the pole was
moving”.
Anaphoric Ambiguity

DEPT. OF AIML PAGE 72


This kind of ambiguity arises due to the use of anaphora entities in discourse. For
example, the horse ran up the hill. It was very steep. It soon got tired. Here, the
anaphoric reference of “it” in two situations cause ambiguity.
Pragmatic ambiguity
Such kind of ambiguity refers to the situation where the context of a phrase gives
it multiple interpretations. In simple words, we can say that pragmatic ambiguity
arises when the statement is not specific. For example, the sentence “I like you too”
can have multiple interpretations like I like you (just like you like me), I like you (just
like someone else dose).

Combining the best of formal and natural languages


Formal languages are languages that are designed by people for specific applications.
For example, the notation that mathematicians use is a formal language that is
particularly good at denoting relationships among numbers and symbols. Chemists
use a formal language to represent the chemical structure of molecules.
Elements of a natural language: Nouns,, and noun phrases that refer to objects
(squares, pits, wumpuses) and verbs and verb phrases that refer to relations among
objects (is breezy, is adjacent to, shoots). Some of these relations are functions—
relations in which there is only one “value” for a given “input.” It is easy to start listing
examples of objects, relations, and functions:
• Objects: people, houses, numbers, theories, Ronald McDonald, colors, baseball
games, wars, centuries
• Relations: these can be unary relations or properties such as red, round, bogus,
prime,
• Functions: father of, best friend, third inning of, one more than, beginning of
Eg: Identify the objects, relations, functions and properties if any in the following
examples:
“one plus two equals three”
Objects: one, two, three; Relations: equals; Function: plus
“Squares neighboring wumpus are smelly”
Objects: squares,Wumpus; Relations:neighbouring; Property:smelly;
“Evil King John ruled England in 1200”
Objects: King John, England, 1200; Relation: ruled; Property:Evil

Formal languages and their ontological and epistemological


commitments.

Propositional logic is the logic that deals with a collection of declarative

DEPT. OF AIML PAGE 73


statements which have a truth value, true or false. Propositions are combined
with Logical Operators or Logical Connectives like Negation(¬),
Disjunction(∨), Conjunction(∧), Exclusive OR(⊕), Implication(⇒), Bi-
Conditional or Double Implication(⇔). It cannot deal with sets of entities.
First order logic is an expression consisting of variables with a specified
domain. It consists of objects, relations and functions between the objects. It
helps analyze the scope of the subject over the predicate. There are three
quantifiers : Universal Quantifier (∀) depicts for all, Existential Quantifier (∃)
depicting there exists some and Uniqueness Quantifier (∃!) depicting exactly
one. It can deal with set of entities with the help of quantifiers.
Temporal logic is a subfield of mathematical logic that deals with reasoning
about time and the temporal relationships between events. In artificial
intelligence, temporal logic is used as a formal language to describe and
reason about the temporal behavior of systems and processes. Temporal logic
extends classical propositional and first-order logic with constructs for
specifying temporal relationships, such as “before,” “after,” “during,” and
“until.” This allows for the expression of temporal constraints and the
modeling of temporal aspects of a system, such as its evolution over time and
the relationships between events.
Probability is defined as the chance of happening or occurrences of an event.
Generally, the possibility of analyzing the occurrence of any event with respect
to previous data is called probability. For example, if a fair coin is tossed, what
is the chance that it lands on the head? These types of questions are answered
under probability. Probability theory uses the concept of random variables
and probability distribution to find the outcome of any situation. Probability
theory is an advanced branch of mathematics that deals with the odds and
statistics of happening an event.
The term fuzzy refers to things that are not clear or are vague. In the real world
many times we encounter a situation when we can’t determine whether the
state is true or false, their fuzzy logic provides very valuable flexibility for
reasoning. In this way, we can consider the inaccuracies and uncertainties of
any situation. Fuzzy Logic is a form of many-valued logic in which the truth
values of variables may be any real number between 0 and 1, instead of just
the traditional values of true or false. It is used to deal with imprecise or
uncertain information and is a mathematical method for representing
vagueness and uncertainty in decision-making. Fuzzy Logic is based on the
idea that in many cases, the concept of true or false is too restrictive, and that
there are many shades of gray in between. It allows for partial truths, where a
statement can be partially true or false, rather than fully true or false.
Ontology: the study of what there is in the world that we should know about,
and Epistemology: the study of how we should get to know the things in the
world. Obviously, this was long before AI was a thing, and they were merely
concerned with the structure of knowledge and its acquisition by humans. when
it comes to creating AI we should take an epistemological approach: how does
the only working truly intelligent system (the human brain) models the world,
innately or by learning. This is in contrast with the ontological approach:
focusing on organizing what we know in data ontologies and then trying to
instill those in computers.

DEPT. OF AIML PAGE 74


Syntax of First-Order logic:
The syntax of FOL determines which collection of symbols is a logical expression in
first-order logic. The basic syntactic elements of first-order logic are symbols. We write
statements in short-hand notation in FOL.
Basic Elements of First-order logic:
Syntax of First Order Logic
The syntax of First Order Logic is written using Backnaus Normal Form.

Atomic sentences:
o Atomic sentences are the most basic sentences of first-order logic. These
sentences are formed from a predicate symbol followed by a parenthesis with a
sequence of terms.
o We can represent atomic sentences as
Predicate (term1, term2, ......, term n).
Example: Ravi and Ajay are brothers: => Brothers(Ravi, Ajay).
Chinky is a cat: => cat (Chinky).
Complex Sentences:
o Complex sentences are made by combining atomic sentences using connectives.
First-order logic statements can be divided into two parts:
o Subject: Subject is the main part of the statement.

DEPT. OF AIML PAGE 75


o Predicate: A predicate can be defined as a relation, which binds two atoms
together in a statement.
Consider the statement: "x is an integer.", it consists of two parts, the first part x is the
subject of the statement and second part "is an integer," is known as a predicate.

Quantifiers in First-order logic:


o A quantifier is a language element which generates quantification, and
quantification specifies the quantity of specimen in the universe of discourse.
o These are the symbols that permit to determine or identify the range and scope
of the variable in the logical expression. There are two types of quantifier:
a. Universal Quantifier, (for all, everyone, everything)
b. Existential quantifier, (for some, at least one).
Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the
statement within its range is true for everything or every instance of a particular thing.
The Universal quantifier is represented by a symbol ∀, which resembles an inverted A.
If x is a variable, then ∀x is read as:
o For all x
o For each x
o For every x.
Example:
All man drink coffee.
Let a variable x which refers to a cat so all x can be represented in UOD as below:

DEPT. OF AIML PAGE 76


∀x man(x) → drink (x, coffee).
It will be read as: There are all x where x is a man who drink coffee.
Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement
within its scope is true for at least one instance of something.
It is denoted by the logical operator ∃, which resembles as inverted E. When it is used
with a predicate variable then it is called as an existential quantifier.
Note: In Existential quantifier we always use AND or Conjunction symbol (∧).
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:
o There exists a 'x.'
o For some 'x.'
o For at least one 'x.'
Example:
Some boys are intelligent.

∃x: boys(x) ∧ intelligent(x)


It will be read as: There are some x where x is a boy who is intelligent.

Some Examples of FOL using quantifier:


1. All birds fly.
In this question the predicate is "fly(bird)."
And since there are all birds who fly so it will be represented as follows.
∀x bird(x) →fly(x).

2. Every man respects his parent.


In this question, the predicate is "respect(x, y)," where x=man, and y= parent.
Since there is every man so will use ∀, and it will be represented as follows:
∀x man(x) → respects (x, parent).

3. Some boys play cricket.

DEPT. OF AIML PAGE 77


In this question, the predicate is "play(x, y)," where x= boys, and y= game. Since
there are some boys so we will use ∃, and it will be represented as:
∃x boys(x) → play(x, cricket).

4. Not all students like both Mathematics and Science.


In this question, the predicate is "like(x, y)," where x= student, and y= subject.
Since there are not all students, so we will use ∀ with negation, so following
representation for this:
¬∀ (x) [ student(x) → like(x, Mathematics) ∧ like(x, Science)].

5. Only one student failed in Mathematics.


In this question, the predicate is "failed(x, y)," where x= student, and y= subject.
Since there is only one student who failed in Mathematics, so we will use following
representation for this:
∃(x) [ student(x) → failed (x, Mathematics) ∧∀ (y) [¬(x==y) ∧ student(y) →
¬failed (x, Mathematics)].

Kinship domain

Symbols and Interpretations


The symbols of the language 0,1, add, prime and so on—have very suggestive names.
When we interpret sentences of this language over the domain N, for example, it is
clear for which elements of the domain prime “should” be true, and for which it
“should” be false. But let us consider a first order language that has only two unary
predicate symbols, fancy and tall, then it depends on the domain to define the value.
It is undefined for the domain of natural numbers.
Symbols stand for objects, relations and functions. There are 3 types of symbols:

DEPT. OF AIML PAGE 78


❑ Constant symbols representing objects. Eg: John, Richard
❑ Predicate symbols representing relations. Eg: Brother, OnHead
❑ Function symbols representing functions. Eg: LeftLeg
Each model includes an interpretation that specifies exactly which objects, relations
and functions are referred to by the constant, predicate, and function symbols.
Logician calls it as Intended Interpretation

Nested Quantifiers
Nested quantifiers are quantifiers that occur within the scope of other quantifiers.
Example: ∀x∃yP(x, y). Quantifier order matters! ∀x∃yP(x, y) ≠ ∃y∀xP(x, y)
Let L(x, y) be the statement “x loves y,” where the domain for both x and y consists of
all people in the world. Use quantifiers to express each of these statements.
a) Everybody loves Jerry. ∀x L(x, Jerry)
b) Everybody loves somebody. ∀x∃yL(x, y)
c) There is somebody whom everybody loves. ∃y∀xL(x, y)
d) Nobody loves everybody. ∀x∃y¬L(x, y) or ¬∃x∀yL(x, y)
e) Everyone loves himself or herself ∀xL(x, x)

Connections between ∀ and ∃


The two quantifiers are actually intimately connected with each other, through
negation. Asserting that everyone dislikes parsnips is the same as asserting there does
not exist someone who likes them, and vice versa:
∀ x ¬Likes(x, Parsnips ) is equivalent to ¬∃ x Likes(x, Parsnips) . We can go one step
further: “Everyone likes ice cream” means that there is no one who does not like ice
cream: ∀ x Likes(x, IceCream) is equivalent to ¬∃ x ¬Likes(x, IceCream) .
Equality
We can use the equality symbol to signify that two terms refer to the same
object. For example, Father (John)=Henry says that the object referred to by Father
(John) and the object referred to by Henry are the same. The equality symbol can be
used to state facts about a given function, as we just did for the Father symbol. It can
also be used with negation to insist that two terms are not the same object. To say that
Richard has at least two brothers, we would write ∃ x, y Brother (x,Richard ) ∧ Brother
(y,Richard ) ∧¬(x=y) .

USING FIRST ORDER LOGIC


Sentences are added to a knowledge base using TELL, exactly as in
propositional logic. Such sentences are called assertions. For example, we can assert
that John is a king, Richard is a person, and all kings are persons:
TELL(KB, King(John))
TELL(KB, Person(Richard))

DEPT. OF AIML PAGE 79


TELL(KB, ∀ x King(x) ⇒ Person(x)) We can ask questions of the knowledge base
using ASK.
For example, ASK(KB, King(John)) returns true. Questions asked with ASK are
called queries or goals If we want to know what value of x makes the sentence true, we
will need a different function, ASKVARS, which we call with ASKVARS(KB, Person(x))
and which yields a stream of answers. In this case there will be two answers: {x/John}
and {x/Richard}. Such an answer is called a substitution or binding list.
The kinship domain
The first example we consider is the domain of family relationships, or kinship.
This domain includes facts such as “Elizabeth is the mother of Charles” and “Charles
is the father of William” and rules such as “One’s grandmother is the mother of one’s
parent.” Clearly, the objects in our domain are people. We have two unary predicates,
Male and Female. Kinship relations—parenthood, brotherhood, marriage, and so on—
are represented by binary predicates: Parent, Sibling, Brother , Sister , Child ,
Daughter, Son, Spouse, Wife, Husband, Grandparent , Grandchild , Cousin, Aunt, and
Uncle. We use functions for Mother and Father , because every person has exactly one
of each of these.
For example, one’s mother is one’s female parent:
∀ m, c Mother (c)=m ⇔ Female(m) ∧ Parent(m, c) .
One’s husband is one’s male spouse:
∀ w, h Husband(h,w) ⇔ Male(h) ∧ Spouse(h,w) .
Male and female are disjoint categories:
∀ x Male(x) ⇔ ¬Female(x) . Parent and child are inverse relations: ∀ p,
c Parent(p, c) ⇔ Child (c, p) .
A grandparent is a parent of one’s parent:
∀ g, c Grandparent (g, c) ⇔ ∃p Parent(g, p) ∧ Parent(p, c) . A sibling is
another child of one’s parents: ∀ x, y
Sibling(x, y) ⇔ x _= y ∧ ∃p Parent(p, x) ∧ Parent(p, y) . Each of these
sentences can be viewed as an axiom of the kinship domain. Axioms are
commonly associated with purely mathematical domains. Our kinship axioms
are also definitions; they have the form ∀ x, y P(x, y) ⇔ ..... The axioms define
the Mother function and the Husband, Male, Parent, Grandparent, and Sibling
predicates in terms of other predicates. For example, consider the assertion that
siblinghood is symmetric: ∀ x, y Sibling(x, y) ⇔ Sibling(y, x) .

Natural numbers in first-order logic


The natural numbers can be described in first-order logic. The language of natural
numbers has
a single constant 0, defined by predicate NatNum(0)
a function Successor, S(n) which defines next number after n in the series of
natural number
The successor is expressed as a quantifier:
∀n, NatNum(n) ⇒ NatNum(S(n))
∀n, 0 ≠ S(n)
Two natural numbers cannot have same successor
∀m, n m ≠ n ⇒ S(m) ≠ S(n)
+ is a function defined on two natural numbers and equality is defined between 2
natural numbers using FOL:

DEPT. OF AIML PAGE 80


∀m, n NatNum(m) ∧ NatNum(n) ⇒ +(S(m), n) = S(+(m, n))

▪ 0 is an arithmetic identity as
▪ ∀m, NatNum(m) ⇒ +(0, m) = m

A set is a collection of objects; any one of the objects in a set is called a member or
an element of the set.
The basic statement in set theory is element inclusion: an element a is included in
some set S. Formally written as:

If an element is not included, we write:

Statements are either true or false, depending on the context. For example, given the
above sets, the first statement is true, whereas the second is false. If a statement S is
true in a given context C, we say the statement is valid in C. Formally, we write this as:

If the statement is not valid in that context, we write:

The operators to compose new sets out of existing ones are:


1. A special set is the empty set, which contains no elements at all:
2. Union: create a set S containing all elements from A, from B, or from both.
Formally:
3. Intersection: create a set S containing all elements that are both in A and in B.
Formally:
4. Exclusion: create a set S from the elements of A that are not in B.
Formally:
These sets can be interpreted as quantified statements:

Subsets:
∀ s1, s2 s1 ⊆ s2 ⇔ (∀ x x ∈ s1 ⇒ x ∈ s2) .

Equality of two sets:


∀ s1, s2 (s1 = s2) ⇔ (s1 ⊆ s2 ∧ s2 ⊆ s1) .

List vs Sets
Lists are similar to sets. The differences are that lists are ordered and the same element
can appear more than once in a list. We can use the vocabulary of Lisp for lists:
Nil is the constant list with no elements;
Cons, Append, First, and Rest are functions; and
Find is the predicate that does for lists what Member does for sets.
List? is a predicate that is true only of lists. elements,

DEPT. OF AIML PAGE 81


First Order Logic with Wumpus World
The wumpus agent receives a percept vector with five elements. The
corresponding firstorder sentence stored in the knowledge base must include
both the percept and the time at which it occurred; otherwise, the agent will get
confused about when it saw what.
We use integers for time steps. A typical percept sentence would be Percept
([Stench, Breeze, Glitter , None, None], 5) . Here, Percept is a binary predicate,
and Stench and so on are constants placed in a list.
The actions in the wumpus world can be represented by logical terms:
Turn(Right ), Turn(Left ), Forward , Shoot , Grab, Climb .
To determine which is best, the agent program executes the query ASKVARS(∃
a BestAction(a, 5)) , which returns a binding list such as {a/Grab}.
The agent program can then return Grab as the action to take.
The raw percept data implies certain facts about the current state. For example:
∀t,s,g,m,c Percept ([s,Breeze,g,m,c],t) ⇒ Breeze(t) , ∀t,s,b,m,c Percept
([s,b,Glitter,m,c],t) ⇒ Glitter (t) These rules exhibit a trivial form of the
reasoning process called perception.
Simple “reflex” behavior can also be implemented by quantified implication
sentences. For example, we have ∀ t Glitter (t) ⇒ BestAction(Grab, t) .
Given the percept and rules from the preceding paragraphs, this would yield the
desired conclusion BestAction(Grab, 5)—that is, Grab is the right thing to do.
For example, if the agent is at a square and perceives a breeze, then that square
is breezy: ∀ s, t At(Agent, s, t) ∧ Breeze(t) ⇒ Breezy(s) . It is useful to know that
a square is breezy because we know that the pits cannot move about. Notice that
Breezy has no time argument.
Having discovered which places are breezy (or smelly) and, very important, not
breezy (or not smelly), the agent can deduce where the pits are (and where the
wumpus is). first-order logic just needs one axiom: ∀ s Breezy(s) ⇔ ∃r Adjacent
(r, s) ∧ Pit(r) .

Inference in First-Order Logic


Inference in First-Order Logic is used to deduce new facts or sentences from
existing sentences. Before understanding the FOL inference rule, let's
understand some basic terminologies used in FOL.
Substitution is a fundamental operation performed on terms and formulas. It
occurs in all inference systems in first-order logic. The substitution is complex
in the presence of quantifiers in FOL. If we write F[a/x], so it refers to substitute
a constant "a" in place of variable "x".
Equality-First-Order logic does not only use predicate and terms for making
atomic sentences but also uses another way, which is equality in FOL. For this,
we can use equality symbols which specify that the two terms refer to the same
object.
Example: Brother (John) = Smith.
As in the above example, the object referred by the Brother (John) is similar to
the object referred by Smith. The equality symbol can also be used with
negation to represent that two terms are not the same objects.
Example: ¬(x=y) which is equivalent to x ≠y.

DEPT. OF AIML PAGE 82


FOL inference rules for quantifier:

As propositional logic we also have inference rules in first-order logic, so following are
some basic inference rules in FOL:
o Universal Instantiation
o Existential Instantiation
Universal Instantiation:
o Universal instantiation is also called as universal elimination or UI is a valid
inference rule. It can be applied multiple times to add new sentences.
o The new KB is logically equivalent to the previous KB.
o As per UI, we can infer any sentence obtained by substituting a ground term for
the variable.
o The UI rule state that we can infer any sentence by substituting a ground term
v with g in the universe of discourse.

o
Example:1.
o IF "Every person like ice-cream"=> we can infer
"John likes ice-cream" => P(c)
Example: 2.
o "All kings who are greedy are Evil." So let our knowledge base contains this
detail as in the form of FOL:
o ∀x king(x) ∧ greedy (x) → Evil (x),
So from this information, we can infer any of the following statements using Universal
Instantiation:
o King(John) ∧ Greedy (John) → Evil (John),
o King(Richard) ∧ Greedy (Richard) → Evil (Richard),
o King(Father(John)) ∧ Greedy (Father(John)) → Evil (Father(John)),
Existential Instantiation:
o Existential instantiation is also called as Existential Elimination, which is a
valid inference rule in first-order logic.
o It can be applied only once to replace the existential sentence.
o The new KB is not logically equivalent to old KB, but it will be satisfiable if old
KB was satisfiable.
o Represented as:

Example:
From the given sentence: ∃x Crown(x) ∧ OnHead(x, John),
So we can infer: Crown(K) ∧ OnHead( K, John), as long as K does not appear in the
knowledge base.
o The above used K is a constant symbol, which is called Skolem constant.
o The Existential instantiation is a special case of Skolemization process.

DEPT. OF AIML PAGE 83


Generalized Modus Ponens Rule:
For the inference process in FOL, we have a single inference rule which is called
Generalized Modus Ponens. It is lifted version of Modus ponens.
Generalized Modus Ponens can be summarized as, " P implies Q and P is asserted to
be true, therefore Q must be True."
According to Modus Ponens, for atomic sentences pi, pi', q. Where there is a
substitution θ such that SUBST (θ, pi',) = SUBST(θ, pi), it can be represented as:

Example:
We will use this rule for Kings are evil, so we will find some x such that x is king, and
x is greedy so we can infer that x is evil.
1. p1' is king(John) p1 is king(x)
2. p2' is Greedy(y) p2 is Greedy(x)
3. θ is {x/John, y/John} q is evil(x)
4. SUBST(θ,q).

Unification
o Unification is a process of making two different logical atomic expressions
identical by finding a substitution. Unification depends on the substitution
process.
o It takes two literals as input and makes them identical using substitution.
o Let Ψ1 and Ψ2 be two atomic sentences and 𝜎 be a unifier such that, Ψ1𝜎 = Ψ2𝜎,
then it can be expressed as UNIFY(Ψ1, Ψ2).
o Example: Find the MGU for Unify{King(x), King(John)}
Let Ψ1 = King(x), Ψ2 = King(John),
Substitution θ = {John/x} is a unifier for these atoms and applying this substitution,
and both expressions will be identical.
o The UNIFY algorithm is used for unification, which takes two atomic sentences
and returns a unifier for those sentences (If any exist).
o Unification is a key component of all first-order inference algorithms.
o It returns fail if the expressions do not match with each other.
o The substitution variables are called Most General Unifier or MGU.
Conditions for Unification:
Following are some basic conditions for unification:
o Predicate symbol must be same, atoms or expression with different predicate
symbol can never be unified.
o Number of Arguments in both expressions must be identical.
o Unification will fail if there are two similar variables present in the same
expression.

Unification Algorithm:
Algorithm: Unify(Ψ1, Ψ2)
Step. 1: If Ψ1 or Ψ2 is a variable or constant, then:
a) If Ψ1 or Ψ2 are identical, then return NIL.
b) Else if Ψ1is a variable,
a. then if Ψ1 occurs in Ψ2, then return FAILURE
b. Else return { (Ψ2/ Ψ1)}.

DEPT. OF AIML PAGE 84


c) Else if Ψ2 is a variable,
a. If Ψ2 occurs in Ψ1 then return FAILURE,
b. Else return {( Ψ1/ Ψ2)}.
d) Else return FAILURE.
Step.2: If the initial Predicate symbol in Ψ1 and Ψ2 are not same, then return
FAILURE.
Step. 3: IF Ψ1 and Ψ2 have a different number of arguments, then return FAILURE.
Step. 4: Set Substitution set(SUBST) to NIL.
Step. 5: For i=1 to the number of elements in Ψ1.
a) Call Unify function with the ith element of Ψ1 and ith element of Ψ2, and
put the result into S.
b) If S = failure then returns Failure
c) If S ≠ NIL then do,
a. Apply S to the remainder of both L1 and L2.
b. SUBST= APPEND(S, SUBST).
Step.6: Return SUBST.

For each pair of the following atomic sentences find the most general unifier (If exist).
1. Find the MGU of {p(f(a), g(Y)) and p(X, X)}
Sol: S0 => Here, Ψ1 = p(f(a), g(Y)), and Ψ2 = p(X, X)
SUBST θ= {f(a) / X}
S1 => Ψ1 = p(f(a), g(Y)), and Ψ2 = p(f(a), f(a))
SUBST θ= {f(a) / g(y)}, Unification failed.
Unification is not possible for these expressions.

2. Find the MGU of {p(b, X, f(g(Z))) and p(Z, f(Y), f(Y))}


Here, Ψ1 = p(b, X, f(g(Z))) , and Ψ2 = p(Z, f(Y), f(Y))
S0 => { p(b, X, f(g(Z))); p(Z, f(Y), f(Y))}
SUBST θ={b/Z}
S1 => { p(b, X, f(g(b))); p(b, f(Y), f(Y))}
SUBST θ={f(Y) /X}
S2 => { p(b, f(Y), f(g(b))); p(b, f(Y), f(Y))}
SUBST θ= {g(b) /Y}
S2 => { p(b, f(g(b)), f(g(b)); p(b, f(g(b)), f(g(b))} Unified Successfully.
And Unifier = { b/Z, f(Y) /X , g(b) /Y}.

3. Find the MGU of {p (X, X), and p (Z, f(Z))}


Here, Ψ1 = {p (X, X), and Ψ2 = p (Z, f(Z))
S0 => {p (X, X), p (Z, f(Z))}
SUBST θ= {X/Z}
S1 => {p (Z, Z), p (Z, f(Z))}
SUBST θ= {f(Z) / Z}, Unification Failed.

4. UNIFY(knows(Richard, x), knows(Richard, John))


Here, Ψ1 = knows(Richard, x), and Ψ2 = knows(Richard, John)
S0 => { knows(Richard, x); knows(Richard, John)}
SUBST θ= {John/x}
S1 => { knows(Richard, John); knows(Richard, John)}, Successfully Unified.
Unifier: {John/x}.

DEPT. OF AIML PAGE 85


Subsumption Lattice
It is a structure created with most generic unifiers at the top and specific at the bottom.
It helps to answer queries efficiently. Each bottom level is derived by applying single
substitution on the top level. Eg:

Subsitute y/Richard to get left child, x/IBM to get right child. Later applying the same
yields common child.
Eg:

Inference Engine using FOL


The inference engine is the component of the intelligent system in artificial
intelligence, which applies logical rules to the knowledge base to infer new information
from known facts. The first inference engine was part of the expert system. Inference
engine commonly proceeds in two modes, which are:
a. Forward chaining
b. Backward chaining
Horn Clause and Definite clause:
Horn clause and definite clause are the forms of sentences, which enables knowledge
base to use a more restricted and efficient inference algorithm. Logical inference
algorithms use forward and backward chaining approaches, which require KB in the
form of the first-order definite clause.
Definite clause: A clause which is a disjunction of literals with exactly one positive
literal is known as a definite clause or strict horn clause.
Horn clause: A clause which is a disjunction of literals with at most one positive
literal is known as horn clause. Hence all the definite clauses are horn clauses.
Example: (¬ p V ¬ q V k). It has only one positive literal k.
It is equivalent to p ∧ q → k.

DEPT. OF AIML PAGE 86


Forward Chaining
Forward chaining is also known as a forward deduction or forward reasoning method
when using an inference engine. Forward chaining is a form of reasoning which start
with atomic sentences in the knowledge base and applies inference rules (Modus
Ponens) in the forward direction to extract more data until a goal is reached.
The Forward-chaining algorithm starts from known facts, triggers all rules whose
premises are satisfied, and add their conclusion to the known facts. This process
repeats until the problem is solved.
Example:
"As per the law, it is a crime for an American to sell weapons to hostile nations. Country
A, an enemy of America, has some missiles, and all the missiles were sold to it by
Robert, who is an American citizen."
Prove that "Robert is criminal."
To solve the above problem, first, we will convert all the above facts into first-order
definite clauses, and then we will use a forward-chaining algorithm to reach the goal.
Facts Conversion into FOL:
o It is a crime for an American to sell weapons to hostile nations. (Let's say p, q,
and r are variables)
American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)
o Country A has some missiles. ?p Owns(A, p) ∧ Missile(p). It can be written in
two definite clauses by using Existential Instantiation, introducing new
Constant T1.
Owns(A, T1) ......(2)
Missile(T1) .......(3)
o All of the missiles were sold to country A by Robert.
?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
o Missiles are weapons.
Missile(p) → Weapons (p) .......(5)
o Enemy of America is known as hostile.
Enemy(p, America) →Hostile(p) ........(6)
o Country A is an enemy of America.
Enemy (A, America) .........(7)
o Robert is American
American(Robert). ..........(8)
Forward chaining proof:
Step-1:
In the first step we will start with the known facts and will choose the sentences which
do not have implications, such as: American(Robert), Enemy(A, America), Owns(A,
T1), and Missile(T1). All these facts will be represented as below.

Step-2:
At the second step, we will see those facts which infer from available facts and with
satisfied premises.
Rule-(1) does not satisfy premises, so it will not be added in the first iteration.
Rule-(2) and (3) are already added.
Rule-(4) satisfy with the substitution {p/T1}, so Sells (Robert, T1, A) is added, which
infers from the conjunction of Rule (2) and (3).

DEPT. OF AIML PAGE 87


Rule-(6) is satisfied with the substitution(p/A), so Hostile(A) is added and which infers
from Rule-(7).

Step-3:
At step-3, as we can check Rule-(1) is satisfied with the substitution {p/Robert, q/T1,
r/A}, so we can add Criminal(Robert) which infers all the available facts. And hence
we reached our goal statement.

Hence it is proved that Robert is Criminal using forward chaining approach.

Backward Chaining:
Backward-chaining is also known as a backward deduction or backward reasoning
method when using an inference engine. A backward chaining algorithm is a form of
reasoning, which starts with the goal and works backward, chaining through rules to
find known facts that support the goal.
Example:
In backward-chaining, we will use the same above example, and will rewrite all the
rules.
o American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)
Owns(A, T1) ........(2)
o Missile(T1)
o ?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
o Missile(p) → Weapons (p) .......(5)
o Enemy(p, America) →Hostile(p) ........(6)
o Enemy (A, America) .........(7)
o American(Robert). ..........(8)

DEPT. OF AIML PAGE 88


Backward-Chaining proof:
In Backward chaining, we will start with our goal predicate, which is Criminal(Robert),
and then infer further rules.
Step-1:
At the first step, we will take the goal fact. And from the goal fact, we will infer other
facts, and at last, we will prove those facts true. So our goal fact is "Robert is Criminal,"
so following is the predicate of it.

Step-2:
At the second step, we will infer other facts form goal fact which satisfies the rules. So
as we can see in Rule-1, the goal predicate Criminal (Robert) is present with
substitution {Robert/P}. So we will add all the conjunctive facts below the first level
and will replace p with Robert.
Here we can see American (Robert) is a fact, so it is proved here.

Step-3:t At step-3, we will extract further fact Missile(q) which infer from Weapon(q),
as it satisfies Rule-(5). Weapon (q) is also true with the substitution of a constant T1 at
q.

Step-4:

DEPT. OF AIML PAGE 89


At step-4, we can infer facts Missile(T1) and Owns(A, T1) form Sells(Robert, T1, r)
which satisfies the Rule- 4, with the substitution of A in place of r. So these two
statements are proved here.

Step-5:
At step-5, we can infer the fact Enemy(A, America) from Hostile(A) which satisfies
Rule- 6. And hence all the statements are proved true using backward chaining.

Resolution in FOL
Resolution is a theorem proving technique that proceeds by building refutation proofs,
i.e., proofs by contradictions. It was invented by a Mathematician John Alan Robinson
in the year 1965. Resolution is used, if there are various statements are given, and we
need to prove a conclusion of those statements. Unification is a key concept in proofs
by resolutions. Resolution is a single inference rule which can efficiently operate on
the conjunctive normal form or clausal form.

DEPT. OF AIML PAGE 90


Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known
as a unit clause.
Conjunctive Normal Form: A sentence represented as a conjunction of clauses is said
to be conjunctive normal form or CNF.

The resolution inference rule:


The resolution rule for first-order logic is simply a lifted version of the propositional
rule. Resolution can resolve two clauses if they contain complementary literals, which
are assumed to be standardized apart so that they share no variables.

Where li and mj are complementary literals.


This rule is also called the binary resolution rule because it only resolves exactly two
literals.
Example:
We can resolve two clauses which are given below:
[Animal (g(x) V Loves (f(x), x)] and [¬ Loves(a, b) V ¬Kills(a, b)]
Where two complimentary literals are: Loves (f(x), x) and ¬ Loves (a, b) These literals
can be unified with unifier θ= [a/f(x), and b/x] , and it will generate a resolvent clause:
[Animal (g(x) V ¬ Kills(f(x), x)].

Steps for Resolution:


1. Conversion of facts into first-order logic.
2. Convert FOL statements into CNF
3. Negate the statement which needs to prove (proof by contradiction)
4. Draw resolution graph (unification).
To better understand all the above steps, we will take an example in which we will
apply resolution.
Example:
a. John likes all kind of food.
b. Apple and vegetable are food
c. Anything anyone eats and not killed is food.
d. Anil eats peanuts and still alive
e. Harry eats everything that Anil eats.
Prove by resolution that:
f. John likes peanuts.
Step-1: Conversion of Facts into FOL
In the first step we will convert all the given statements into its first order logic.

DEPT. OF AIML PAGE 91


Step-2: Conversion of FOL into CNF
In First order logic resolution, it is required to convert the FOL into CNF as CNF form
makes easier for resolution proofs.
o Eliminate all implication (→) and rewrite
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x¬ [¬ killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
o Move negation (¬)inwards and rewrite
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x ¬killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
o Rename variables or standardize variables
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀w¬ eats(Anil, w) V eats(Harry, w)
f. ∀g ¬killed(g) ] V alive(g)
g. ∀k ¬ alive(k) V ¬ killed(k)
h. likes(John, Peanuts).
o Eliminate existential instantiation quantifier by elimination.
In this step, we will eliminate existential quantifier ∃, and this process is known
as Skolemization. But in this example problem since there is no existential
quantifier so all the statements will remain same in this step.

DEPT. OF AIML PAGE 92


o Drop Universal quantifiers.
In this step we will drop all universal quantifier since all the statements are
not implicitly quantified so we don't need it.
a. ¬ food(x) V likes(John, x)
b. food(Apple)
c. food(vegetables)
d. ¬ eats(y, z) V killed(y) V food(z)
e. eats (Anil, Peanuts)
f. alive(Anil)
g. ¬ eats(Anil, w) V eats(Harry, w)
h. killed(g) V alive(g)
i. ¬ alive(k) V ¬ killed(k)
j. likes(John, Peanuts).
o Distribute conjunction ∧ over disjunction ¬.
This step will not make any change in this problem.
o
Step-3: Negate the statement to be proved
In this statement, we will apply negation to the conclusion statements, which will be
written as ¬likes(John, Peanuts)

Step-4: Draw Resolution graph: Now in this step, we will solve the problem by
resolution tree using substitution. For the above problem, it will be given as follows:

Hence the negation of the conclusion has been proved as a complete contradiction with
the given set of statements.
Explanation of Resolution graph:
o In the first step of resolution graph, ¬likes(John, Peanuts) , and likes(John,
x) get resolved(canceled) by substitution of {Peanuts/x}, and we are left with ¬
food(Peanuts)

DEPT. OF AIML PAGE 93


o In the second step of the resolution graph, ¬ food(Peanuts) , and food(z) get
resolved (canceled) by substitution of { Peanuts/z}, and we are left with ¬
eats(y, Peanuts) V killed(y) .
o In the third step of the resolution graph, ¬ eats(y, Peanuts) and eats (Anil,
Peanuts) get resolved by substitution {Anil/y}, and we are left
with Killed(Anil) .
o In the fourth step of the resolution graph, Killed(Anil) and ¬ killed(k) get
resolve by substitution {Anil/k}, and we are left with ¬ alive(Anil) .
o In the last step of the resolution graph ¬ alive(Anil) and alive(Anil) get
resolved.

DEPT. OF AIML PAGE 94


MODULE-5
Acting Under uncertainty
A logical agent believes a sentence to be either true or false. Probabilistic agents are
those that have a degree of belief about the validity of a given sentence. And the belief
could be ranging from 0 to 1.
Uncertainty can arise because of incompleteness and incorrectness in the agents
understanding of the properties of the environment. when we are talking of handling
uncertainty, one needs to recall that if I am using first order logic to cope with complex
domains, like medical diagnosis or criminal investigation or some form of methods
and techniques to figure out false in a system.
Example: diagnosing a toothache – Diagnosis: classic example of a problem with
inherent uncertainty – Attempt 1: Toothache ⇒ HasCavity • But: not all toothaches are
caused by cavities. Not true! – Attempt 2: Toothache ⇒ Cavity ∨ GumDisease ∨Abscess
∨ etc ∨ etc • To be true: would need nearly unlimited list of options...some unknown.
– Attempt 3: Try make causal: Cavity ⇒ Toothache • Nope: not all cavities cause
toothaches!
There would be 3 main reasons why such first order logic systems would fail. One, we
call it laziness, this is about too much work involved to create the complete set of
antecedents or consequence, needed to ensure an exception less rule. And it is too hard
to use the enormous rules that result out of this. So, if you are looking for completely
covering one of these domains, then it would be too much of task either to list the
complete set of incidents or consequence for a given rule or it would be even hard to
really get all the rules in the system. Number 2 is about certain ignorance, which is
referred to as the theoretical ignorance, it is about the expertise of the area, which may
not be sufficient to have complete theory for the domain that is being worked with.
And finally we have what is called the practical ignorance. Suppose we know all the
rules yet we may be uncertain about particular cases, because all the necessary tests or
evaluations may have not been or possibly cannot be done for that particular case.
So agent’s knowledge under such a situation can at best provide only a degree of belief
in the relevant sentences. And that is why every sentence in under such a scenario
cannot evaluate to being just true or false. We have to associate to each sentence that
the knowledge of the agent comprises of to a degree of belief. This is not only true for
the medical domain that I have been emphasizing. It is also true for most other
judgmental domains like law, business, design, automobile repair, gardening, so on
and so forth. So dealing with degrees of belief is what is done through the probability
theory, which assigns a numerical degree of belief between 0 and 1 to the sentences.
Probability provides a way of summarizing the uncertainty. And one needs to realize
that this uncertainty comes from our laziness and ignorance. Laziness here refers to
our inability to completely quantify the domain and ignorance their reference to either
our theoretical ignorance of the domain or the practical ignorance when I am working
with certain cases. So to make choices that is to make rational decisions, an agent must
first have preferences between the possible outcomes of the plans and this is where we
use what is called the utility theory.
The utility theory is used to represent and reason with preferences. Here preference
refers to options choices, and other alternatives of what is more preferred, outcomes
are completely specified state and the utility theory is about figuring out which is more
useful, or in terms we say the quality of being useful.

DEPT. OF AIML PAGE 95


Basic Notations of Probability
Probability: Probability can be defined as a chance that an uncertain event will
occur. It is the numerical measure of the likelihood that an event will occur. The value
of probability always remains between 0 and 1 that represent ideal uncertainties.
0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
P(A) = 0, indicates total uncertainty in an event A.
P(A) =1, indicates total certainty in an event A.

We can find the probability of an uncertain event by using the below formula.

o P(¬A) = probability of a not happening event.


o P(¬A) + P(A) = 1.
Event: Each possible outcome of a variable is called an event. An event A is any subset
of Ω – Allows us to group possible worlds, e.g., “doubles rolled with dice” – P(A) =
Σ{ω∈A} P(ω) – e.g., P(doubles rolled) = P (1,1) + P (2,2) + ... + P (6,6)
Sample space: The collection of all possible events is called sample space. set Ω = all
possible worlds that might exist – e.g., after two dice roll: 36 possible worlds (assuming
distinguishable dice) – Possible worlds are exclusive and mutually exhaustive. Only
one can be true (the actual world); at least one must be true – ω ∈ Ω is a sample point
(possible world)
probability space or probability model is a sample space with an assignment
P(ω) for every ω∈ Ω such that: – 0 ≤ P(ω) ≤ 1 – Σω P(ω) = 1 – e.g. for die roll: P(1,1) =
P(1,2) = P(1,3) =... = P(6,6) = 1/36
Random variables: Random variables are used to represent the events and objects
in the real world.
A proposition in the probabilistic world is then simply an assertion that some event
(describing a set of possible worlds) is true. – θ=“doubles rolled” à asserts event
“doubles” is true à asserts {[1,1] ∨ [2,2] ∨...∨ [6,6]} is true. – Propositions can be
compound: θ=(doubles ∧(total>4)) – P(θ) = Σω∈θ P(ω) à probability of proposition is
sum of its parts
Prior probability: The prior probability of an event is probability computed before
observing new information.
Posterior Probability: The probability that is calculated after all evidence or
information has taken into account. It is a combination of prior probability and new
information.

DEPT. OF AIML PAGE 96


Conditional probability: Conditional probability is a probability of occurring an
event when another event has already happened.
Let's suppose, we want to calculate the event A when event B has already occurred,
"the probability of A under the conditions of B", it can be written as:

Where P(A⋀B)= Joint probability of a and B


P(B)= Marginal probability of B.
If the probability of A is given and we need to find the probability of B, then it will be
given as:

It can be explained by using the below Venn diagram, where B is occurred event, so
sample space will be reduced to set B, and now we can only calculate event A when
event B is already occurred by dividing the probability of P(A⋀B) by P( B ).

Probability Distributions
One can express the probability of a proposition: –P(Weather=sunny) = 0.6 ;
P(Cavity=false) = P(¬cavity)=0.1
Probability Distribution expresses all possible probabilities for some event – So for:
P(Weather=sunny) = 0.6; P(Weather=rain) = 0.1; P(Weather) = {0.72, 0.1, 0.29, 0.01}
for Weather={sun, rain, clouds, snow}
Hence probability distribution can be seen as total function that returns probabilities
for all values of Weather. It is normalized, i.e., sum of all probabilities adds up to 1.
Joint Probability Distribution: for a set of random variables, gives probability for every
combination of values of every variable. – Gives probability for every event within the
sample space – P(Weather, Cavity) = a 4x2 matrix of values:

Full Joint Probability Distribution = joint distribution for all random variables in
domain. Every probability question about a domain can be answered by full joint
distribution, because every event is a sum of sample points (variable/value pairs). if
the variables are Cavity, Toothache, and Weather, then the full joint distribution is
given by P(Cavity, Toothache, Weather ). This joint distribution can be represented as
a 2 × 2 × 4 table with 16 entries. Because every proposition’s probability is a sum over
possible worlds, a full joint distribution suffices, in principle, for calculating the

DEPT. OF AIML PAGE 97


probability of any proposition.
Some variables are continuous, e.g. P(Temp=82.3) = 0.23; P(Temp=82.5)= 0.24; etc.
• Also could assert ranges: P(Temp<85) or P(40 < Temp <67)
We can express distributions as a parameterized function of value: • P (X = x) = U [18,
26](x) = uniform density between 18 and 26. It is known as a probability density
function (pdf). P is a really a density distribution; the whole range integrates to 1.
The distribution pictorially looks like:

Kolomogorovs Axioms:
1. P(¬a) = 1- P(a)
Proof:
P(¬a) = Pω∈¬a P(ω) (Definition of probability of any event)
= Pω∈¬a P(ω) + Pω∈a P(ω) - Pω∈a P(ω)
= Pω∈Ω P(ω) - Pω∈a P(ω) (grouping the first two terms)
= 1 - P(a)

2. Probability distribution on a discrete random variable is 1


Proof:Let X be a discrete random variables of n events
Then, X={X1,X2,……..,Xn}
P(X)=P(X1,X2…,Xn)
=P(X=X1)+P(X=X2)+….+P(X=Xn)
=∑𝑛𝑥=1 𝑃(𝑥)
= Pω∈Ω P(ω) (All sample spaces)
=1

3. P(A|B)=1-P(A’|B)
Proof:
P(B)=P((A∩B)∪(A′∩B))
=P(A∩B)+P(A′∩B)
P(A∩B)=P(B)−P(A′∩B)
Divide each term by P(B)
P(A∩B)/P(B)=1- P(A′∩B)/P(B)
P(A|B)=1-P(A’|B) (By definition of conditional probability)

DEPT. OF AIML PAGE 98


4. Inclusion-Exclusion principle

𝑃(𝐴 ∧ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∨ 𝐵)

Proof:

P(AUB)=P(AU(B-A))
=P(A)+P(B-A)
=P(A)+P(B-A)+P(A∩B)- P(A∩B)
=P(A)+P((B-A)U(A∩B))- P(A∩B)
=P(A)+P(B)- P(A∩B)

Eg: Given that bus arriving late=0.3 and a student oversleeping


probability is 0.4., find the probability that student gets late

Let A=event that bus arrives late


B=event that student oversleeps
Given, P(A)=0.3 and P(B)=0.4
As per inclusion-exclusion principle
P(student gets late)=P(bus arrives late or student oversleeps)
=P(AUB)
= P(A)+P(B)- P(A∩B)
=0.3+0.4-P(A).P(B) [as A and B can both occur
independently]
=0.3+0.4-0.3*0.4=0.7-0.12=0.58

Expectations and probability


A gambling theory where a money is attached to the outcome of the probability is
called expectations. Think of it as a game between two agents: Agent 1 states, “my
degree of belief in event a is 0.4.” Agent 2 is then free to choose whether to wager for
or against a at stakes that are consistent with the stated degree of belief. That is, Agent
2 could choose to accept Agent 1’s bet that a will occur, offering $6 against Agent 1’s
$4. Or Agent 2 could accept Agent 1’s bet that ¬a will occur, offering $4 against Agent
1’s $6. Then we observe the outcome of a, and whoever is right collects the money. If
an agent’s degrees of belief do not accurately reflect the world, then you would expect
that it would tend to lose money over the long run to an opposing agent whose beliefs
more accurately reflect the state of the world

DEPT. OF AIML PAGE 99


Thus in all cases agent1 loses money due to the belief values it has attached.

Inference using Full Joint Distributions


Logical Inference is asking whether something is true (entailed), given the KB.
However, Probabilistic Inference is about asking how likely something is, given the
KB . The process is to just compute the posterior probability for query proposition,
given KB!. We use the full joint probability distribution as the KB! Which contains the
probability of all possible worlds!. The mechanism of the inference is to look up the
probability of a query proposition. Further, extract and sum up the appropriate “slice”
of the joint distribution.
Example: Consider a world with just three boolean variables – Toothache (has one or
not) – Cavity (has or not) – Catch (dentists tool catches or not).
Start with the full joint distribution for this world:

Marginalization:

Sum up probabilities across values of other (non-specified) variables. In this case:


Cavity and Catch.
Generally: P(Y) = Σz∈Z P(Y,z) ,or also, by product rule: P(Y)=Σz∈Z P(Y|z) P(z)

DEPT. OF AIML PAGE 100


Mutual Exclusion

P (cavity∨toothache) = 0.108+0.012+0.072+0.008+0.016+0.064 = 0.28


For any proposition φ, the P(φ) = sum the atomic events where it is true:
P (φ) = Σω:ω|=φP (ω)

Conditional probabilities

Normalization

Denominator can be viewed as a normalization constant α for the distribution


P(Cavity| toothache) ensures that the probability of the distribution adds up to 1.
P(Cavity|toothache) = α P(Cavity, toothache)
= α [P(Cavity, toothache, catch) + P(Cavity, toothache, ¬catch)]
= α [(0.108, 0.016) + (0.012, 0.064)]
= α (0.12, 0.08) = (0.6, 0.4)

DEPT. OF AIML PAGE 101


Note that proportions between (0.12, 0.08) and (0.6, 0.4) are same. The latter are just
normalized by application of α to add up to 1. So if α just normalizes, one could also
normalize “manually” à divide by sum of two.

Problems
1. In a shipment of 20 apples, 3 are rotten. 3 apples are randomly selected.
What is the probability that all three are rotten if the first and second are
not replaced?

Solution:
P(first apple is rotten)=3/20
P(second apple is rotten)=2/19 (as first apple is not replaced)
P(third apple is rotten)=1/18 (as second apple is also not replaced)
P(all three are rotten)=3/20 * 2/19 * 1/18=1/1140

2. A die is cast twice and a coin is tossed twice. What is the probability that
the die will turn a 6 each time and the coin will turn a tail every time?

Solution:
P(die turn up 6 first time)=1/6
P(die turn up 6 second time)=1/6
P(coin turn up tail first time)=1/2
P(coin turn up tail second time)=1/2
P(die turn up 6 and coin turn up tail each time)=1/6*1/6*1/2*1/2=1/144

3. An instructor has a question bank with 300 Easy T/F, 200 Difficult T/F,
500 Easy MCQ, and 400 Difficult MCQ. If a question is selected randomly
from the question bank, What is the probability that it is an easy question
given that it is an MCQ?
Solution
P(Easy)=800/1400
P(MCQ)=900/1400
P(Easy ∧ 𝑀𝐶𝑄)=500/1400
P(Easy | MCQ)=P(Easy ∧ 𝑀𝐶𝑄)/P(MCQ)=500/900=5/9

Independence of variables
The problem of full joint distribution get huge fast with the cross-product of all
variables, all values in their range. Different probability for every
variables...conditional on all values of all other variables is required. But are all of these
variables really related? Is every variable really related to all others?
Consider P(toothache, catch, cavity, cloudy) à 2 x 2 x 2 x 4 joint distr. = 32 entries –
By product rule:
P(toothache, catch, cavity, cloudy) = P(cloudy|toothache,catch,cavity)
P(touchache,catch,cavity).
But it the weather really conditional on toothaches, cavities and dentist’s tools? No! –
So realistically:
P(cloudy|toothache,catch,cavity) = P(cloudy).
So then actually:
P(toothache, catch, cavity, cloudy) = P(cloudy) P(touchache,catch,cavity).

DEPT. OF AIML PAGE 102


We say that cloudy and dental variables are independent (also absolute
independence). Effectively: the 32-element joint distribution table becomes one 8-
element table + 4-element table

Independence assertions based on judgment, specific knowledge of domain. It can


dramatically reduce information needed for full joint distribution

Bayes' theorem:
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines the probability of an event with uncertain knowledge. In probability
theory, it relates the conditional probability and marginal probabilities of two random
events. Bayes' theorem was named after the British mathematician Thomas Bayes.
The Bayesian inference is an application of Bayes' theorem, which is fundamental to
Bayesian statistics.

It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).


Bayes' theorem can be derived using product rule and conditional probability of event
A with known event B:
As from product rule we can write:
P(A ⋀ B)= P(A|B) P(B) or
Similarly, the probability of event B with known event A:
P(A ⋀ B)= P(B|A) P(A)
Equating right hand side of both the equations, we will get:

The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic
of most modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as
Probability of hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we
calculate the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the
evidence

DEPT. OF AIML PAGE 103


P(B) is called marginal probability, pure probability of an evidence.
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes'
rule can be written as:

Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Applying Bayes' rule:
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and
P(A). This is very useful in cases where we have a good probability of these three terms
and want to determine the fourth one. Suppose we want to perceive the effect of some
unknown cause, and want to compute that cause, then the Bayes' rule becomes:

Eg: a doctor knows that the disease meningitis causes the patient to have
a stiff neck,say, 70% of the time. probability that a patient has meningitis
is 1/50,000, and probability that any patient has a stiff neck is 1%. Find
probability of meningitis due to stiff neck
o The Known probability that a patient has meningitis disease is 1/50,000.
o The Known probability that a patient has a stiff neck is 1%.
Let s be the proposition that patient has stiff neck and m be the proposition that patient
has meningitis. , so we can calculate the following as:
P(m|s) = 0.7
P(m) = 1/50000
P(s)= .02

Problems on Bayes Theorem:


In a neighbourhood, 90% children were falling sick due flu and 10% due to
measles and no other disease. The probability of observing rashes for
measles is 0.95 and for flu is 0.08. If a child develops rashes, find the
child’s probability of having flu.
Solution:
Let,
F: children with flu
M: children with measles
R: children showing the symptom of rash
P(F) = 90% = 0.9
P(M) = 10% = 0.1
P(R|F) = 0.08

DEPT. OF AIML PAGE 104


P(R|M) = 0.95

Bayes Theorem with combined evidence

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’


Theorem. It is not a single algorithm but a family of algorithms where all of them
share a common principle, i.e. every pair of features being classified is independent
of each other. To start with, let us consider a dataset.One of the most simple and
effective classification algorithms, the Naïve Bayes classifier aids in the rapid
development of machine learning models with rapid prediction capabilities. Naïve
Bayes algorithm is used for classification problems. It is highly used in text
classification. In text classification tasks, data contains high dimension (as each word
represent one feature in the data). It is used in spam filtering, sentiment detection,
rating classification etc. The advantage of using naïve Bayes is its speed. It is fast and
making prediction is easy with high dimension of data.
The “Naive” part of the name indicates the simplifying assumption made by
the Naïve Bayes classifier. The classifier assumes that the features used to describe
an observation are conditionally independent, given the class label. The “Bayes” part
of the name refers to Reverend Thomas Bayes, an 18th-century statistician and
theologian who formulated Bayes’ theorem.

Probability and Wumpus world


Uncertainty arises in the wumpus world because the agent’s sensors give only partial
information about the world. For example, Figure below shows a situation in which
each of the three reachable squares—[1,3], [2,2], and [3,1]—might contain a pit. Pure
logical inference can conclude nothing about which square is most likely to be safe, so
a logical agent might have to choose randomly. Probabilistic agent performs better
than logic agent.

DEPT. OF AIML PAGE 105


The aim is to calculate the probability that each of the three squares contains a pit. The
relevant properties of the wumpus world are that (1) a pit causes breezes in all
neighboring squares, and (2) each square other than [1,1] contains a pit with
probability 0.2. Wumpus world is divided into three regions: Query node whose
probability is seeked. Known nodes whose probability is known. Frontier nodes are
nodes adjacent to known nodes except query node. Remaining nodes in the Wumpus
world form other nodes.

Different probability calculations are shown.


Case-1: [1,3] as a pit and various combinations of frontiers are computed.

Case-2: [1,3] do not have put and various combinations of frontiers are computed.

DEPT. OF AIML PAGE 106

You might also like