AI&ML 4 & 5 Module Notes
AI&ML 4 & 5 Module Notes
Atomic sentences:
o Atomic sentences are the most basic sentences of first-order logic. These
sentences are formed from a predicate symbol followed by a parenthesis with a
sequence of terms.
o We can represent atomic sentences as
Predicate (term1, term2, ......, term n).
Example: Ravi and Ajay are brothers: => Brothers(Ravi, Ajay).
Chinky is a cat: => cat (Chinky).
Complex Sentences:
o Complex sentences are made by combining atomic sentences using connectives.
First-order logic statements can be divided into two parts:
o Subject: Subject is the main part of the statement.
Kinship domain
Nested Quantifiers
Nested quantifiers are quantifiers that occur within the scope of other quantifiers.
Example: ∀x∃yP(x, y). Quantifier order matters! ∀x∃yP(x, y) ≠ ∃y∀xP(x, y)
Let L(x, y) be the statement “x loves y,” where the domain for both x and y consists of
all people in the world. Use quantifiers to express each of these statements.
a) Everybody loves Jerry. ∀x L(x, Jerry)
b) Everybody loves somebody. ∀x∃yL(x, y)
c) There is somebody whom everybody loves. ∃y∀xL(x, y)
d) Nobody loves everybody. ∀x∃y¬L(x, y) or ¬∃x∀yL(x, y)
e) Everyone loves himself or herself ∀xL(x, x)
▪ 0 is an arithmetic identity as
▪ ∀m, NatNum(m) ⇒ +(0, m) = m
A set is a collection of objects; any one of the objects in a set is called a member or
an element of the set.
The basic statement in set theory is element inclusion: an element a is included in
some set S. Formally written as:
Statements are either true or false, depending on the context. For example, given the
above sets, the first statement is true, whereas the second is false. If a statement S is
true in a given context C, we say the statement is valid in C. Formally, we write this as:
Subsets:
∀ s1, s2 s1 ⊆ s2 ⇔ (∀ x x ∈ s1 ⇒ x ∈ s2) .
List vs Sets
Lists are similar to sets. The differences are that lists are ordered and the same element
can appear more than once in a list. We can use the vocabulary of Lisp for lists:
Nil is the constant list with no elements;
Cons, Append, First, and Rest are functions; and
Find is the predicate that does for lists what Member does for sets.
List? is a predicate that is true only of lists. elements,
As propositional logic we also have inference rules in first-order logic, so following are
some basic inference rules in FOL:
o Universal Instantiation
o Existential Instantiation
Universal Instantiation:
o Universal instantiation is also called as universal elimination or UI is a valid
inference rule. It can be applied multiple times to add new sentences.
o The new KB is logically equivalent to the previous KB.
o As per UI, we can infer any sentence obtained by substituting a ground term for
the variable.
o The UI rule state that we can infer any sentence by substituting a ground term
v with g in the universe of discourse.
o
Example:1.
o IF "Every person like ice-cream"=> we can infer
"John likes ice-cream" => P(c)
Example: 2.
o "All kings who are greedy are Evil." So let our knowledge base contains this
detail as in the form of FOL:
o ∀x king(x) ∧ greedy (x) → Evil (x),
So from this information, we can infer any of the following statements using Universal
Instantiation:
o King(John) ∧ Greedy (John) → Evil (John),
o King(Richard) ∧ Greedy (Richard) → Evil (Richard),
o King(Father(John)) ∧ Greedy (Father(John)) → Evil (Father(John)),
Existential Instantiation:
o Existential instantiation is also called as Existential Elimination, which is a
valid inference rule in first-order logic.
o It can be applied only once to replace the existential sentence.
o The new KB is not logically equivalent to old KB, but it will be satisfiable if old
KB was satisfiable.
o Represented as:
Example:
From the given sentence: ∃x Crown(x) ∧ OnHead(x, John),
So we can infer: Crown(K) ∧ OnHead( K, John), as long as K does not appear in the
knowledge base.
o The above used K is a constant symbol, which is called Skolem constant.
o The Existential instantiation is a special case of Skolemization process.
Example:
We will use this rule for Kings are evil, so we will find some x such that x is king, and
x is greedy so we can infer that x is evil.
1. p1' is king(John) p1 is king(x)
2. p2' is Greedy(y) p2 is Greedy(x)
3. θ is {x/John, y/John} q is evil(x)
4. SUBST(θ,q).
Unification
o Unification is a process of making two different logical atomic expressions
identical by finding a substitution. Unification depends on the substitution
process.
o It takes two literals as input and makes them identical using substitution.
o Let Ψ1 and Ψ2 be two atomic sentences and 𝜎 be a unifier such that, Ψ1𝜎 = Ψ2𝜎,
then it can be expressed as UNIFY(Ψ1, Ψ2).
o Example: Find the MGU for Unify{King(x), King(John)}
Let Ψ1 = King(x), Ψ2 = King(John),
Substitution θ = {John/x} is a unifier for these atoms and applying this substitution,
and both expressions will be identical.
o The UNIFY algorithm is used for unification, which takes two atomic sentences
and returns a unifier for those sentences (If any exist).
o Unification is a key component of all first-order inference algorithms.
o It returns fail if the expressions do not match with each other.
o The substitution variables are called Most General Unifier or MGU.
Conditions for Unification:
Following are some basic conditions for unification:
o Predicate symbol must be same, atoms or expression with different predicate
symbol can never be unified.
o Number of Arguments in both expressions must be identical.
o Unification will fail if there are two similar variables present in the same
expression.
Unification Algorithm:
Algorithm: Unify(Ψ1, Ψ2)
Step. 1: If Ψ1 or Ψ2 is a variable or constant, then:
a) If Ψ1 or Ψ2 are identical, then return NIL.
b) Else if Ψ1is a variable,
a. then if Ψ1 occurs in Ψ2, then return FAILURE
b. Else return { (Ψ2/ Ψ1)}.
For each pair of the following atomic sentences find the most general unifier (If exist).
1. Find the MGU of {p(f(a), g(Y)) and p(X, X)}
Sol: S0 => Here, Ψ1 = p(f(a), g(Y)), and Ψ2 = p(X, X)
SUBST θ= {f(a) / X}
S1 => Ψ1 = p(f(a), g(Y)), and Ψ2 = p(f(a), f(a))
SUBST θ= {f(a) / g(y)}, Unification failed.
Unification is not possible for these expressions.
Subsitute y/Richard to get left child, x/IBM to get right child. Later applying the same
yields common child.
Eg:
Step-2:
At the second step, we will see those facts which infer from available facts and with
satisfied premises.
Rule-(1) does not satisfy premises, so it will not be added in the first iteration.
Rule-(2) and (3) are already added.
Rule-(4) satisfy with the substitution {p/T1}, so Sells (Robert, T1, A) is added, which
infers from the conjunction of Rule (2) and (3).
Step-3:
At step-3, as we can check Rule-(1) is satisfied with the substitution {p/Robert, q/T1,
r/A}, so we can add Criminal(Robert) which infers all the available facts. And hence
we reached our goal statement.
Backward Chaining:
Backward-chaining is also known as a backward deduction or backward reasoning
method when using an inference engine. A backward chaining algorithm is a form of
reasoning, which starts with the goal and works backward, chaining through rules to
find known facts that support the goal.
Example:
In backward-chaining, we will use the same above example, and will rewrite all the
rules.
o American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)
Owns(A, T1) ........(2)
o Missile(T1)
o ?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
o Missile(p) → Weapons (p) .......(5)
o Enemy(p, America) →Hostile(p) ........(6)
o Enemy (A, America) .........(7)
o American(Robert). ..........(8)
Step-2:
At the second step, we will infer other facts form goal fact which satisfies the rules. So
as we can see in Rule-1, the goal predicate Criminal (Robert) is present with
substitution {Robert/P}. So we will add all the conjunctive facts below the first level
and will replace p with Robert.
Here we can see American (Robert) is a fact, so it is proved here.
Step-3:t At step-3, we will extract further fact Missile(q) which infer from Weapon(q),
as it satisfies Rule-(5). Weapon (q) is also true with the substitution of a constant T1 at
q.
Step-4:
Step-5:
At step-5, we can infer the fact Enemy(A, America) from Hostile(A) which satisfies
Rule- 6. And hence all the statements are proved true using backward chaining.
Resolution in FOL
Resolution is a theorem proving technique that proceeds by building refutation proofs,
i.e., proofs by contradictions. It was invented by a Mathematician John Alan Robinson
in the year 1965. Resolution is used, if there are various statements are given, and we
need to prove a conclusion of those statements. Unification is a key concept in proofs
by resolutions. Resolution is a single inference rule which can efficiently operate on
the conjunctive normal form or clausal form.
Step-4: Draw Resolution graph: Now in this step, we will solve the problem by
resolution tree using substitution. For the above problem, it will be given as follows:
Hence the negation of the conclusion has been proved as a complete contradiction with
the given set of statements.
Explanation of Resolution graph:
o In the first step of resolution graph, ¬likes(John, Peanuts) , and likes(John,
x) get resolved(canceled) by substitution of {Peanuts/x}, and we are left with ¬
food(Peanuts)
We can find the probability of an uncertain event by using the below formula.
It can be explained by using the below Venn diagram, where B is occurred event, so
sample space will be reduced to set B, and now we can only calculate event A when
event B is already occurred by dividing the probability of P(A⋀B) by P( B ).
Probability Distributions
One can express the probability of a proposition: –P(Weather=sunny) = 0.6 ;
P(Cavity=false) = P(¬cavity)=0.1
Probability Distribution expresses all possible probabilities for some event – So for:
P(Weather=sunny) = 0.6; P(Weather=rain) = 0.1; P(Weather) = {0.72, 0.1, 0.29, 0.01}
for Weather={sun, rain, clouds, snow}
Hence probability distribution can be seen as total function that returns probabilities
for all values of Weather. It is normalized, i.e., sum of all probabilities adds up to 1.
Joint Probability Distribution: for a set of random variables, gives probability for every
combination of values of every variable. – Gives probability for every event within the
sample space – P(Weather, Cavity) = a 4x2 matrix of values:
Full Joint Probability Distribution = joint distribution for all random variables in
domain. Every probability question about a domain can be answered by full joint
distribution, because every event is a sum of sample points (variable/value pairs). if
the variables are Cavity, Toothache, and Weather, then the full joint distribution is
given by P(Cavity, Toothache, Weather ). This joint distribution can be represented as
a 2 × 2 × 4 table with 16 entries. Because every proposition’s probability is a sum over
possible worlds, a full joint distribution suffices, in principle, for calculating the
Kolomogorovs Axioms:
1. P(¬a) = 1- P(a)
Proof:
P(¬a) = Pω∈¬a P(ω) (Definition of probability of any event)
= Pω∈¬a P(ω) + Pω∈a P(ω) - Pω∈a P(ω)
= Pω∈Ω P(ω) - Pω∈a P(ω) (grouping the first two terms)
= 1 - P(a)
3. P(A|B)=1-P(A’|B)
Proof:
P(B)=P((A∩B)∪(A′∩B))
=P(A∩B)+P(A′∩B)
P(A∩B)=P(B)−P(A′∩B)
Divide each term by P(B)
P(A∩B)/P(B)=1- P(A′∩B)/P(B)
P(A|B)=1-P(A’|B) (By definition of conditional probability)
Proof:
P(AUB)=P(AU(B-A))
=P(A)+P(B-A)
=P(A)+P(B-A)+P(A∩B)- P(A∩B)
=P(A)+P((B-A)U(A∩B))- P(A∩B)
=P(A)+P(B)- P(A∩B)
Marginalization:
Conditional probabilities
Normalization
Problems
1. In a shipment of 20 apples, 3 are rotten. 3 apples are randomly selected.
What is the probability that all three are rotten if the first and second are
not replaced?
Solution:
P(first apple is rotten)=3/20
P(second apple is rotten)=2/19 (as first apple is not replaced)
P(third apple is rotten)=1/18 (as second apple is also not replaced)
P(all three are rotten)=3/20 * 2/19 * 1/18=1/1140
2. A die is cast twice and a coin is tossed twice. What is the probability that
the die will turn a 6 each time and the coin will turn a tail every time?
Solution:
P(die turn up 6 first time)=1/6
P(die turn up 6 second time)=1/6
P(coin turn up tail first time)=1/2
P(coin turn up tail second time)=1/2
P(die turn up 6 and coin turn up tail each time)=1/6*1/6*1/2*1/2=1/144
3. An instructor has a question bank with 300 Easy T/F, 200 Difficult T/F,
500 Easy MCQ, and 400 Difficult MCQ. If a question is selected randomly
from the question bank, What is the probability that it is an easy question
given that it is an MCQ?
Solution
P(Easy)=800/1400
P(MCQ)=900/1400
P(Easy ∧ 𝑀𝐶𝑄)=500/1400
P(Easy | MCQ)=P(Easy ∧ 𝑀𝐶𝑄)/P(MCQ)=500/900=5/9
Independence of variables
The problem of full joint distribution get huge fast with the cross-product of all
variables, all values in their range. Different probability for every
variables...conditional on all values of all other variables is required. But are all of these
variables really related? Is every variable really related to all others?
Consider P(toothache, catch, cavity, cloudy) à 2 x 2 x 2 x 4 joint distr. = 32 entries –
By product rule:
P(toothache, catch, cavity, cloudy) = P(cloudy|toothache,catch,cavity)
P(touchache,catch,cavity).
But it the weather really conditional on toothaches, cavities and dentist’s tools? No! –
So realistically:
P(cloudy|toothache,catch,cavity) = P(cloudy).
So then actually:
P(toothache, catch, cavity, cloudy) = P(cloudy) P(touchache,catch,cavity).
Bayes' theorem:
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines the probability of an event with uncertain knowledge. In probability
theory, it relates the conditional probability and marginal probabilities of two random
events. Bayes' theorem was named after the British mathematician Thomas Bayes.
The Bayesian inference is an application of Bayes' theorem, which is fundamental to
Bayesian statistics.
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic
of most modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as
Probability of hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we
calculate the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the
evidence
Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Applying Bayes' rule:
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and
P(A). This is very useful in cases where we have a good probability of these three terms
and want to determine the fourth one. Suppose we want to perceive the effect of some
unknown cause, and want to compute that cause, then the Bayes' rule becomes:
Eg: a doctor knows that the disease meningitis causes the patient to have
a stiff neck,say, 70% of the time. probability that a patient has meningitis
is 1/50,000, and probability that any patient has a stiff neck is 1%. Find
probability of meningitis due to stiff neck
o The Known probability that a patient has meningitis disease is 1/50,000.
o The Known probability that a patient has a stiff neck is 1%.
Let s be the proposition that patient has stiff neck and m be the proposition that patient
has meningitis. , so we can calculate the following as:
P(m|s) = 0.7
P(m) = 1/50000
P(s)= .02
Case-2: [1,3] do not have put and various combinations of frontiers are computed.