AI Unit-3
AI Unit-3
RCA-403
Syllabus
UNIT-I INTRODUCTION: - Introduction to Artificial Intelligence, Foundations and
History of Artificial Intelligence, Applications of Artificial Intelligence, Intelligent
Agents, Structure of Intelligent Agents. Computer vision, Natural Language
Possessing.
UNIT-II INTRODUCTION TO SEARCH: - Searching for solutions, uniformed
search strategies, informed search strategies, Local search algorithms and optimistic
problems, Adversarial Search, Search for Games, Alpha - Beta pruning.
UNIT-III KNOWLEDGE REPRESENTATION & REASONING: -
Propositional logic, Theory of first order logic, Inference in First order logic,
Forward & Backward chaining, Resolution, Probabilistic reasoning, Utility
theory, Hidden Markov Models (HMM), Bayesian Networks.
Syllabus
UNIT-IV MACHINE LEARNING: - Supervised and unsupervised learning,
Decision trees, Statistical learning models, learning with complete data - Naive
Bayes models, Learning with hidden data – EM algorithm, Reinforcement
learning.
There are mainly four ways of knowledge representation which are given as
follows:
1. Logical Representation
2. Semantic Network Representation
3. Frame Representation
4. Production Rules
Logical Representation
Logical representation is a language with some concrete rules which deals with
propositions and has no ambiguity in representation. Logical representation means
drawing a conclusion based on various conditions. Each sentence can be translated
into logics using syntax and semantics.
Syntax: Syntaxes are the rules which decide how we can construct legal
sentences in the logic. It determines which symbol we can use in knowledge
representation. How to write those symbols?
Semantics: Semantics are the rules by which we can interpret the sentence
in the logic. Semantic also involves assigning a meaning to each sentence.
Logical representation can be categorized into mainly two logics:
• Propositional Logics
• Predicate logics
Propositional Logics (PL) or Propositional Calculus
A proposition or sentence is classified as declarative sentence whose value is either
‘TRUE’ or ‘FALSE’.
For example:
• New Delhi is the capital of India Propositions
• Square root of 4 is 2
• No, thank you Assertions, we can’t assign TRUE or FALSE with it.
Types of propositions:
• Atomic or simple propositions
• Compound or complex or molecular propositions
Propositional Logics (PL) or Propositional Calculus
Atomic Propositions: In logic, a statement which cannot be broken down into
smaller statements, also simply called an “atom”. Small letters like p, q, r, s etc are
used to represent atomic propositions. The examples of atomic propositions are-
p : Sun rises in the east.
q : Sun sets in the west.
r : Apples are red.
s : Grapes are green.
Compound propositions are those propositions that are formed by combining one
or more atomic propositions using connectives. Capital letters like P, Q, R, S etc are
used to represent compound propositions. Examples-
P : Sun rises in the east and Sun sets in the west.
Q : Apples are red and Grapes are green.
Propositional Logics (PL) or Propositional Calculus
Syntax of Propositions:
Uppercase letters are used to denote the atomic symbols like P, Q, R and so on.
Logical connectives: (negation), (conjunction), (disjunction),
(implication), (bi-conditional)
Semantics of propositions: The semantics of the propositional calculus defines the
meanings of its sentences. The truth assignments for sentence involving connectives
is defined by the following table:
Propositional Logics (PL) or Propositional Calculus
Semantics of propositions - Example: Consider the following propositions:
• P: It is cloudy
• Q: It is raining
Now,
• P: It is not cloudy
• Q: It is not raining
• P Q: It is cloudy and it is raining
• P Q: It is cloudy or it is raining
• P Q: It is cloudy indicates it is raining
• P Q: It is cloudy indicates it is raining and it is raining indicates it is cloudy
Well-formed formulas (wff) in Propositional Logic
Well-formed formula consists of atomic symbols joined with connectives. Thus, P
is propositional variable then it is wff.
• P is wff then P is wff
• P and Q are wff then
P Q, P Q, P Q, P Q are wff
Example:
1. Convert the following sentence into predicate logic and then prove "Is someone
smiling?” using resolution:
• All people who are graduating are happy
• All happy people smile
• Someone is graduating
• Someone is smiling
2. Anyone passing his history exams and winning the lottery is happy. But anyone
who studies or is lucky can pass all his exams. John did not study but John is
lucky. Anyone who is lucky wins the lottery. Is John happy?
Predicate Logic or First Order Logic
Solution 2:
Probabilistic reasoning in Artificial intelligence
In probabilistic reasoning, there are two ways to solve problems with uncertain
knowledge:
1. Bayes' rule
2. Bayesian Statistics
Probabilistic reasoning in Artificial intelligence
Conditional probability:
Conditional probability is a probability of occurring an event when another event
has already happened. Consider the event A when event B has already occurred, "the
probability of A under the conditions of B", it can be written as:
𝑃(𝐴 )𝐵 ځ
𝑃 𝐴𝐵 =
𝑃(𝐵)
Where P(A ځB)= Joint probability of A and B, P(B)= Marginal probability of B.
If the probability of A is given and we need to find the probability of B, then it will
be given as:
𝑃(𝐴 )𝐵 ځ
𝑃 𝐵𝐴 =
𝑃(𝐴)
Probabilistic reasoning in Artificial intelligence
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning,
which determines the probability of an event with uncertain knowledge. In
probability theory, it relates the conditional probability and marginal probabilities of
two random events. It is a way to calculate the value of P(B|A) with the knowledge
of P(A|B).
As from product rule we can write: 𝑃(𝐴 𝐵 𝑃 𝐵 𝐴 𝑃 =)𝐵 ځ
Similarly, the probability of event B with known event A: 𝑃(𝐴 )𝐴(𝑃 𝐴 𝐵 𝑃 =)𝐵 ځ
Equating right hand side of both the equations, we will get: 𝑃 𝐴 𝐵 𝑃(𝐵) =
𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴𝐵 =
𝑃(𝐵)
The above equation is called as Bayes' rule or Bayes' theorem. This equation is
basic of most modern AI systems for probabilistic inference.
Probabilistic reasoning in Artificial intelligence
Utility theory
• The main idea of Utility Theory is: an agent's preferences over possible outcomes
can be captured by a function that maps these outcomes to a real number; the
higher the number the more that agent likes that outcome. The function is called a
utility function.
• Utility Theory uses the notion of Expected Utility (EU) as a value that represents
the average utility of all possible outcomes of a state, weighted by the probability
that the outcome occurs.
• The agent can use probability theory to reason about uncertainty. The agent can
use utility theory for rational selection of actions based on preferences. Decision
theory is a general theory for combining probability with rational decisions
Decision theory = Probability theory + Utility theory
Hidden Markov Model (HMM)
HMM is a statistical Markov Model in which the system being modelled is assumed
to be a markov process with unobserved (hidden) states.
In probability theory, markov model or markov chain is a stochastic model used to
model randomly changing systems. It is assumed that the future states depend on the
current state, not on the events that occurred before it which is called Markov
Property.
• If system state is fully observable Markov Chain or Markov Model
• If system state is partially observable Hidden Markov Model (HMM)
Hidden Markov Model (HMM)
Definition of Markov Model: If S = (s1, s2, s3, …, sn) be a set of states and the
process moves from one state to another generating a sequence state i.e., si1, si2, si3,
…., sik. Then Markov chain property is used to calculate the probability of each
subsequent state depends only on what was the previous state, i.e.,
P(sik | si1, si2, si3, …., sik-1) = P (sik | Sik-1)
For example: If a person is feeling healthy today, then the probability will be high
for the goodness of his health tomorrow. Probability of any individual’s health on
day four can be represented as:
P(H4) = P(H4|H3).P(H3) + P(H4|I3).P(I3)
=P(H4|H3).(P(H3|H2).P(H2)+P(H3|I2).P(I2))+P(H4|I3).(P(I3|H2).P(H2)+P(I3|I2).P(
I2))
= ………. That continues in the same manner and such expression represents a
chain which is called Markov Chain
Hidden Markov Model (HMM)
Example 1: Consider the following initial probability as:
• P(R) = 0.7
• P(S) = 0.3
Transition Probability:
• R R = 0.4
• S S = 0.7
• R S = 0.6
• S R = 0.3
What is the probability of Day 2 to have raining?
Hidden Markov Model (HMM)
A hidden Markov model (HMM) is a Markov chain for which the state is only
partially observable. In other words, observations are related to the state of the
system, but they are typically insufficient to precisely determine the state.
An HMM is specified by the following components:
• Transition Probability: probability of transition of one state to another
• Emission Probability: probability of observed variable at a particular time
given the state of the hidden variable at that time
• Initial Probability: probability of starting the observation sequence.
Hidden Markov Model (HMM)
Let’s assume that we observe only three different kinds of weather, namely sunny,
rainy or foggy weather. We will now use a Markov Model to model the weather. The
Markov Model can be build using three states which is given by
{S = sunny, R = rainy, F = foggy}
Given that today is sunny, what's the probability that tomorrow is sunny and the day
after is rainy?
Transition probability is given as:
Assume the weather yesterday was rainy, today is foggy. What is the probability that
tomorrow is raining?
Bayesian Network
Bayesian networks are a type of Probabilistic Graphical Model that can be used to
build models from data and/or expert opinion. They can be used for a wide range of
tasks including prediction, anomaly detection, diagnostics, reasoning, time series
prediction and decision making under uncertainty. They are also commonly referred
to as Bayes nets, Belief networks and sometimes Causal networks. Let {X1, X2, …
Xn} be some events. In Bayesian Network, they can be represented as nodes. Now
if a node has some dependency on another node then an arrow/arc is drawn from
one node to another as shown below:
It is interpreted as the child node’s
occurrence is influenced by the occurrence
of its parent node. So Bayesian Network
represents a directed acyclic graph(DAG).
Bayesian Network
A Bayesian network is a DAG:
• Each node corresponds to a random variable
• Directed edges link nodes
• Edge goes from parent to child
• Each node has a conditional probability dist. P(Xi | Parents(Xi) )
• P(node|parent(node))
• Topology of network represents causality
Example: The Alarm Problem by Pearl 1990: You have a new burglary alarm
installed. It is reliable about detecting burglary, but responds to minor earthquakes.
Two neighbours (John, Mary) promise to call you at work when they hear the alarm.
John always calls when hears alarm, but confuses alarm with phone ringing (and
calls then also). Mary likes loud music and sometimes misses alarm! Given
evidence about who has and hasn’t called, estimate the probability of a burglary.
Bayesian Network
Solution: Represent problem using 5 binary variables:
• B = a burglary occurs at your house
• E = an earthquake occurs at your house
• A = the alarm goes off
• J = John calls to report the alarm
• M = Mary calls to report the alarm
Constructing the Bayesian Network:
1. Order the variables in terms of causality (may be a partial order) e.g., {E, B}
{A} {J, M}
2. Use these assumptions to create the graph structure of the Bayesian network
(Above):
3. Fill in Conditional Probability Table (CPT) a. One for each node b. 2p entries,
where p is the number of parents Note: Where do these probabilities come from:
expert knowledge, data (retrieval frequency estimates)
Bayesian Network
Example: What is the probability that alarm has gone off, both John and Mary call
the police, but nothing happened?
• Formulate the probability of burglary knowing that both John and Mary calls the
police?
Bayesian Network
Advantages:
• A Bayesian network is a complete and non-redundant representation of the
domain
• Each subcomponent interacts directly with only a bounded number of other
components, regardless of the total number of components
• Local structure is usually associated with linear rather than exponential growth
in complexity
• In domain of a Bayesian network, if each of the n random variables is
influenced by at most k others, then specifying each conditional probability
table will require at most 2k numbers
Bayesian Network
Disadvantages:
• BN tend to perform poorly on high dimensional data
• It may be wiser to leave out very weak dependencies from the network in
order to restrict the complexity, but this yields a lower accuracy
• Only useful when prior knowledge is reliable
• Adding nodes in the false order makes the network unnecessarily complex and
unintuitive
End of UNIT-III