0% found this document useful (0 votes)
16 views11 pages

Aies Unit 3

The document outlines the syllabus for the Artificial Intelligence and Expert Systems course at Sri Manakula Vinayagar Engineering College, focusing on key topics such as probability notations, Bayes' Rule, Bayesian networks, Hidden Markov Models, Kalman Filters, and Dempster-Shafer Theory. It includes definitions, explanations, and examples of various concepts in probabilistic reasoning and decision theory. Additionally, it discusses the importance of these theories in artificial intelligence applications.

Uploaded by

Sita Ram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views11 pages

Aies Unit 3

The document outlines the syllabus for the Artificial Intelligence and Expert Systems course at Sri Manakula Vinayagar Engineering College, focusing on key topics such as probability notations, Bayes' Rule, Bayesian networks, Hidden Markov Models, Kalman Filters, and Dempster-Shafer Theory. It includes definitions, explanations, and examples of various concepts in probabilistic reasoning and decision theory. Additionally, it discusses the importance of these theories in artificial intelligence applications.

Uploaded by

Sita Ram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

SRI MANAKULA VINAYAGAR ENGINEERING COLLEGE

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Subject Name: ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEMS


Subject Code: U20CST613

Prepared By:
Mrs.P. BHAVANI, Asst.Prof / CSE
Mr.S.KUMARAKRISHNAN, Asst.Prof / CSE

Verified by: Approved by:

SYLLABUS – UNIT - 3

Basic Probability Notations – Bayes Rule and its Applications – Bayesian Networks – Hidden Markov Models –
Kalman Filters, Dempster-Shafer Theory.

P a g e | 1 ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEM DEPARTMENT OF CSE


SRI MANAKULA VINAYAGAR ENGINEERING COLLEGE

2MARKS

1.What Is Called As Decision Theory ?


Preferences As Expressed by Utilities Are Combined with Probabilities in the General Theory of Rational
DecisionsCalledDecisionTheory.
Decision Theory = Probability Theory + Utility Theory.
2.Define Prior Probability?
P(a) for the Unconditional or Prior Probability Is That the Proposition A is True.
It is important to remember that p(a) can only be used when there is no other information.
3. Define conditional probability?(Jan 2024)
Once the agents has obtained some evidence concerning the previously unknown propositions making up the
domain conditional or posterior probabilities with the notation p(A/B) is used.This is important that p(A/B) can
only be used when all be is known.
4. Define probability distribution:
Eg.P(weather) = (0.7,0.2,0.08,0.02). This type of notations simplifies many equations.
5. What is an atomic event?
An atomic event is an assignment of particular values to all variables, in other words, the complete
specifications of the state of domain.
6. Define joint probability distribution
This completely specifies an agent's probability assignments to all propositions in the domain. The join
probability distribution p(x1,x2,--------xn) assigns probabilities to all possible atomic events; where X1,X2------
Xn =variables.
7. Give the Baye's rule equation
W.K.T P(A ^ B) = P(A/B) P(B) -------------------------- 1
P(A ^ B) = P(B/A) P(A) -------------------------- 2
DIVIDING BYE P(A) ; WE GET
P(B/A) = P(A/B) P(B)
--------------------
P(A)
8. What is meant by belief network?
A belief network is a graph in which the following holds
- A set of random variables
- A set of directive links or arrows connects pairs of nodes.
- The conditional probability table for each node
- The graph has no directed cycles.
9. What are the ways in which one can understand the semantics of a belief network?
There are two ways to see the network as a representation of the joint probability distribution to view it as
an encoding of collection of conditional independence statements.
10.What is meant by bayesian network (Jan 2024)
A Bayesian network is a probabilistic graphical model which represents a set of variables and their conditional
dependencies using a directed acyclic graph."It is also called a Bayes network, belief network, decision
network, or Bayesian model.
11. Why is Bayesian network important in AI?
Bayesian-Network in AI can be utilized for building models from data and specialists' ideas, and it comprises of
two sections like a Table of conditional probabilities and a Directed Acyclic Graph. Bayesian-network example:
It could address the probabilistic connections among symptoms and diseases.
12. What is Bayes rule in artificial intelligence?
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which determines the
probability of an event with uncertain knowledge. In probability theory, it relates the conditional probability
and marginal probabilities of two random events.
13. What is Bayes rule explain Bayes rule with example?
Bayes rule provides us with a way to update our beliefs based on the arrival of new, relevant pieces of
evidence . For example, if we were trying to provide the probability that a given person has cancer, we would
initially just say it is whatever percent of the population has cancer.
14. What is Hidden Markov model with example?

P a g e | 2 ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEM DEPARTMENT OF CSE


SRI MANAKULA VINAYAGAR ENGINEERING COLLEGE

Hidden Markov Models (HMMs) are a class of probabilistic graphical model that allow us to predict a
sequence of unknown (hidden) variables from a set of observed variables. A simple example of an HMM is
predicting the weather (hidden variable) based on the type of clothes that someone wears (observed).
15. What are different Hidden Markov Models?
Hidden Markov models (HMMs) have been extensively used in biological sequence analysis. In this paper, we
give a tutorial review of HMMs and their applications in a variety of problems in molecular biology. We
especially focus on three types of HMMs: the profile-HMMs, pair-HMMs, and context-sensitive HMMs.
16. How do hidden Markov models work?
The Hidden Markov Model (HMM) is a relatively simple way to model sequential data. A hidden Markov
model implies that the Markov Model underlying the data is hidden or unknown to you. More specifically, you
only know observational data and not information about the states.
17. What is Kalman filter in AI?
A Kalman Filter is an algorithm that takes data inputs from multiple sources and estimates unknown variables,
despite a potentially high level of signal noise.
18. What is the Kalman filter used for?
Kalman filters are used to optimally estimate the variables of interests when they can't be measured directly ,
but an indirect measurement is available. They are also used to find the best estimate of states by combining
measurements from various sensors in the presence of noise.
19. What is Dempster-Shafer theory in artificial intelligence?
Often used as a method of sensor fusion, Dempster–Shafer theory is based on two ideas: obtaining degrees of
belief for one question from subjective probabilities for a related question, and Dempster's rule for combining
such degrees of belief when they are based on independent items of evidence.
20. What is Dempster-Shafer theory compare it with Bayesian reasoning?
Dempster-Shafer was a further generalization of Bayesian Networks, in which malformed probability
distributions were permitted as a way to capture uncertainty.

P a g e | 3 ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEM DEPARTMENT OF CSE


SRI MANAKULA VINAYAGAR ENGINEERING COLLEGE

5 MARKS
Probabilistic reasoning:
 Probabilistic reasoning is a way of knowledge representation where we apply the concept of probability to
indicate the uncertainty in knowledge.
 In probabilistic reasoning, we combine probability theory with logic to handle the uncertainty.
 We use probability in probabilistic reasoning because it provides a way to handle the uncertainty that is the
result of someone's laziness and ignorance.
 In the real world, there are lots of scenarios, where the certainty of something is not confirmed, such as "It
will rain today," "behavior of someone for some situations," "A match between two teams or two players."
These are probable sentences for which we can assume that it will happen but not sure about it, so here we
use probabilistic reasoning.
Need of probabilistic reasoning in AI:
 When there are unpredictable outcomes.
 When specifications or possibilities of predicates becomes too large to handle.
 When an unknown error occurs during an experiment.
 In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge: Bayes' rule
 As probabilistic reasoning uses probability and related terms, so before understanding probabilistic
reasoning, let's understand some common terms:
Probability:
Probability can be defined as a chance that an uncertain event will occur. It is the numerical measure of
the likelihood that an event will occur. The value of probability always remains between 0 and 1 that
represent ideal uncertainties. 1. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A. 1. P(A) = 0,
indicates total uncertainty in an event A. 1. P(A) =1, indicates total certainty in an event A.

Dempster Shafer Theory (Jan 2024)


What Dempster Shafer Theory was given by Arthure P.Dempster in 1967 and his student Glenn Shafer in
1976.
This theory was released because of following reason:-
 Bayesian theory is only concerned about single evidences.
 Bayesian probability cannot describe ignorance.
DST is an evidence theory, it combines all possible outcomes of the problem. Hence it is used to solve problems
where there may be a chance that a different evidence will lead to some different result.
The uncertainty in this model is given by:-
1. Consider all possible outcomes.
2. Belief will lead to believe in some possibility by bringing out some evidence.(What is this supposed to
mean?)
3. Plausibility will make evidence compatible with possible outcomes.
Example
Let us consider a room where four people are present, A, B, C and D. Suddenly the lights go out and when the
lights come back, B has been stabbed in the back by a knife, leading to his death. No one came into the room
and no one left the room. We know that B has not committed suicide. Now we have to find out who the
murderer is.
To solve these there are the following possibilities:
 Either {A} or {C} or {D} has killed him.
 Either {A, C} or {C, D} or {A, C} have killed him.
 Or the three of them have killed him i.e; {A, C, D}
Advantages:
 As we add more information, uncertainty interval reduces.
 DST has much lower level of ignorance.
 Diagnose hierarchies can be represented using this.
 Person dealing with such problems is free to think about evidences.

P a g e | 4 ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEM DEPARTMENT OF CSE


SRI MANAKULA VINAYAGAR ENGINEERING COLLEGE

Disadvantages:
 In this, computation effort is high, as we have to deal with 2 n of sets.

Bayes' theorem:
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which determines the probability
of an event with uncertain knowledge.
In probability theory, it relates the conditional probability and marginal probabilities of two random events.
Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian inference is an
application of Bayes' theorem, which is fundamental to Bayesian statistics.
Bayes' theorem allows updating the probability prediction of an event by observing new information of the
real world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine the
probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A with known event
B:
As from product rule we can write:
P(A ⋀ B)= P(A|B) P(B) or
Similarly, the probability of event B with known event A:
P(A ⋀ B)= P(B|A) P(A)
Equating right hand side of both the equations, we will get:

The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of most modern
AI systems for probabilistic inference.
10 MARKS

Basic probability notation


Probability

Given the available evidence, A25 will get me there on time with probability 0:04

(Fuzzy logic handles degree of truth NOT uncertainty e.g., WetGrass is true to degree 0:2)

Probabilistic assertions summarize effects of

laziness: failure to enumerate exceptions, qualifications, etc.

ignorance: lack of relevant facts, initial conditions, etc.

Subjective or Bayesian probability:

Probabilities relate propositions to one's own state of knowledge

e.g., P(A25/no reported accidents) = 0:06

These are not claims of a \probabilistic tendency" in the current situation (but might be learned from past
experience of similar situations)

Probabilities of propositions change with new evidence:

e.g., P(A25jno reported accidents; 5 a.m.) = 0:15

(Analogous to logical entailment status KB j= _, not truth.)

Making decisions under uncertainty

Suppose I believe the following:

P a g e | 5 ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEM DEPARTMENT OF CSE


SRI MANAKULA VINAYAGAR ENGINEERING COLLEGE

P(A25 gets me there on timej : : :) = 0:04

P(A90 gets me there on timej : : :) = 0:70

P(A120 gets me there on timej : : :) = 0:95

P(A1440 gets me there on timej : : :) = 0:9999

Which action to choose?

Depends on my preferences for missing ight vs. airport cuisine, etc.

Utility theory is used to represent and infer preferences

Decision theory = utility theory + probability theory

Probabilistic Reasoning

Using logic to represent and reason we can represent knowledge about the world with facts and rules, like the
following ones

bird(tweety).

fly(X) :- bird(X).
We can also use a theorem-prover to reason about the world and deduct new facts about the world, for e.g.,
?- fly(tweety).
Yes
However, this often does not work outside of toy domains - non-tautologous certain rules are hard to find. A
way to handle knowledge representation in real problems is to extend logic by using certainty factors. In other
words, replace
IF condition THEN fact with
IF condition with certainty x THEN fact with certainty f(x) Unfortunately
cannot really adapt logical inference to probabilistic inference, since the latter is not context-free. Replacing
rules with conditional probabilities makes inferencing simpler.
Replace smoking -> lung cancer
or
lots of conditions, smoking -> lung cancer
with
P(lung cancer | smoking) = 0.6
Uncertainty is represented explicitly and quantitatively within probability theory, a formalism that has been
developed over centuries.
A probabilistic model describes the world in terms of a set S of possible states - the sample space. We don’t
know the true state of the world, so we (somehow) come up with a probability distribution over S which gives
the probability of any state being the true one.
The world usually described by a set of variables or attributes. Consider the probabilistic model of a fictitious
medical expert system. The ‘world’ is described by 8 binary valued variables:
Visit to Asia? A
Tuberculosis? T
Either tub. or lung cancer? E
Lung cancer? L
Smoking? S

P a g e | 6 ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEM DEPARTMENT OF CSE


SRI MANAKULA VINAYAGAR ENGINEERING COLLEGE

Bronchitis? B
Dyspnoea? D Positive X-ray? X
We have 28 = 256 possible states or configurations and so 256 probabilities to find. 10.3 Review of
Probability Theory .The primitives in probabilistic reasoning are random variables. Just like primitives in
Propositional Logic are propositions.

A random variable is not in fact a variable, but a function from a sample space S to another space,
often the real numbers. For example, let the random variable Sum (representing outcome of two die throws)
be defined thus:Sum(die1, die2) = die1 +die2
Each random variable has an associated probability distribution determined by the underlying distribution on
the sample space
Continuing our example : P(Sum = 2) = 1/36, P(Sum
= 3) = 2/36, . . . , P(Sum = 12) = 1/36
Consider the probabilistic model of the fictitious medical expert system mentioned before. The sample space
is described by 8 binary valued variables.

Explain Bayes’ Rule(Jan 2024)

Explain Bayesian networks


A simple, graphical notation for conditional independence assertions and hence for compact specification of
full joint distributions

Syntax:

a set of nodes, one per variable

P a g e | 7 ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEM DEPARTMENT OF CSE


SRI MANAKULA VINAYAGAR ENGINEERING COLLEGE

a directed, acyclic graph (link _ \directly influences")

a conditional distribution for each node given its parents:

In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the
distribution over Xi for each combination of parent values

Example

Topology of network encodes conditional independence assertions:

Weather is independent of the other variables

Toothache and Catch are conditionally independent given Cavity

Example

I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set
of by minor earthquakes. Is there a burglar?

Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls

Network topology reects \causal" knowledge:

 A burglar can set the alarm off

 An earthquake can set the alarm off

 The alarm can cause Mary to call

 The alarm can cause John to call

Hidden Markov Model

P a g e | 8 ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEM DEPARTMENT OF CSE


SRI MANAKULA VINAYAGAR ENGINEERING COLLEGE

Hidden Markov Model is an temporal probabilistic model for which a single discontinuous random variable
determines all the states of the system.

• It means that, possible values of variable = Possible states in the system.


• For example: Sunlight can be the variable and sun can be the only possible state.
• The structure of Hidden Markov model is restricted to the fact that basic algorithms can be implemented using
matrix representations.
Hidden Markov Model : The Concept
• In Hidden Markov Model, every individual states has limited number of transitions and emissions.
• Probability is assigned for each transition between states.
• Hence, the past states are totally independent of future states.
• The fact that HMM is called hidden because of its ability of being a memory less process i.e. its future and past
states are not dependent on each other.
Since, HMM is rich in mathematical structure it can be implemented for practical applications.
This can be achieved on two algorithms called as:
1. Forward Algorithm.
2. Backward Algorithm.
Applications : Hidden Markov Model
• Speech Recognition.
• Gesture Recognition.
• Language Recognition.
• Motion Sensing and Analysis.
• Protein Folding.
Markov Model
• Markov model is an un-precised model that is used in the systems that does not have any fixed patterns of
occurrence i.e. randomly changing systems.
• Markov model is based upon the fact of having a random probability distribution or pattern that may be analysed
statistically but cannot be predicted precisely.
• In Markov model, it is assumed that the future states only depends upon the current states and not the previously
occurred states.
• There are four common Markov models out of which the most commonly used is the hidden Markov model.
Kalman Filter

A Kalman Filter is an algorithm that takes data inputs from multiple sources and estimates unknown variables,
despite a potentially high level of signal noise. Often used in navigation and control technology, the Kalman Filter
has the advantage of being able to predict unknown values more accurately than if individual predictions are made
using singular methods of measurement.

Optimal in what sense:

If Noise is Gaussian: the Kalman filter minimizes the mean square error of the estimated parameters.
• If Noise is NOT Gaussian: Kalman filter is still the best linear estimator.
Nonlinear estimators may be better.
• Gauss-Markov Theorem – Optimal among all Linear, Unbiased Estimators

P a g e | 9 ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEM DEPARTMENT OF CSE


SRI MANAKULA VINAYAGAR ENGINEERING COLLEGE

• Rao–Blackwell theorem – Optimal among Non-linear Estimators with Gaussian Noise


Why is Kalman Filtering so popular:

• Good results in practice due to optimality and structure.


• Convenient form for online real time processing.
• Easy to formulate and implement given a basic understanding.
• Measurement equations need not be inverted. Why use the word “Filter”
• The process of finding the “best estimate” from noisy data amounts to “filtering out” the noise.
• Kalman filter doesn’t just clean up the data measurements, but also projects them onto the state estimate.
Dempster Shafer Theory
What Dempster Shafer Theory was given by Arthure P.Dempster in 1967 and his student Glenn Shafer in 1976.
This theory was released because of following reason:-
• Bayesian theory is only concerned about single evidences.
• Bayesian probability cannot describe ignorance.
DST is an evidence theory, it combines all possible outcomes of the problem. Hence it is used to solve
problems where there may be a chance that a different evidence will lead to some different result.
The uncertainty in this model is given by:-
1. Consider all possible outcomes.
2. Belief will lead to believe in some possibility by bringing out some evidence.(What is this supposed to mean?)
3. Plausibility will make evidence compatible with possible outcomes.
For eg:-
Let us consider a room where four people are present, A, B, C and D. Suddenly the lights go out and when
the lights come back, B has been stabbed in the back by a knife, leading to his death. No one came into the
room and no one left the room. We know that B has not committed suicide. Now we have to find out who
the murderer is.
To solve these there are the following possibilities:
• Either {A} or {C} or {D} has killed him.
• Either {A, C} or {C, D} or {A, C} have killed him.
• Or the three of them have killed him i.e; {A, C, D}
• None of them have killed him {o} (let’s say).
There will be the possible evidence by which we can find the murderer by measure of plausibility.
Using the above example we can say:
Set of possible conclusion (P): {p1, p2....pn}
where P is set of possible conclusions and cannot be exhaustive, i.e. at least one (p)i must be true.
(p)i must be mutually exclusive.
Power Set will contain 2
n
elements where n is number of elements in the possible set.
For eg:-

P a g e | 10 ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEM DEPARTMENT OF CSE


SRI MANAKULA VINAYAGAR ENGINEERING COLLEGE
If P = { a, b, c}, then Power set is given as
{o, {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, {a, b, c}}= 2
3
elements.
Mass function m(K): It is an interpretation of m({K or B}) i.e; it means there is evidence for {K or B} which
cannot be divided among more specific beliefs for K and B.
Belief in K: The belief in element K of Power Set is the sum of masses of element which are subsets of K.
This can be explained through an example
Lets say K = {a, b, c}
Bel(K) = m(a) + m(b) + m(c) + m(a, b) + m(a, c) + m(b, c) + m(a, b, c)
Plausibility in K: It is the sum of masses of set that intersects with K.
i.e; Pl(K) = m(a) + m(b) + m(c) + m(a, b) + m(b, c) + m(a, c) + m(a, b, c)
Characteristics of Dempster Shafer Theory:
• It will ignorance part such that probability of all events aggregate to 1.(What is this
supposed to mean?)
• Ignorance is reduced in this theory by adding more and more evidences.
• Combination rule is used to combine various types of possibilities.
Advantages:
• As we add more information, uncertainty interval reduces.
• DST has much lower level of ignorance.
• Diagnose hierarchies can be represented using this.
• Person dealing with such problems is free to think about evidences.
Disadvantages:
• In this, computation effort is high, as we have to deal with 2
n of sets.

P a g e | 11 ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEM DEPARTMENT OF CSE

You might also like