0% found this document useful (0 votes)

16 views30 pages

Ai Unit 2

The document outlines the syllabus and key concepts related to Artificial Intelligence and Machine Learning, specifically focusing on probabilistic reasoning. It covers definitions of uncertainty, probabilistic reasoning, Bayesian inference, and related terms such as prior and posterior probability, along with applications of Bayes' theorem in AI. Additionally, it discusses the structure of Bayesian networks and their significance in handling uncertain knowledge in various domains.

Uploaded by

ROHISIVAM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views30 pages

Ai Unit 2

Uploaded by

ROHISIVAM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

II YEAR / IV SEM
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
UNIT II PROBABILISTIC REASONING

SYLLABUS:
Acting under uncertainty – Bayesian inference – Naïve Bayes
models. Probabilistic reasoning – Bayesian networks – exact
inference in BN – approximate inference in BN – causal networks.

PART A
1. Define uncertainty and list the causes of uncertainty.
Uncertainty:
• The knowledge representation, A→B, means if A is true then B is true,
but a situation where not sure about whether A is true or not then cannot
express this statement, this situation is called uncertainty.
• So to represent uncertain knowledge, uncertain reasoning or probabilistic
reasoning is used.
Causes of uncertainty:
1. Causes of uncertainty in the real world
2. Information occurred from unreliable sources.
3. Experimental Errors
4. Equipment fault
5. Temperature variation
6. Climate change.

2. Define Probabilistic reasoning. Mention the need of probabilistic

reasoning in AI
Probabilistic reasoning:
• Probabilistic reasoning is a way of knowledge representation, the
concept of probability is applied to indicate the uncertainty in
knowledge.
Need of probabilistic reasoning in AI:
• When there are unpredictable outcomes.
• When specifications or possibilities of predicates becomes too large to
handle.
• When an unknown error occurs during an experiment.

1
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

3. List the Ways to solve problems with uncertain knowledge.

• Bayes' rule
• Bayesian Statistics

4. Define Probability and the probability of occurrence.

• Probability can be defined as a chance that an uncertain event will
occur.
• The value of probability always remains between 0 and 1 that
represent ideal uncertainties.
o 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
o P(A) = 0, indicates total uncertainty in an event A.
o P(A) =1, indicates total certainty in an event A.
• Formula to find the probability of an uncertain event

5. Define the terms event, sample space, random variables, prior

probability and posterior probability.
• Event: Each possible outcome of a variable is called an event.
• Sample space: The collection of all possible events is called sample
space.
• Random variables: Random variables are used to represent the
events and objects in the real world.
• Prior probability: The prior probability of an event is probability
computed before observing new information.
• Posterior Probability: The probability that is calculated after all
evidence or information has taken into account. It is a combination of
prior probability and new information.

6. Define Conditional probability.

• Conditional probability is a probability of occurring an event when
another event has already happened.
• Let's suppose, to calculate the event A when event B has already
occurred, "the probability of A under the conditions of B", it is:

2
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Where P(A⋀B)= Joint probability of a and B

P(B)= Marginal probability of B.

7. In a class, there are 70% of the students who like English and 40%
of the students who likes English and mathematics, and then what
is the percent of students those who like English also like
mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.

Hence, 57% are the students who like English also like Mathematics

8. Define Bayesian Inference.

▪ Bayesian inference is a probabilistic approach to machine learning that
provides estimates of the probability of specific events.
▪ Bayesian inference is a statistical method for understanding the
uncertainty inherent in prediction problems.
▪ Bayesian inference algorithm can be viewed as a Markov Chain Monte
Carlo algorithm that uses prior probability distributions to optimize the
likelihood function.

9. List Bayes Theorem or Bayes Rule

• Bayes' theorem can be derived using product rule and conditional
probability of event A with known event B:
• Product Rule:
1. P(A ⋀ B)= P(A|B) P(B) or
2. P(A ⋀ B)= P(B|A) P(A)
• Conditional Probability:
• Let A and B are events,
• P(A|B) is the conditional probability of A given B,
• P(B|A) is the conditional probability of B given A.
• Equating right hand side of both the equations will get:

The above equation (a) is called as Bayes' rule or Bayes' theorem.

This equation is basic of most modern AI systems for probabilistic
inference.

3
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• P(A|B) is known as posterior, is the Probability of hypothesis A

when occurred an evidence B.
• P(B|A) is called the likelihood, in which hypothesis is true, then
calculate the probability of evidence.
• P(A) is called the prior probability, probability of hypothesis before
considering the evidence
• P(B) is called marginal probability, pure probability of an
evidence.

10. Suppose we want to perceive the effect of some unknown cause,

and want to compute that cause, then the Bayes' rule becomes:

what is the probability that a patient has diseases meningitis

with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have
a stiff neck, and it occurs 80% of the time. He is also aware of
some more facts, which are given as follows:
o The Known probability that a patient has meningitis disease is
1/30,000.
o The Known probability that a patient has a stiff neck is 2%.
Solution
Let a be the proposition that patient has stiff neck and b be the
proposition that patient has meningitis.
So, calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02

Hence, assume that 1 patient out of 750 patients has meningitis

disease with a stiff neck.

11. Consider two events: A (it will rain tomorrow) and B (the sun will
shine tomorrow).
• Use Bayes’ theorem to compute the posterior probability of each event
occurring, given the resulting weather conditions for today:
P(A|sunny) = P(sunny|A) * P(A) / P(sunny)

4
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

P(B|sunny) = P(sunny|B) * P(B) / P(sunny)

where sunny is our evidence (the resulting weather condition for
today).

12. What are the Application of Bayes' theorem in Artificial

intelligence?
• It is used to calculate the next step of the robot when the already
executed step is given.
• Bayes' theorem is helpful in weather forecasting.

13. Define Bayesian Network.

14. Define Joint probability distribution.

• If variables are x1, x2, x3,....., xn, then the probabilities of a different
combination of x1, x2, x3.. xn, are known as Joint probability
distribution.
• P[x1, x2, x3, ,xn], can be written as the following way in terms of the
joint probability distribution.
= P[x1| x2, x3,....., xn]. p[x2, x3, , xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn] P[xn-1|xn]P[xn].
• In general for each variable Xi,
P(Xi|Xi-1, ............. , X1) = P(Xi |Parents(Xi ))

15. Write an algorithm for Constructing Bayesian Network

5
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

16. Define Global semantics and local semantics.

Global Semantics

Local Semantics

17. List the ways to understand the semantics of Bayesian Network

There are two ways to understand the semantics of the Bayesian
network, which is given below:
1. To understand the network as the representation of the Joint
probability distribution.
It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of
conditional independence statements.
It is helpful in designing inference procedure.

18. What are the Applications of Bayesian networks in AI?

1. Spam filtering
2. Bio monitoring
3. Information retrieval
4. Image processing
5. Gene regulatory network
6. Turbo code
7. Document classification

19. Define Bayesian Inference.

• Bayesian Network is to perform inference, which computes the
marginal probability P(V=v) for each node V and each possible
instantiation v.
• Inference can also be done on a Bayesian network when the values of
some nodes are known (as evidence) and wish to compute the
likelihood of values of other nodes.
• There are two types of inference on Bayesian networks: exact and
approximate.
• Exact inference algorithms compute the exact values of each marginal
or posterior probability, while approximate inference algorithms
sacrifice some accuracy of the probabilities to report results quickly.

6
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

PART B

1. Explain the concept of uncertainty and acting under uncertainty with

suitable example. Explain in detail about probabilistic reasoning.

UNCERTAINITY & PROBABILISTIC REASONING

1.1 Uncertainty:
1.1.1 Causes of uncertainty
1.2 Probabilistic reasoning:
1.2.1 Need of probabilistic reasoning in AI
1.2.2 Ways to solve problems with uncertain
knowledge
1.2.3 Probability
1.2.4 Conditional probability
1.2.4.1 Example

Agents almost never have access to the whole truth about their environment.
Agents must, therefore, act under uncertainty.

Handling uncertain knowledge

In this section, we look more closely at the nature of uncertain knowledge.
We will use a simple diagnosis example to illustrate the concepts involved.
Diagnosis whether for medicine, automobile repair, or whatever-is a task that
almost always involves uncertainty. Let us try to write rules for dental diagnosis
using first-order logic, so that we can see how the logical approach breaks down.
Consider the following rule:

The problem is that this rule is wrong. Not all patients with toothaches have
cavities; some of them have gum disease, an abscess, or one of several other
problems:

Unfortunately, in order to make the rule true, we have to add an almost unlimited
list of possible causes. We could try turning the rule into a causal rule:

7
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

But this rule is not right either; not all cavities cause pain The only way to fix the
rule is to make it logically exhaustive: to augment the left-hand side with all the
qualifications required for a cavity to cause a toothache. Even then, for the
purposes of diagnosis, one must also take into account the possibility that the
patient might have a toothache and a cavity that are unconnected. Trying to use
first-order logic to cope with a domain like medical diagnosis thus fails for three
main reasons:

0 Laziness: It is too much work to list the complete set of antecedents or

consequents needed to ensure an exceptionless rule and too hard to use such
rules.
0 Theoretical ignorance: Medical science has no complete theory for the domain.
0 Practical ignorance: Even if we know all the rules, we might be uncertain about
a particular patient because not all the necessary tests have been or can be run.

The connection between toothaches and cavities is just not a logical

consequence in either direction. This is typical of the medical domain, as well as
most other judgmental domains: law, business, design, automobile repair,
gardening, dating, and so on. The agent's knowledge can at best provide only a
degree of belief in the relevant sentences. Our main tool for dealing with degrees of
belief will be probability theory, which assigns to each sentence a numerical
degree of belief between 0 and 1.

Probability provides a way of summarizing the uncertainty that comes from

our laziness and ignorance. We might not know for sure what afflicts a particular
patient, but we believe that there is, say, an 80% chance-that is, a probability of
0.8-that the patient has a cavity if he or she has a toothache.

That is, we expect that out of all the situations that are indistinguishable
from the current situation as far as the agent's knowledge goes, the patient will
have a cavity in 80% of them. This belief could be derived from statistical data-80%
of the toothache patients seen so far have had cavities-or from some general rules,
or from a combination of evidence sources.

The 80% summarizes those cases in which all the factors needed for a cavity
to cause a toothache are present and other cases in which the patient has both
toothache and cavity but the two are unconnected. The missing 20% summarizes
all the other possible causes of toothache that we are too lazy or ignorant to confirm
or deny.

8
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Design for a decision-theoretic agent

Below algorithm sketches the structure of an agent that uses decision theory
to select actions. The agent is identical, at an abstract level, to the logical agent.
The primary difference is that the decision-theoretic agent's knowledge of the
current state is uncertain; the agent's belief state is a representation of the
probabilities of all possible actual states of the world. As time passes, the agent
accumulates more evidence and its belief state changes. Given the belief state, the
agent can make probabilistic predictions of action outcomes and hence select the
action with highest expected utility.

1.1.1 Causes of uncertainty:

Causes of uncertainty in the real world
1. Information occurred from unreliable sources.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.

1.2 Probabilistic reasoning:

• Probabilistic reasoning is a way of knowledge representation, the
concept of probability is applied to indicate the uncertainty in
knowledge.

1.2.1 Need of probabilistic reasoning in AI:

o When there are unpredictable outcomes.

9
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

o When specifications or possibilities of predicates becomes too

large to handle.
oWhen an unknown error occurs during an experiment.
1.2.2 Ways to solve problems with uncertain knowledge:
o Bayes' rule
o Bayesian Statistics

1.2.3 Probability:
• Probability can be defined as a chance that an uncertain event
will occur.
• The value of probability always remains between 0 and 1 that
represent ideal uncertainties.
o 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
o P(A) = 0, indicates total uncertainty in an event A.
o P(A) =1, indicates total certainty in an event A.
• Formula to find the probability of an uncertain event

P(¬A) = probability of a not happening event.

P(¬A) + P(A) = 1.

o Event: Each possible outcome of a variable is called an event.

o Sample space: The collection of all possible events is called
sample space.
o Random variables: Random variables are used to represent the
events and objects in the real world.
o Prior probability: The prior probability of an event is
probability computed before observing new information.
o Posterior Probability: The probability that is calculated after
all evidence or information has taken into account. It is a
combination of prior probability and new information.

1.2.4 Conditional probability:

• Conditional probability is a probability of occurring an event when
another event has already happened.

10
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• Let's suppose, to calculate the event A when event B has already

occurred, "the probability of A under the conditions of B", it is:

Where P(A⋀B)= Joint probability of a and B

P(B)= Marginal probability of B.

• If the probability of A is given and to find the probability of B, then it

is:

1.2.4.1 Example:
In a class, there are 70% of the students who like English and 40% of
the students who likes English and mathematics, and then what is
the percent of students those who like English also like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.

Hence, 57% are the students who like English also like Mathematics

2. Explain in detail about Bayesian inference and Naive Bayes Model or Naive
Bayes Theorem or Bayes Rule.

Naive Bayes Model or Naive Bayes Theorem or Bayes Rule

2.1 Bayesian Inference
2.2 Bayes Theorem or Bayes Rule
2.3 Example - Applying Bayes' rule:
2.4 Application of Bayes' theorem in Artificial intelligence

2.1 Bayesian Inference

11
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

▪ Bayesian inference algorithm can be viewed as a Markov Chain Monte

Carlo algorithm that uses prior probability distributions to optimize the
likelihood function.
• The basis of Bayesian inference is the notion of apriori and a posteriori
probabilities.
o The priori probability is the probability of an event before any
evidence is considered.
o The posteriori probability is the probability of an event after
taking into account all available evidence.
• For example, if we want to know the probability that it will rain
tomorrow, our priori probability would be based on our knowledge of
the weather patterns in our area.

2.2 Bayes Theorem or Bayes Rule

• Bayes' theorem can be derived using product rule and conditional
probability of event A with known event B:
• Product Rule:
3. P(A ⋀ B)= P(A|B) P(B) or
4. P(A ⋀ B)= P(B|A) P(A)
•
•
•
•
•

The above equation (a) is called as Bayes' rule or Bayes' theorem.

This equation is basic of most modern AI systems for probabilistic
inference.
• P(A|B) is known as posterior, is the Probability of hypothesis A
when occurred an evidence B.
• P(B|A) is called the likelihood, in which hypothesis is true, then
calculate the probability of evidence.
• P(A) is called the prior probability, probability of hypothesis before
considering the evidence
• P(B) is called marginal probability, pure probability of an
evidence.
• In general,
P (B) = P(A)*P(B|Ai),

12
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• Hence the Bayes' rule can be written as:

Where A1, A2, A3,........, An is a set of mutually exclusive and

exhaustive events.

2.3 Example 1 - Applying Bayes' rule:

Suppose we want to perceive the effect of some unknown
cause, and want to compute that cause, then the Bayes' rule
becomes:

what is the probability that a patient has diseases meningitis

with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to
have a stiff neck, and it occurs 80% of the time. He is also
aware of some more facts, which are given as follows:
The Known probability that a patient has meningitis disease
is 1/30,000.
The Known probability that a patient has a stiff neck is 2%.
Solution
Let a be the proposition that patient has stiff neck and b be the
proposition that patient has meningitis.
So, calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02

Hence, assume that 1 patient out of 750 patients has meningitis

disease with a stiff neck.

13
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Example 2 - Applying Bayes' rule:

• Consider two events: A (it will rain tomorrow) and B (the sun will shine
tomorrow).
• Use Bayes’ theorem to compute the posterior probability of each event
occurring, given the resulting weather conditions for today:
P(A|sunny) = P(sunny|A) * P(A) / P(sunny)
P(B|sunny) = P(sunny|B) * P(B) / P(sunny)
where sunny is our evidence (the resulting weather condition for
today).
• From these equations,
o if event A is more likely to result in sunny weather than event B,
then the posterior probability of A occurring, given that the
resulting weather condition for today is sunny, will be higher than
the posterior probability of B occurring.
o Conversely, if event B is more likely to result in sunny weather than
event A, then the posterior probability of B occurring, given that the
resulting weather condition for today is sunny, will be higher than
the posterior probability of A occurring.

2.4 Application of Bayes' theorem in Artificial intelligence:

• It is used to calculate the next step of the robot when the already
executed step is given.
• Bayes' theorem is helpful in weather forecasting.

Naive Bayes Theorem

The dentistry example illustrates a commonly occurring pattern in which a single
cause directly influences a number of effects, all of which are conditionally
independent, given the cause. The full joint distribution can be written as

Such a probability distribution is called a naive Bayes model—“naive” because it is

often used (as a simplifying assumption) in cases where the “effect” variables are not
strictly independent given the cause variable. (The naive Bayes model is sometimes
called a Bayesian classifier, a somewhat careless usage that has prompted true
Bayesians to call it the idiot Bayes model.) In practice, naive Bayes systems often
work very well, even when the conditional independence assumption is not strictly
true

14
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

3. Explain in detail about Bayesian Network

3.1 Bayesian Network

• "A Bayesian network is a probabilistic graphical model which represents a
set of variables and their conditional dependencies using a directed
acyclic graph."
• It is also called a Bayes network, belief network, decision network,
or Bayesian model.
• Bayesian Network can be used for building models from data and experts
opinions,and it consists of two parts:
o Directed Acyclic Graph
o Table of conditional probabilities
• The generalized form of Bayesian network that represents and solve
decisionproblems under uncertain knowledge is known as an Influence
diagram.
• It is used to represent conditional dependencies.
• It can also be used in various tasks including prediction, anomaly
detection, diagnostics, automated insight, reasoning, time series
prediction, and decision making under uncertainty.
• A Bayesian network graph is made up of nodes and Arcs (directed links).

Figure 2.1 – Example for Bayesian Network

• Each node corresponds to the random variables, and a variable can be
continuous or discrete.

15
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• Arc or directed arrows represent the causal relationship or conditional

probabilities between random variables.
• These directed links or arrows connect the pair of nodes in the graph.
• These links represent that one node directly influence the other node, and
if there is no directed link that means that nodes are independent with
each other.
Example
In the figure 2.1, A, B, C, and D are random variables represented by
the nodes of the network graph.
• Considering node B, which is connected with node A by a directed
arrow, then node A is called the parent of Node B.
• Node C is independent of node A.
• The Bayesian network graph does not contain any cyclic graph. Hence,
it is known as a directed acyclic graph or DAG.
• The Bayesian network has mainly two components:
1. Causal Component
2. Actual numbers

• Each node in the Bayesian network has condition probability

distribution P(Xi |Parent(Xi) ), which determines the effect of the
parent on that node.
• Bayesian network is based on Joint probability distribution and
conditional probability.

3.2 Joint probability distribution:

3.3 Constructing Bayesian Network

16
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Global Semantics

Local Semantics

Markov Blanket
▪ Each node is conditionally independent of all others given its
Markov blanket: parents + children + children’s parents

3.4 Example:
Harry installed a new burglar alarm at his home to detect burglary.
The alarm reliably responds at detecting a burglary but also responds
for minor earthquakes. Harry has two neighbors David and Sophia,
who have taken a responsibility to inform Harry at work when they
hear the alarm. David always calls Harry when he hears the alarm,
but sometimes he got confused with the phone ringing and calls at
that time too. On the other hand, Sophia likes to listen to high music,
so sometimes she misses to hear the alarm. Here we would like to
compute the probability of Burglary Alarm.

Problem:
Calculate the probability that alarm has sounded, but there is
neither a burglary, nor an earthquake occurred, and David and
Sophia both called the Harry.

17
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Solution:
• The Bayesian network for the above problem is given in figure 2.2.
The network structure is showing that burglary and earthquake is
the parent node of the alarm and directly affecting the probability
of alarm's going off, but David and Sophia's calls depend on alarm
probability.

•
•
•
•

Figure 2.2 - The Bayesian network for the example problem

18
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

All events occurring in this network:

o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)

Write the events of problem statement in the form of probability:

P[D, S, A, B, E],

Rewrite the probability statement using joint probability distribution:

Let's take the observed probability for the Burglary and earthquake
component:
• P(B=True) = 0.002, which is the probability of burglary.
• P(B=False)= 0.998, which is the probability of no burglary.
• P(E=True)= 0.001, which is the probability of a minor earthquake
• P(E=False)= 0.999, Which is the probability that an earthquake not
occurred.

Conditional probability table for Alarm A:

The Conditional probability of Alarm A depends on Burglar and
earthquake:

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False True 0.31 0.69

False False 0.001 0.999

Conditional probability table for David Calls:

The Conditional probability of David that he will call depends on the
probability of Alarm.

19
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95

Conditional probability table for Sophia Calls:

The Conditional probability of Sophia that she calls is depending on
its Parent Node "Alarm."

A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98

From the formula of joint distribution, the problem statement in the form of
probability distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain
by using Joint distribution.

3.5 The semantics of Bayesian Network:

There are two ways to understand the semantics of the Bayesian network,
which is given below:
1. To understand the network as the representation of the Joint
probability distribution.
It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of
conditional independence statements.
It is helpful in designing inference procedure.

3.6 Applications of Bayesian networks in AI

Bayesian networks find applications in a variety of tasks such as:
1. Spam filtering:
a. A spam filter is a program that helps in detecting unsolicited and
spam mails. Bayesian spam filters check whether a mail is spam
or not.

20
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

2. Biomonitoring:
a. This involves the use of indicators to quantify the concentration of
chemicals in the human body.
3. Information retrieval:
a. Bayesian networks assist in information retrieval for research,
which is a constant process of extracting information from
databases.
4. Image processing:
a. A form of signal processing, image processing uses mathematical
operations to convert images into digital format.
5. Gene regulatory network:
a. A Bayesian network is an algorithm that can be applied to gene
regulatory networks in order to make predictions about the effects
of genetic variations on cellular phenotypes.
b. Gene regulatory networks are a set of mathematical equations that
describe the interactions between genes, proteins, and metabolites.
c. They are used to study how genetic variations affect the
development of a cell or organism.
6. Turbo code:
a. Turbo codes are a type of error correction code capable of achieving
very high data rates and long distances between error correcting
nodes in a communications system.
b. They have been used in satellites, space probes, deep-space
missions, military communications systems, and civilian wireless
communication systems, including WiFi and 4G LTE cellular
telephone systems.
7. Document classification:
a. The main issue is to assign a document multiple classes. The task
can be achieved manually and algorithmically. Since manual effort
takes too much time, algorithmic documentation is done to
complete it quickly and effectively.

4. Explain in detail about Bayesian Inference and its type Exact Inference
with suitable example.

Exact inference in Bayesian networks

The basic task for any probabilistic inference system is to compute the
posterior probability distribution for a set of query variables, given some observed
event-that is, some assignment of values to a set of evidence variables. We will use

21
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

the notation X denotes the query variable; E denotes the set of evidence variables
El, . . . , Em, and e is a particular observed event; Y will denote the nonevidence
variables Yl, . . . , (sometimes called the hidden variables). Thus, the complete set
of variables X = {X} U E U Y. A typical query asks for the posterior probability
distribution P(X|e)4

In the burglary network, we might observe the event in which JohnCalls =

true and MaryCalls = true. We could then ask for, say, the probability that a
burglary has occurred:

Inference by enumeration

Conditional probability can be computed by summing terms from the full joint
distribution. More specifically, a query P(X|e) can be answered using Equation,
which we repeat here for convenience:

Now, a Bayesian network gives a complete representation of the full joint

distribution. More specifically, Equation shows that the terms P(x, e, y) in the joint
distribution can be written as products of conditional probabilities from the
network. Therefore, a query can be answered using a Bayesian network by
computing sums of products of conditional probabibities from the network. In Figure
an algorithm, ENUMERATE-JOINT-ASK, was given for inference by enumeration
from the full joint distribution. The algorithm takes as input a full joint distribution
P and looks up values therein. It is a simple matter to modify the algorithm so that
it takes as input a Bayesian network bn and "looks up" joint entries by multiplying
the corresponding CPT entries from bn.

Consider the query P(Burglary1 JohnCalls = true, &Jury Calls = true). The
hidden variables for this query are Earthquake and Alarm. From Equation (13.6),
using initial letters for the variables in order to shorten the expressions, we have

The semantics of Bayesian networks then gives us an expression in terms of

CPT entries. For simplicity, we will do this just for Burglizry = true:

22
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

To compute this expression, we have to add four terms, each computed by

multiplying five numbers. In the worst case, where we have to sum out almost all
the variables, the complexity of the algorithm for a network with n Boolean
variables is O(n2n). An improvement can be obtained from the following simple
observations: the P(b) term is a constant and can be moved outside the
summaltions over a and e, and the 13(e) term can be moved outside the summation
over a. Hence, we have

This expression can be evaluated by looping through the variables in order,

multiplying CPT entries as we go. For each summation, we also need to loop over
the variable's possible values. The structure of this computation is shown in Figure.
Using the numbers from Figure, we obtain P(b| j , m) = a x 0.00059224. 'The
correspondingc omputation for ~b yields a x 0.0014919; hence

This expression can be evaluated by looping through the variables in order,

multiplying CPT entries as we go. For each summation, we also need to loop over
the variable's possible values.

The structure of this computation is shown in above Figure. Using the numbers
from Figure, we obtain P(b| j , m) = a x 0.00059224. 'The co~respondingc
omputation for ~b yields a x 0.0014919; hence

23
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

That is, the chance of a burglary, given calls from both neighbors, is about 28%.
The evaluation process for the expression in Equation is shown as an expression
tree in Figure.

The variable elimination algorithm

The enumeration algorithm can be improved substantially by eliminating
repeated calculations of the kind illustrated in Figure. The idea is simple: do the
calculation once and save the results for later use. This is a form of dynamic
programming. There are several versions of this approach; we present the variable
elimination algorithm, which is the simplest. Variable elimination works by
evaluating expressions such as Equation in right-to-left order (that is, bottom-up in
Figure). Intermediate results are stored, and summations over each variable are
done only for those portions of the expression that depend on the variable. Let us
illustrate this process for the burglary network. We evaluate the expression

The complexity of exact inference

We have argued that variable elimination is more efficient than enumeration

because it avoids repeated computations (as well as dropping irrelevant variables).
The time and space requirements of variable elimination are dominated by the size
of the largest factor constructed during the operation of the algorithm. This in turn
is determined by the order of elimination of variables and by the structure of the
network.

The burglary network of Figure belongs to the family of networks in which

there is at most one undirected path between any two nodes in the network. These
are called singly connected networks or polytrees, and they have a particularly
nice property: The time and space complexity of exact inference in polytrees is linear
in the size of the network. I-Iere, the size is defined as the number of CPT entries; if

24
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

the number of parents of each node is bounded by a constant, then the complexity
will also be linear in the number of nodes. These results hold for any ordering
consistent with the topological ordering of the network .

For multiply connected networks, such as that of Figure, variable

elimination can have exponential time and space complexity in the worst case, even
when the number of parents per node is bounded. This is not surprising when one
considers that, because it includes inference in propositional logic as a special case,
inference in Bayesian networks is 1W-hard. In fact, it can be shown that the
problem is as hard as that of computing the number of satisfying assignments for a
propositional logic formula. This means that it is #P-hard ("number-P hard")-that is,
strictly harder than NP-complete problems.

There is a close connection between the complexity of Bayesian network

inference and the complexity of constraint satisfaction problems (CSPs), the
difficulty of solving a discrete CSP is related to how "tree-lilce" its constraint graph
is Measures such as hypertree width, which bound the complexity of solving a
CSP, can also be applied directly to Bayesian networks. Moreover, the variable
elimination algorithm can be generalized to solve CSPs as well as Bayesian
networks.

5. Explain Causal Network or Causal Bayesian Network in Machine

5.1 Causal Network or Causal Bayesian Network
• A causal network is an acyclic digraph arising from an evolution of
a substitution system, and representing its history.
• In an evolution of a multiway system, each substitution event is a vertex
in a causal network.
• Two events which are related by causal dependence, meaning one occurs
just before the other, have an edge between the corresponding vertices in
the causal network.
• More precisely, the edge is a directed edge leading from the past event to
the future event.
• Refer Figure 2.3 for an example causal network.
• A CBN is a graph formed by nodes representing random variables,
connected by links denoting causal influence.

25
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Figure 2.3 – Causal Network Example

• Some causal networks are independent of the choice of evolution, and

these are called causally invariant.

• Structural Causal Models (SCMs).

• SCMs consist of two parts: a graph, which visualizes causal
connections, and equations, which express the details of the
connections. a graph is a mathematical construction that consists
of vertices (nodes) and edges (links).
• SCMs use a special kind of graph, called a Directed Acyclic Graph
(DAG), for which all edges are directed and no cycles exist.
• DAGs are a common starting place for causal inference.
• Bayesian and causal networks are completely identical. However, the
difference lies in their interpretations.
Fire -> Smoke

• A network with 2 nodes (fire icon and smoke icon) and 1 edge (arrow
pointing from fire to smoke).
• This network can be both a Bayesian or causal network.
• The key distinction, however, is when interpreting this network.
• For a Bayesian network, we view the nodes as variables and the arrow
as a conditional probability, namely the probability of smoke given
information about fire.
• When interpreting this as a causal network, we still view nodes as
variables, however, the arrow indicates a causal connection.

26
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• In this case, both interpretations are valid. However, if we were to flip the
edge direction, the causal network interpretation would be invalid, since
smoke does not cause fire.

Implementing Causal Inference

1.The do-operator
• The do-operator is a mathematical representation of a physical
intervention.
• If the model starts with Z → X → Y, simulate an intervention in X by
deleting all the incoming arrows to X, and manually setting X to some
value x_0. Refer Figure 2.4 denotes the example of do-operator.

Figure 2.4 – do-operator Example

P(Y|X) is the conditional probability that is, the probability of Y given

an observation of X. While, P(Y|do(X)) is the probability of Y given
an intervention in X.

27
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

2: Confounding
A simple example of confounding is shown in the figure 2.5 below.

Figure 2.5 – Confounding Example

• In this example, age is a confounder of education and wealth. In

other words, if trying to evaluate the impact of education on wealth
one would need to adjust for age.
• Adjusting for (or conditioning on) age just means that when
looking at age, education, and wealth data, one would compare
data points within age groups, not between age groups.
• Confounding is anything that leads to P(Y|X) being different than
P(Y|do(X)).

6. Explain approximate inference in Bayesian network (BN)

Given the intractability of exact inference in large networks, we will now

consider approximate inference methods. This section describes randomized
sampling algorithms, also called Monte Carlo algorithms, that provide approximate
answers whose accuracy depends on the number of samples generated.

They work by generating random events based on the probabilities in the

Bayes net and counting up the different answers found in those random events.
With enough samples, we can get arbitrarily close to recovering the true probability
distribution—provided the Bayes net has no deterministic conditional distributions

28
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Direct sampling methods

The primitive element in any sampling algorithm is the generation of samples
from a known probability distribution. For example, an unbiased coin can be
thought of as a random variable Coin with values (heads, tails) and a prior
distribution P(Coin) = (0.5,0.5). Sampling from this distribution is exactly like
flipping the coin: with probability 0.5 it will return heads, and with probability 0.5
it will return tails.
Given a source of random numbers r uniformly distributed in the range [0,1],
it is a simple matter to sample any distribution on a single variable, whether
discrete or continuous. This is done by constructing the cumulative distribution for
the variable and returning the first value whose cumulative probability exceeds r

We begin with a random sampling process for a Bayes net that has no evidence
associated with it. The idea is to sample each variable in turn, in topological order.
The probability distribution from which the value is sampled is conditioned on the
values already assigned to the variable’s parents. (Because we sample in topological
order, the parents are guaranteed to have values already.) This algorithm is shown
in Figure. Applying it to the network with the ordering Cloudy, Sprinkler, Rain,
WetGrass, we might produce a random event as follows:

Rejection sampling in Bayesian networks

Rejection sampling is a general method for producing samples from a hard-
to-sample distribution given an easy-to-sample distribution. In its simplest form, it
can be used to compute conditional probabilities that is, to determine P(X |e). The
REJECTION-SAMPLING algorithm is shown in Figure. First, it generates samples
from the prior distribution specified by the network. Then, it rejects all those that
do not match the evidence. Finally, the estimateˆP (X =x|e) is obtained by counting
how often X =x occurs in the remaining samples.

29
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Let ˆP(X |e) be the estimated distribution that the algorithm returns; this
distribution is computed by normalizing NPS(X,e), the vector of sample counts for
each value of X where the sample agrees with the evidence e:

Inference by Markov chain simulation

In this section, we describe the Markov chain Monte Carlo (MCMC)
algorithm for inference in Bayesian networks. We will first describe what the
algorithm does, then we will explain why it works and why it has such a
complicated name.

The MCMC algorithm

MCMC generates each event by making a random change to the preceding

event. It is therefore helpful to think of the network as being in a particular current
state specifying a value for every variable. The next state is generated by randomly
sampling a value for one of the nonevidence variables Xi,conditioned on the current
values of the variables in the Markov blanket of Xi. MCMC therefore wanders
randomly around the state space-the space of possible complete assignments-
flipping one variable at a time, but keeping the evidence variables fixed.

Consider the query P(Rain1 Sprinkler = true, Wet Grass = true) applied to the
network in Figure. The evidence variables Sprinkler and WetGrass are fixed to their
observed values and the hidden variables Cloudy and Rain are initialized randomly-
let us say to true and false respectively. Thus, the initial state is [true, true, false,
true]. Now the following steps are executed repeatedly:

Cloudy is sampled, given the current values of its Markov blanket variables:
in this case, we sample from P(Cloudy1 Sprinkler = true, Rain =false). Suppose the result is Cloudy
=false. Then the new current state is [false, true, false, true].

1. Rain is sampled, given the current values of its Markov blanket variables: in this
case, we sample from P(Rain1 Cloudy =false, Sprinkler = true, WetGrass = true).
Suppose this yields Rain = true. The new current state is [false, true, true, true].
Each state visited during this process is a sample that contributes to the estimate
for the query variable Rain. If the process visits 20 states where Rain is true and 60
states where Rain is false, then the answer to the query is NORMALIZE

TSM Chemistry Teacher Support Material en 7be5ff0b 7505 44ac 9380 585f5b07a2e0
No ratings yet
TSM Chemistry Teacher Support Material en 7be5ff0b 7505 44ac 9380 585f5b07a2e0
121 pages
Cs3491 Aiml Assignment 2
No ratings yet
Cs3491 Aiml Assignment 2
3 pages
2024-2025 (3)
No ratings yet
2024-2025 (3)
7 pages
21. RS_QUIZ (1)
No ratings yet
21. RS_QUIZ (1)
9 pages
III CSE Lab Manual-ooad
No ratings yet
III CSE Lab Manual-ooad
54 pages
Catálogo de Electroválvulas SMC
100% (1)
Catálogo de Electroválvulas SMC
0 pages
Apptitude + HR Qa
No ratings yet
Apptitude + HR Qa
252 pages
Intersection Geometric Design-R1
No ratings yet
Intersection Geometric Design-R1
56 pages
23 July 2024 - Comprehensive Review of Depression Detection Techniques Based On Machine Learning Approach
No ratings yet
23 July 2024 - Comprehensive Review of Depression Detection Techniques Based On Machine Learning Approach
25 pages
Ai Unit 3
No ratings yet
Ai Unit 3
30 pages
Machinery Report
No ratings yet
Machinery Report
13 pages
Learning Curve
No ratings yet
Learning Curve
4 pages
C Programming and Data Structures - CS3353 - Important Questions With Answer - Unit 2 - C Programming Advanced Features
0% (1)
C Programming and Data Structures - CS3353 - Important Questions With Answer - Unit 2 - C Programming Advanced Features
6 pages
Advance Java Notes
No ratings yet
Advance Java Notes
138 pages
Algorithms Unit 1
No ratings yet
Algorithms Unit 1
52 pages
III - CSE - CS3391 - OOP - QB - Unit 3
No ratings yet
III - CSE - CS3391 - OOP - QB - Unit 3
6 pages
Ai Unit 5
No ratings yet
Ai Unit 5
33 pages
Markov Models Supervised and Unsupervised Machine Learning: Mastering Data Science And Python
From Everand
Markov Models Supervised and Unsupervised Machine Learning: Mastering Data Science And Python
William Sullivan
2/5 (1)
Basic Calculus
No ratings yet
Basic Calculus
12 pages
IS 802 (Part 1 Sec 2) 2016 Use of Structural Steel in Overhead Transmission Line Towers - Part 1 Section 2 Design Strengths
No ratings yet
IS 802 (Part 1 Sec 2) 2016 Use of Structural Steel in Overhead Transmission Line Towers - Part 1 Section 2 Design Strengths
19 pages
Aiml Unit 2 Notes
No ratings yet
Aiml Unit 2 Notes
40 pages
Quat 6221 WB
No ratings yet
Quat 6221 WB
148 pages
AI UNIT-5 Notes AI UNIT-5 Notes: Scan To Open On Studocu Scan To Open On Studocu
No ratings yet
AI UNIT-5 Notes AI UNIT-5 Notes: Scan To Open On Studocu Scan To Open On Studocu
26 pages
Uncertainity and Knowledge Engineering
No ratings yet
Uncertainity and Knowledge Engineering
24 pages
90 Integrals
No ratings yet
90 Integrals
2 pages
Unit 4 - Acting Logically
No ratings yet
Unit 4 - Acting Logically
33 pages
Applications of Nanotechnology in Agriculture
No ratings yet
Applications of Nanotechnology in Agriculture
417 pages
Paper - On-Site Investigation Techniques For The Structural Evaluation of Historic Masonry Buildings
No ratings yet
Paper - On-Site Investigation Techniques For The Structural Evaluation of Historic Masonry Buildings
8 pages
Unit TST 9
No ratings yet
Unit TST 9
3 pages
Prof Ed Sample Questions Set 1
No ratings yet
Prof Ed Sample Questions Set 1
10 pages
Iwc Dump
No ratings yet
Iwc Dump
147 pages
Unit-Iii Knowledge & Reasoning 7 Hours: Statistical Reasoning: Probability and Bays' Theorem, Certainty Factors
No ratings yet
Unit-Iii Knowledge & Reasoning 7 Hours: Statistical Reasoning: Probability and Bays' Theorem, Certainty Factors
45 pages
AIML Module 3,4
No ratings yet
AIML Module 3,4
16 pages
Ai Unit 2
No ratings yet
Ai Unit 2
33 pages
Ai Unit 2-1
No ratings yet
Ai Unit 2-1
33 pages
1.luzhong Machine Catalog New
No ratings yet
1.luzhong Machine Catalog New
57 pages
C Programming and Data Structures - CS3353 - Important Questions With Answer - Unit 1 - C Programming Fundamentals
No ratings yet
C Programming and Data Structures - CS3353 - Important Questions With Answer - Unit 1 - C Programming Fundamentals
11 pages
14 Loci and Transformations
No ratings yet
14 Loci and Transformations
83 pages
Unit-Iii Knowledge & Reasoning
No ratings yet
Unit-Iii Knowledge & Reasoning
35 pages
Acting Under Uncertainty - Bayesian Inference-Probabilistic Reasoning
No ratings yet
Acting Under Uncertainty - Bayesian Inference-Probabilistic Reasoning
22 pages
AI&MLUnit 2
No ratings yet
AI&MLUnit 2
26 pages
Al3391-Unit 5
No ratings yet
Al3391-Unit 5
23 pages
Pharmacy Proposal
25% (4)
Pharmacy Proposal
20 pages
7 Statistical Reasoning
No ratings yet
7 Statistical Reasoning
21 pages
CS3491 Unit 2
No ratings yet
CS3491 Unit 2
22 pages
Unit 4 Artificial Intelligence
No ratings yet
Unit 4 Artificial Intelligence
12 pages
Aies Unit 3
No ratings yet
Aies Unit 3
11 pages
Unit 2 - Probabilistic Reasoning
No ratings yet
Unit 2 - Probabilistic Reasoning
25 pages
AI Unit-3
No ratings yet
AI Unit-3
51 pages
Lecture2.2.2 Unit2 - AI (Autosaved)
No ratings yet
Lecture2.2.2 Unit2 - AI (Autosaved)
17 pages
Unit2 AI & ML
No ratings yet
Unit2 AI & ML
29 pages
Aiml CS3491 Cse
No ratings yet
Aiml CS3491 Cse
91 pages
Aiml Unit 2
No ratings yet
Aiml Unit 2
22 pages
Unit 2new
No ratings yet
Unit 2new
26 pages
7SR10 Argus Complete Technical Manual
0% (1)
7SR10 Argus Complete Technical Manual
205 pages
Types of Solutions
No ratings yet
Types of Solutions
7 pages
Ai (It) Unit-3
No ratings yet
Ai (It) Unit-3
85 pages
Chapter 1 Uncertainty
No ratings yet
Chapter 1 Uncertainty
32 pages
Interface Knowledge
No ratings yet
Interface Knowledge
4 pages
AI Models
No ratings yet
AI Models
10 pages
Aiml Unit 2
No ratings yet
Aiml Unit 2
34 pages
Selective Determination of Fe (III) in Fe (II) Samples by UV-spectrophotometry With The Aid of Quercetin and Morin
No ratings yet
Selective Determination of Fe (III) in Fe (II) Samples by UV-spectrophotometry With The Aid of Quercetin and Morin
8 pages
CH 6
No ratings yet
CH 6
19 pages
Unit-1 To 5
No ratings yet
Unit-1 To 5
183 pages
Unit 3
No ratings yet
Unit 3
17 pages
Module V - v1
No ratings yet
Module V - v1
58 pages
UNIT 5 AI Notes
No ratings yet
UNIT 5 AI Notes
26 pages
Spesifikasi Barang Listrik
No ratings yet
Spesifikasi Barang Listrik
2 pages
CS3491 Unit 2 Aiml
100% (1)
CS3491 Unit 2 Aiml
21 pages
Ai Cat 2
No ratings yet
Ai Cat 2
21 pages
Cs3491 Aiml Unit 2 Qbank
No ratings yet
Cs3491 Aiml Unit 2 Qbank
33 pages
Consolidation of Clay
No ratings yet
Consolidation of Clay
17 pages
Introduction To Uncertainity
No ratings yet
Introduction To Uncertainity
66 pages
Ai Unit 4 Unit 4
No ratings yet
Ai Unit 4 Unit 4
12 pages
Unit 5
No ratings yet
Unit 5
98 pages
Comparison of Sound Insulation of Windows With Double Glass Units
No ratings yet
Comparison of Sound Insulation of Windows With Double Glass Units
5 pages
Statistical Reasoning
No ratings yet
Statistical Reasoning
19 pages
4 Unce
No ratings yet
4 Unce
32 pages
Ai2 Unit
No ratings yet
Ai2 Unit
22 pages
17 GEOG245 Tutorial9 PDF
No ratings yet
17 GEOG245 Tutorial9 PDF
7 pages
Baes Rule
No ratings yet
Baes Rule
8 pages
FAI Module 3
No ratings yet
FAI Module 3
19 pages
(R17A1204) Artificial Intelligence (6) - 119-143
No ratings yet
(R17A1204) Artificial Intelligence (6) - 119-143
25 pages
Ai Cat 2
No ratings yet
Ai Cat 2
20 pages
Elements of Railway Tracks
No ratings yet
Elements of Railway Tracks
27 pages
Unit 5 1
No ratings yet
Unit 5 1
18 pages
Ai Pro
No ratings yet
Ai Pro
11 pages
CS3351 AIML UNIT 2 Notes
No ratings yet
CS3351 AIML UNIT 2 Notes
27 pages
Probabilistic Reasoning: Unit-V
No ratings yet
Probabilistic Reasoning: Unit-V
33 pages
Unit 6
No ratings yet
Unit 6
126 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
9 pages
Modifications For The Kenwood TS-940
No ratings yet
Modifications For The Kenwood TS-940
10 pages
5 - CSE3013 - Uncertainity and Knowledge Engineering
No ratings yet
5 - CSE3013 - Uncertainity and Knowledge Engineering
24 pages
Cs3351 Aiml Unit 2 Notes Eduengg
No ratings yet
Cs3351 Aiml Unit 2 Notes Eduengg
26 pages
AI - Module 4
No ratings yet
AI - Module 4
57 pages
5 - Uncertainty and Knowledge Reasoning
No ratings yet
5 - Uncertainty and Knowledge Reasoning
33 pages
Ai Notes
No ratings yet
Ai Notes
68 pages
AL3391 AI UNIT 5 NOTES EduEngg
100% (1)
AL3391 AI UNIT 5 NOTES EduEngg
26 pages
Statistical Reasoning Cha 8
94% (16)
Statistical Reasoning Cha 8
21 pages
Artificial Intelligence M2
No ratings yet
Artificial Intelligence M2
12 pages