0% found this document useful (0 votes)
65 views

Unit 2

The document provides an overview of probabilistic reasoning topics including acting under uncertainty, Bayesian inference, and Naive Bayes models. It discusses key concepts such as representing knowledge probabilistically to handle uncertainty, Bayesian networks for graphical probabilistic modeling, and Bayes' theorem for updating probabilities based on new evidence. The document outlines topics like representing degrees of belief with probabilities between 0 and 1, handling uncertain knowledge through statistical data or rules, and using conditional probabilities to reason about events.

Uploaded by

Abhirami K
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Unit 2

The document provides an overview of probabilistic reasoning topics including acting under uncertainty, Bayesian inference, and Naive Bayes models. It discusses key concepts such as representing knowledge probabilistically to handle uncertainty, Bayesian networks for graphical probabilistic modeling, and Bayes' theorem for updating probabilities based on new evidence. The document outlines topics like representing degrees of belief with probabilities between 0 and 1, handling uncertain knowledge through statistical data or rules, and using conditional probabilities to reason about events.

Uploaded by

Abhirami K
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

KCE-CSE –AI&ML 2023

UNIT- II

PROBABILISTIC REASONING

S.NO TOPICS PAGE NO. STATUS OF


COVERAGE

8 Acting under uncertainty 2-5

9 Bayesian inference, Naïve 5-11


Bayes models.

10 Probabilistic reasoning, 11-18


Bayesian networks, exact
inference in BN

11 Approximate inference in BN 18-25

12 Causal networks. 25-29

UNIT-I II- Page 1 of 29


KCE-CSE –AI&ML 2023

TOPIC 8 : ACTING UNDER UNCERTAINTY

Agents in the real world need to handle uncertainty, whether due to partial
observability, nondeterminism, or adversaries. An agent may never know for sure what
state it is in now or where it will end up after a sequence of actions.

• In practice, programs have to act under uncertainty:

– using a simple but incorrect theory of the world, which does not take into
account uncertainty and will work most of the time

– handling uncertain knowledge and utility (tradeoff between accuracy and


usefulness) in a rational way

• The right thing to do (the rational decision) depends on:

– the relative importance of various goals

– the likelihood that, and degree to which, they will be


achieved

Handling Uncertain Knowledge

• Example of rule for dental diagnosis using first-order logic:

∀p Symptom(p, Toothache) ⇒ Disease(p, Cavity)

• This rule is wrong and in order to make it true we have to add an almost
unlimited list of possible causes:

∀p Symptom(p, Toothache) ⇒ Disease(p, Cavity) ⇒ Disease(p, GumDisease) ⇒


Disease(p, Abscess)…

• Trying to use first-order logic to cope with a domain like medical diagnosis fails
for three main reasons:

• Laziness. It is too much work to list the complete set of


antecedents or consequents needed to ensure an exceptionless
rule and too hard to use such rules.

• Theoretical ignorance. Medical science has no complete theory for


the domain.

• Practical ignorance. Even if we know all the rules, we might be


uncertain about a particular patient because not all the necessary
tests have been or can be run.

• Actually, the connection between toothaches and cavities is just not a logical
consequence in any direction.

UNIT-I II- Page 2 of 29


KCE-CSE –AI&ML 2023

• In judgmental domains (medical, law, design...) the agent’s knowledge can at best
provide a degree of belief in the relevant sentences.

• The main tool for dealing with degrees of belief is probability theory, which
assigns to each sentence a numerical degree of belief between 0 and 1.

• The belief could be derived from:

• statistical data

• 80% of the toothache patients have had cavities

• some general rules

• some combination of evidence sources

• Assigning a probability of 0 to a given sentence corresponds to an unequivocal


belief that the sentence is false.

Assigning a probability of 1 corresponds to an unequivocal belief that the


sentence is true

• A degree of belief is different from a degree of truth.

• A probability of 0.8 does not mean “80% true”, but rather an 80% degree of
belief that something is true.

• In logic, a sentence such as “The patient has a cavity” is true or false.

• In probability theory, a sentence such as “The probability that the patient has a
cavity is 0.8” is about the agent’s belief, not directly about the world.

• These beliefs depend on the percepts that the agent has received to date.

• These percepts constitute the evidence on which probability assertions are


based

• For example:

• An agent draws a card from a shuffled pack.

• Before looking at the card, the agent might assign a probability of 1/52 to
its being the ace of spades.

• After looking at the card, an appropriate probability for the same


proposition would be 0 or 1.

Following are some leading causes of uncertainty to occur in the real world.

• Information occurred from unreliable sources.

UNIT-I II- Page 3 of 29


KCE-CSE –AI&ML 2023

• Experimental Errors

• Equipment fault

• Temperature variation

• Climate change.

Probabilistic Reasoning

Probabilistic reasoning is a way of knowledge representation where we apply the


concept of probability to indicate the uncertainty in knowledge.

In probabilistic reasoning, we combine probability theory with logic to handle the


uncertainty

Need of probabilistic reasoning in AI:

• When there are unpredictable outcomes.

• When specifications or possibilities of predicates becomes too large to handle.

• When an unknown error occurs during an experiment.

In probabilistic reasoning, there are two ways to solve problems with uncertain
knowledge:

• Bayes' rule

• Bayesian Statistics

Probability

0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.

P(A) = 0, indicates total uncertainty in an event A.

P(A) =1, indicates total certainty in an event A.

Probabilistic Reasoning Terminologies

• Sample space: The collection of all possible events is called sample space.

• Random variables: Random variables are used to represent the events and
objects in the real world.

• Prior probability: The prior probability of an event is probability computed


before observing new information.

UNIT-I II- Page 4 of 29


KCE-CSE –AI&ML 2023

• Posterior Probability: The probability that is calculated after all evidence or


information has taken into account. It is a combination of prior probability and
new information.

Conditional probability:

• Conditional probability is a probability of occurring an event when another event


has already happened

• Let's suppose, we want to calculate the event A when event B has already
occurred, "the probability of A under the conditions of B", it can be written as:

Where P(A⋀B)= Joint probability of a and B

P(B)= Marginal probability of B.

• We can find the probability of an uncertain event by using the below formula.

• P(¬A) = probability of a not happening event.

• P(¬A) + P(A) = 1.

Example

• In a class, there are 70% of the students who like English and 40% of the
students who likes English and mathematics, and then what is the percent of
students those who like English also like mathematics?

Solution:

• Let, A is an event that a student likes Mathematics

• B is an event that a student likes English.

TOPIC 9: Bayesian inference, Naïve Bayes models.

Baye’s Theorem

Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines the probability of an event with uncertain knowledge.

UNIT-I II- Page 5 of 29


KCE-CSE –AI&ML 2023

In probability theory, it relates the conditional probability and marginal probabilities of


two random events.

It is a way to calculate the value of P(B|A) with the knowledge of P(A|B)

Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.

Example: If cancer corresponds to one's age then by using Bayes' theorem, we can
determine the probability of cancer more accurately with the help of age.

Bayes' theorem can be derived using product rule and conditional probability of event A
with known event B:

• Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian


reasoning, which determines the probability of an event with uncertain
knowledge. – Probabilistic reasoning

This equation is basic of most modern AI systems for probabilistic inference.

It shows the simple relationship between joint and conditional probabilities. Here,

P(A|B) is known as posterior, which we need to calculate, and it will be read as


Probability of hypothesis A when we have occurred an evidence B.

P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we
calculate the probability of evidence.

P(A) is called the prior probability, probability of hypothesis before considering the
evidence

UNIT-I II- Page 6 of 29


KCE-CSE –AI&ML 2023

P(B) is called marginal probability, pure probability of an evidence.

Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and
P(A). This is very useful in cases where we have a good probability of these three terms
and want to determine the fourth one. Suppose we want to perceive the effect of some
unknown cause, and want to compute that cause, then the Bayes' rule becomes:

Question: what is the probability that a patient has diseases meningitis with a stiff
neck?

Given Data:

A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it
occurs 80% of the time. He is also aware of some more facts, which are given as follows:

The Known probability that a patient has meningitis disease is 1/30,000.

The Known probability that a patient has a stiff neck is 2%.

Let a be the proposition that patient has stiff neck and b be the proposition that patient
has meningitis. , so we can calculate the following as:

P(a|b) = 0.8

P(b) = 1/30000

P(a)= .02

Hence, we can assume that 1 patient out of 750 patients has meningitis disease
with a stiff neck

Application of Bayes' theorem in Artificial intelligence:

Following are some applications of Bayes' theorem:

o It is used to calculate the next step of the robot when the already executed step is
given.

o Bayes' theorem is helpful in weather forecasting.

o It can solve the Monty Hall problem.

UNIT-I II- Page 7 of 29


KCE-CSE –AI&ML 2023

TOPIC 10: Probabilistic reasoning, Bayesian networks, exact inference in BN

Bayesian belief network is key computer technology for dealing with probabilistic
events and to solve a problem which has uncertainty. We can define a Bayesian network
as:

"A Bayesian network is a probabilistic graphical model which represents a set of


variables and their conditional dependencies using a directed acyclic graph."

It is also called a Bayes network, belief network, decision network, or Bayesian model.

Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and anomaly
detection.

Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network. It can also be used in various
tasks including prediction, anomaly detection, diagnostics, automated insight,
reasoning, time series prediction, and decision making under uncertainty.

Bayesian Network can be used for building models from data and experts opinions, and
it consists of two parts:

o Directed Acyclic Graph

o Table of conditional probabilities.

The generalized form of Bayesian network that represents and solve decision problems
under uncertain knowledge is known as an Influence diagram.

A Bayesian network graph is made up of nodes and Arcs (directed links), where:

o Each node corresponds to the random variables, and a variable can


be continuous or discrete.

UNIT-I II- Page 8 of 29


KCE-CSE –AI&ML 2023

o Arc or directed arrows represent the causal relationship or conditional


probabilities between random variables. These directed links or arrows connect
the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if
there is no directed link that means that nodes are independent with each other

o In the above diagram, A, B, C, and D are random variables represented by


the nodes of the network graph.

o If we are considering node B, which is connected with node A by a


directed arrow, then node A is called the parent of Node B.

o Node C is independent of node A.

The Bayesian network has mainly two components:

o Causal Component

o Actual numbers

Each node in the Bayesian network has condition probability


distribution P(Xi |Parent(Xi) ), which determines the effect of the parent on that node.

Bayesian network is based on Joint probability distribution and conditional probability.


So let's first understand the joint probability distribution:

Joint probability distribution:

If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination of
x1, x2, x3.. xn, are known as Joint probability distribution.

P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability
distribution.

= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]

= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].

In general for each variable Xi, we can write the equation as:

P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm
reliably responds at detecting a burglary but also responds for minor earthquakes.
Harry has two neighbors David and Sophia, who have taken a responsibility to inform
Harry at work when they hear the alarm. David always calls Harry when he hears the
alarm, but sometimes he got confused with the phone ringing and calls at that time too.
On the other hand, Sophia likes to listen to high music, so sometimes she misses to hear
the alarm. Here we would like to compute the probability of Burglary Alarm.

UNIT-I II- Page 9 of 29


KCE-CSE –AI&ML 2023

Problem:

Calculate the probability that alarm has sounded, but there is neither a burglary, nor an
earthquake occurred, and David and Sophia both called the Harry.

Solution:

o The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of the
alarm and directly affecting the probability of alarm's going off, but David and
Sophia's calls depend on alarm probability.

o The network is representing that our assumptions do not directly perceive the
burglary and also do not notice the minor earthquake, and they also not confer
before calling.

o The conditional distributions for each node are given as conditional probabilities
table or CPT.

o Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.

o In CPT, a boolean variable with k boolean parents contains 2 K probabilities.


Hence, if there are two parents, then CPT will contain 4 probability values

List of all events occurring in this network:

o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B, E],
can rewrite the above probability statement using joint probability distribution:

UNIT-I II- Page 10 of 29


KCE-CSE –AI&ML 2023

P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]

=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]

= P [D| A]. P [ S| A, B, E]. P[ A, B, E]

= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]

= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

Let's take the observed probability for the Burglary and earthquake component:

P(B= True) = 0.002, which is the probability of burglary.

P(B= False)= 0.998, which is the probability of no burglary.

P(E= True)= 0.001, which is the probability of a minor earthquake

P(E= False)= 0.999, Which is the probability that an earthquake not occurred.

We can provide the conditional probabilities as per the below tables:

Conditional probability table for Alarm A:

The Conditional probability of Alarm A depends on Burglar and earthquake:

UNIT-I II- Page 11 of 29


KCE-CSE –AI&ML 2023

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False True 0.31 0.69

False False 0.001 0.999

Conditional probability table for David Calls:

The Conditional probability of David that he will call depends on the probability
of Alarm.

A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95

Conditional probability table for Sophia Calls:

The Conditional probability of Sophia that she calls is depending on its Parent
Node "Alarm."

A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98

From the formula of joint distribution, we can write the problem statement in the form
of probability distribution:

P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).

UNIT-I II- Page 12 of 29


KCE-CSE –AI&ML 2023

= 0.75* 0.91* 0.001* 0.998*0.999

= 0.00068045.

Hence, a Bayesian network can answer any query about the domain by using Joint
distribution.

The semantics of Bayesian Network:

There are two ways to understand the semantics of the Bayesian network, which is
given below:

1. To understand the network as the representation of the Joint probability distribution.

It is helpful to understand how to construct the network.

2. To understand the network as an encoding of a collection of conditional


independence statements.

It is helpful in designing inference procedure.

Exact inference by enumeration

UNIT-I II- Page 13 of 29


KCE-CSE –AI&ML 2023

UNIT-I II- Page 14 of 29


KCE-CSE –AI&ML 2023

Exact inference by variable elimination

UNIT-I II- Page 15 of 29


KCE-CSE –AI&ML 2023

UNIT-I II- Page 16 of 29


KCE-CSE –AI&ML 2023

UNIT-I II- Page 17 of 29


KCE-CSE –AI&ML 2023

TOPIC 11: Approximate inference in BN

Approximate inference by stochastic simulation

UNIT-I II- Page 18 of 29


KCE-CSE –AI&ML 2023

UNIT-I II- Page 19 of 29


KCE-CSE –AI&ML 2023

UNIT-I II- Page 20 of 29


KCE-CSE –AI&ML 2023

UNIT-I II- Page 21 of 29


KCE-CSE –AI&ML 2023

UNIT-I II- Page 22 of 29


KCE-CSE –AI&ML 2023

Approximate inference by Markov chain Monte Carlo

UNIT-I II- Page 23 of 29


KCE-CSE –AI&ML 2023

UNIT-I II- Page 24 of 29


KCE-CSE –AI&ML 2023

TOPIC 12: Causal networks.

Causal AI refers to the use of AI to make decisions and predictions based on


cause-and-effect relationships rather than just correlational relationships.

Causal reasoning is the process of understanding the relationships between causes and
effects. It is the way that we, as humans, make sense of the world around us and draw
conclusions based on our observations. In a similar vein, causal AI uses algorithms and
models to identify and analyse causal relationships in data, allowing it to make
predictions and decisions based on these relationships.

UNIT-I II- Page 25 of 29


KCE-CSE –AI&ML 2023

To illustrate, imagine a simplistic world where educational outcomes Y are related to


school expenditures X as well as to the parents’ involvement in their children’s
education.

Suppose now that you have some data on educational outcomes Y, school expenditures
X, and parent involvement C. The unit of observation is, say, a school district. The
educational outcome data might come from standardized testing. Parent involvement
might be the records of what fraction of parents attend their student’s quarterly teacher
conferences. You, the modeler, work for the national government. You’ve been asked to
figure out what will be the effect on educational outcomes of an intervention where the
national government will give additional funding to schools.

For the purpose of anticipating the impact of a change in X on Y, either of two models
might be appropriate: either Y ~ X or Y ~ X + C.

UNIT-I II- Page 26 of 29


KCE-CSE –AI&ML 2023

The above graph illustrates another simple yet typical Bayesian network. In contrast to
the statistical relationships in the non-causal example, this graph describes the causal
relationships among the seasons of the year (X 1), whether it is raining (X2), whether the
sprinkler is on (X3), whether the pavement is wet (X4), and whether the pavement is
slippery (X5).

Here, the absence of a direct link between X1 and X5, for example, captures our
understanding that there is no direct influence of season on slipperiness. The influence
is mediated by the wetness of the pavement (if freezing were a possibility, a direct link
could be added).

Perhaps the most important aspect of Bayesian networks is that they are direct
representations of the world, not of reasoning processes.

The arrows in the diagram represent real causal connections and not the flow of
information during reasoning (as in rule-based systems and neural networks).
Reasoning processes can operate on Bayesian networks by propagating information in
any direction.

For example, if the sprinkler is on, then the pavement is probably wet (prediction,
simulation). If someone slips on the pavement, that will also provide evidence that it is
wet (abduction, reasoning to a probable cause, or diagnosis).

On the other hand, if we see that the pavement is wet, that will make it more likely that
the sprinkler is on or that it is raining (abduction); but if we then observe that the
sprinkler is on, that will reduce the likelihood that it is raining (explaining away).

It is the latter form of reasoning, explaining away, that is especially difficult to model in
rule-based systems and neural networks in a natural way because it seems to require
the propagation of information in two directions.

UNIT-I II- Page 27 of 29


KCE-CSE –AI&ML 2023

Causal Reasoning

Most probabilistic models, including general Bayesian networks, describe a Joint


Probability Distribution (JPD) over possible observed events but say nothing about
what will happen if a certain intervention occurs.

For example, what if I turn the Sprinkler on instead of just observing that it is turned
on? What effect does that have on the Season, or on the connection between Wet and
Slippery?

A causal network, intuitively speaking, is a Bayesian network with the added property
that the parents of each node are its direct causes.

In such a network, the result of an intervention is obvious: the Sprinkler node is set
to X3=on and the causal link between the Season X1 and the Sprinkler X3 is removed. All
other causal links and conditional probabilities remain intact, so the new model is:

This differs from observing that X3=on, which would result in a new model
that included the term P(X3=on|x1). This mirrors the difference between
seeing and doing: after observing that the Sprinkler is on, we wish to infer
that the Season is dry, that it probably did not rain, and so on. An arbitrary
decision to turn on the Sprinkler should not result in any such beliefs.

Causal networks are more properly defined, then, as Bayesian networks in which the
correct probability model—after intervening to fix any node’s value—is given simply by
deleting links from the node’s parents. For example, Fire → Smoke is a causal network,
whereas Smoke → Fire is not, even though both networks are equally capable of
representing any Joint Probability Distribution (JPD) of the two variables.

Causal networks model the environment as a collection of stable component


mechanisms. These mechanisms may be reconfigured locally by interventions, with

UNIT-I II- Page 28 of 29


KCE-CSE –AI&ML 2023

corresponding local changes in the model. This, in turn, allows causal networks to be
used very naturally for prediction by an agent that is considering various courses of
action.

Learning Bayesian Network Parameters

Given a qualitative Bayesian network structure, the conditional probability


tables, P(xi|pai), are typically estimated with the maximum-likelihood approach from
the observed frequencies in the dataset associated with the network.

In pure Bayesian approaches, Bayesian networks are designed from expert knowledge
and include hyperparameter nodes. Data (usually scarce) is used as pieces of evidence
for incrementally updating the distributions of the hyperparameters

UNIT-I II- Page 29 of 29

You might also like