Unit 2
Unit 2
UNIT- II
PROBABILISTIC REASONING
Agents in the real world need to handle uncertainty, whether due to partial
observability, nondeterminism, or adversaries. An agent may never know for sure what
state it is in now or where it will end up after a sequence of actions.
– using a simple but incorrect theory of the world, which does not take into
account uncertainty and will work most of the time
• This rule is wrong and in order to make it true we have to add an almost
unlimited list of possible causes:
• Trying to use first-order logic to cope with a domain like medical diagnosis fails
for three main reasons:
• Actually, the connection between toothaches and cavities is just not a logical
consequence in any direction.
• In judgmental domains (medical, law, design...) the agent’s knowledge can at best
provide a degree of belief in the relevant sentences.
• The main tool for dealing with degrees of belief is probability theory, which
assigns to each sentence a numerical degree of belief between 0 and 1.
• statistical data
• A probability of 0.8 does not mean “80% true”, but rather an 80% degree of
belief that something is true.
• In probability theory, a sentence such as “The probability that the patient has a
cavity is 0.8” is about the agent’s belief, not directly about the world.
• These beliefs depend on the percepts that the agent has received to date.
• For example:
• Before looking at the card, the agent might assign a probability of 1/52 to
its being the ace of spades.
Following are some leading causes of uncertainty to occur in the real world.
• Experimental Errors
• Equipment fault
• Temperature variation
• Climate change.
Probabilistic Reasoning
In probabilistic reasoning, there are two ways to solve problems with uncertain
knowledge:
• Bayes' rule
• Bayesian Statistics
Probability
• Sample space: The collection of all possible events is called sample space.
• Random variables: Random variables are used to represent the events and
objects in the real world.
Conditional probability:
• Let's suppose, we want to calculate the event A when event B has already
occurred, "the probability of A under the conditions of B", it can be written as:
• We can find the probability of an uncertain event by using the below formula.
• P(¬A) + P(A) = 1.
Example
• In a class, there are 70% of the students who like English and 40% of the
students who likes English and mathematics, and then what is the percent of
students those who like English also like mathematics?
Solution:
Baye’s Theorem
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines the probability of an event with uncertain knowledge.
Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can
determine the probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A
with known event B:
It shows the simple relationship between joint and conditional probabilities. Here,
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we
calculate the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the
evidence
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and
P(A). This is very useful in cases where we have a good probability of these three terms
and want to determine the fourth one. Suppose we want to perceive the effect of some
unknown cause, and want to compute that cause, then the Bayes' rule becomes:
Question: what is the probability that a patient has diseases meningitis with a stiff
neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it
occurs 80% of the time. He is also aware of some more facts, which are given as follows:
Let a be the proposition that patient has stiff neck and b be the proposition that patient
has meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease
with a stiff neck
o It is used to calculate the next step of the robot when the already executed step is
given.
Bayesian belief network is key computer technology for dealing with probabilistic
events and to solve a problem which has uncertainty. We can define a Bayesian network
as:
It is also called a Bayes network, belief network, decision network, or Bayesian model.
Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and anomaly
detection.
Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network. It can also be used in various
tasks including prediction, anomaly detection, diagnostics, automated insight,
reasoning, time series prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and
it consists of two parts:
The generalized form of Bayesian network that represents and solve decision problems
under uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
o Causal Component
o Actual numbers
If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination of
x1, x2, x3.. xn, are known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability
distribution.
In general for each variable Xi, we can write the equation as:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm
reliably responds at detecting a burglary but also responds for minor earthquakes.
Harry has two neighbors David and Sophia, who have taken a responsibility to inform
Harry at work when they hear the alarm. David always calls Harry when he hears the
alarm, but sometimes he got confused with the phone ringing and calls at that time too.
On the other hand, Sophia likes to listen to high music, so sometimes she misses to hear
the alarm. Here we would like to compute the probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an
earthquake occurred, and David and Sophia both called the Harry.
Solution:
o The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of the
alarm and directly affecting the probability of alarm's going off, but David and
Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the
burglary and also do not notice the minor earthquake, and they also not confer
before calling.
o The conditional distributions for each node are given as conditional probabilities
table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.
o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B, E],
can rewrite the above probability statement using joint probability distribution:
Let's take the observed probability for the Burglary and earthquake component:
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
The Conditional probability of David that he will call depends on the probability
of Alarm.
The Conditional probability of Sophia that she calls is depending on its Parent
Node "Alarm."
From the formula of joint distribution, we can write the problem statement in the form
of probability distribution:
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using Joint
distribution.
There are two ways to understand the semantics of the Bayesian network, which is
given below:
Causal reasoning is the process of understanding the relationships between causes and
effects. It is the way that we, as humans, make sense of the world around us and draw
conclusions based on our observations. In a similar vein, causal AI uses algorithms and
models to identify and analyse causal relationships in data, allowing it to make
predictions and decisions based on these relationships.
Suppose now that you have some data on educational outcomes Y, school expenditures
X, and parent involvement C. The unit of observation is, say, a school district. The
educational outcome data might come from standardized testing. Parent involvement
might be the records of what fraction of parents attend their student’s quarterly teacher
conferences. You, the modeler, work for the national government. You’ve been asked to
figure out what will be the effect on educational outcomes of an intervention where the
national government will give additional funding to schools.
For the purpose of anticipating the impact of a change in X on Y, either of two models
might be appropriate: either Y ~ X or Y ~ X + C.
The above graph illustrates another simple yet typical Bayesian network. In contrast to
the statistical relationships in the non-causal example, this graph describes the causal
relationships among the seasons of the year (X 1), whether it is raining (X2), whether the
sprinkler is on (X3), whether the pavement is wet (X4), and whether the pavement is
slippery (X5).
Here, the absence of a direct link between X1 and X5, for example, captures our
understanding that there is no direct influence of season on slipperiness. The influence
is mediated by the wetness of the pavement (if freezing were a possibility, a direct link
could be added).
Perhaps the most important aspect of Bayesian networks is that they are direct
representations of the world, not of reasoning processes.
The arrows in the diagram represent real causal connections and not the flow of
information during reasoning (as in rule-based systems and neural networks).
Reasoning processes can operate on Bayesian networks by propagating information in
any direction.
For example, if the sprinkler is on, then the pavement is probably wet (prediction,
simulation). If someone slips on the pavement, that will also provide evidence that it is
wet (abduction, reasoning to a probable cause, or diagnosis).
On the other hand, if we see that the pavement is wet, that will make it more likely that
the sprinkler is on or that it is raining (abduction); but if we then observe that the
sprinkler is on, that will reduce the likelihood that it is raining (explaining away).
It is the latter form of reasoning, explaining away, that is especially difficult to model in
rule-based systems and neural networks in a natural way because it seems to require
the propagation of information in two directions.
Causal Reasoning
For example, what if I turn the Sprinkler on instead of just observing that it is turned
on? What effect does that have on the Season, or on the connection between Wet and
Slippery?
A causal network, intuitively speaking, is a Bayesian network with the added property
that the parents of each node are its direct causes.
In such a network, the result of an intervention is obvious: the Sprinkler node is set
to X3=on and the causal link between the Season X1 and the Sprinkler X3 is removed. All
other causal links and conditional probabilities remain intact, so the new model is:
This differs from observing that X3=on, which would result in a new model
that included the term P(X3=on|x1). This mirrors the difference between
seeing and doing: after observing that the Sprinkler is on, we wish to infer
that the Season is dry, that it probably did not rain, and so on. An arbitrary
decision to turn on the Sprinkler should not result in any such beliefs.
Causal networks are more properly defined, then, as Bayesian networks in which the
correct probability model—after intervening to fix any node’s value—is given simply by
deleting links from the node’s parents. For example, Fire → Smoke is a causal network,
whereas Smoke → Fire is not, even though both networks are equally capable of
representing any Joint Probability Distribution (JPD) of the two variables.
corresponding local changes in the model. This, in turn, allows causal networks to be
used very naturally for prediction by an agent that is considering various courses of
action.
In pure Bayesian approaches, Bayesian networks are designed from expert knowledge
and include hyperparameter nodes. Data (usually scarce) is used as pieces of evidence
for incrementally updating the distributions of the hyperparameters