Artificial Intelligence Notes 5
Artificial Intelligence Notes 5
AL3391-ARTIFICIAL INTELLIGENCE
UNIT V
PROBABILISTIC REASONING
SYLLABUS
Acting under uncertainty – Bayesian inference – naïve Bayes models.Probabilistic
reasoning –Bayesian networks – exact inference in BN – approximate inference in BN
– causal networks.
INTRODUCTION
Causes of uncertainty:
The following are some of the most common sources of uncertainty in the real world:
• The information came from unreliable sources.
• Errors in Experimental Design
• Equipment failure
• Temperature variations
• Climate change
Uncertainty:
Till now, we have learned knowledge representation using first-order logic and
propositional logic with certainty, which means we were sure about the predicates.
With this knowledge representation, we might write A→B, which means if A is
true then B is true, but consider a situation where we are not sure about whether A
is true or not then we cannot express this statement, this situation is called
uncertainty.
So to represent uncertain knowledge, where we are not sure about the predicates,
we need uncertain reasoning or probabilistic reasoning.
Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we apply the
concept of probability to indicate the uncertainty in knowledge.
In probabilistic reasoning, we combine probability theory with logic to handle the
uncertainty.
We use probability in probabilistic reasoning because it provides a way to handle
the uncertainty that is the result of someone's laziness and ignorance.
Agents may need to handle uncertainty, whether due to partial observability, non
determinism, or a combination of the two.
An agent may never know for certain what state it’s in or where it will end up after
a sequence of actions.
Problem-solving agents and Logical agents designed to handle uncertainty by
keeping track of a belief state—a representation of the set of all possible world
states that it might be in—and generating a contingency plan that handles every
possible eventuality that its sensors may report during execution.
Despite its many virtues, however, this approach has significant drawbacks when
taken literally as a recipe for creating agent programs:
Suppose, for example, that an automated taxi, automated has the goal of
delivering a passenger, that involves leaving home 90 minutes before the flight
departs and driving at a reasonable speed. Even though the airport is only about 5
miles away, a logical taxi agent will not be able to conclude with certaintythat “Plan
A90 get us to the airport on time. Instead, it reaches the weaker conclusion. “Plan A
will get us to the airport in time, as long as the car doesn’t break down or run out of
gas, and I don’t get into an accident, and there are no accidents on the bridge, and the
plane doesn’t leave early, and no meteorite hits the car, and ... .” None of these
conditions can be deduced for sure, so the plan’s success cannot be inferred. This is
the qualification problem, for which we so far have seen no real solution.
Toothache ⇒Cavity .
The problem is that this rule is wrong. Not all patients with toothaches have
cavities; some of them have gum disease, an abscess, or one of several other
problems:
Toothache ⇒ Cavity ∨GumProblem∨ ∨Abscess ...
Unfortunately, in order to make the rule true, we have to add an almost unlimited
list of possible problems. We could try turning the rule into a causal rule:
Cavity ⇒Toothache .
But this rule is not right either; not all cavities cause pain.
The only way to fix the rule is to make it logically exhaustive: to augment the left-
hand side with all the qualifications required for a cavity to cause a toothache.
Trying to use logic to cope with a domain like medical diagnosis thus fails for
three main reasons:
Laziness:
• It is too much work to list the complete set of antecedents or consequents needed
to ensure an exception less rule and too hard to use such rules.
Theoretical ignorance:
• Medical science has no complete theory for the domain.
Practical ignorance:
• Even if we know all the rules, we might be uncertain about particular patient
because not all the necessary tests have been or can be run.
The connection between toothaches and cavities is just not a logical consequence
in either direction.
This is typical of the medical domain, as well as most other judgmental domains:
law, business, design, automobile repair, gardening, dating, and so on.
The agent’s knowledge can at best provide only a degree of belief in the relevant
sentences.
Our main tool for dealing with degrees of belief is probability theory.
Probability provides a way of summarizing the uncertainty that comes from our
laziness and ignorance, thereby solving the qualification problem.
We might not know for sure what afflicts a particular patient, but we believe that
there is, say, an 80% chance—that is, a probability of 0.8—that the patient who
has a toothache has a cavity.
That is, we expect that out of all the situations that are indistinguishable from the
current situation as far as our knowledge goes, the patient will have a cavity in
80% of them.
This belief could be derived from statistical data—80% of the toothache patients
seen so far have had cavities—or from some general dental knowledge, or from a
combination of evidence sources.
Consider again the A plan for getting to the airport. Suppose it gives us a 97%
chance of catching our flight.
Does this mean it is a rational choice? Not necessarily: there might be other plans,
such as A90, with higher probabilities.
If it is vital not to miss the flight, then it is worth risking the longer wait at the
airport.
What about A1440 , a plan that involves leaving home 24 hours in advance?
In most circumstances, this is not a good choice, because although it almost
guarantees getting there on time, it involves an intolerable wait—not to mention a
possibly unpleasant diet of airport food.
To make such choices, an agent must first have preferences between the different
possible outcomes of the various plans.
An outcome is a completely specified state, including such factors as whether the
agent arrives on time and the length of the wait at the airport.
Preferences, as expressed by utilities, are combined with probabilities in the
generaltheory of rational decisions called decision theory:
Decision theory = probability theory + utility theory .
The fundamental idea of decision theory is that an agent is rational if and only if it
choosesthe action that yields the highest expected utility, averaged over all the
possible outcomesof the action.
This is called the principle of maximum expected utility (MEU).
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning,
which determines the probability of an event with uncertain knowledge.
In probability theory, it relates the conditional probability and marginal
probabilities of two random events.
Bayes' theorem was named after the British mathematician Thomas Bayes.
The Bayesian inference is an application of Bayes' theorem, which is fundamental
to Bayesian statistics.
It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the
Bayes' rule can be written as:
Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can
be described as:
• Naïve: It is called Naïve because it assumes that the occurrence of a certain
feature is independent of the occurrence of other features. Such as if the fruit is
identified on the bases of color, shape, and taste, then red, spherical, and sweet
fruit is recognized as an apple. Hence each feature individually contributes to
identify that it is an apple without depending on each other.
• Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem.
Bayes' Theorem:
• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge.
• It depends on the conditional probability.
• The formula for Bayes' theorem is given as:
Above,
• P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
• P(c) is the prior probability of class.
Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for
each class. The class with the highest posterior probability is the outcome of
prediction.
Cons:
• If categorical variable has a category (in test data set), which was not observed in
training data set, then model will assign a 0 (zero) probability and will be unable to
make a prediction. This is often known as “Zero Frequency”. To solve this, we can
use the smoothing technique. One of the simplest smoothing techniques is called
Laplace estimation.
• On the other side naive Bayes is also known as a bad estimator, so the probability
outputs from predict_proba are not to be taken too seriously.
• Another limitation of Naive Bayes is the assumption of independent predictors. In
real life, it is almost impossible that we get a set of predictors which are
completely independent.
problems and independence rule) have higher success rate as compared to other
algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail)
and Sentiment Analysis (in social media analysis, to identify positive and
negative customer sentiments)
• Recommendation System: Naive Bayes Classifier and Collaborative
Filtering together builds a Recommendation System that uses machine learning
and data mining techniques to filter unseen information and predict whether a
user would like a given resource or not
Bayesian belief network is key computer technology for dealing with probabilistic
events and to solve a problem which has uncertainty. We can define a Bayesian
network as:
Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network.
It can also be used in various tasks including prediction, anomaly detection,
diagnostics, automated insight, reasoning, time series prediction, and decision
making under uncertainty.
Bayesian Network can be used for building models from data and experts
opinions, and it consists of two parts:
• Directed Acyclic Graph
• Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
Example Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary,
nor an earthquake occurred, and David and Sophia both called the Harry.
Solution:
• The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of the
alarm and directly affecting the probability of alarm's going off, but David and
Sophia's calls depend on alarm probability.
• The network is representing that our assumptions do not directly perceive the
burglary and also do not notice the minor earthquake, and they also not confer
before calling.
• The conditional distributions for each node are given as conditional
probabilities table or CPT.
• Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.
• In CPT, a boolean variable with k boolean parents contains 2K probabilities.
Hence, if there are two parents, then CPT will contain 4 probability values
• Alarm(A)
• David Calls(D)
• Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B,
E], can rewrite the above probability statement using joint probability distribution:
Let's take the observed probability for the Burglary and earthquake component:
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
We can provide the conditional probabilities as per the below tables:
Hence, a Bayesian network can answer any query about the domain by using
Joint distribution.
Web Search: Bayesian Network modes can be used for search accuracy based on user
intent. Based on the user’s intent, these models show things that are relevant to the
person. For instance, when we search for Python functions most of the time, then the
web search model activates our intent and it makes sure to show relevant things to the
user.
Mail Spam Filtering: Gmail is using Bayesian Models to filter the mails by reading
or understanding the context of mail. For instance, we may have observed spam
emails in the spam folder in Gmail. So, how are these emails classified as spams?
Using the Bayesian model, which observes the mail and based on the previous
experience and the probability it classifies mail as spam or not.
Topic: 5 : INFERENCES
Types
1. Exact Inference
2. Approximate Inference
Exact Inference
• It is the term used when inference is performed exactly (subject to standard
numerical rounding errors).
• Exact inference is applicable to a large range of problems, but may not be
possible when combinations/paths get large.
Approximate inference
• Wider class of problems
• Non deterministic
• No guarantee of correct answer
Though humans rely heavily on causal reasoning to navigate the world, our
cognitive biases make our causal inferences highly error-prone. We developed
empiricism, the scientific method, and experimental statistics to address our
tendencies to make errors in causal inference tasks such as finding and
validating causal relations, distinguishing causality from mere correlation, and
predicting the consequences of actions, decisions, and policies. Yet even
empiricism still requires humans to interpret and explain observational data
(data we observe in passing). The way we interpret causality from those
different types of data is also error-prone. Causal AI attempts to use statistics,
A common belief is that in the era of big data, it is easy to run a virtually
unlimited amount of experiments. If you can run unlimited experiments, who
needs causal inference? But even when you can run experiments at little actual
cost, there are often opportunity costs to running experiments. For example,
suppose a data scientist at an e-commerce company has a choice of one
thousand experiments to run and running each one would take time and
potentially sacrifice some sales or clicks in the process. Causal modeling
allows that data scientist to use results from a past experiment that answers one
question to simulate the results of a new experiment that would answer a
slightly different question. That means she could use past data to simulate the
results for each of these one thousand experiments and then prioritize the
experiments by those predicted to have the most impactful results in
simulation. She avoids wasting clicks and sales on less insightful experiments.
Training large machine learning artifacts is often more art than science, relying
on the intuition and experience of individual machine learning engineers. When
they leave the organization to move on to their next opportunity, they take all
of that intuition and experience with them. In contrast, by virtue of being
decomposable and explainable, it is easier for your team to understand
individual components of the model. Further, by forcing the engineer to encode
their causal knowledge about a problem domain into the structure of the
artifact, more of the fruits of their knowledge and experience stays with the
organization as valuable intellectual property after that employee moves on.