0% found this document useful (0 votes)
12 views22 pages

Artificial Intelligence Notes 5

The document discusses probabilistic reasoning in artificial intelligence, emphasizing its importance in handling uncertainty through Bayesian inference and models like Naive Bayes. It explains how probabilistic reasoning combines probability theory with logic to manage incomplete information and outlines the necessity of Bayesian networks for various AI applications. Key concepts include the challenges of uncertainty, the role of Bayes' theorem in updating probabilities, and the Naive Bayes algorithm's classification technique based on independence among features.

Uploaded by

Shrijith K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views22 pages

Artificial Intelligence Notes 5

The document discusses probabilistic reasoning in artificial intelligence, emphasizing its importance in handling uncertainty through Bayesian inference and models like Naive Bayes. It explains how probabilistic reasoning combines probability theory with logic to manage incomplete information and outlines the necessity of Bayesian networks for various AI applications. Key concepts include the challenges of uncertainty, the role of Bayes' theorem in updating probabilities, and the Naive Bayes algorithm's classification technique based on independence among features.

Uploaded by

Shrijith K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

1

AL3391-ARTIFICIAL INTELLIGENCE

UNIT V
PROBABILISTIC REASONING
SYLLABUS
Acting under uncertainty – Bayesian inference – naïve Bayes models.Probabilistic
reasoning –Bayesian networks – exact inference in BN – approximate inference in BN
– causal networks.

INTRODUCTION

In Artificial Intelligence and Cognitive Science, probabilistic approaches are


critical for generating, reasoning, and making decisions (simple and complex).
Probabilistic reasoning is a form of knowledge representation in which the concept
of probability is used to indicate the degree of uncertainty in knowledge.
In AI, probabilistic models are used to examine data using statistical codes.
It was one of the first machine learning methods.
To this day, it's still widely used. The Naive Bayes algorithm is one of the most
well-known algorithms in this group.

How does it work?


Probabilistic modeling provides a framework for accepting the concept of learning.
The probabilistic framework specifies how to express and deploy model
reservations.
In scientific data analysis, predictions play a significant role. Machine learning,
automation, cognitive computing, and artificial intelligence all rely heavily on
them.
AI makes use of probabilistic reasoning:
When we are uncertain about the premises
When the number of possible predicates becomes unmanageable
When it is known that an experiment contains an error

How does knowledge support reasoning?


Simplifying any realistic domain requires some simplifications.
The act of preparing knowledge to support reasoning necessitates the omission of
numerous facts, their omission or their crude summarization.

S.PRABU Associative Professor, AI&DS Department, KVCET


2
AL3391-ARTIFICIAL INTELLIGENCE

Rather than ignoring or enumerating exceptions, an alternative is to summaries


them, i.e., to provide some warning signs indicating which areas of the minefield
are more dangerous than others.
Summarization is critical for striking a reasonable balance between safety and
movement speed.
One way to summaries exceptions is to assign a numerical measure
of uncertainty to each proposition and then combine these measures using uniform
syntactic principles, similar to how truth values are combined in logic.
AI is a computational representation of intelligent behavior and common sense
reasoning.
Probability theory provides a logical explanation for how belief should change in
the presence of incomplete or uncertain information. .
AI systems are not unfamiliar with network representations.
Thus, we need uncertain reasoning or probabilistic reasoning to represent uncertain
knowledge.

Causes of uncertainty:
The following are some of the most common sources of uncertainty in the real world:
• The information came from unreliable sources.
• Errors in Experimental Design
• Equipment failure
• Temperature variations
• Climate change

Why Is Probabilistic Reasoning Necessary in AI?


• when the outcome is unpredictable.
• when the specification or set of possible predicates becomes unmanageable,
• when an experiment encounters an unknown error.
In probabilistic reasoning, there are two ways to solve problems involving
uncertain knowledge:
• Bayes' rule
• Bayesian Statistics

Numerous problems in AI (reasoning, planning, learning, perception, and robotics)


necessitate that the agent operates with incomplete or uncertain information.

S.PRABU Associative Professor, AI&DS Department, KVCET


3
AL3391-ARTIFICIAL INTELLIGENCE

Using methods from probability theory and economics, AI researchers have


developed a number of powerful tools for resolving these problems.
Bayesian networks are a very versatile tool that can be used to solve a variety of
problems, including reasoning (via the Bayesian inference algorithm), learning
(via the expectation maximization algorithm), planning (via decision networks),
and perception (using dynamic Bayesian networks).
Probabilistic algorithms can also be used to filter, predict, smooth, and find
explanations for data streams, assisting perception systems in analyzing time-
dependent processes (e.g., hidden Markov models or Kalman filters).

Topic: 1 : ACTING UNDER UNCERTAINTY

Uncertainty:
Till now, we have learned knowledge representation using first-order logic and
propositional logic with certainty, which means we were sure about the predicates.
With this knowledge representation, we might write A→B, which means if A is
true then B is true, but consider a situation where we are not sure about whether A
is true or not then we cannot express this statement, this situation is called
uncertainty.
So to represent uncertain knowledge, where we are not sure about the predicates,
we need uncertain reasoning or probabilistic reasoning.

Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we apply the
concept of probability to indicate the uncertainty in knowledge.
In probabilistic reasoning, we combine probability theory with logic to handle the
uncertainty.
We use probability in probabilistic reasoning because it provides a way to handle
the uncertainty that is the result of someone's laziness and ignorance.
Agents may need to handle uncertainty, whether due to partial observability, non
determinism, or a combination of the two.
An agent may never know for certain what state it’s in or where it will end up after
a sequence of actions.
Problem-solving agents and Logical agents designed to handle uncertainty by
keeping track of a belief state—a representation of the set of all possible world

S.PRABU Associative Professor, AI&DS Department, KVCET


4
AL3391-ARTIFICIAL INTELLIGENCE

states that it might be in—and generating a contingency plan that handles every
possible eventuality that its sensors may report during execution.
Despite its many virtues, however, this approach has significant drawbacks when
taken literally as a recipe for creating agent programs:

• When interpreting partial sensor information, a logical agent must consider


every logically possible explanation for the observations, no matter how
unlikely. This leads to impossible large and complex belief-state
representations.
• A correct contingent plan that handles every eventuality can grow arbitrarily
large and must consider arbitrarily unlikely contingencies.
• Sometimes there is no plan that is guaranteed to achieve the goal—yet the
agent must act. It must have some way to compare the merits of plans that are
not guaranteed.

Suppose, for example, that an automated taxi, automated has the goal of
delivering a passenger, that involves leaving home 90 minutes before the flight
departs and driving at a reasonable speed. Even though the airport is only about 5
miles away, a logical taxi agent will not be able to conclude with certaintythat “Plan
A90 get us to the airport on time. Instead, it reaches the weaker conclusion. “Plan A
will get us to the airport in time, as long as the car doesn’t break down or run out of
gas, and I don’t get into an accident, and there are no accidents on the bridge, and the
plane doesn’t leave early, and no meteorite hits the car, and ... .” None of these
conditions can be deduced for sure, so the plan’s success cannot be inferred. This is
the qualification problem, for which we so far have seen no real solution.

Let’s consider an example of uncertain reasoning: diagnosing a dental patient’s


toothache. Diagnosis—whether for medicine, automobile repair, or whatever—
almost always involves uncertainty.
Let us try to write rules for dental diagnosis using propositional logic, so that we
can see how the logical approach breaks down.
Consider the following simple rule:

S.PRABU Associative Professor, AI&DS Department, KVCET


5
AL3391-ARTIFICIAL INTELLIGENCE

Toothache ⇒Cavity .
The problem is that this rule is wrong. Not all patients with toothaches have
cavities; some of them have gum disease, an abscess, or one of several other
problems:
Toothache ⇒ Cavity ∨GumProblem∨ ∨Abscess ...
Unfortunately, in order to make the rule true, we have to add an almost unlimited
list of possible problems. We could try turning the rule into a causal rule:
Cavity ⇒Toothache .
But this rule is not right either; not all cavities cause pain.
The only way to fix the rule is to make it logically exhaustive: to augment the left-
hand side with all the qualifications required for a cavity to cause a toothache.
Trying to use logic to cope with a domain like medical diagnosis thus fails for
three main reasons:

Laziness:
• It is too much work to list the complete set of antecedents or consequents needed
to ensure an exception less rule and too hard to use such rules.

Theoretical ignorance:
• Medical science has no complete theory for the domain.

Practical ignorance:
• Even if we know all the rules, we might be uncertain about particular patient
because not all the necessary tests have been or can be run.

The connection between toothaches and cavities is just not a logical consequence
in either direction.
This is typical of the medical domain, as well as most other judgmental domains:
law, business, design, automobile repair, gardening, dating, and so on.
The agent’s knowledge can at best provide only a degree of belief in the relevant
sentences.
Our main tool for dealing with degrees of belief is probability theory.
Probability provides a way of summarizing the uncertainty that comes from our
laziness and ignorance, thereby solving the qualification problem.

S.PRABU Associative Professor, AI&DS Department, KVCET


6
AL3391-ARTIFICIAL INTELLIGENCE

We might not know for sure what afflicts a particular patient, but we believe that
there is, say, an 80% chance—that is, a probability of 0.8—that the patient who
has a toothache has a cavity.
That is, we expect that out of all the situations that are indistinguishable from the
current situation as far as our knowledge goes, the patient will have a cavity in
80% of them.
This belief could be derived from statistical data—80% of the toothache patients
seen so far have had cavities—or from some general dental knowledge, or from a
combination of evidence sources.

Consider again the A plan for getting to the airport. Suppose it gives us a 97%
chance of catching our flight.
Does this mean it is a rational choice? Not necessarily: there might be other plans,
such as A90, with higher probabilities.
If it is vital not to miss the flight, then it is worth risking the longer wait at the
airport.
What about A1440 , a plan that involves leaving home 24 hours in advance?
In most circumstances, this is not a good choice, because although it almost
guarantees getting there on time, it involves an intolerable wait—not to mention a
possibly unpleasant diet of airport food.
To make such choices, an agent must first have preferences between the different
possible outcomes of the various plans.
An outcome is a completely specified state, including such factors as whether the
agent arrives on time and the length of the wait at the airport.
Preferences, as expressed by utilities, are combined with probabilities in the
generaltheory of rational decisions called decision theory:
Decision theory = probability theory + utility theory .
The fundamental idea of decision theory is that an agent is rational if and only if it
choosesthe action that yields the highest expected utility, averaged over all the
possible outcomesof the action.
This is called the principle of maximum expected utility (MEU).

S.PRABU Associative Professor, AI&DS Department, KVCET


7
AL3391-ARTIFICIAL INTELLIGENCE

Topic: 2 : BAYESIAN INFERENCE

Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning,
which determines the probability of an event with uncertain knowledge.
In probability theory, it relates the conditional probability and marginal
probabilities of two random events.
Bayes' theorem was named after the British mathematician Thomas Bayes.
The Bayesian inference is an application of Bayes' theorem, which is fundamental
to Bayesian statistics.
It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).

Bayes' theorem allows updating the probability prediction of an event by observing


new information of the real world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can
determine the probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of
event A with known event B:
As from product rule we can write:
1. P(A ⋀ B)= P(A|B) P(B) or
Similarly, the probability of event B with known event A:
P(A ⋀ B)= P(B|A) P(A)

Equating right hand side of both the equations, we will get:


The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation
is basic of most modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities.
Here, P(A|B) is known as posterior, which we need to calculate, and it will be
read as Probability of hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then
we calculate the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering
the evidence
P(B) is called marginal probability, pure probability of an evidence.

S.PRABU Associative Professor, AI&DS Department, KVCET


8
AL3391-ARTIFICIAL INTELLIGENCE

In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the
Bayes' rule can be written as:

Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.

Applying Bayes' rule:


Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B),
and P(A).
This is very useful in cases where we have a good probability of these three terms
and want to determine the fourth one.
Suppose we want to perceive the effect of some unknown cause, and want to
compute that cause, then the Bayes' rule becomes:

Application of Bayes' theorem in Artificial intelligence:


Following are some applications of Bayes' theorem:
• It is used to calculate the next step of the robot when the already executed step
is given.
• Bayes' theorem is helpful in weather forecasting.
• It can solve the Monty Hall problem.

Topic: 3 : NAÏVE BAYES MODEL

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can
be described as:
• Naïve: It is called Naïve because it assumes that the occurrence of a certain
feature is independent of the occurrence of other features. Such as if the fruit is
identified on the bases of color, shape, and taste, then red, spherical, and sweet
fruit is recognized as an apple. Hence each feature individually contributes to
identify that it is an apple without depending on each other.
• Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem.

S.PRABU Associative Professor, AI&DS Department, KVCET


9
AL3391-ARTIFICIAL INTELLIGENCE

Bayes' Theorem:
• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge.
• It depends on the conditional probability.
• The formula for Bayes' theorem is given as:

What is Naive Bayes algorithm?


It is a classification technique based on Bayes’ Theorem with an assumption of
independence among predictors. In simple terms, a Naive Bayes classifier assumes
that the presence of a particular feature in a class is unrelated to the presence of
any other feature.
For example, a fruit may be considered to be an apple if it is red, round, and about
3 inches in diameter.
Even if these features depend on each other or upon the existence of the other
features, all of these properties independently contribute to the probability that this
fruit is an apple and that is why it is known as ‘Naive’.
Naive Bayes model is easy to build and particularly useful for very large data sets.
Along with simplicity, Naive Bayes is known to outperform even
highly sophisticated classification methods.
Bayes theorem provides a way of calculating posterior probability P(c|x) from
P(c), P(x) and P(x|c). Look at the equation below:

Above,
• P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
• P(c) is the prior probability of class.

S.PRABU Associative Professor, AI&DS Department, KVCET


10
AL3391-ARTIFICIAL INTELLIGENCE

• P(x|c) is the likelihood which is the probability of predictor given class.


• P(x) is the prior probability of predictor.

How Naive Bayes algorithm works?


Let’s understand it using an example. Below I have a training data set of weather
and corresponding target variable ‘Play’ (suggesting possibilities of playing).
Now, we need to classify whether players will play or not based on weather
condition. Let’s follow the below steps to perform it.
Step 1: Convert the data set into a frequency table
Step 2: Create Likelihood table by finding the probabilities like Overcast
probability = 0.29 and probability of playing is 0.64

Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for
each class. The class with the highest posterior probability is the outcome of
prediction.

Problem: Players will play if weather is sunny. Is this statement is correct?


We can solve it using above discussed method of posterior probability.

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)


Here we have P(Sunny |Yes) = 3/9 = 0.33,
P(Sunny) = 5/14 = 0.36,
P( Yes)= 9/14 = 0.64
Now,
P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60,

S.PRABU Associative Professor, AI&DS Department, KVCET


11
AL3391-ARTIFICIAL INTELLIGENCE

which has higher probability.


Naive Bayes uses a similar method to predict the probability of different class based
on various attributes. This algorithm is mostly used in text classification and with
problems having multiple classes.

What are the Pros and Cons of Naive Bayes?


Pros:
• It is easy and fast to predict class of test data set. It also perform well in multi class
prediction
• When assumption of independence holds, a Naive Bayes classifier performs better
compare to other models like logistic regression and you need less training data.
• It perform well in case of categorical input variables compared to numerical
variable(s). For numerical variable, normal distribution is assumed (bell curve,
which is a strong assumption).

Cons:
• If categorical variable has a category (in test data set), which was not observed in
training data set, then model will assign a 0 (zero) probability and will be unable to
make a prediction. This is often known as “Zero Frequency”. To solve this, we can
use the smoothing technique. One of the simplest smoothing techniques is called
Laplace estimation.
• On the other side naive Bayes is also known as a bad estimator, so the probability
outputs from predict_proba are not to be taken too seriously.
• Another limitation of Naive Bayes is the assumption of independent predictors. In
real life, it is almost impossible that we get a set of predictors which are
completely independent.

4 Applications of Naive Bayes Algorithms


• Real time Prediction: Naive Bayes is an eager learning classifier and it is
sure fast. Thus, it could be used for making predictions in real time.
• Multi class Prediction: This algorithm is also well known for multi class
prediction feature. Here we can predict the probability of multiple classes of
target variable.
• Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes
classifiers mostly used in text classification (due to better result in multi class

S.PRABU Associative Professor, AI&DS Department, KVCET


12
AL3391-ARTIFICIAL INTELLIGENCE

problems and independence rule) have higher success rate as compared to other
algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail)
and Sentiment Analysis (in social media analysis, to identify positive and
negative customer sentiments)
• Recommendation System: Naive Bayes Classifier and Collaborative
Filtering together builds a Recommendation System that uses machine learning
and data mining techniques to filter unseen information and predict whether a
user would like a given resource or not

Topic: 4 : BAYESIAN NETWORK

Bayesian belief network is key computer technology for dealing with probabilistic
events and to solve a problem which has uncertainty. We can define a Bayesian
network as:

"A Bayesian network is a probabilistic graphical model which represents a set of


variables and their conditional dependencies using a directed acyclic graph."

It is also called a Bayes network, belief network, decision network, or Bayesian


model.
Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and anomaly
detection.

Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network.
It can also be used in various tasks including prediction, anomaly detection,
diagnostics, automated insight, reasoning, time series prediction, and decision
making under uncertainty.
Bayesian Network can be used for building models from data and experts
opinions, and it consists of two parts:
• Directed Acyclic Graph
• Table of conditional probabilities.

S.PRABU Associative Professor, AI&DS Department, KVCET


13
AL3391-ARTIFICIAL INTELLIGENCE

The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:

Each node corresponds to the random variables, and a variable can


be continuous or discrete.
Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables. These directed links or arrows connect
the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if
there is no directed link that means that nodes are independent with each other
• In the above diagram, A, B, C, and D are random variables represented by
the nodes of the network graph.
• If we are considering node B, which is connected with node A by a directed
arrow, then node A is called the parent of Node B.
• Node C is independent of node A.

The Bayesian network has mainly two components:


• Causal Component
• Actual numbers
Each node in the Bayesian network has condition probability
distribution P(Xi |Parent(Xi) ), which determines the effect of the parent on
that node.
Bayesian network is based on Joint probability distribution and conditional
probability.
So let's first understand the joint probability distribution:

S.PRABU Associative Professor, AI&DS Department, KVCET


14
AL3391-ARTIFICIAL INTELLIGENCE

Joint Probability – It is the measure of two events happening at the same


time. It can be written as P(A∩B)
Conditional Probability– It is the measure of the probability of an event
occurring given that another event has occurred. In other terms, the conditional
probability of an event X is the probability that the event will occur given that
event Y has already occurred.
P(X|Y): Probability of event X occurring given that event Y already occurred.
If X and Y are dependent events, then P(X|Y) = P(X∩Y)/P(Y)
If X and Y are independent events, then P(X∩Y) = 0
So, P(X|Y) = P(X)
Let’s understand the Bayesian network by an example.

Example Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary,
nor an earthquake occurred, and David and Sophia both called the Harry.
Solution:
• The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of the
alarm and directly affecting the probability of alarm's going off, but David and
Sophia's calls depend on alarm probability.
• The network is representing that our assumptions do not directly perceive the
burglary and also do not notice the minor earthquake, and they also not confer
before calling.
• The conditional distributions for each node are given as conditional
probabilities table or CPT.
• Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.
• In CPT, a boolean variable with k boolean parents contains 2K probabilities.
Hence, if there are two parents, then CPT will contain 4 probability values

List of all events occurring in this network:


• Burglary (B)
• Earthquake(E)

S.PRABU Associative Professor, AI&DS Department, KVCET


15
AL3391-ARTIFICIAL INTELLIGENCE

• Alarm(A)
• David Calls(D)
• Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B,
E], can rewrite the above probability statement using joint probability distribution:

P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]


=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

Let's take the observed probability for the Burglary and earthquake component:

P(B= True) = 0.002, which is the probability of burglary.


P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake

S.PRABU Associative Professor, AI&DS Department, KVCET


16
AL3391-ARTIFICIAL INTELLIGENCE

P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
We can provide the conditional probabilities as per the below tables:

Conditional probability table for Alarm A:


The Conditional probability of Alarm A depends on Burglar and earthquake:

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False True 0.31 0.69

False False 0.001 0.999


Conditional probability table for David Calls:
The Conditional probability of David that he will call depends on the probability of
Alarm.

A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95


Conditional probability table for Sophia Calls:
The Conditional probability of Sophia that she calls is depending on its Parent Node
"Alarm."

A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98


From the formula of joint distribution, we can write the problem statement in the form
of probability distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.

S.PRABU Associative Professor, AI&DS Department, KVCET


17
AL3391-ARTIFICIAL INTELLIGENCE

Hence, a Bayesian network can answer any query about the domain by using
Joint distribution.

Application of Bayesian Network


Healthcare Industry: The Bayesian network is used in the healthcare industry for the
detection and prevention of diseases. Based on models built, we can find out the likely
symptoms and predict whether a person will be diseased or not. For instance, if a
person has cholesterol, then there are high chances that the person gets a heart
problem. With this information, a person can take preventive measures.

Web Search: Bayesian Network modes can be used for search accuracy based on user
intent. Based on the user’s intent, these models show things that are relevant to the
person. For instance, when we search for Python functions most of the time, then the
web search model activates our intent and it makes sure to show relevant things to the
user.

Mail Spam Filtering: Gmail is using Bayesian Models to filter the mails by reading
or understanding the context of mail. For instance, we may have observed spam
emails in the spam folder in Gmail. So, how are these emails classified as spams?
Using the Bayesian model, which observes the mail and based on the previous
experience and the probability it classifies mail as spam or not.

Biomonitoring: Bayesian Models are used to quantify the concentration of chemicals


in blood and human tissues. These use indicators to measure blood and urine. To find
the level of ECCs, one can conduct biometric studies.
Information Retrieval: Models can be used for retrieving information from the
database. During this process, we refine our problem multiple times. It is used to
reduce information overload.

S.PRABU Associative Professor, AI&DS Department, KVCET


18
AL3391-ARTIFICIAL INTELLIGENCE

Topic: 5 : INFERENCES

Inference is the process of calculating a probability distribution of interest .


The terms inference and queries are used interchangeably.

Types
1. Exact Inference
2. Approximate Inference

Exact Inference
• It is the term used when inference is performed exactly (subject to standard
numerical rounding errors).
• Exact inference is applicable to a large range of problems, but may not be
possible when combinations/paths get large.

Approximate inference
• Wider class of problems
• Non deterministic
• No guarantee of correct answer

S.PRABU Associative Professor, AI&DS Department, KVCET


19
AL3391-ARTIFICIAL INTELLIGENCE

S.PRABU Associative Professor, AI&DS Department, KVCET


20
AL3391-ARTIFICIAL INTELLIGENCE

Topic: 6 : CASUAL NETWORK

What is causal Network?


Causal reasoning is a crucial element to how humans understand, explain, and
make decisions about the world. Causal AI means automating causal reasoning
with machine learning. Today’s learning machines have superhuman prediction
ability but aren't particularly good at causal reasoning, even when we train
them on obscenely large amounts of data. In this book, you will learn how to
write algorithms that capture causal reasoning in the context of machine
learning and automated data science.

Though humans rely heavily on causal reasoning to navigate the world, our
cognitive biases make our causal inferences highly error-prone. We developed
empiricism, the scientific method, and experimental statistics to address our
tendencies to make errors in causal inference tasks such as finding and
validating causal relations, distinguishing causality from mere correlation, and
predicting the consequences of actions, decisions, and policies. Yet even
empiricism still requires humans to interpret and explain observational data
(data we observe in passing). The way we interpret causality from those
different types of data is also error-prone. Causal AI attempts to use statistics,

S.PRABU Associative Professor, AI&DS Department, KVCET


21
AL3391-ARTIFICIAL INTELLIGENCE

probability, and computer science to help us surpass these errors in our


reasoning.

The difficulty of answering causal questions has motivated the work of


millennia of philosophers, centuries of scientists, and decades of statisticians.
But now, a convergence of statistical and computational advances has shifted
the focus from discourse to algorithms that we can train on data and deploy to
software. It is now a fascinating time to learn how to build causal AI.

Simulated experiments and causal effect inference


Causal effect inference – quantifying how much a cause (e.g., a promotion)
affects an effect (e.g., sales) is the most common goal of applied data science.
The gold standard for causal effect inference is randomized experiments, such
as an A/B test. The concepts of causal inference explain why randomized
experiments work so well; randomization eliminates non-causal sources of
statistical correlation.
More importantly, causal inference enables data scientists to simulate
experiments and estimate causal effects from observational data. Most data in
the world is observational data because most data is not the result of an
experiment. When people say “big data,” they almost always mean
observational data. When a tech company boasts of training a natural language
model on petabytes of Internet data, it is observational data. When we can’t run
a randomized experiment because of infeasibility, cost, or ethics, causal
inference enables data scientists to turn to observational data to estimate causal
effects.

S.PRABU Associative Professor, AI&DS Department, KVCET


22
AL3391-ARTIFICIAL INTELLIGENCE

A common belief is that in the era of big data, it is easy to run a virtually
unlimited amount of experiments. If you can run unlimited experiments, who
needs causal inference? But even when you can run experiments at little actual
cost, there are often opportunity costs to running experiments. For example,
suppose a data scientist at an e-commerce company has a choice of one
thousand experiments to run and running each one would take time and
potentially sacrifice some sales or clicks in the process. Causal modeling
allows that data scientist to use results from a past experiment that answers one
question to simulate the results of a new experiment that would answer a
slightly different question. That means she could use past data to simulate the
results for each of these one thousand experiments and then prioritize the
experiments by those predicted to have the most impactful results in
simulation. She avoids wasting clicks and sales on less insightful experiments.

Training large machine learning artifacts is often more art than science, relying
on the intuition and experience of individual machine learning engineers. When
they leave the organization to move on to their next opportunity, they take all
of that intuition and experience with them. In contrast, by virtue of being
decomposable and explainable, it is easier for your team to understand
individual components of the model. Further, by forcing the engineer to encode
their causal knowledge about a problem domain into the structure of the
artifact, more of the fruits of their knowledge and experience stays with the
organization as valuable intellectual property after that employee moves on.

S.PRABU Associative Professor, AI&DS Department, KVCET

You might also like