0% found this document useful (0 votes)

288 views103 pages

Lecture 5 Bayesian Classification 3

Artificial Intelligence is a course of computer science and engineering department in all countries.

Uploaded by

musa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

288 views103 pages

Lecture 5 Bayesian Classification 3

Artificial Intelligence is a course of computer science and engineering department in all countries.

Uploaded by

musa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 103

Bayesian Theorem

Abu Saleh Musa Miah

Assist. Professor, Dept. of CSE, BAUST, Bangladesh
email: [email protected], tel: +8801734264899
web: www.baust.edu.bd/cse
1
Probability
Probability is the measure of the likelihood that an event will occur.

Probability is quantified as a number between 0 and 1 (where 0

indicates impossibility and 1 indicates certainty).

2
Example
 A simple example is the toss of a fair (unbiased) coin.
 Since the two outcomes are equally probable,
 the probability of "heads" equals the probability of "tails",
 so the probability is 1/2 (or 50%) chance of either "heads" or
"tails".

3
Probability Theory
Try to write rules for dental diagnosis using first-order logic, so that
we can see how the logical approach breaks down
Consider the following rule:

The problem is that this rule is wrong. Not all patients with
toothaches have cavities; some of them have gum disease, an abscess,
or one of several other problems:

4
Unfortunately, in order to make the rule true, we have to add an
almost unlimited list of possible causes. We could try turning the
rule into a causal rule

 But this rule is not right either; not all cavities cause pain The only
way to fix the rule is to make it logically exhaustive:
 to augment the left-hand side with all the qualifications required for
a cavity to cause a toothache.
 Even then, for the purposes of diagnosis, one must also take into
account the possibility that the patient might have a toothache and a
cavity that are unconnected

5
Trying to use first-order logic to cope with a domain like medical
diagnosis thus fails for three main reasons:

 Laziness: It is too much work to list the complete set of

antecedents or consequents needed to ensure an exceptionless rule
and too hard to use such rules.
 Theoretical ignorance: Medical science has no complete theory
for the domain.
 Practical ignorance: Even if we know all the rules, we might be
uncertain about a particular patient because not all the necessary
tests have been or can be run.

 The connection between toothaches and cavities is just not a logical

consequence in either direction.
 This is typical of the medical domain, as well as most other
judgmental domains:
 law, business, design, automobile repair, gardening, dating, and so
6
 Our main tool for dealing with degrees of belief will be probability
theory, which assigns to each sentence a numerical degree of belief
between 0 and 1.
 Probability provides a way of summarizing the uncertainty that
comes from our laziness and ignorance.

 We might not know for sure what afflicts a particular patient, but
we believe that there is, say, an 80% chance-that is, a probability of
0.8-that the patient has a cavity if he or she has a toothache.
 The 80% summarizes those cases in which all the factors needed for
a cavity to cause a toothache are present and other cases in which
the patient has both toothache and cavity but the two are
unconnected. The missing 20% summarizes all the other possible
causes of toothache that we are too lazy or ignorant to confirm or
deny.

7
Probability Theory: Variables and Events
• In probability theory, the set of all possible worlds is called the sample space.
• For example, if we are about to roll two (distinguishable) dice, there are 36
possible worlds to consider: (1,1), (1,2), . . ., (6,6).
• A random variable can be an observation, outcome or event the value of which is
uncertain.
• Total and Die1 are random variables, random variable has a domain—the set of
possible values it can take on.
• The domain of Total for two dice is the set {2, . . . , 12} and the domain of Die1 is
{1, . . . , 6}.
• e.g a coin. Let’s use Throw as the random variable denoting the outcome
• The set of possible outcomes for a random variable is called its domain.
• The domain of Throw is {head, tail}
• A Boolean random variable has two outcomes.
• University {true, false}
• Cavity has the domain {true, false}
• Toothache has the domain {true, false}
• Weather has the domain {sunny, rainy, cloudy, snow} domain of Age to be
Probability Theory: Variables and Events
Each random variable has a domain of values that it can take on.

For example, the domain of Cavity might be (true, false ).

Random Variable can

 Discrete or Continuous
 Discrete------ Dice
 Continuous ---------- Tomorrow’s Temperature.

For example, the domain of Weather might be (sunny, rainy,

cloudy, snow).
Prior/Independent probability
The unconditional or prior probability associated with a proposition
a is the degree of belief accorded to it in the absence of any other
information. it is written as p(a)
For example,
Probability=Number of events occurred in Favor of expected outcome
/
Number of total events occurred expected outcome

 If we play dice it can be 1 2 3 4 5 6,

 The probability of 6 is 1/6
 Probability of odd or even number 3/6

 if I am going to the dentist for a regular checkup, the probability P(cavity)=0.2

might be of interest;
10
Prior probability
Probability=Number of events occurred in Favor of expected outcome
/
Number of total events occurred expected outcome
For example, you would expect for a fair dice that the event that you
threw five would have a frequency about one-sixth.

11
Independenct variable
The two run variables are considered independent if their joint
probability, that is, a probability of X and Y, equals to the product of
their marginals.
So it will be a probability of X times a probability of Y.

X and Y are independent if:

P(X,Y)=P(X)P(Y)
(joint)= (marginal)

12
Independence
Let's see an example.

 Imagine have a deck of 52 cards and take, randomly, 2 cards from it.
 the first random variable would be the picture that is drawn on
 the first card
 and second would be the picture that is drawn on the second card.
Those kind of variables are dependent since it is impossible to
take one card two times.
Another example is throwing two coins independently.

probability that the first coin will land heads up and the second would
land tails up equals to the product of the two probabilities.

And so these random variables are independent. 13

Conditional Probability
Probability of X given that Y happened:

It is the probability of X given Y equals to the joint probability P

of X and Y over the marginal probability P of Y.
Conditional Probability
• A conditional probability expresses the likelihood that one event a will occur if b
occurs. We denote this as follows
P ( a | b)
• This is read as "the probability of a, given that all we know is b." For example,

• e.g.
P(Toothache  true )  0.2

P(Toothache  true | Cavity  true )  0.6

• So conditional probabilities reflect the fact that some events make other events
more (or less) likely

• If one event doesn’t affect the likelihood of another event they are said to be
independent and therefore
P ( a | b)  P ( a )
• E.g. if you roll a 6 on a die, it doesn’t make it more or less likely that you will
roll a 6 on the next throw. The rolls are independent.
Conditional Probability
Let's consider an example.
 Imagine you are a student and you want to pass some course.
 It has two exams in it, a midterm and the final.
 The probability that the student will pass a midterm is 0.4 and
 the probability that the student will pass a midterm and the final 0.25.
 If you want to find the probability that you will pass the final, given that you already passed
the midterm, you can apply the formula from the previous slide. And this will give you a
value around 60%.

 P(M)=0.4
 P(M&F)=0.25
 P(F/M)==0.625

 We need two tricks

 chain rule and sum rules
16
Joint Probability
 P( Weather, Cavity) denotes all combinations of the values of Weather and Cavity
Where Weather=sunny, rainy, cloudy, snow
Cavity=yes, no

 That is 4 x 2 table of probabilities and called the joint probability dis-tribution of

Weather and Cavity.

 P(sunny, Cavity) would be a two-element vector giving the probabilities of a sunny day
with a cavity and a sunny day with no cavity

 P(Weather , Cavity) = P(Weather | Cavity) P(Cavity) instead of as these 4×2=8 equations

(using abbreviations W and C):

17
Joint Probability
 P( Weather, Cavity) denotes all combinations of the values of Weather and Cavity

 P(W , C) = P(W | C)P(C) instead of as these 4×2=8 equations (using abbreviations

Weather and Cavity):

 Written as P(sunny, cavity) or P(sunny ∧ cavity).

 will sometimes use P notation to derive results about individual P values, and when we
say “P(sunny)=0.6” it is really an abbreviation for “P(sunny) is the one-element vector
0.6, which means that P(sunny)=0.6.”

18
Conditional Probability
 A joint probability distribution that covers this complete set is
called the full joint probability distribution.
 We borrow this part directly from the semantics of propositional
logic, as follows. A possible world is defined to be an
assignment of values to all of the random variables under
consideration.
 For example, Cavity, Toothache, and Weather, then the full joint
distribution is given by P(Cavity , Toothache, Weather).
 This joint distribution can be represented as a 2 x 2 x 4 table with
16 entries
 Probability distributions for continuous variables are called
probability density functions.

19
Combining Probabilities

 For example, the sentence P(al b) = 0.8 cannot be interpreted to mean "whenever b holds, conclude that P(a) is 0.8."
first, P(a) always denotes the prior probability of a, not the posterior probability given some evidence;
 second, the statement P(al b) = 0.8 is immediately relevant just when b is the only available evidence.
 When additional information c is available, the degree of belief in a is P(al b A c), which might have little relation
to P(a1b).
 For example, c might tell us directly whether a is true or false.
 If we examine a patient who complains of toothache, and discover a cavity, then we have additional evidence cavity,
and
 we conclude (trivially) that P(cavity / toothache A cavity) = 1.0.
 So
Chain rule

 We can derive it from the definition of the conditional probability.

 That is, the joint probability of X and Y equals to the product of X given Y and the
probability of Y.
 By induction, we can prove the same formula for three variables.
 It will be the probability of X, Y, and Z equals to probability of X given Y and
 Z, the probability of Y given Z, and finally probability of Z. And in a similar way, we can
obtain the formula for the arbitrary number of points. So this would be the probability of
the current point, given all its previous points.
21
Sum rule
If events Y1, …., Yn are mutually
exclusive with X

P()
P()+ P()+….+ P()

if you want to find out the marginal distribution p(X), and

you know only the joint probability that p(X,Y),
you can integrate out the random variable Y, as it is given on the formula.
The Axioms of Probability
Using the axioms of probability
We can derive a variety of useful facts from the basic ,axioms. For
example, the familiar rule for negation follows by substituting l a for
b in axiom 3, giving us:

 Let the discrete variable D have the domain (dl, . . . , d,).

 Then it is easy to show (Exercise 13.2) that

 The probability of a proposition is equal to the sum of the probabilities of the

atomic events in which it holds
Inference by Enumeration
Inference by Enumeration
Normalization

Normalization will turn out to be a useful shortcut in many probability calculations

Independence
BAYES RULE
Conditional probability can be written
in two forms because of the
commutativity of conjunction:

Equating the two right-hand sides and dividing by P(a), we get

B=parameter
A=Observation/data

This equation is known as Bayes' rule (also Bayes' law or Bayes'

Theorem.

29
BAYES RULE
The Bayes Theorem was developed and named for Thomas
Bayes(1702-1761)
 Show the Relation between one conditional probability and its
inverse.
 Provide a mathematical rule for revising an estimate or forecast in
light of experience and observation.
In the 18th Century , Thomas Bayes,
 Ponder this question:
“Does God really exist?”
•Being interested in the mathematics, he attempt to develop a formula
to arrive at the probability that God does exist based on
the evidence that was available to him on earth.
Later, Laplace refined Bayes’ work and gave it the name
“Bayes’ Theorem”.
30
Applying: Bayesian Rule
1. Classification:

2.Regularization:
This formula can also lead to regularization.
We can treat the prior on theta as a regularizer.
Imagine if you want to estimate the probability that your coin will land
heads up.
You already know that most of the coins land heads up with probability 0.5 and
so you can use the following prior, because we'll say that most of the coins are fair.
However, if you know that for your experiment the probability
of heads can either be fair, that is 0.5 or bias towards heads,
that is greater than 0.5, you could use for example the following
31
Applying: BAYES RULE
3. Online Learning

4.Prediction:
 Bayes’ rule is useful in practice because there are many cases where we do have good probability
estimates for these three numbers and need to compute the fourth.

The conditional probability P(effect | cause) quantifies the relationship in the causal direction, whereas
P(cause | effect) describes the diagnostic direction. In a task such as medical diagnosis,
we often have conditional probabilities on causal relationships (that is, the doctor knows P(symptoms |
disease)) and want to derive a diagnosis, P(disease | symptoms).

32
Example-1 BAYES RULE
A doctor knows that the disease meningitis causes the patient to have
a stiff neck, say, 50% of the time. The doctor also knows some
unconditional facts: the prior probability that a patient has meningitis is
50,000, and the prior probability that any patient has a stiff neck is 20.
Letting s be the proposition that the patient has a stiff neck and m be
the proposition that the patient has meningitis, we have

That is, we expect only 1 in 5000 patients with a stiff neck to have
meningitis

33
Bayes’ rule can capture causal models
• Suppose a doctor knows that a meningitis causes a stiff neck in 50% of cases
P ( s | m)  0.5
• She also knows that the probability in the general population of someone having
a stiff neck at any time is 1/20
P ( s )  0.05

• She also has to know the incidence of meningitis in the population (1/50,000)
P(m)  0.00002
• Using Bayes’ rule she can calculate the probability the patient has meningitis:

P( s | m) P(m) 0.5  0.00002

P(m | s)    0.0002  1 / 5000
P( s) 0.05
P(effect | cause) P(cause )
P (cause | effect ) 
P(effect )
Example-2 of Bayes Rule
Marie is getting married tomorrow, at an outdoor ceremony in the desert. In recent
years, it has rained only 5 days each year. Unfortunately, the weatherman has
predicted rain for tomorrow. When it actually rains, the weatherman correctly
forecasts rain 90% of the time. When it doesn't rain, he incorrectly forecasts rain 10%
of the time. What is the probability that it will rain on the day of Marie's wedding?

Solution…
The sample space is defined by two mutually-exclusive events - it rains
or it does not rain. Additionally, a third event occurs when the
weatherman predicts rain. Notation for these events appears below.

 Event A1. It rains on Marie's wedding.

 Event A2. It does not rain on Marie's wedding.
 Event B. The weatherman predicts rain

35
In terms of probabilities, we know the following:
 P( A1 ) = 5/365 =0.0136985 [It rains 5 days out of the year.]
 P( A2 ) = 360/365 = 0.9863014 [It does not rain 360 days out
of the year.]
 P( B | A1 ) = 0.9 [When it rains, the weatherman predicts rain
90% of the time.]
 P( B | A2 ) = 0.1 [When it does not rain, the weatherman
predicts rain 10% of the time.]

We want to know P( A1 | B ), the probability it will rain on the day

of Marie's wedding, given a forecast for rain by the weatherman.
The answer can be determined from Bayes' theorem, as shown
below.

36
Using Bayes' rule: Combining evidence
What happens when we have two or more pieces of evidence? For example, what can a
dentist conclude if her nasty steel probe catches in the aching tooth of a patient? If we
know the full joint distribution

That might be feasible for just two evidence variables, but again it will not scale up. If there are
n possible evidence variables (X rays, diet, oral hygiene, etc.), then there are 2n possible
combinations of observed values for which we would need to know conditional probabilities.

This equation expresses the conditional independence of toothache and catch given Cavity.
We can plug it into Equation (13.12) to obtain the probability of a cavity

37
Using Bayes' rule: Combining evidence
 The dentistry example illustrates a commonly occurring pattern in which a single cause
directly influences a number of effects, all of which are conditionally independent, given
the cause. The full joint distribution can be written as

Such a probability distribution is called a naive Bayes model-"naive" because it is often

used (as a simplifying assumption) in cases where the "effect" variables are not conditionally
independent given the cause variable. (The naive Bayes model is sometimes called a
Bayesian classifier, a somewhat careless usage that has prompted true Bayesians to call it
the idiot Bayes model.)

38
What is a Bayesian Network ?
A graphical model that efficiently encodes the joint probability
distribution for a large set of variables
We remarked on the importance of independence and conditional
independence relationships in simplifying probabilistic
representations of the world.
This chapter introduces a systematic way to represent
such relationships explicitly in the form of Bayesian networks

39
Bayesian network Or Belief Networks
A Bayesian network is a directed graph in which each node is an
notated with quantitative probability information. The full
specification is as follows:

1. A set of random variables makes up the nodes of the network.

Variables may be discrete or continuous.
2. A set of directed links or arrows connects pairs of nodes. If
there is an arrow from node X to node Y, X is said to be a parent of Y.
3. Each node X, has a conditional probability distribution that
quantifies the effect of the parents on the node.
4. The graph has no directed cycles (and hence is a directed, acyclic
graph, or DAG).
Probabilistic model from BN
Nodes: Random variables
Edges: Direct impact
BN Excercise
 consisting of the variables Toothache, Cavzty, Catch, and Weather
 We argued that Weather is independent of the other variables;
 we argued that Toothache and Catch are conditionally independent, given Cavity.

Figure 14.1 A simple Bayesian network in which Weather is

independent of the other three variables and Toothache and Catch are
conditionally independent, given Cavity
 Formally, the conditional independence of Toothache and Catch given Cavity is indicated by
the absence of a link between Toothache and Catch.
 Intuitively, the network represents the fact that Cavity is a direct cause of Toothache and
Catch, whereas no direct causal relationship exists between Toothache and Catch. 42
BN Excercise

1) Consider the following Bayesian network, where F = having the flu

and C = coughing:

a) Write down the joint probability table specified by the Bayesian

network.
Answer:

43
b) Determine the probabilities for the following Bayesian network

so that it specifies the same joint probabilities as the given one.

Answer:

44
C) Which Bayesian network would you have specified using the
rules learned in class?
Answer:
The first one. It is good practice to add nodes that correspond to
causes before nodes that correspond to their effects.

d) Are C and F independent in the given Bayesian network?

Answer:
No, since (for example) P(F) = 0.1 but P(F / C) 0.23
e) Are C and F independent in the Bayesian network from Question b?
Answer:
No, for the same reason.

45
2) To safeguard your house, you recently installed two deferent
alarm systems by two deferent reputable manufacturers that use
completely deferent sensors for their alarm systems.
a) Which one of the two Bayesian networks given below makes
independence assumptions that are not true? Explain all of your
reasoning. Alarm1 means that the first alarm system rings,
Alarm2 means that the second alarm system rings, and Burglary
means that a burglary is in progress.

Answer: The second one falsely assumes that Alarm1 and Alarm2 are
independent if the value of Burglary is unknown. However, if the
alarms are working as intended, it should be more likely that Alarm1
rings if Alarm2 rings (that is, they should not be independent). 46
2) To safeguard your house, you recently installed two deferent
alarm systems by two deferent reputable manufacturers that use
completely deferent sensors for their alarm systems.

b) Consider the first Bayesian network. How many probabilities

need to be specified for its conditional probability tables? How
many probabilities would need to be given if the same joint
probability distribution were specified in a joint probability table?

47
Answer:

We need to specify 5 probabilities,

namely P(Burglary),
P,
P,
Pand
P.

A joint probability table would need probabilities.

48
c) Consider the second Bayesian network. Assume that:

49
Answer:

with

50
4) Consider the following Bayesian network. A, B, C, and D
are Boolean random variables. If we know that A is true, what
is the probability of D being true?

51
5) For the following Bayesian network

We know that X and Z are not guaranteed to be independent if

the value of Y is unknown.
This means that, depending on the probabilities, X and Z can
be independent or dependent if the value of Y is unknown.

Construct probabilities where X and Z are independent if the

value of Y is unknown, and show that they are indeed
independent. 52
53
Bayesian network earthquake example
 Imagine that you buy an alarm to a house to prevent thief
from going into it. Either the thief goes into a house,
 and the alarm will go off and they will get, for example,
SMS notification.
 However, the alarm may give a false alarm in case of an
earthquake.
 Also, if there is a strong earthquake, the radio will report
about it and so you get another source of them notification.
 Here's a graphical model for it. What is the general
probability of the four and the variables, thief, alarm,
earthquake, and the radio is given by the following formula.

P(t,a,e,r)=P(t)P(e)P(a/t,e)P(r/e)
What is the probability that there is a thief in our house?
Let's define it us to 10 to the power of minus three, that is, one in a
thousand houses has been robbed.
What is the probability of an earthquake?
Well, Iet's say it is 10 to the power of mines two. The earthquakes
happened about once in 100 days. Now, we've defined the probability
of alarm, given the thief as an earthquake so those will be four
numbers. If there is a thief in our house, the alarm will go for sure.
BN probabilistic distribution
Priors

P(T=1)

P(E=1)

P(R/E)

P(t,a,e,r)=P(t)P(e)P(a/t,e)P(r/e) E=0

P(A=1/T,E) E= E=1 E
0
T=0 0 1/10  If there is no earthquake, the radio reports it has
nothing to tell us about, and so it will not report
T=1 1 1 about it.
 If there is no thief and there is no earthquake, then the
alarm has no reason to send us signals.  However, if there is an earthquake, the radio
 However, if there is no thief, but there is an earthquake, will reports that were permuted one half that is,
 the alarm will notify us for abuse at one time. it does not report about some small earthquakes
BN probabilistic distribution
P(A=1/T,E) E=0 E=1 Priors

T=0 0 1/10 P(T=1)

T=1 1 1 P(E=1)
P(E=1)

T=1
equal T
T=0 equal P(R/E)
E=0
 For a work, and you got a notification from an alarm
system. E

 You want to estimate the probability that there is a

thief in our house, given that there is an alarm, given that
we've gotten notification from an alarm.
 This would be, the probability of a thief during the alarm.
 =50%
 =P(A/T,E)P(T)P(E)
 =P(A/T)P(A/E)P(T)P(E)

 =1%
=0
=0
Bayesian network earthquake example
Example:
I´m at work, neighbor John calls to say my alarm is ringing, but
neighbor Mary doesn´t call. Something it‘s set off by minor
earthquakes. Is there a burglary?

Variables: 5 Variables such Burglary, Earthquake, Alarm, JohnCalls,

and MarCalls. Network topology reflects ‘causal knowledge’
Conditional Probability Table(CPT) Example

In the CPTs, the letters B, E, A, J, and M stand for Burglary,

Earthquake, Alarm, JohnCalls, and MarCalls respectively.
Joint distribution

Joint Distribution for a network

A
with n boolean nodes has
Rows for the combinations of
parent values.
Total 32 rows… Ok, 31.

59
Compactness

A CPT for Boolean with k boolean parents has for the
combinations of parent value.

 Each row requires one number of p for and is just 1-p

 If each variable has no more than k parents , the

complete networks requirees O(n.) numbers.
 Ie.. Grows linearly with n vs O() for the full joint
distribution.

 For the burglary net, 1+1+4+2+2=10 numbers vs

 Suppose we have n=30 nodes, each with five parents

k=5, then the bayesian network requires 960 numbers,
but the full joint distribution requires over a billion.

60
Compactness

where parents(Xi) denotes the specific values of the variables in Parents(Xi).

Thus, each entry in the joint distribution is represented by the product of the
appropriate elements of the conditional probability tables (CPTs) in the Bayesian
network. The CPTs therefore provide a decomposed representation of the joint
distribution.

This identity is called the chain rule.

61
• Calculate
the probability that the alarm has sounded, but neither a
burglary nor an earthquake has occurred, and both john and mary
call
• =?

• Q2; P(J)=?
• =P(J/A)*P(A)+P(J/A)P(A)

P(A)=P(A/B,E)P(BÊ)+ P(A/B,E)P(BÊ)+ P(A/B,E)*P(BÊ)+

P(A/B,E)*P(B^E)

P(A)=P(A/B,E)P(BÊ)+ P(A/B,E)P(BÊ)+ P(A/B,E)*P(BÊ)+

P(A/B,E)*P(B^E)
Compactness example
we will get a compact Bayesian network only if we choose
the node ordering well. What happens if we happen to
choose the wrong order?
Consider the burglary example again. Suppose we decide to add the nodes in the order
MaryCalls, JohnCalls, Alarm, Burglary, Earthquake.

63
Compactness example

Suppose we choose the ordering M,J,A,B,E.

 Adding MaryCalls: No parents.

 Adding JohnCalls: If Mary calls, that probably means the

alarm has gone off, which of course would make it more
likely that John calls. Therefore, JohnCalls needs MaryCalls
as a parent.

 Adding Alarm: Clearly, if both call, it is more likely that the

alarm has gone off than if just one or neither calls, so we
need both MaryCalls and JohnCalls as parents.

64
Compactness example

Suppose we choose the ordering M,J,A,B,E.

 Adding Burglary: If we know the alarm state, then the call from
John or Mary might give us information about our phone ringing
or Mary’s music, but not about burglary: P(B | A, J ,M) = P(B | A)
 Hence we need just alarm as parent
.

 Adding Earthquake: If the alarm is on, it is more likely that there

has been an earthquake. But if we know that there has been a
burglary, then that explains the alarm, and the probability of an
earthquake would be only slightly above normal. Hence, we need
both Alarm and Burglary as parents.

65
Compactness example
 The resulting network has two more links than the original network requires three
more probabilities to be specified. the probability of Earthquake, given Burglary
and Alarm.
 Deciding conditional independence is hard in no causal directions.
 Network is less compact: 1+2+4+2+4=13 numbers needed

66
Compactness example

67
Naïve Bayes Classifier
We have a class, C that directly impacts
the values of the features, that is for different
classes. The distribution of the features may be
different. And the joint distribution can be written
using the following formula.

It is the probability of the class times the product over

all features, the probability of the current feature
given the class.

we have a lot of equal sub-graphs. A bit more convenient way to write down this graph is called a
plate notation. It is written as follows.
How Naive Bayes algorithm works?

Step 1: Convert the data set into a frequency table

Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29
and probability of playing is 0.64.
Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each
class. The class with the highest posterior probability is the outcome of prediction.

69
How Naive Bayes algorithm works?

Problem: Players will play if weather is sunny. Is this statement is correct?

We can solve it using above discussed method of posterior probability.

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64

Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.

Naive Bayes uses a similar method to predict the probability of different class based on
various attributes. This algorithm is mostly used in text classification and with problems
having multiple classes.

Problem: Players will play if weather is rainy Is this statement is correct?

70
Predict(Cloudy,Warm,Outdoor)=?
P(sunny)=.40
P(cloudy)= .60
71
72
Predict(Cloudy,Warm,Outdoor)=No

73
Why Bayesian Networks?

Bayesian Probability represents the degree of belief in that event

while Classical Probability (or frequents approach) deals with true or
physical probability of an event

•Bayesian Network

•Handling of Incomplete or missing Data Sets

•Over-fitting of data can is avoidable when using Bayesian networks
and Bayesian statistical methods.

74
What are the Pros and Cons of Baysian theorem

Pros:
 It is easy and fast to predict class of test data set. It also perform well in multi class
prediction
 When assumption of independence holds, a Naive Bayes classifier performs better
compare to other models like logistic regression and you need less training data.
 It perform well in case of categorical input variables compared to numerical variable(s).
For numerical variable, normal distribution is assumed (bell curve, which is a strong
assumption).

Cons:
 If categorical variable has a category (in test data set), which was not observed in
training data set, then model will assign a 0 (zero) probability and will be unable to
make a prediction. This is often known as “Zero Frequency”. To solve this, we can use
the smoothing technique. One of the simplest smoothing techniques is called Laplace
estimation.
 On the other side naive Bayes is also known as a bad estimator, so the probability
outputs from predict_proba are not to be taken too seriously.
 Another limitation of Naive Bayes is the assumption of independent predictors. In real
life, it is almost impossible that we get a set of predictors which are completely
independent.
75
Limitations of Bayesian Networks

Typically require initial knowledge of many probabilities…quality and

extent of prior knowledge play an important role

 Significant computational cost(NP hard task)

 Unanticipated probability of an event is not taken care of.

76
Decision Network
 Uncertainty

 Many environment have multiple possible outcomes. Good, bad,

 Utilities are combined with the outcome probabilities for actions

to give expected utility for each action.

 Decision making under uncertainty: what action to take the

state of the world unknown?

 Bayesian answer: Find the utility of each possible outcome and

take the actions that maximized expected utility.
Decision Network
Expected Utility:

Maximum Expected Utility:

 Generally:

)
 MEU(e,)

Value of Perfect Information

79
Expected Utility:
Prior to execution of A, the agent assigns probability P(Result(A)/ Do(A),E) to each outcome, where E
summarizes the agents available about the world, and Do(A) is proposition that action A is executed in the
current state. Then we can cancalcualte the expected utility of the action given the evidence, EU(A/F)
using followng formula:
EU(A|E)=
Maximum Expected Utility: a rational agent should choose an action
that maximizes the agent’s EU
 Simple decisions are one-shot decisions.

Value of Perfect Information

80
VPI(A)?
A independent of C
P(C/A)=P(C)
VPI(A)=0

81
Decision networks

Decision networks combine Bayesian networks with

decision theory.
Decision networks combine Bayesian networks with
additional node types for actions and utilities. We, will
use airport siting as an example.
Extend Bayes nets to handle actions and utilities
a.k.a. influence diagrams
Make use of Bayes net inference
Useful application: Value of Information
Decision Networks
 A decision network is a graphical
representation of a finite sequential decision
problem.

 A decision network is an extension of the

Bayes' Net representation that allows us to
calculated expected utilities for the actions
that we take.

83
Decision network representation
• Chance nodes: random variables, as in Bayes nets

• Decision nodes: actions that decision maker can take

• Utility/value nodes: the utility of the outcome state.

Representing a decision problem with
a decision network
Decision network represents information about the agent's current
state, its possible actions, the state that will result from the
agent's action, and the utility of that state.

Chance nodes (ovals) represent random variables, just as they do in

Bayes nets.
The agent could be uncertain about the construction cost, the level
of air traffic and the potential for litigation, and the Deaths, Noise,
and total Cost variables, each of which also depends on the site
chosen.
Decision nodes (rectangles) represent points where the decision-
maker has a choice of actions. In this case, the Airport site action
can take on a different value for each site under consideration. The
choice influences the cost, safety, and noise that will result.
85
Utility nodes (diamonds) represent the agent's utility function .
The utility node has as parents all variables describing the outcome
that directly affect utility.
Associated with the utility node is a description of the agent's utility
as a function of the parent attributes.
The description could be just a tabulation of the function, or it
might be a parameterized additive or multi linear function.

86
Decision network example

 Chance nodes(BNs)
 Actions(rectangles cannot have parents)
 Utility node(diamond, depends on action and chance node

87
Decision Networks
Action-utility tables:
 Notice that because the noise, death, and cost chance nodes refer to
future states, they can never have their values se as evidence
variables.

Figure 16.5 A simple decision

network for the airport-siting
problem

88
Action-utility tables:
 Simplified version omits these nodes. Omission of an explicit
description of the outcome of the siting decision means that it is less
flexible with respect to changes in circumstances.

Figure 16.6 A simplified representation of the airport-siting problem.

Chance nodes corresponding to outcome states have been factored out. 89
Action-utility tables:
 Rather than representing a utility function on states, the table
associated with the utility node represents the expected utility
associated with each action..

Figure 16.6 A simplified representation of the airport-siting problem.

Chance nodes corresponding to outcome states have been factored out. 90
Action-utility tables:
 In the original DN, a change in aircraft noise levels can be reflected by a change in
the conditional probability table associated with the noise node, whereas a change
in the weight accorded to noise pollution in the utility function can be reflected by
a change in the utility table. In the action utility diagram, on the other hand, all
such changes have to be reflected by a change in the utility table.
 Action utility formalism is a compiled version of the original
formulation.

Figure 16.6 A simplified representation of the airport-siting problem.

Chance nodes corresponding to outcome states have been factored out. 91
Evaluating decision networks
To determine rational decisions the network has to be evaluated and
utilities computed
I. Set the evidence variables for the current state.
2. For each possible value of the decision node;
(a) Set the decision node to that value
b) Calculate the posterior probabilities for the parent nodes of the
utility node, using a standard probabilistic inference algorithm.
(c) Calculate the resulting utility for the action.
3. Return the action with the highest utility.

92
Evaluating decision networks
Set the evidence variables for the current state.
For each possible value of the decision node (assume just
one):
 Set the decision node to that value.
 Calculate the posterior probabilities for the parent
nodes of the utility node, using BN inference.
 Calculate the resulting utility for the action.
Return the action with the highest utility.
Example 9.11: agent should take an
umbrella when it goes out. The agent's
utility depends on the weather and
whether it takes an umbrella. However, it
does not get to observe the weather. It only
gets to observe the forecast. The forecast
probabilistically depends on the weather.
94
Conider a simple decision network for a decision of whether the
agent should take an umbrella when it goes out. The agents utility
depends on the weather and whether it takes an umbrella.

Domain for each random variable; the domain for each decision
variable;
Random variable Weather has domain {sunny, rain}
Decision variable Umbrella has domain {take, leave}
95
Expected Utilities: Optimal decision=leave
 EU(take)=70*.3+20*.7=35
 EU(Leave)=0*.3+100*.7=70

𝑀𝐸𝑈 ( ∅ ) =𝑚𝑎𝑥𝐸𝑈 ( 𝑎 )=7096

The value of Information
 In the preceding analysis, we have assumed that all relevant information, or at least all available
information, is provided to the agent before makes its decision. In practice, this is hardly ever the case

 One of the most important parts of decision making is knowing

what question to ask.
 To conduct expensive and critical tests or not depends on tow
factors
 Whether the different possible outcomes would make a
significant difference to the optimal course of action
 The likelihood of the various outcomes.
 Information value theory enables an agent to choose what
information to acquire.

97
The value of Information

98
The value of Information

99
The value of Information

P(w/f=bad)=p(f/w)*f(w)/p(f)
p(f)=p(f/w=rain)*p(w=rain)+
p(f/w=sun)*p(w=sun)

100
The value of Information

Optimal Decision=take MEU(F=bad)=maxEU(a/bad)=53

101
The value of Information

Forecase Distribution

Value of Information=(.59*95+.41*53)-70
VI=7.8

102
Summery

103

3.5 Session 14 - Naive Bayes Classifier
67% (3)
3.5 Session 14 - Naive Bayes Classifier
47 pages
Slide-3 Z Transform and Its Application
No ratings yet
Slide-3 Z Transform and Its Application
76 pages
Basic Functions t24
No ratings yet
Basic Functions t24
6 pages
DBSCAN
No ratings yet
DBSCAN
18 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
IIT Madras Notes Machine Learning
No ratings yet
IIT Madras Notes Machine Learning
13 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
I. The Types of Machine Learning
No ratings yet
I. The Types of Machine Learning
8 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Cheatsheet Midterms 2 - 3
No ratings yet
Cheatsheet Midterms 2 - 3
2 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
AI&ML BM4251 Unit 1-5 Notes
No ratings yet
AI&ML BM4251 Unit 1-5 Notes
116 pages
UNIT 4 Predicate Logic
No ratings yet
UNIT 4 Predicate Logic
20 pages
Artificial Intelligence Module 5
No ratings yet
Artificial Intelligence Module 5
23 pages
Lecture 4.a - Greedy Algorithms
No ratings yet
Lecture 4.a - Greedy Algorithms
45 pages
ML - Unit 2
No ratings yet
ML - Unit 2
15 pages
Implications of Predictive Analytics
No ratings yet
Implications of Predictive Analytics
9 pages
ARTIFICIAL NEURAL NETWORKS-moduleIII
No ratings yet
ARTIFICIAL NEURAL NETWORKS-moduleIII
61 pages
Bayesian Inference
No ratings yet
Bayesian Inference
5 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
15 pages
Bayesian Network
No ratings yet
Bayesian Network
15 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
Bias, Variance, and Tradeoff
No ratings yet
Bias, Variance, and Tradeoff
8 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Data Mining and Model Selection
No ratings yet
Data Mining and Model Selection
27 pages
Machine Learning Lecture Notes
No ratings yet
Machine Learning Lecture Notes
119 pages
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
100% (1)
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
35 pages
Machine Learning Algorithms
100% (1)
Machine Learning Algorithms
15 pages
Solutions To Deep Learning
No ratings yet
Solutions To Deep Learning
25 pages
Midterm Solution
No ratings yet
Midterm Solution
6 pages
DecisionTree Numerical ID3Prob
No ratings yet
DecisionTree Numerical ID3Prob
114 pages
Classification Metrics in Machine Learning
No ratings yet
Classification Metrics in Machine Learning
6 pages
Nueral Network Mcqs
No ratings yet
Nueral Network Mcqs
6 pages
Ma5160 Applied Probability and Statistics 1 PDF
50% (2)
Ma5160 Applied Probability and Statistics 1 PDF
4 pages
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
100% (1)
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
51 pages
ML Algorithms
No ratings yet
ML Algorithms
12 pages
R22 ML Syllabus
No ratings yet
R22 ML Syllabus
2 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
2 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Data Science
No ratings yet
Data Science
74 pages
Artificial Intelligence: Adversarial Search
No ratings yet
Artificial Intelligence: Adversarial Search
36 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Evaluation Mcqs
No ratings yet
Evaluation Mcqs
2 pages
Supervised Learning (Classification and Regression)
No ratings yet
Supervised Learning (Classification and Regression)
14 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
Excel Probability Function
No ratings yet
Excel Probability Function
4 pages
RAG With Math
No ratings yet
RAG With Math
7 pages
Data Science Analytics For Ordinary People PDF
No ratings yet
Data Science Analytics For Ordinary People PDF
199 pages
An Introduction To Data Mining IIT Bombay
No ratings yet
An Introduction To Data Mining IIT Bombay
48 pages
Data Mining Mcqs PDF
No ratings yet
Data Mining Mcqs PDF
39 pages
Unit 2 Preparing To Model
No ratings yet
Unit 2 Preparing To Model
49 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Deep Neural Network (DNN)
100% (1)
Deep Neural Network (DNN)
80 pages
Similarity and Dissimilarity
No ratings yet
Similarity and Dissimilarity
34 pages
Pert13 - Quantifying Uncertainty
No ratings yet
Pert13 - Quantifying Uncertainty
32 pages
Ch-5 Uncertain Knowledge and Reasoning
No ratings yet
Ch-5 Uncertain Knowledge and Reasoning
25 pages
Module 4 - v1
No ratings yet
Module 4 - v1
84 pages
Uncertainty Inference
No ratings yet
Uncertainty Inference
38 pages
Lecture-3.1 Heuristic Search
No ratings yet
Lecture-3.1 Heuristic Search
131 pages
A Guided Book of Engineering Economics
No ratings yet
A Guided Book of Engineering Economics
54 pages
Study Outline For Data Structure
No ratings yet
Study Outline For Data Structure
73 pages
Study Outline of JAVA Programming
No ratings yet
Study Outline of JAVA Programming
84 pages
Genetic Algorithms (GA)
No ratings yet
Genetic Algorithms (GA)
32 pages
Lecture-8. Only For This Batch
No ratings yet
Lecture-8. Only For This Batch
46 pages
Lecture 2 Agent
No ratings yet
Lecture 2 Agent
44 pages
Lecture-3.2 Constraint Satisfaction Problem (CSP)
No ratings yet
Lecture-3.2 Constraint Satisfaction Problem (CSP)
26 pages
Lecture-4.1. Representing Knowledge Using Rules
No ratings yet
Lecture-4.1. Representing Knowledge Using Rules
29 pages
Lecture 4 PropositionalLogic 2
No ratings yet
Lecture 4 PropositionalLogic 2
116 pages
Lecture-3 Problems Solving by Searching
No ratings yet
Lecture-3 Problems Solving by Searching
79 pages
Slide-1 Introduction To Signal Processing
No ratings yet
Slide-1 Introduction To Signal Processing
104 pages
Slide-4.1Frequency Analysis Fo Discrete-Time Signal
No ratings yet
Slide-4.1Frequency Analysis Fo Discrete-Time Signal
18 pages
Slide 7 Filter
100% (1)
Slide 7 Filter
66 pages
Lcture-1 Introduction To Artificial Intelligence Version-1
No ratings yet
Lcture-1 Introduction To Artificial Intelligence Version-1
54 pages
Slide-5 Discrete Fourier Transfor-1
No ratings yet
Slide-5 Discrete Fourier Transfor-1
36 pages
Slide-2.2 Discrete Time Linear Time Invariant (LTI) System-2
No ratings yet
Slide-2.2 Discrete Time Linear Time Invariant (LTI) System-2
94 pages
Slide-4 Frequency Analysis of Signal - 1
No ratings yet
Slide-4 Frequency Analysis of Signal - 1
28 pages
Slide 2 Discrete Time Signals
No ratings yet
Slide 2 Discrete Time Signals
100 pages
Diodo 1N4148
No ratings yet
Diodo 1N4148
7 pages
Docking Studies of Benzimidazole Derivatives Using Hex 8.0
100% (1)
Docking Studies of Benzimidazole Derivatives Using Hex 8.0
13 pages
Journals Price List For 2022: No. Title Journal Abbreviation Issn (Print) Issn (E-Only)
No ratings yet
Journals Price List For 2022: No. Title Journal Abbreviation Issn (Print) Issn (E-Only)
24 pages
Lecture 10-Controllers (PLC) B.
No ratings yet
Lecture 10-Controllers (PLC) B.
28 pages
NUST Computer Science 2
No ratings yet
NUST Computer Science 2
29 pages
PROPRTIONAL PRESSURE REDUCING 3DREP and 3DREPE RE29184 PDF
No ratings yet
PROPRTIONAL PRESSURE REDUCING 3DREP and 3DREPE RE29184 PDF
12 pages
HF-Katalog 2 EN - Technische Informationen PDF
No ratings yet
HF-Katalog 2 EN - Technische Informationen PDF
27 pages
ESS Leave Request Config Steps
No ratings yet
ESS Leave Request Config Steps
8 pages
First Mock Exam As Chem MCQ Nov 23
No ratings yet
First Mock Exam As Chem MCQ Nov 23
16 pages
Simple Interest
No ratings yet
Simple Interest
16 pages
Syllabus For 2020-2024
No ratings yet
Syllabus For 2020-2024
24 pages
Brushless AC Motor PDF
No ratings yet
Brushless AC Motor PDF
2 pages
QUICK-969D
No ratings yet
QUICK-969D
16 pages
Total Internal Reflection and Evanescent Waves: Principle of SPR Detection: Intensity Profile and Shift of The SPR Angle
100% (1)
Total Internal Reflection and Evanescent Waves: Principle of SPR Detection: Intensity Profile and Shift of The SPR Angle
2 pages
Ap Lab Repor1
No ratings yet
Ap Lab Repor1
18 pages
Wang2020
No ratings yet
Wang2020
10 pages
Series: Basics of Series - in This Chapter A Series
No ratings yet
Series: Basics of Series - in This Chapter A Series
38 pages
D8.F.4 Design Parameters
No ratings yet
D8.F.4 Design Parameters
8 pages
Irc Compilation Past Year Questions (Student)
No ratings yet
Irc Compilation Past Year Questions (Student)
387 pages
2024 F Ma Sols-2
No ratings yet
2024 F Ma Sols-2
5 pages
Tutorial 10: Solving Cutting Stock Problem Using Column Generation Technique
No ratings yet
Tutorial 10: Solving Cutting Stock Problem Using Column Generation Technique
13 pages
Arnaboldi, 2021
No ratings yet
Arnaboldi, 2021
46 pages
All Biology Worksheets
No ratings yet
All Biology Worksheets
40 pages
Lesson 39 - Transcript. Build Applications With Glide - Part 2
No ratings yet
Lesson 39 - Transcript. Build Applications With Glide - Part 2
112 pages
Canrig-Top Drive 1275AC-681 750 Ton
0% (1)
Canrig-Top Drive 1275AC-681 750 Ton
17 pages
Homework 9 Answers
No ratings yet
Homework 9 Answers
12 pages
Congruency Proofs Q2
No ratings yet
Congruency Proofs Q2
2 pages
Lecture Slides - Lubrication Part 1 - 241130 - 093207
No ratings yet
Lecture Slides - Lubrication Part 1 - 241130 - 093207
24 pages
Course - CS 120 Computer Programming Lab
No ratings yet
Course - CS 120 Computer Programming Lab
3 pages

Lecture 5 Bayesian Classification 3

Uploaded by

Lecture 5 Bayesian Classification 3

Uploaded by

Bayesian Theorem

Abu Saleh Musa Miah

Probability is quantified as a number between 0 and 1 (where 0

 Laziness: It is too much work to list the complete set of

 The connection between toothaches and cavities is just not a logical

For example, the domain of Cavity might be (true, false ).

Random Variable can

For example, the domain of Weather might be (sunny, rainy,

 If we play dice it can be 1 2 3 4 5 6,

 if I am going to the dentist for a regular checkup, the probability P(cavity)=0.2

X and Y are independent if:

And so these random variables are independent. 13

It is the probability of X given Y equals to the joint probability P

P(Toothache  true | Cavity  true )  0.6

 We need two tricks

 That is 4 x 2 table of probabilities and called the joint probability dis-tribution of

 P(Weather , Cavity) = P(Weather | Cavity) P(Cavity) instead of as these 4×2=8 equations

 P(W , C) = P(W | C)P(C) instead of as these 4×2=8 equations (using abbreviations

 Written as P(sunny, cavity) or P(sunny ∧ cavity).

 We can derive it from the definition of the conditional probability.

if you want to find out the marginal distribution p(X), and

 Let the discrete variable D have the domain (dl, . . . , d,).

 The probability of a proposition is equal to the sum of the probabilities of the

Normalization will turn out to be a useful shortcut in many probability calculations

Equating the two right-hand sides and dividing by P(a), we get

This equation is known as Bayes' rule (also Bayes' law or Bayes'

P( s | m) P(m) 0.5  0.00002

 Event A1. It rains on Marie's wedding.

We want to know P( A1 | B ), the probability it will rain on the day

Such a probability distribution is called a naive Bayes model-"naive" because it is often

1. A set of random variables makes up the nodes of the network.

Figure 14.1 A simple Bayesian network in which Weather is

1) Consider the following Bayesian network, where F = having the flu

a) Write down the joint probability table specified by the Bayesian

so that it specifies the same joint probabilities as the given one.

d) Are C and F independent in the given Bayesian network?

b) Consider the first Bayesian network. How many probabilities

We need to specify 5 probabilities,

A joint probability table would need probabilities.

We know that X and Z are not guaranteed to be independent if

Construct probabilities where X and Z are independent if the

T=0 0 1/10 P(T=1)

 You want to estimate the probability that there is a

Variables: 5 Variables such Burglary, Earthquake, Alarm, JohnCalls,

In the CPTs, the letters B, E, A, J, and M stand for Burglary,

Joint Distribution for a network

 Each row requires one number of p for and is just 1-p

 If each variable has no more than k parents , the

 For the burglary net, 1+1+4+2+2=10 numbers vs

 Suppose we have n=30 nodes, each with five parents

where parents(Xi) denotes the specific values of the variables in Parents(Xi).

This identity is called the chain rule.

P(A)=P(A/B,E)*P(BÊ)+ P(A/B,E)*P(BÊ)+ P(A/B,E)*P(BÊ)+

P(A)=P(A/B,E)*P(BÊ)+ P(A/B,E)*P(BÊ)+ P(A/B,E)*P(BÊ)+

Suppose we choose the ordering M,J,A,B,E.

 Adding JohnCalls: If Mary calls, that probably means the

 Adding Alarm: Clearly, if both call, it is more likely that the

Suppose we choose the ordering M,J,A,B,E.

 Adding Earthquake: If the alarm is on, it is more likely that there

It is the probability of the class times the product over

Step 1: Convert the data set into a frequency table

Problem: Players will play if weather is sunny. Is this statement is correct?

We can solve it using above discussed method of posterior probability.

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

Problem: Players will play if weather is rainy Is this statement is correct?

Bayesian Probability represents the degree of belief in that event

•Handling of Incomplete or missing Data Sets

Typically require initial knowledge of many probabilities…quality and

 Significant computational cost(NP hard task)

 Many environment have multiple possible outcomes. Good, bad,

 Utilities are combined with the outcome probabilities for actions

 Decision making under uncertainty: what action to take the

 Bayesian answer: Find the utility of each possible outcome and

Maximum Expected Utility:

Value of Perfect Information

Value of Perfect Information

P(A)=P(A/B,E)P(BÊ)+ P(A/B,E)P(BÊ)+ P(A/B,E)*P(BÊ)+

P(A)=P(A/B,E)P(BÊ)+ P(A/B,E)P(BÊ)+ P(A/B,E)*P(BÊ)+