0% found this document useful (0 votes)
11 views4 pages

PR Mod1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views4 pages

PR Mod1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Bayesian Decision Theory

• Bayesian Decision Theory is a fundamental statistical approach to the problem of pattern classification.
• It is considered as the ideal pattern classifier and often used as the benchmark for other algorithms
because its decision rule automatically minimizes its loss function.
• It involves making decisions based on probabilities and the cost of decisions.
• The core idea is to use the probability of different outcomes to make optimal decisions.
• The entire purpose of the Bayes Decision Theory is to help us select decisions that will cost us the least
‘risk’. There is always some sort of risk attached to any decision we choose.

Example

Basic Decision:

For a customer to buy a computer in a store , there will be two classes:

w1 – Yes (Customer will buy a computer)

w2 – No (Customer will not buy a computer)

According to the previous customer records, the probability of customer buying P(w1) and probability of
customer not buying P(w2) will be calculated.

For a new customer,

If P(w1) > P(w2), then the customer will buy a computer (w1)

And, if P(w2) > P(w1), then the customer will not buy a computer (w2)

However, based on just previous records, it will always give the same decision for all future customers. This is
illogical and absurd. So we need something that will help us in making better decisions for future customers. We
do that by introducing some features.

Let’s say we add a feature ‘x’ where ‘x’ denotes the age of the customer. Now with this added feature, we will be
able to make better decisions. To do this, we need to know what Bayes Theorem is.

Bayes Theorem :

For our class w1 and feature ‘x’, we have:

There are 4 terms in this formula that we need to understand:

• Prior – P(w1) is the Prior Probability that w1 is true before the data is observed
• Posterior – P(w1 | x) is the Posterior Probability that w1 is true after the data is observed.
• Evidence – P(x) is the Total Probability of the Data
• Likelihood – P(x | w1) is the information about w1 provided by ‘x’

P(w1 | x) is read as Probability of w1 given x.


More Precisely, it is the probability that a customer will buy a computer, given a specific customer’s age.

Thus, for new customer is if P(w1 | x) > P(w2 | x), then the customer will buy a computer (w1). And, if P(w2 | x) >
P(w1 | x), then the customer will not buy a computer (w2).
FACTS:
• If P(w1) = P(w2) , decision will be based on P(x | w1) and P(x | w2).
• If P(x | w1) = P(x | w2) , then decision will be based on P(w1) and P(w2).

Risk calculation

There is always going to be some amount of ‘risk’ or error made in the decision. So, we also need to determine the
probability of error made in a decision.

The y-axis is the posterior probability P(w(i) | x) and


the x-axis is our feature ‘x’. The axis where the
posterior probability for both the classes is equal,
that axis is called our decision boundary. So at
Decision Boundary:
P(w1 | x) = P(w2 | x)

But, as you can see in the graph, there is some non-zero magnitude of w2 to the left of the decision boundary. Also,
there is some non-zero magnitude of w1 to the right of the decision boundary. This extension of another class over
another class is what you call a risk or probability error.

Calculation of Probability Error

To calculate the probability of error for class w1, we need to find the probability that the class is w2 in the area that
is to the left of the decision boundary. Similarly, the probability of error for class w2 is the probability that the class is
w1 in the area that is to the right of the decision boundary.
Mathematically, for favour of w2, the probability of error is P(w1|x) and vice versa.

Loss function

Let there C no. of classes ( also called states of nature)


C → { w1, w2, w3,…………………………….wc }

Also there can be more actions apart from Yes or No for a particular class as it was in the example. Let there be ‘a’
no. of actions, denoted by α.
a → ( α1, α2, α3 ,……………… αa}

The loss function will be:

λ ( αi | wj ) → Loss incurred for taking action αi when the state of nature is wj

Generalized Bayes theory

Let we have a feature vector X which is d-dimensional.


Let for this X, action taken is αi. So for true state of nature wj
Loss function is λ (αi/wj)

Average Risk / average loss


R ( αi | x) = ∑𝑐𝑗=1 𝜆 (𝛼𝑖/𝑤𝑗) 𝑃 ( 𝑤𝑗 | 𝑥 ) [ thus taking summation for all the states of nature ]
This is also called Risk function / conditional risk / expected Loss. So that αi action will be taken for which this
loss/risk is minimum. So this is also called Minimum risk Classifier.

Two class problem ( Minimum Risk Classifier)

No. of classes = 2, {w1,w2}


Actions : {a1,a2} [ where a1 means object belongs to class w1 and vice versa ]

Loss incurred for taking action I when the true state of nature is wj =

…….(1)
Risk function =

So , for 2 class sproblems these expressions simply become:

If < then action a1 is taken and vice versa.

Under that condition, to take an action in favour of a1, this leads to :

→Here λ11 and λ22 are loss incurred for taking correct decisions. And λ21, λ12 are loss incurred for wrong decisions.
→So 21,12 must be greater than 11 and 22 (ideally 11 and 22 should be zero).
→So (λ21- λ11 ) > 0 , same for the other term.
→Loss functions are predefined based on the application.

Derivatives of Minimum Risk Classifier

1. Minimum Error Rate classifier

If action ai is taken then true state of nature is wi


Now,

( c no. of true states of nature)

Expected Risk, is eq.(1).

For i != j becomes 1 so,


Now, there is only one value of I which is = j, nnd sum of probabilities of all the values of i will be 1.
Therefore, sum of i!=j is

To minimize the R(ai|x) , the above expression has to be minimized. Thus, P(wi|x) has to be maximized.
Thus the point is proved that whenever the P(wi|x) will be maximum, the Risk will be minimum and as per
the simple bayes decision theory also, that action ai will be taken.

Disciminant Functions
This can be considered a box
which
which will compute various
functional models to compute
c no. functions g(x) , each for
each class/states of nature.
Depending on the results of
this functions, applying max
criteria the decision of true
state of nature can be taken.

ThThese functions(g(x)) are called


DISCRIMINANT FUNCTIONS

Thus,

Nature of Discriminant Functions

A/c to Minimum Risk classifier, the risk R(ai|x) has to be minimum to take action a1.

However, for gi(x), it has to maximum to take the action ai.

Thus,

A/c to Minimum Error Rate Classifier, 1-P(wi|x) has to be minimum which means P(w1|x) should be maximum.
Thus,

You might also like