PR Mod1
PR Mod1
• Bayesian Decision Theory is a fundamental statistical approach to the problem of pattern classification.
• It is considered as the ideal pattern classifier and often used as the benchmark for other algorithms
because its decision rule automatically minimizes its loss function.
• It involves making decisions based on probabilities and the cost of decisions.
• The core idea is to use the probability of different outcomes to make optimal decisions.
• The entire purpose of the Bayes Decision Theory is to help us select decisions that will cost us the least
‘risk’. There is always some sort of risk attached to any decision we choose.
Example
Basic Decision:
According to the previous customer records, the probability of customer buying P(w1) and probability of
customer not buying P(w2) will be calculated.
If P(w1) > P(w2), then the customer will buy a computer (w1)
And, if P(w2) > P(w1), then the customer will not buy a computer (w2)
However, based on just previous records, it will always give the same decision for all future customers. This is
illogical and absurd. So we need something that will help us in making better decisions for future customers. We
do that by introducing some features.
Let’s say we add a feature ‘x’ where ‘x’ denotes the age of the customer. Now with this added feature, we will be
able to make better decisions. To do this, we need to know what Bayes Theorem is.
Bayes Theorem :
• Prior – P(w1) is the Prior Probability that w1 is true before the data is observed
• Posterior – P(w1 | x) is the Posterior Probability that w1 is true after the data is observed.
• Evidence – P(x) is the Total Probability of the Data
• Likelihood – P(x | w1) is the information about w1 provided by ‘x’
Thus, for new customer is if P(w1 | x) > P(w2 | x), then the customer will buy a computer (w1). And, if P(w2 | x) >
P(w1 | x), then the customer will not buy a computer (w2).
FACTS:
• If P(w1) = P(w2) , decision will be based on P(x | w1) and P(x | w2).
• If P(x | w1) = P(x | w2) , then decision will be based on P(w1) and P(w2).
Risk calculation
There is always going to be some amount of ‘risk’ or error made in the decision. So, we also need to determine the
probability of error made in a decision.
But, as you can see in the graph, there is some non-zero magnitude of w2 to the left of the decision boundary. Also,
there is some non-zero magnitude of w1 to the right of the decision boundary. This extension of another class over
another class is what you call a risk or probability error.
To calculate the probability of error for class w1, we need to find the probability that the class is w2 in the area that
is to the left of the decision boundary. Similarly, the probability of error for class w2 is the probability that the class is
w1 in the area that is to the right of the decision boundary.
Mathematically, for favour of w2, the probability of error is P(w1|x) and vice versa.
Loss function
Also there can be more actions apart from Yes or No for a particular class as it was in the example. Let there be ‘a’
no. of actions, denoted by α.
a → ( α1, α2, α3 ,……………… αa}
Loss incurred for taking action I when the true state of nature is wj =
…….(1)
Risk function =
→Here λ11 and λ22 are loss incurred for taking correct decisions. And λ21, λ12 are loss incurred for wrong decisions.
→So 21,12 must be greater than 11 and 22 (ideally 11 and 22 should be zero).
→So (λ21- λ11 ) > 0 , same for the other term.
→Loss functions are predefined based on the application.
To minimize the R(ai|x) , the above expression has to be minimized. Thus, P(wi|x) has to be maximized.
Thus the point is proved that whenever the P(wi|x) will be maximum, the Risk will be minimum and as per
the simple bayes decision theory also, that action ai will be taken.
Disciminant Functions
This can be considered a box
which
which will compute various
functional models to compute
c no. functions g(x) , each for
each class/states of nature.
Depending on the results of
this functions, applying max
criteria the decision of true
state of nature can be taken.
Thus,
A/c to Minimum Risk classifier, the risk R(ai|x) has to be minimum to take action a1.
Thus,
A/c to Minimum Error Rate Classifier, 1-P(wi|x) has to be minimum which means P(w1|x) should be maximum.
Thus,