0% found this document useful (0 votes)

0 views61 pages

2 Unit PR Statistical Decision Making

Unit 2 covers statistical decision making, focusing on classification methods such as supervised, unsupervised, and semi-supervised learning. It discusses Bayes' Theorem, conditional independence, decision boundaries, and the importance of covariance in understanding relationships between variables. The unit also provides examples of applying Bayes' Theorem in real-world scenarios, illustrating how to calculate probabilities and make informed decisions based on statistical data.

Uploaded by

ShahulSamon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views61 pages

2 Unit PR Statistical Decision Making

Uploaded by

ShahulSamon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Unit - 2

Pattern Recognition
Statistical Decision Making
Dr. Srinath. S
Syllabus for Unit - 2
• Statistical Decision Making:
• Introduction, Bayes’ Theorem
• Conditionally Independent Features
• Decision Boundaries
Classification (Revision)
It is the task of assigning a class label to an input pattern. The class label indicates
one of a given set of classes. The classification is carried out with the help of a
model obtained using a learning procedure. There are two categories of
classification. supervised learning and unsupervised learning.

• Supervised learning makes use of a set of examples which already have the
class labels assigned to them.

• Unsupervised learning attempts to find inherent structures in the data.

• Semi-supervised learning makes use of a small number of labeled data and a

large number of unlabeled data to learn the classifier.
Learning - Continued
• The classifier to be designed is built using input samples which is a mixture of
all the classes.
• The classifier learns how to discriminate between samples of different
classes.
• If the Learning is offline i.e. Supervised method then, the classifier is first
given a set of training samples and the optimal decision boundary found, and
then the classification is done.
• Supervised Learning refers to the process of designing a pattern classifier by
using a Training set of patterns to assign class labels.
• If the learning involves no teacher and no training samples (Unsupervised).
The input samples are the test samples itself. The classifier learns from the
samples and classifies them at the same time.
Statistical / Parametric decision making
This refers to the situation in which we assume the general form of probability
distribution function or density function for each class.
• Statistical/Parametric Methods uses a fixed number of parameters to build the
model.

• Parametric methods are assumed to be a normal distribution.

• Parameters for using the normal distribution is –

Mean
Standard Deviation
• For each feature, we first estimate the mean and standard deviation of the feature
for each class.
Statistical / Parametric decision making (Continued)
• If a group of features – multivariate normally distributed, estimate mean and standard
deviation and covariance.
• Covariance is a measure of the relationship between two random variables, in statistics.
• The covariance indicates the relation between the two variables and helps to know if the
two variables vary together. (To find the relationship between two numerical variable)
• In the covariance formula, the covariance between two random variables X and Y can be
denoted as Cov(X, Y).
• 𝑥𝑖 is the values of the X-variable
• 𝑦𝑗 is the values of the Y-variable
• 𝑥 − is the mean of the X-variable
• 𝑦 − is the mean of the Y-variable
• N is the number of data points
Positive and negative covariance
• Positive Co variance: If temperature goes high sale of ice cream
also goes high. This is positive covariance. Relation is very close.

• On the other hand cold related disease is less as the temperature

increases. This is negative covariance.
• No co variance : Temperature and stock market links
Example: Two set of data X and Y
Compute x-x(mean) and y-y(mean)
Apply Covariance formula
• Final result will be 35/5 = 7 = is a positive covariance
Statistical / Parametric Decision making - continued
• Parametric Methods can perform well in many situations but its performance is
at peak (top) when the spread of each group is different.
• Goal of most classification procedures is to estimate the probabilities that a
pattern to be classified belongs to various possible classes, based on the values
of some feature or set of features.
Ex1: To classify the fish on conveyor belt as salmon or sea bass
Ex2: To estimate the probabilities that a patient has various diseases given
some symptoms or lab tests. (Use laboratory parameters).
Ex3: Identify a person as Indian/Japanese based on statistical parameters
like height, face and nose structure.
• In most cases, we decide which is the most likely class.
• We need a mathematical decision making algorithm, to obtain classification or
decision.
Bayes Theorem
When the joint probability, P(A∩B), is hard to calculate or if the inverse or Bayes
probability, P(B|A), is easier to calculate then Bayes theorem can be applied.

Revisiting conditional probability

Suppose that we are interested in computing the probability of event A and we have been
told event B has occurred.
Then the conditional probability of A given B is defined to be:

P  A  B if P  B  0
P
 A B
= P  B

P[A  B]
Similarly, P[B|A] = if P[A] is not equal to 0
P[A]
• Original Sample space is the red coloured rectangular box.
• What is the probability of A occurring given sample space as B.
• Hence P(B) is in the denominator.
• And area in question is the intersection of A and B
P  A  B
P  A B  = and
P  B

From the above expressions, we can rewrite

P[A  B] = P[B].P[A|B]
and P[A  B] = P[A].P[B|A]
This can also be used to calculate P[A  B]

So
P[A  B] = P[B].P[A|B] = P[A].P[B|A]
or
P[B].P[A|B] = P[A].P[B|A]

P[A|B] = P[A].P[B|A] / P[B] - Bayes Rule

Bayes Theorem
Bayes Theorem:
The goal is to measure: P(wi |X)
Measured-conditioned or posteriori probability, from the above
three values.
P(X|w)
P(w) Bayes Rule P(wi|X)

X, P(X)
This is the Prob. of any vector X being assigned to class wi.
Example for Bayes Rule/ Theorem
• Given Bayes' Rule :
Example1:

• Compute : Probability in the deck of cards (52 excluding jokers)

• Probability of (King/Face)

• It is given by P(King/Face) = P(Face/King) * P(King)/ P(Face)

= 1 * (4/52) / (12/52)
= 1/3
Example2:

Cold (C) and not-cold (C’). Feature is fever (f).

Prior probability of a person having a cold, P(C) = 0.01.

Prob. of having a fever, given that a person has a cold is, P(f|C) = 0.4.
Overall prob. of fever P(f) = 0.02.

Then using Bayes Th., the Prob. that a person has a cold, given that she (or he)
has a fever is:
P(f|C) P(C ) 0.4∗0.01
P(C|f) = == = 0.2
P(f ) 0.02
Generalized Bayes Theorem
• Consider we have 3 classes A1, A2 and A3.
• Area under Red box is the sample space
• Consider they are mutually exclusive and
collectively exhaustive.
• Mutually exclusive means, if one event occurs then
another event cannot happen.
• Collectively exhaustive means, if we combine all the probabilities, i.e P(A1),
P(A2) and P(A3), it gives the sample space, i.e the total rectangular red coloured
space.
• Consider now another event B occurs over A1,A2 and A3.
• Some area of B is common with A1, and A2 and A3.
• It is as shown in the figure below:
• Portion common with A1 and B is shown by:
• Portion common with A2 and B is given by :
• Portion common with A3 and B is given by:

• Probability of B in total can be given by

• Remember :

• Equation from the previous slide:

• Replacing first in the second equation in this slide, we will get:

Further simplified P(B)
Arriving at Generalized version of Bayes theorem
Example 3: Problem on Bayes theorem with 3 class case
What is being asked
• While solving problem based on Bayes theorem, we need to split
the given information carefully:
• Asked is:
• Note, the flip of what is asked will be always given:

• It is found in the following statement :

• What else is given:

• Represented by:
So.. Given Problem can be represented as:
Example-4.
Given 1% of people have a certain genetic defect. (It means 99% don’t have genetic defect)
90% of tests on the genetic defected people, the defect/disease is found positive(true positives).
9.6% of the tests (on non diseased people) are false positives

If a person gets a positive test result,

what are the Probability that they actually have the genetic defect?

A = chance of having the genetic defect. That was given in the question as 1%. (P(A) = 0.01)
That also means the probability of not having the gene (~A) is 99%. (P(~A) = 0.99)
X = A positive test result.

P(A|X) = Probability of having the genetic defect given a positive test result. (To be computed)

P(X|A) = Chance of a positive test result given that the person actually has the genetic defect = 90%. (0.90)
p(X|~A) = Chance of a positive test if the person doesn’t have the genetic defect. That was given in the question as 9.6% (0.096)
Now we have all of the information, we need to put into the
equation:

P(A|X) = (.9 * .01) / (.9 * .01 + .096 * .99) = 0.0865 (8.65%).

The probability of having the faulty gene on the test is 8.65%.

Example - 5
Given the following statistics, what is the probability that a woman has
cancer if she has a positive mammogram result?
One percent of women over 50 have breast cancer.
Ninety percent of women who have breast cancer test positive on
mammograms.
Eight percent of women will have false positives.

Let women having cancer is W and ~W is women not having cancer.

Positive test result is PT.
Solution for Example 5
What is asked: what is the probability that a woman has cancer if she
has a positive mammogram result?

• P(W)=0.01
• P(~W)=0.99
• P(PT|W)=0.9
• P(PT|~W)=0.08 Compute P(testing positive)
(0.9 * 0.01) / ((0.9 * 0.01) + (0.08 * 0.99) = 0.10.
Example-6
A disease occurs in 0.5% of the population
(5% is 5/10% removing % (5/10)/100=0.005)

A diagnostic test gives a positive result in:

◦ 99% of people with the disease

◦ 5% of people without the disease (false positive)

A person receives a positive result

What is the probability of them having the disease, given a positive result?
𝑃(𝑃𝑇|𝐷)×𝑃 𝐷
◦ 𝑃 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡 =
𝑃(𝑃𝑇|𝐷)×𝑃 𝐷 +𝑃 𝑃𝑇 ~𝐷 ×𝑃 ~𝐷

0.99×0.005
◦ =
0.99×0.005 + 0.05×0.995

Therefore:
0.99 × 0.005
𝑃 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡 = = 0.09
0.0547
𝑖. 𝑒. 9%
◦
◦ We know:
𝑃 𝐷 = chance of having the disease
𝑃 ~𝐷 = chance of not having the disease

◦ 𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 = 0.99

◦ 𝑃(𝑑𝑖𝑠𝑒𝑎𝑠𝑒) = 0.005
Decision Regions
• Likelihood ratio R between two classes can be computed by dividing posterior
probability of two classes.
• So P(Ci|x) (posterior probability of class Ci ) and P(Cj|x) (posterior probability of class
Cj) are to be divided to understand the likelihood.
• If there are only two classes, then Ci and Cj can be replaced by A and B and the
equation becomes: (the equation obtained is so because, the denominator gets
cancelled)
P(A|x) P(A)p(x|A )
R= =
P(B|x) P(B )p(x|B)

• If the likelihood ratio R is greater than 1, we should select class A as the most likely
class of the sample, otherwise it is class B
• A boundary between the decision regions is called decision boundary
• Optimal decision boundaries separate the feature space into decision regions R1,
R2…..Rn such that class Ci is the most probable for values of x in Ri than any other
region
• For feature values exactly on the decision boundary between two
classes , the two classes are equally probable.

• Thus to compute the optimal decision boundary between two

classes A and B, we can equate their posterior probabilities if the
densities are continuous and overlapping.
– P(A|x) = P(B|x).

• Substituting Bayes Theorem and cancelling p(x) term:

– P(A)p(x|A) = P(B )p(x|B)

• If the feature x in both the classes are normally distributed

1 (𝑥−𝜇𝐴)2 1 (𝑥−𝜇𝐵) 2
ൗ ൗ
• P(A) 𝑒− 2𝜎𝐴2 = P(B) 𝑒− 2𝜎𝐵2
𝜎𝐴 2𝜋 𝜎𝐵 2𝜋
•
• Cancelling 2𝜋 and taking natural logarithm

𝑃(𝐴) 𝑥−𝜇𝐴 2 𝑃(𝐵) 𝑥−𝜇𝐵 2

• −2ln( ൗ𝜎𝐴) +( ) = −2ln( ൗ𝜎𝐵 ) +( )
𝜎𝐴 𝜎𝐵
𝑃(𝐴) 𝑥−𝜇𝐴 2 𝑃(𝐵) 𝑥−𝜇𝐵 2
• D = −2ln( ൗ𝜎𝐴 ) +( ) + 2ln( ൗ𝜎𝐵 ) +( )
𝜎𝐴 𝜎𝐵
• D equals 0 then : on the decision boundary;
• D is positive in the decision region in which B is most likely the class;
• and D is negative in the decision region in which A is most likely.

• Example problem can be seen in the next slide

Independence
• Independent random variables: Two random variables X and Y are said to be
statistically independent if and only if :

• p(x,y) = p(x).p(y)

• Ex: Tossing two coins… are independent.

• Then the joint probability of these two will be product of their probability
• Another Example: X – Throw of dice, Y Toss of a coin
• (Event X and Y are joint probabilities and are independent)

• X=height and Y=Weight are joint probabilities are not independent… usually they are
dependent.
• Independence is equivalent to saying

• P(y|x) = P(y) or
• P(x|y) = P(x)
Conditional Independence
• Two random variables X and Y are said to be independent given Z if and
only if

• P(x,y|z)=P(x|z).P(y|z) : indicates that X and Y are independent given Z.

• Example: X: Throw a dice

Y: Toss a coin
Z: Card from deck

So X and Y are conditionally independent and also conditionally independent.

Joint probabilities are dependent but conditionally
independent
• Let us consider:
– X: height
– Y: Vocabulary
– Z: Age

– Height is less indicates age is less and hence vocabulary might vary.
– So Vocabulary is dependent on height.

– Further let us add a condition Z.

– If Age is fixed say 30, then consider samples of people with age 30, but now the vocabulary of
people with age 30 ..as the height increases vocabulary does not changes.
– So it is conditionally independent but joint probabilities are dependent without condition.
Reverse:
• Two events are independent, but conditionally they are becoming dependent.

• Let us say X : Dice throw 1

• Y : Dice throw 2
•
• Basically they are independent.

• Let us add Z = sum of the dice

• Given Z and X value is fixed then Y value depends on X value.
• It is
• X is said to be orthogonal or perpendicular to y, given z.
Multiple Features
• A single feature may not discriminate well between classes.
• Recall the example of just considering the ‘dapg’ or ‘dwp’ we can not discriminate well
between the two classes. (Example for hypothetical basket ball games – unit 1).
• If the joint conditional density of multiple features is known for each class, Bayesian
classification is very similar to classification with one feature.
• Replace the value of single feature x by feature vector X which has single feature as the
component.
P(wi )P(X | wi)
• P(wi| X) = 𝒌 for single feature
σ𝒋=𝟏 P(wj )P(x | wj)
P(wi )p(X | wi)
• P(wi| X) = 𝒌
σ𝒋=𝟏 P(wj )p(x | wj)
• For multiple features with Vector X replaces the conditional probabilities P(X|Wi) by the
conditional densities p(x|wi)
Example of Naïve Bayes Classifier
Name Give Birth Can Fly Live in Water Have Legs Class
human yes no no yes mammals
python no no no no non-mammals
salmon no no yes no non-mammals
whale yes no yes no mammals
frog no no sometimes yes non-mammals
komodo no no no yes non-mammals
bat yes yes no yes mammals
pigeon no yes no yes non-mammals
cat yes no no yes mammals
leopard shark yes no yes no non-mammals
turtle no no sometimes yes non-mammals
penguin no no sometimes yes non-mammals
porcupine yes no no yes mammals
eel no no yes no non-mammals
salamander no no sometimes yes non-mammals
gila monster no no no yes non-mammals
platypus no no no yes mammals
owl no yes no yes non-mammals
dolphin yes no yes no mammals
eagle no yes no yes non-mammals
Give Birth Can Fly Live in Water Have Legs Class
yes no yes no ?
Solution

6 6 2 2
P( A | M ) =    = 0.06
7 7 7 7
1 10 3 4
P( A | N ) =    = 0.0042
13 13 13 13
7 P(A|M)P(M) > P(A|N)P(N)
P ( A | M ) P ( M ) = 0.06  = 0.021
20 => Mammals
13
P ( A | N ) P ( N ) = 0.004  = 0.0027
20
Example. ‘Play Tennis’ data
• Naïve based classifier is very popular for document classifier

• (naïve means: all are equal and independent: all the attributes will
have equal weightage and are independent)
Based on the examples in the table, classify the following datum x:
x=(Outl=Sunny, Temp=Cool, Hum=High, Wind=strong)
• That means: Play tennis or not?
hNB = arg max P ( h) P ( x | h) = arg max P ( h) P ( at | h)
h[ yes , no ] h[ yes , no ] t

= arg max P ( h) P (Outlook = sunny | h) P (Temp = cool | h) P ( Humidity = high | h) P (Wind = strong | h)
h[ yes , no ]

• Working:
P ( PlayTennis = yes) = 9 / 14 = 0.64
P ( PlayTennis = no) = 5 / 14 = 0.36
P (Wind = strong | PlayTennis = yes) = 3 / 9 = 0.33
P (Wind = strong | PlayTennis = no) = 3 / 5 = 0.60
etc.
P ( yes) P ( sunny | yes) P (cool | yes) P ( high | yes) P ( strong | yes) = 0.0053
P ( no) P ( sunny | no) P (cool | no) P ( high | no) P ( strong | no) = 0.0206
 answer : PlayTennis( x ) = no
What is our probability of error?
• For the two class situation, we have
• P(error|x) = { P(ω1|x) if we decide ω2
{ P(ω2|x) if we decide ω1
• We can minimize the probability of error by following the posterior:
Decide ω1 if P(ω1|x) > P(ω2|x)
Probability of error becomes P(error|x) = min [P(ω1|x), P(ω2|x)]
Equivalently, Decide ω1 if p(x|ω1)P(ω1) > p(x|ω2)P(ω2);
otherwise decide ω2 I.e., the evidence term is not used in decision making.
Conversely, if we have uniform priors, then the decision will rely exclusively on the
likelihoods.
Take Home Message: Decision making relies on both the priors and the likelihoods and
Bayes Decision Rule combines them to achieve the minimum probability of error.
Application of Naïve Bayes Classifier for NLP
• Consider the following sentences:
– S1 : The food is Delicious : Liked
– S2 : The food is Bad : Not Liked
– S3 : Bad food : Not Liked

– Given a new sentence, whether it can be classified as liked sentence or not liked.

– Given Sentence: Delicious Food

• Remove stop words, then perform stemming

F1 F2 F3 0utput
Food Delicious Bad
• S1 1 1 0 1
• S2 1 0 1 0
• S3 1 0 1 0
• P(Liked | attributes) = P(Delicious | Liked) * P(Food | Liked) * P(Liked)
• =(1/1) * (1/1) *(1/3) = 0.33

• P(Not Liked | attributes) = P(Delicious | Not Liked) * P(Food | Not

Liked) * P(Not Liked)
• = (0)*(2/2)*(2/3) = 0
• Hence the given sentence belongs to Liked class
End of Unit 2

PPT CH 1 PR Ir
No ratings yet
PPT CH 1 PR Ir
48 pages
Naive Bayes
No ratings yet
Naive Bayes
21 pages
Naive Bayes Classification Numerical Example - Coding Infinite
No ratings yet
Naive Bayes Classification Numerical Example - Coding Infinite
14 pages
Lecture 2.4-2.5
No ratings yet
Lecture 2.4-2.5
16 pages
3 - Classification - Naive Bayes
No ratings yet
3 - Classification - Naive Bayes
30 pages
NaiveBayes TomasWard
No ratings yet
NaiveBayes TomasWard
39 pages
T06 - Bayes Classifiers
No ratings yet
T06 - Bayes Classifiers
22 pages
Week 8 v1.61 - Association and Causation
No ratings yet
Week 8 v1.61 - Association and Causation
57 pages
Lecture 5-Naïve Bayes
No ratings yet
Lecture 5-Naïve Bayes
26 pages
ml3 - Text Classification - Naive Bayes
No ratings yet
ml3 - Text Classification - Naive Bayes
50 pages
Probs Stats
No ratings yet
Probs Stats
26 pages
Unit 2 .Statistical Decision Making-1
No ratings yet
Unit 2 .Statistical Decision Making-1
213 pages
Bayes
No ratings yet
Bayes
48 pages
Week 4
No ratings yet
Week 4
84 pages
Foundations of Data Science - Unit 6 - Naive Bayes
No ratings yet
Foundations of Data Science - Unit 6 - Naive Bayes
12 pages
Mathematics - Iii: Institute of Science&Technology
No ratings yet
Mathematics - Iii: Institute of Science&Technology
16 pages
AIML Lect7 Bayes
No ratings yet
AIML Lect7 Bayes
48 pages
Bayesian Learning1
No ratings yet
Bayesian Learning1
21 pages
Chapter 4
No ratings yet
Chapter 4
22 pages
Chapter 4 Bayesian Networks
No ratings yet
Chapter 4 Bayesian Networks
62 pages
Lecture 11
No ratings yet
Lecture 11
49 pages
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
64 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
ML - Lec 2 - Review of Probability and Statistics
No ratings yet
ML - Lec 2 - Review of Probability and Statistics
30 pages
Introduction To Machine Learning CS - 229
No ratings yet
Introduction To Machine Learning CS - 229
109 pages
ML Lecture#5
No ratings yet
ML Lecture#5
65 pages
Babybayes Master
No ratings yet
Babybayes Master
172 pages
Statistical Perspective
No ratings yet
Statistical Perspective
85 pages
Bayesclassday 1
No ratings yet
Bayesclassday 1
57 pages
Pattern Classification
No ratings yet
Pattern Classification
141 pages
Machine Learning Models and Theories
No ratings yet
Machine Learning Models and Theories
38 pages
Classification (Naive Bayes)
No ratings yet
Classification (Naive Bayes)
40 pages
26-Bayes Rule-16-03-2024
No ratings yet
26-Bayes Rule-16-03-2024
18 pages
Unit 5
No ratings yet
Unit 5
21 pages
SDA Bayes
No ratings yet
SDA Bayes
12 pages
AIML-Unit 3 Notes-Assignment 3
No ratings yet
AIML-Unit 3 Notes-Assignment 3
37 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Naive Bayes
No ratings yet
Naive Bayes
29 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
Overview of Principles of Statistics
No ratings yet
Overview of Principles of Statistics
8 pages
ML BayesionBeliefNetwork Lect12 14
No ratings yet
ML BayesionBeliefNetwork Lect12 14
99 pages
Notes - Module 4
No ratings yet
Notes - Module 4
17 pages
Theory For Classification and Linear Models (I)
No ratings yet
Theory For Classification and Linear Models (I)
32 pages
DSCI303-18 NaiveBayes
No ratings yet
DSCI303-18 NaiveBayes
44 pages
Pattern Recognition
No ratings yet
Pattern Recognition
76 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Scribe: Naive Bayes Classifier
No ratings yet
Scribe: Naive Bayes Classifier
16 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
63 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Business Econometrics Using SAS Tools (BEST) : Class IV - Probability Refresher
No ratings yet
Business Econometrics Using SAS Tools (BEST) : Class IV - Probability Refresher
31 pages
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
No ratings yet
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
7 pages
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
No ratings yet
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
34 pages
Bayesian
No ratings yet
Bayesian
14 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
STA301 Mcqs FinalTerm by Vu Topper RM
No ratings yet
STA301 Mcqs FinalTerm by Vu Topper RM
45 pages
B.sc. Statistics
No ratings yet
B.sc. Statistics
186 pages
Chapter 5 - 7
No ratings yet
Chapter 5 - 7
110 pages
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
No ratings yet
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
35 pages
Analytic Inequalities
From Everand
Analytic Inequalities
Nicholas D. Kazarinoff
5/5 (1)
EC404 - Monsoon 2016 Archana Aggarwal Introduction To Statistics and Econometrics 307, SSS-II Problem Set 1
No ratings yet
EC404 - Monsoon 2016 Archana Aggarwal Introduction To Statistics and Econometrics 307, SSS-II Problem Set 1
4 pages
Devya Distribution
No ratings yet
Devya Distribution
14 pages
Cumulative Poisson Probability Distribution Table: Appendix C
No ratings yet
Cumulative Poisson Probability Distribution Table: Appendix C
5 pages
Summative 8 4TH
No ratings yet
Summative 8 4TH
2 pages
Lean Six Sigma Green Belt Certification Training Manual CSSC 2018 06b (1) (251 300)
No ratings yet
Lean Six Sigma Green Belt Certification Training Manual CSSC 2018 06b (1) (251 300)
50 pages
Latihan
0% (1)
Latihan
2 pages
Janrie M. Raguine, Mat Probability Distribution of Discrete Random Variable (Q3 - Wk. 1, LAS 3)
No ratings yet
Janrie M. Raguine, Mat Probability Distribution of Discrete Random Variable (Q3 - Wk. 1, LAS 3)
1 page
Math 10-Q4-L1-Union of Events
No ratings yet
Math 10-Q4-L1-Union of Events
47 pages
Bivariate Discrete Probability
No ratings yet
Bivariate Discrete Probability
18 pages
KM1 - Think in Probabilities, Not Certainties
100% (1)
KM1 - Think in Probabilities, Not Certainties
13 pages
Poisson Processes: Foundation Fortnight, September 2020
No ratings yet
Poisson Processes: Foundation Fortnight, September 2020
32 pages
Presentation On The Martingale Theory
No ratings yet
Presentation On The Martingale Theory
31 pages
CH-20 Ruin Theory
No ratings yet
CH-20 Ruin Theory
13 pages
2 Random Variables and Probability Distribution
No ratings yet
2 Random Variables and Probability Distribution
79 pages
HIRAC Template
No ratings yet
HIRAC Template
4 pages
Activities 31
No ratings yet
Activities 31
3 pages
Stochastic Dominance
No ratings yet
Stochastic Dominance
40 pages
SP Quiz 3 - Problem Solving Involving Mean, Variance, and SD
No ratings yet
SP Quiz 3 - Problem Solving Involving Mean, Variance, and SD
4 pages
MS8 IGNOU MBA Assignment 2009
No ratings yet
MS8 IGNOU MBA Assignment 2009
6 pages
Log Normal Distribution
No ratings yet
Log Normal Distribution
7 pages
Problems On P&C
No ratings yet
Problems On P&C
6 pages
Sta257 F2023
No ratings yet
Sta257 F2023
3 pages
Unit10 Intelligent Business
No ratings yet
Unit10 Intelligent Business
8 pages
Worksheets
No ratings yet
Worksheets
5 pages
INDR 430 - Sample Final
No ratings yet
INDR 430 - Sample Final
2 pages
Monte Carlo Methods Primer
No ratings yet
Monte Carlo Methods Primer
4 pages
4 Basic Probability
No ratings yet
4 Basic Probability
1 page
Assignment 1
No ratings yet
Assignment 1
2 pages

2 Unit PR Statistical Decision Making

Uploaded by

2 Unit PR Statistical Decision Making

Uploaded by

Unit - 2

• Unsupervised learning attempts to find inherent structures in the data.

• Semi-supervised learning makes use of a small number of labeled data and a

• Parametric methods are assumed to be a normal distribution.

• Parameters for using the normal distribution is –

• On the other hand cold related disease is less as the temperature

Revisiting conditional probability

From the above expressions, we can rewrite

P[A|B] = P[A].P[B|A] / P[B] - Bayes Rule

• Compute : Probability in the deck of cards (52 excluding jokers)

• It is given by P(King/Face) = P(Face/King) * P(King)/ P(Face)

Cold (C) and not-cold (C’). Feature is fever (f).

Prior probability of a person having a cold, P(C) = 0.01.

• Probability of B in total can be given by

• Equation from the previous slide:

• Replacing first in the second equation in this slide, we will get:

• It is found in the following statement :

If a person gets a positive test result,

P(A|X) = (.9 * .01) / (.9 * .01 + .096 * .99) = 0.0865 (8.65%).

The probability of having the faulty gene on the test is 8.65%.

Let women having cancer is W and ~W is women not having cancer.

A diagnostic test gives a positive result in:

◦ 99% of people with the disease

A person receives a positive result

◦ 𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 = 0.99

• Thus to compute the optimal decision boundary between two

• Substituting Bayes Theorem and cancelling p(x) term:

• If the feature x in both the classes are normally distributed

𝑃(𝐴) 𝑥−𝜇𝐴 2 𝑃(𝐵) 𝑥−𝜇𝐵 2

• Example problem can be seen in the next slide

• Ex: Tossing two coins… are independent.

• P(x,y|z)=P(x|z).P(y|z) : indicates that X and Y are independent given Z.

• Example: X: Throw a dice

So X and Y are conditionally independent and also conditionally independent.

– Further let us add a condition Z.

• Let us say X : Dice throw 1

• Let us add Z = sum of the dice

– Given Sentence: Delicious Food

• P(Not Liked | attributes) = P(Delicious | Not Liked) * P(Food | Not

You might also like