0% found this document useful (0 votes)

0 views

ML-09-naive-bayes-classifier

The document discusses the Naïve Bayes Classifier, a probabilistic framework for classification problems that assumes conditional independence among attributes. It explains Bayes theorem, how to compute posterior probabilities, and the implications of conditional independence on classification accuracy. The document also highlights the advantages and disadvantages of the Naïve Bayes Classifier, including its robustness to noise and handling of missing values, while noting potential performance degradation due to correlated attributes.

Uploaded by

Triveni Jayaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

ML-09-naive-bayes-classifier

Uploaded by

Triveni Jayaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

CS 60050

Machine Learning

Naïve Bayes Classifier

Some slides taken from course materials of Tan, Steinbach, Kumar

Bayes Classifier

●A probabilistic framework for solving classification

problems
● Approach for modeling probabilistic relationships
between the attribute set and the class variable
– May not be possible to certainly predict class label of a
test record even if it has identical attributes to some
training records
– Reason: noisy data or presence of certain factors that
are not included in the analysis
Probability Basics

● P(A = a, C = c): joint probability that random

variables A and C will take values a and c
respectively
● P(A = a | C = c): conditional probability that A will
take the value a, given that C has taken value c
P( A, C )
P(C | A) =
P( A)
P( A, C )
P( A | C ) =
P(C )
Bayes Theorem

● Bayes theorem:

P( A | C ) P(C )
P(C | A) =
P( A)

● P(C) known as the prior probability

● P(C | A) known as the posterior probability
Example of Bayes Theorem

● Given:
– A doctor knows that meningitis causes stiff neck 50% of the
time
– Prior probability of any patient having meningitis is 1/50,000
– Prior probability of any patient having stiff neck is 1/20

● If a patient has stiff neck, what’s the probability

he/she has meningitis?
P( S | M ) P( M ) 0.5 ×1 / 50000
P( M | S ) = = = 0.0002
P( S ) 1 / 20
Bayesian Classifiers

● Consider each attribute and class label as random

variables

● Given
a record with attributes (A1, A2,…,An)
– Goal is to predict class C
– Specifically, we want to find the value of C that
maximizes P(C| A1, A2,…,An )

● Canwe estimate P(C| A1, A2,…,An ) directly from

data?
Bayesian Classifiers

● Approach:
– compute the posterior probability P(C | A1, A2, …, An) for
all values of C using the Bayes theorem

P( A A ! A | C ) P(C )
P(C | A A ! A ) = 1 2 n

P( A A ! A )
1 2 n

1 2 n
Bayesian Classifiers

● Approach:
– compute the posterior probability P(C | A1, A2, …, An) for
all values of C using the Bayes theorem
Class-conditional
probability Prior probability

P( A A ! A | C ) P(C )
P(C | A A ! A ) = 1 2 n

P( A A ! A )
1 2 n

1 2 n

Posterior probability Evidence

Bayesian Classifiers

● Approach:
– compute the posterior probability P(C | A1, A2, …, An) for
all values of C using the Bayes theorem
P( A A ! A | C ) P(C )
P(C | A A ! A ) = 1 2 n

P( A A ! A )
1 2 n

1 2 n

– Choose value of C that maximizes

P(C | A1, A2, …, An)

– Equivalent to choosing value of C that maximizes

P(A1, A2, …, An|C) P(C)

● How to estimate P(A1, A2, …, An | C )?

Naïve Bayes Classifier

● Assumes all attributes Ai are conditionally independent,

when class C is given:
– P(A1, A2, …, An |C) = P(A1| C) P(A2| C)… P(An| C)

– Can estimate P(Ai | Cj) for all Ai and Cj.

– New point is classified to Cj if P(Cj) Π P(Ai | Cj) is

maximal.
Conditional independence: basics

● Let X, Y, Z denote three sets of random variables

● The variables in X are said to be conditionally
independent of variables in Y, given Z if
● P( X | Y, Z ) = P( X | Z )

● An example
– Level of reading skills of people tends to increase with
length of the arm
– Explanation: both increase with age of a person
– If age is given, arm length and reading skills are
(conditionally) independent
Conditional independence: basics

● If X and Y are conditionally independent, given Z

P( X, Y | Z ) = P(X, Y, Z) / P(Z)
= P(X, Y, Z) / P(Y, Z) * P(Y, Z) / P(Z)
= P(X | Y, Z) * P(Y | Z)
= P(X | Z) * P(Y | Z)

P( X, Y | Z ) = P(X | Z) * P(Y | Z)
NB assumption:
P(A1, A2, …, An |C) = P(A1| C) P(A2| C)… P(An| C)
How to Estimate
l l
Probabilities from Data?
a a u s
r ic r ic o
o o u
g g it n ss
c at
e
c at
e
c on cl
a ● Class: P(C) = Nc/N
Tid Refund Marital Taxable – e.g., P(No) = 7/10,
Status Income Evade
P(Yes) = 3/10
1 Yes Single 125K No
2 No Married 100K No
● For discrete attributes:
3 No Single 70K No
4 Yes Married 120K No P(Ai | Ck) = |Aik|/ Nc k
5 No Divorced 95K Yes
6 No Married 60K No
– where |Aik| is number of
instances having attribute
7 Yes Divorced 220K No
Ai and belongs to class Ck
8 No Single 85K Yes
9 No Married 75K No
– Examples:
10 No Single 90K Yes P(Status=Married|No) = 4/7
P(Refund=Yes|Yes)=0
10
How to Estimate Probabilities from Data?

● For
continuous attributes, two options:
– Discretize the range into bins
u one ordinal attribute per bin
– Probability density estimation:
u Assume attribute follows a Gaussian / normal
distribution
u Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
u Once probability distribution is known, can use it to
estimate the conditional probability P(Ai|c)
How too Estimate Probabilities from Data?
a l a l s
u
o u ric ric o
e g e g t in s s
c at c at c o n
cl
a
Tid Refund Marital
Status
Taxable
Income Evade
● Normal distribution:
1 −
( Ai − µ ij ) 2

1 Yes Single 125K No

P( A | c ) = e 2 σ ij2

2πσ
i j 2
2 No Married 100K No
ij
3 No Single 70K No
4 Yes Married 120K No – One for each (Ai, cj) pair
5 No Divorced 95K Yes
6 No Married 60K No ● For (Income, Class=No):
7 Yes Divorced 220K No
– If Class=No
8 No Single 85K Yes
9 No Married 75K No
u sample mean = 110
10 No Single 90K Yes u sample variance = 2975
10

1 −
( 120 −110 ) 2

P( Income = 120 | No) = e 2 ( 2975 )

= 0.0072
2π (54.54)
A complete example
Example of Naïve Bayes Classifier
Given a Test Record:
X = (Refund
ca
l
ca
l= No, Married,
us
Income = 120K)
r i r i o
o o u
g g it n s
at
e
Training data: at
e
on las naive Bayes Classifier:
c c c c
Tid Refund Marital Taxable
Status Income Evade P(Refund=Yes|No) = 3/7
P(Refund=No|No) = 4/7
1 Yes Single 125K No P(Refund=Yes|Yes) = 0
P(Refund=No|Yes) = 1
2 No Married 100K No
P(Marital Status=Single|No) = 2/7
3 No Single 70K No P(Marital Status=Divorced|No)=1/7
P(Marital Status=Married|No) = 4/7
4 Yes Married 120K No P(Marital Status=Single|Yes) = 2/7
5 No Divorced 95K Yes P(Marital Status=Divorced|Yes)=1/7
P(Marital Status=Married|Yes) = 0
6 No Married 60K No
7 Yes Divorced 220K No For taxable income:
If class=No: sample mean=110
8 No Single 85K Yes sample variance=2975
9 No Married 75K No If class=Yes: sample mean=90
sample variance=25
10 No Single 90K Yes
10
Example of Naïve Bayes Classifier
Given a Test Record:
X = (Refund = No, Married, Income = 120K)
naive Bayes Classifier:

P(Refund=Yes|No) = 3/7 ● P(X|Class=No) = P(Refund=No|Class=No)

● Ifone of the conditional probability is zero, then

the entire expression becomes zero
● Probability estimation:

N ic
Original : P ( Ai | C ) =
Nc
c: number of classes
N ic + 1
Laplace : P ( Ai | C ) = p: prior probability
Nc + c
m: parameter
N ic + mp
m - estimate : P ( Ai | C ) =
Nc + m
Naïve Bayes: Pros and Cons

● Robust to isolated noise points

● Can handle missing values by ignoring the

instance during probability estimate calculations

● Robust to irrelevant attributes

● Independence assumption may not hold for some

attributes
– Presence of correlated attributes can degrade
performance of NB classifier
Example with correlated attribute

● Two attributes A, B and class Y (all binary)

● Prior probabilities:
– P(Y=0) = P(Y=1) = 0.5
● Class conditional probabilities of A:
– P(A=0 | Y=0) = 0.4 P(A=1 | Y=0) = 0.6
– P(A=0 | Y=1) = 0.6 P(A=1 | Y=1) = 0.4
● Class conditional probabilities of B are same as
that of A
● B is perfectly correlated with A when Y=0, but is
independent of A when Y=1
Example with correlated attribute

● Need to classify a record with A=0, B=0

● Need to classify a record with A=0, B=0

● In reality, since B is perfectly correlated to A when
Y= 0
● P(Y=0 | A=0,B=0) = P(A=0,B=0 | Y=0) P(Y=0)
P(A=0, B=0)
= P(A=0|Y=0) P(Y=0)
P(A=0, B=0)
= (0.4 * 0.5) / P(A=0,B=0)

● Hence prediction should have been Y=0

Other Bayesian classifiers

● Ifit is suspected that attributes may have

correlations:
● Can use other techniques such as Bayesian
Belief Networks (BBN)
● Uses a graphical model (network) to capture prior
knowledge in a particular domain, and causal
dependencies among variables

Data Mining Classification: Alternative Techniques
No ratings yet
Data Mining Classification: Alternative Techniques
15 pages
Naïve Bayes
No ratings yet
Naïve Bayes
15 pages
GT Basic Practise Set
No ratings yet
GT Basic Practise Set
5 pages
Business Statistics Exercises Answers
0% (1)
Business Statistics Exercises Answers
16 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
ML Lec 15 Naive Bayes
No ratings yet
ML Lec 15 Naive Bayes
16 pages
Naive Bayes
No ratings yet
Naive Bayes
13 pages
Classification With NaiveBayes
No ratings yet
Classification With NaiveBayes
19 pages
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
No ratings yet
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
26 pages
Foundations of Data Science - Unit 6 - Naive Bayes
No ratings yet
Foundations of Data Science - Unit 6 - Naive Bayes
12 pages
6 - Naive Bayes
No ratings yet
6 - Naive Bayes
26 pages
Naïve Bayes Classifier (Week 8)
No ratings yet
Naïve Bayes Classifier (Week 8)
18 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
DM NaiveBayes
No ratings yet
DM NaiveBayes
15 pages
Classification (Naive Bayes)
No ratings yet
Classification (Naive Bayes)
40 pages
Naive Ba Yes
No ratings yet
Naive Ba Yes
28 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
navie classifier
No ratings yet
navie classifier
8 pages
Unit-4 DWDM
No ratings yet
Unit-4 DWDM
10 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
Naive-By
No ratings yet
Naive-By
23 pages
naive_bayes
No ratings yet
naive_bayes
19 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
PR January20 05 PDF
No ratings yet
PR January20 05 PDF
24 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
Naive Bayes
No ratings yet
Naive Bayes
19 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Classification-Alternative Techniques: Bayesian Classifiers
No ratings yet
Classification-Alternative Techniques: Bayesian Classifiers
7 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
ML Lecture#5
No ratings yet
ML Lecture#5
65 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
21 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
ML Naive Bayes 1
No ratings yet
ML Naive Bayes 1
19 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
2.3 Bayes classification
No ratings yet
2.3 Bayes classification
15 pages
Lecture 5-Naïve Bayes
No ratings yet
Lecture 5-Naïve Bayes
26 pages
Naive Bayes
No ratings yet
Naive Bayes
9 pages
Bayes
No ratings yet
Bayes
10 pages
D3 It Naive Bayes
No ratings yet
D3 It Naive Bayes
24 pages
Bays Classifier (Machine Learning)
No ratings yet
Bays Classifier (Machine Learning)
16 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
BSC ML CH2.pptx
No ratings yet
BSC ML CH2.pptx
79 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Bayesian Classifier and ML Estimation: 6.1 Conditional Probability
100% (3)
Bayesian Classifier and ML Estimation: 6.1 Conditional Probability
11 pages
Naive Bayes
No ratings yet
Naive Bayes
36 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
Bayes Classification Method
No ratings yet
Bayes Classification Method
18 pages
Lecture 8 - Naive Bayes
No ratings yet
Lecture 8 - Naive Bayes
27 pages
Bayesian Learning
No ratings yet
Bayesian Learning
58 pages
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
lin reg outputs
No ratings yet
lin reg outputs
1 page
kmean pgm
No ratings yet
kmean pgm
3 pages
p4 outputs
No ratings yet
p4 outputs
3 pages
cyber scheme may 2024
No ratings yet
cyber scheme may 2024
20 pages
banner gptt
No ratings yet
banner gptt
1 page
WEEK1 NOTES
No ratings yet
WEEK1 NOTES
5 pages
Managerial Economics: Unit 7: Game Theory
No ratings yet
Managerial Economics: Unit 7: Game Theory
51 pages
The Activities and Time Estimates For A Particular Project Are As Follows. A. Construct The Network Diagram
No ratings yet
The Activities and Time Estimates For A Particular Project Are As Follows. A. Construct The Network Diagram
2 pages
2016 Problem Set 3
No ratings yet
2016 Problem Set 3
2 pages
Game Theor1
No ratings yet
Game Theor1
7 pages
GAME THEORY - notes белешки
No ratings yet
GAME THEORY - notes белешки
110 pages
Design of Experiments and ANOVA
No ratings yet
Design of Experiments and ANOVA
45 pages
Which of The Following Provide The Most Powerful Information?
No ratings yet
Which of The Following Provide The Most Powerful Information?
10 pages
Karthik Aat PSQ
No ratings yet
Karthik Aat PSQ
13 pages
2022 - Bai Thuc Hanh 5
No ratings yet
2022 - Bai Thuc Hanh 5
73 pages
ECON 673 2024 Spring Lecture 2
No ratings yet
ECON 673 2024 Spring Lecture 2
23 pages
Modi Method
No ratings yet
Modi Method
10 pages
Bayes For Beginners: Luca Chech and Jolanda Malamud Supervisor: Thomas Parr 13 February 2019
No ratings yet
Bayes For Beginners: Luca Chech and Jolanda Malamud Supervisor: Thomas Parr 13 February 2019
41 pages
Nash On The Crossroads of Information and Game Theory
No ratings yet
Nash On The Crossroads of Information and Game Theory
7 pages
Unit Iv Design of Experiments
0% (1)
Unit Iv Design of Experiments
34 pages
Game Theory: Dr. Wasihun Tiku CH 6 1
No ratings yet
Game Theory: Dr. Wasihun Tiku CH 6 1
15 pages
Tabel Time Value of Money PDF
No ratings yet
Tabel Time Value of Money PDF
10 pages
The Robustness and Ruggedness Tests For The GC-MS
No ratings yet
The Robustness and Ruggedness Tests For The GC-MS
1 page
Analisis Pengaruh Faktor-Faktor Resiko Terhadap Kinerja Mutu
No ratings yet
Analisis Pengaruh Faktor-Faktor Resiko Terhadap Kinerja Mutu
22 pages
Ens 362: Econometrics Ii Course Outline and Reading List
No ratings yet
Ens 362: Econometrics Ii Course Outline and Reading List
3 pages
Studi Kelayakan Penyediaan Air Minum Kota Surakarta Planning Horizon 10 Tahun (Studi Kasus: Pdam Kota Surakarta)
No ratings yet
Studi Kelayakan Penyediaan Air Minum Kota Surakarta Planning Horizon 10 Tahun (Studi Kasus: Pdam Kota Surakarta)
6 pages
Chapter 0 Introduction: STAT3907 Linear Models and Forecasting
No ratings yet
Chapter 0 Introduction: STAT3907 Linear Models and Forecasting
14 pages
EC220 - Slides 10
No ratings yet
EC220 - Slides 10
28 pages
Year-End Value of A Deposit $100 Annua Rate 10% Compund Interest
No ratings yet
Year-End Value of A Deposit $100 Annua Rate 10% Compund Interest
5 pages
Audit Regression Nestle
No ratings yet
Audit Regression Nestle
2 pages
Jurnal Nur Aeni Salsabila 119020059 Word
No ratings yet
Jurnal Nur Aeni Salsabila 119020059 Word
11 pages
Name: Desi Rizma Yanti Class: A NIM: I2J018007 Statistical Soup: ANOVA, ANCOVA, MANOVA, & MANCOVA
No ratings yet
Name: Desi Rizma Yanti Class: A NIM: I2J018007 Statistical Soup: ANOVA, ANCOVA, MANOVA, & MANCOVA
4 pages
4123-Article Text-15297-1-10-20211020
No ratings yet
4123-Article Text-15297-1-10-20211020
16 pages
Experemental Research 2
No ratings yet
Experemental Research 2
14 pages

ML-09-naive-bayes-classifier

Uploaded by

ML-09-naive-bayes-classifier

Uploaded by

CS 60050

Naïve Bayes Classifier

Some slides taken from course materials of Tan, Steinbach, Kumar

●A probabilistic framework for solving classification

● P(A = a, C = c): joint probability that random

● P(C) known as the prior probability

● If a patient has stiff neck, what’s the probability

● Consider each attribute and class label as random

● Canwe estimate P(C| A1, A2,…,An ) directly from

Posterior probability Evidence

– Choose value of C that maximizes

– Equivalent to choosing value of C that maximizes

● How to estimate P(A1, A2, …, An | C )?

● Assumes all attributes Ai are conditionally independent,

– Can estimate P(Ai | Cj) for all Ai and Cj.

– New point is classified to Cj if P(Cj) Π P(Ai | Cj) is

● Let X, Y, Z denote three sets of random variables

● If X and Y are conditionally independent, given Z

1 Yes Single 125K No

P( Income = 120 | No) = e 2 ( 2975 )

P(Refund=Yes|No) = 3/7 ● P(X|Class=No) = P(Refund=No|Class=No)

● Ifone of the conditional probability is zero, then

● Robust to isolated noise points

● Can handle missing values by ignoring the

● Robust to irrelevant attributes

● Independence assumption may not hold for some

● Two attributes A, B and class Y (all binary)

● Need to classify a record with A=0, B=0

● Need to classify a record with A=0, B=0

● Hence prediction should have been Y=0

● Ifit is suspected that attributes may have

You might also like