0% found this document useful (0 votes)

68 views43 pages

Lesson 3.3 - Supervised Learning Rule Based Classification

This document provides an overview of supervised learning using Bayesian classifiers, specifically naive Bayesian classification. It begins with an introduction to Bayesian classifiers and how they perform probabilistic prediction based on Bayes' theorem. It then explains naive Bayesian classification, which makes assumptions of attribute independence, and how it works by calculating the probability that an instance belongs to a class using Bayes' theorem and estimating probabilities from training data. Finally, it provides an example of how to classify a new instance using naive Bayesian classification.

Uploaded by

Tayyaba Faisal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views43 pages

Lesson 3.3 - Supervised Learning Rule Based Classification

Uploaded by

Tayyaba Faisal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Supervised Learning: Bayesian Classifiers

CS 822 Data Mining

Anis ur Rahman
Department of Computing
NUST-SEECS
Islamabad

November 5, 2018

1 / 43
Bayesian Classification

Roadmap

Introduction
Bayes Theorem
Naive Bayesian Classification
References

2 / 43
Bayesian Classification

Introduction

Bayesian classifier
A statistical classifier
performs probabilistic prediction,
i.e., predicts class membership probabilities, such as the
probability that a given instance belongs to a particular class.

Foundation
Based on Bayes Theorem.

3 / 43
Bayesian Classification

Introduction

Performance
A simple Bayesian classifier, Naive Bayesian classifier, exhibits
high accuracy and speed when applied to large databases
comparable to decision tree and selected neural network
classifiers

Incremental
Each training example can incrementally increase/decrease the
probability that a hypothesis is correct

Popular methods
Naive Bayesian classifier
Bayesian belief networks

4 / 43
Bayesian Classification

Introduction

Naive Bayesian classifiers assume that the effect of an attribute

value on a given class is independent of the values of the other
attributes.
This assumption is called class conditional independence
It simplifies computations involved and, and hence “Naive”
Results comparable performance with decision tree and selected
neural network classifiers.

Bayesian belief networks are graphical models

Unlike Naive Bayesian classifiers
Allows representation of dependencies among subsets of
attributes
Can also be used for classification

5 / 43
Bayesian Classification

Bayes Theorem

Let:
X . be a data sample: class label is unknown
H . a hypothesis that X belongs to class C
P (H |X ). determined by classifier
The probability that instance X belongs to class C
We know the attribute description of X
P (H ). probability of H
P (X ). probability that sample data is observed
P (X |H ). probability of X conditioned on H

6 / 43
Bayesian Classification

Bayes Theorem

How are these probabilities estimated?

P (H ), P (X |H ), and P (X ) may be estimated from the given data
Bayes theorem is useful way of calculating P (H |X ),
P (X |H )P (H )
P (H |X ) =
P (X )

Example. Suppose customers described by attributes age and income

X : a 35-year-old customer with an income of $40,000.
H : the hypothesis that the customer will buy a computer.
P (H |X ): the probability that customer X will buy a computer given
that we know the customer’s age and income.
P (H ): the probability that any given customer will buy a computer,
regardless of age and income
7 / 43
Bayesian Classification

Bayes Theorem

Example. (cont.)
P (X ): the probability that a person from our set of customers is 35
years old and earns $40,000.
P (X |H ): the probability that a customer, X , is 35 years old and
earns $40,000, given that we know the customer will buy a
computer.

Practical difficulty
Require initial knowledge of many probabilities, significant
computational cost

In the next section, we will look at how Bayes’ theorem is used in the
naive Bayesian classifier.

8 / 43
Bayesian Classification

Naive Bayesian Classification

Naive bayes classifier use all the attributes

Two assumptions
Attributes are equally important
Attributes are statistically independent
i.e., knowing the value of one attribute says nothing about the value
of another

Note
Equally important and independence assumptions are never correct in
real-life datasets

9 / 43
Bayesian Classification

Naive Bayesian Classification

The naive Bayesian classifier works as follows:

1 Let D be a training set of instances and their associated class
labels,
each instance is represented by an n-dimentional attribute vector
X = (x1 , x2 , · · · , xn )
2 Suppose there are m classes C1 , C2 , · · · , Cm .
The classifier will predict that X belongs to the class Ci if and only if:
P (Ci |X ) > P (Cj |X )for1 ≤ j ≤ m, j , i

The probability can be derived from Bayes’ theorem:

10 / 43
Bayesian Classification

Naive Bayesian Classification

3 Since P (X ) is constant for all classes, only the follows need to be

maximized
P (X |Ci )P (Ci )
P (Ci |X ) =
P (X )

Note that the class prior probabilities may be estimated by

Naive Bayesian Classification

We can estimate the probabilities P (xk |Ci ) from the training

dataset.
Let xk refers to the value of attribute Ak for instance X .
The attribute can be:
1 Categorical valued
2 Continuous valued

If Ak is categorical
P (xk |Ci ) is the # of tuples in Ci having value xk for Ak divided by
|Ci ,D | (# of tuples of Ci in D )

12 / 43
Bayesian Classification

Naive Bayesian Classification

If Ak is continous-valued
P (xk |Ci ) is usually computed based on Gaussian distribution with a
mean µ and standard deviation σ :
(x−µ) 2
1 −
g(x, µ, σ ) = √ e 2σ 2
2πσ
n
1X
µ= xi
n
i =1
v
n
t
1 X
σ= (xi − µ)2
n −1
i =1

and P (xk |Ci ) is

P (xk |Ci ) = g(xk , µCi , σCi )

µCi and σCi : the mean and standard deviation, respectively, of the
values of attribute Ak for training instances of class Ci .
13 / 43
Bayesian Classification

Naive Bayesian Classification

Example.
let X = (35, $40, 000), where A1 and A2 are the attributes age and
income.
Let the class label attribute be buys_computer.
The associated class label for X is yes (i.e., buys computer = yes).
For attribute age and this class, we have µ = 38 years and σ = 12.
Plug in these quantities, along with x1 = 35 for our instance X into
g(x, µ, σ ) to estimate P(age = 35 | buys_computer = yes).

14 / 43
Bayesian Classification

Naive Bayesian Classification

5 The classifier predicts that the class label of instance X is the

class Ci if and only if
P (X |Ci )P (Ci ) > P (X |Cj )P (Cj )for1 ≤ i ≤ m, j , i

In other words, the predicted class label is the class Ci for which
P (X |Ci )P (Ci ) is the maximum.

15 / 43
Bayesian Classification

Example 1: AllElectronics

We wish to predict the class label of a instance using Naive

Bayesian classification given the AllElectronics training data
The data instances are described by the attributes age, income,
student, and credit rating.
The class label attribute, buys_computer, has two distinct values
Let
C1 correspond to the class buys computer = yes
C2 correspond to the class buys computer = no

16 / 43
Bayesian Classification

Example 1: AllElectronics

17 / 43
Bayesian Classification

Example: AllElectronics

The instance we wish to classify is

X = (age = youth ,
income = medium,
student = yes,
credit_rating = fair)

We need to maximize P (X |Ci )P (Ci ), for i = 1, 2.

P (Ci ), the probability of each class, can be computed based on the
training data

18 / 43
Bayesian Classification

Example: AllElectronics

The probability of each class:

P (buys_computer = yes) = 9/14 = 0.643
P (buys_computer = no) = 5/14 = 0.357

The conditional probabilities P (X |Ci ) for i = 1, 2:

P (age = youth |buys_computer = yes) = 2/9 = 0.222
P (age = youth |buys_computer = no) = 3/5 = 0.600
P (income = medium|buys_computer = yes) = 4/9 = 0.444
P (income = medium|buys_computer = no) = 2/5 = 0.400
P (student = yes|buys_computer = yes) = 6/9 = 0.667
P (student = yes|buys_computer = no) = 1/5 = 0.200
P (credit_rating = yes|buys_computer = yes) = 6/9 = 0.667
P (credit_rating = yes|buys_computer = no) = 2/5 = 0.400

19 / 43
Bayesian Classification

Example: AllElectronics

Using the above probabilities, we obtain:

Similarly,
P (X |buys_computer = no) =
= 0.600 × 0.400 × 0.200 × 0.400 = 0.019

20 / 43
Bayesian Classification

Example: AllElectronics

To find the class we compute P (X |Ci )P (Ci ):

P (X |buys_computer = yes)P (buys_computer = yes) =
= 0.044 × 0.643 = 0.007
P (X |buys_computer = no)P (buys_computer = no) =
= 0.019 × 0.357 = 0.028

Therefore, the naive Bayesian classifier predicts buys computer =

yes for instance X .

21 / 43
Bayesian Classification

Avoiding the 0-Probability Problem

We need to compute P (X |Ci ) for each class (i = 1, 2, · · · , m) in order

Naive Bayesian prediction requires each conditional probability be

non-zero.
Otherwise, the predicted probability will be zero

22 / 43
Bayesian Classification

Avoiding the 0-Probability Problem

Example:
for the attribute-value pair student = yes of X
we need two counts
the number of customers who are students and for which
buys_computer = yes, which contributes to
P (X |buys_computer = yes)
the number of customers who are students and for which
buys_computer = no, which contributes to
P (X |buys_computer = no)
But if there are no training instances representing students for the
class buys computer = no, resulting in
P (student = yes|buys_computer = no) = 0
Plugging this zero value into Equation P (X |Ci ) would return a zero
probability for P (X |Ci )

23 / 43
Bayesian Classification

Avoiding the 0-Probability Problem

Laplacian correction (Laplacian estimator)

We assume that our training database, D , is so large
Adding 1 to each case
It makes a negligible difference in the estimated probability value
It would conveniently avoid the case of probability values of zero.

24 / 43
Bayesian Classification

Avoiding the 0-Probability Problem

Use Laplacian correction (or Laplacian estimator)

Adding 1 to each case
P (income = low) = 1/1003
P (income = medium) = 991/1003
P (income = high ) = 11/1003

The “corrected” prob. estimates are close to their “uncorrected”

counterparts

25 / 43
Bayesian Classification

Avoiding the 0-Probability Problem

Example:
Suppose that for the class buys_computer = yes in training
database, D , containing 1,000 instances
We have
0 instances with income = low,
990 instances with income = medium, and
10 instances with income = high
The probabilities of these events are 0 (from 0/1000), 0.990 (from
999/1000), and 0.010 (from 10/1,000)
Using the Laplacian correction for the three quantities, we pretend
that we have 1 more instance for each income-value pair.

26 / 43
Bayesian Classification

Avoiding the 0-Probability Problem

In this way, we instead obtain the following probabilities (rounded

up to three decimal places):

1
P (income = low) = = 0.001
1003
991
P (income = medium) = = 0.998
1003
11
P (income = high ) = = 0.011
1003

27 / 43
Bayesian Classification

Example 2: Weather Problem

28 / 43
Bayesian Classification

Weather Problem

Outlook Temperature Humidity Windy Play

yes no yes no yes no yes no yes no

sunny 2 3 hot 2 2 high 3 4 false 6 2 9 5

overcast 4 0 mild 4 2 normal 6 1 true 3 3
rainy 3 2 cool 3 1

sunny 2/9 3/5 hot 2/9 2/5 high 3/9 4/5 false 6/9 2/5 9/14 5/14
overcast 4/9 0/5 mild 4/9 2/5 normal 6/9 1/5 true 3/9 3/5
rainy 3/9 2/5 cool 3/9 1/5

E.g.
P (outlook = sunny|play = yes) = 2/9
P (windy = true|play = No) = 3/5

29 / 43
Bayesian Classification

Probabilities for weather data

A new day:
Outlook Temperature Humidity Windy Play
sunny cool high true ?

likelihood of yes = 2/9 × 3/9 × 3/9 × 3/9 × 9/14 = 0.0053

likelihood of no = 3/5 × 1/5 × 4/5 × 3/5 × 5/14 = 0.0206

Conversion into a probability by normalization:

0.0053
Probability of yes = = 20.5%
0.0053 + 0.0206
0.0206
Probability of no = = 79.5%
0.0053 + 0.0206

30 / 43
Bayesian Classification

Bayes rule

The hypothesis H (class) is that

play will be ‘yes’ P (H |X ) is 20.5%
play will be ‘no’ P (H |X ) is 79.5%
The evidence X is the particular combination of attribute values for
the new day:
outlook = sunny
temperature = cool
humidity = high
windy = true

31 / 43
Bayesian Classification

Weather data example

32 / 43
Bayesian Classification

The “zero-frequency problem”

What if an attribute value doesn’t occur with every class value?

e.g. “Humidity = high” for class “yes” Probability will be zero!
P [Humidity = High |yes] = 0

A posteriori probability will also be zero!

Pr[yes|E ] = 0

(No matter how likely the other values are!)

Correction: add 1 to the count for every attribute value-class
combination (Laplace estimator)
Result: probabilities will never be zero!

33 / 43
Bayesian Classification

Modified probability estimates

In some cases adding a constant different from 1 might be more

appropriate
Example: attribute outlook for class ‘yes’
2 + µ/3
sunny :
9+µ
4 + µ/3
overcast :
9+µ
3 + µ/3
rainy :
9+µ
Weights don’t need to be equal but they must sum to 1 (p1 , p2 , and
p3 sum to 1)
2 + µp1
sunny :
9+µ
4 + µp3
overcast :
9+µ
3 + µp3
rainy :
9+µ 34 / 43
Bayesian Classification

Missing values

Training: instance is not included in frequency count for attribute

value-class combination
Classification: attribute will be omitted from calculation
Example: if the value of outlook were missing in the example
Outlook Temperature Humidity Windy Play
? cool high true ?

Likelihood of “yes” = 3/9 × 3/9 × 3/9 × 9/14 = 0.0238

Likelihood of “no” = 1/5 × 4/5 × 3/5 × 5/14 = 0.0343
0.0238
P (“yes 00 ) = = 41%
0.0238 + 0.0343
0.0343
P (“no 00 ) = = 59%
0.0238 + 0.0343

35 / 43
Bayesian Classification

Numeric attributes

Usual assumption: attributes have a normal or Gaussian

probability distribution
The probability density function for the normal distribution is
defined by two parameters:
Sample mean µ
n
1X
µ= xi
n
i =1
Standard deviation σ
v
n
t
1 X
σ= (xi − µ)2
n −1
i =1
Then the density function f (x) is:
(x−µ) 2
1 −
f (x) = √ e 2σ 2
2πσ

36 / 43
Bayesian Classification

Statistics for weather data

Outlook Temperature Humidity Windy Play

yes no yes no yes no yes no yes no

sunny 2 3 83 85 86 85 false 6 2 9 5
overcast 4 0 70 80 96 90 true 3 3
rainy 3 2 68 65 80 70
64 72 65 95
69 71 70 91
75 80
75 70
72 90
81 75

sunny 2/9 3/5 µ 73 74.6 µ 79.1 86.2 false 6/9 2/5 9/14 5/14
overcast 4/9 0/5 σ 6.2 7.9 σ 10.2 9.7 true 3/9 3/5
rainy 3/9 2/5

37 / 43
Bayesian Classification

Example density value

If we are considering a yes outcome when temperature has a value

of 66
We just need to plug x = 66, µ = 73, and σ = 6.2 into the formula
The value of the probability density function is:
2
(66−73)
1 −
f (Temperature = 66|yes) = √ e 2·6.22 = 0.0340
2π · 6.2

38 / 43
Bayesian Classification

Classifying a new day

A new day:

Outlook Temperature Humidity Windy Play

sunny 66 90 true ?

Likelihood of “yes” = 2/9 × 0.0340 × 0.0221 × 3/9 × 9/14 = 0.000036

Likelihood of “no” = 3/5 × 0.0221 × 0.0381 × 3/5 × 5/14 = 0.000108
0.000036
P (“yes 00 ) = = 25%
0.000036 + 0.000108
0.000108
P (“no 00 ) = = 75%
0.000036 + 0.000108

39 / 43
Bayesian Classification

Comments

40 / 43
Bayesian Classification

Missing values

Missing values during training are not included in calculation of

mean and standard deviation

41 / 43
Bayesian Classification

Naive Bayesian Classifier: Comments

Advantages
Easy to implement
Good results obtained in most of the cases
Disadvantages
Assumption: class conditional independence, therefore loss of
accuracy
Practically, dependencies exist among variables
How to deal with these dependencies?
Bayesian Belief Networks

42 / 43
Bayesian Classification

References

J. Han, M. Kamber, Data Mining: Concepts and Techniques, Elsevier

Inc. (2006). (Chapter 6)
I. H. Witten and E. Frank, Data Mining: Practical Machine Learning
Tools and Techniques, 2nd Edition, Elsevier Inc., 2005. (Chapter 6)

43 / 43

State Transition Testing
100% (6)
State Transition Testing
33 pages
Kesnand Phata Wagholi - Bhakti Shakti Terminal: 336 Bus Time Schedule & Line Map
No ratings yet
Kesnand Phata Wagholi - Bhakti Shakti Terminal: 336 Bus Time Schedule & Line Map
5 pages
Traveller - Imperial Secret Service - From White Dwarf 27
100% (2)
Traveller - Imperial Secret Service - From White Dwarf 27
1 page
Microprocessor Vs Microcontroller
No ratings yet
Microprocessor Vs Microcontroller
20 pages
CH 15
No ratings yet
CH 15
21 pages
RFP Volume 2 - Scope of Work
No ratings yet
RFP Volume 2 - Scope of Work
165 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
4_22865_IS465_2019_1__2_1_08ClassBasic
No ratings yet
4_22865_IS465_2019_1__2_1_08ClassBasic
43 pages
Eurocontrol Eassp Specification Vol1 v1.2
No ratings yet
Eurocontrol Eassp Specification Vol1 v1.2
92 pages
Ml Module4 Classification
No ratings yet
Ml Module4 Classification
79 pages
P543-5 - Fonctions de Protection - VerK - SE
No ratings yet
P543-5 - Fonctions de Protection - VerK - SE
254 pages
UNIT- iv
No ratings yet
UNIT- iv
169 pages
Lecture-7 Classification Using Naive Bays
No ratings yet
Lecture-7 Classification Using Naive Bays
19 pages
Konica Minolta Bizhub PRO C6500 Brochure
No ratings yet
Konica Minolta Bizhub PRO C6500 Brochure
8 pages
NaiveBayersClassification BA (1)
No ratings yet
NaiveBayersClassification BA (1)
36 pages
14 - Naive Baysean Classification
No ratings yet
14 - Naive Baysean Classification
20 pages
IR-19 Asgmnt02 PDF
No ratings yet
IR-19 Asgmnt02 PDF
1 page
NUST National University of Sciences and Technology (NUST) School of Electrical Engineering and Computer Science (SEECS)
No ratings yet
NUST National University of Sciences and Technology (NUST) School of Electrical Engineering and Computer Science (SEECS)
1 page
7f9c0 Ami Pay 2020 05 11 Item 4.1d Swift HVP Draft Blueprint
No ratings yet
7f9c0 Ami Pay 2020 05 11 Item 4.1d Swift HVP Draft Blueprint
19 pages
Week 11
No ratings yet
Week 11
15 pages
Lesson 3.6 - Supervised Learning Neural Networks PDF
No ratings yet
Lesson 3.6 - Supervised Learning Neural Networks PDF
97 pages
Convert Dynamics NAV Reports To NAV Extensions
No ratings yet
Convert Dynamics NAV Reports To NAV Extensions
7 pages
Seiko Kinetic Energy Supplier YT02A Manual
100% (2)
Seiko Kinetic Energy Supplier YT02A Manual
1 page
CSC 325 AI Lecture08 Supervised Learning Fall2024 Dr Raheel 20022025 034558pm
No ratings yet
CSC 325 AI Lecture08 Supervised Learning Fall2024 Dr Raheel 20022025 034558pm
29 pages
JCM 125 Q: Magnetic Base Drill Up To 1 1/4 in (32 MM)
No ratings yet
JCM 125 Q: Magnetic Base Drill Up To 1 1/4 in (32 MM)
2 pages
DVG 6004S 6008S FXO Gateway User Manual 58735a9c138c7 PDF
No ratings yet
DVG 6004S 6008S FXO Gateway User Manual 58735a9c138c7 PDF
59 pages
Chapter No. 9
No ratings yet
Chapter No. 9
19 pages
Lesson 2.2 - Frequent Pattern Analysis
No ratings yet
Lesson 2.2 - Frequent Pattern Analysis
54 pages
Lesson 3.1 - Supervised Learning Decision Trees
No ratings yet
Lesson 3.1 - Supervised Learning Decision Trees
51 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Lesson 3.2 - Supervised Learning Evaluation PDF
No ratings yet
Lesson 3.2 - Supervised Learning Evaluation PDF
38 pages
Lesson 3.6 - Supervised Learning Neural Networks
No ratings yet
Lesson 3.6 - Supervised Learning Neural Networks
35 pages
Lesson 4.1 - Unsupervised Learning Partitioning Methods
No ratings yet
Lesson 4.1 - Unsupervised Learning Partitioning Methods
32 pages
Lesson 3.2 - Supervised Learning Evaluation
No ratings yet
Lesson 3.2 - Supervised Learning Evaluation
31 pages
TTDS Lecture 5
No ratings yet
TTDS Lecture 5
8 pages
Data Mining - Classification - Lecture04
No ratings yet
Data Mining - Classification - Lecture04
21 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Module 3- Bayesian Classifier (1)
No ratings yet
Module 3- Bayesian Classifier (1)
17 pages
Naive Bayes.ppt
No ratings yet
Naive Bayes.ppt
24 pages
AI notes
No ratings yet
AI notes
19 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Chapter No. 17
No ratings yet
Chapter No. 17
33 pages
MultiLoad II Preset Eletrônico
No ratings yet
MultiLoad II Preset Eletrônico
4 pages
Chapter No. 16
No ratings yet
Chapter No. 16
27 pages
9-Decision Tree Induction-23-01-2025
No ratings yet
9-Decision Tree Induction-23-01-2025
40 pages
Classification-Clustering
No ratings yet
Classification-Clustering
44 pages
Bayesian
No ratings yet
Bayesian
23 pages
07_Naive_Bayes
No ratings yet
07_Naive_Bayes
6 pages
DWM - Classification-Unit7
No ratings yet
DWM - Classification-Unit7
44 pages
Bayes Classification Method
No ratings yet
Bayes Classification Method
18 pages
Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF
No ratings yet
Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF
41 pages
Lesson 2.1 - Know Your Data PDF
No ratings yet
Lesson 2.1 - Know Your Data PDF
43 pages
2.3 Bayes classification
No ratings yet
2.3 Bayes classification
15 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
Unit6 -3 Classification-Bayesian_e224638f-6bb6-4684-a1a1-adb33ef1b15d
No ratings yet
Unit6 -3 Classification-Bayesian_e224638f-6bb6-4684-a1a1-adb33ef1b15d
15 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
4.0.1.2 Emerging WAN Technologies Instructions-Ok
No ratings yet
4.0.1.2 Emerging WAN Technologies Instructions-Ok
4 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
21 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Bayes Classification Methods
No ratings yet
Bayes Classification Methods
22 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
Naive-By
No ratings yet
Naive-By
23 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
64 pages
23-Naive Bayes
No ratings yet
23-Naive Bayes
22 pages
Lecture12-Ch8-ClassBasic-Part2
No ratings yet
Lecture12-Ch8-ClassBasic-Part2
22 pages
Q-Eye PSC: Features Your Benefits
No ratings yet
Q-Eye PSC: Features Your Benefits
4 pages
Bayesian Learning
No ratings yet
Bayesian Learning
58 pages
Spindle Errors
No ratings yet
Spindle Errors
40 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
Isc Class Xii Computer Science Project Java Programs
No ratings yet
Isc Class Xii Computer Science Project Java Programs
77 pages
MAD Unit 3
100% (1)
MAD Unit 3
19 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
Alibaba Group
100% (1)
Alibaba Group
21 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
Digital Data Transmission-Report
No ratings yet
Digital Data Transmission-Report
9 pages
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
No ratings yet
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
2 pages
IME672 - Lecture 44
No ratings yet
IME672 - Lecture 44
16 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
6 Classification
No ratings yet
6 Classification
53 pages
Lecture 8 - Naive Bayes
No ratings yet
Lecture 8 - Naive Bayes
27 pages
CS3452 Theory of Computation Two Mark Questions 1
No ratings yet
CS3452 Theory of Computation Two Mark Questions 1
47 pages
Dbms Module 1
No ratings yet
Dbms Module 1
48 pages
Classification DMKD
No ratings yet
Classification DMKD
50 pages
Unit-Iv Data Classification: Data Warehousing and Data Mining
No ratings yet
Unit-Iv Data Classification: Data Warehousing and Data Mining
7 pages
3 - Bayesian Classification
No ratings yet
3 - Bayesian Classification
15 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
Ingles 2019 - 2
No ratings yet
Ingles 2019 - 2
9 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Unit-4 DWDM
No ratings yet
Unit-4 DWDM
10 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Circuit Symbol T1
No ratings yet
Circuit Symbol T1
8 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
British American Tobacco Case Study
No ratings yet
British American Tobacco Case Study
3 pages
Lesson 1
No ratings yet
Lesson 1
44 pages
Sap Aba Fiori Ui5
No ratings yet
Sap Aba Fiori Ui5
5 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet

Lesson 3.3 - Supervised Learning Rule Based Classification

Uploaded by

Lesson 3.3 - Supervised Learning Rule Based Classification

Uploaded by

Supervised Learning: Bayesian Classifiers

CS 822 Data Mining

Naive Bayesian classifiers assume that the effect of an attribute

Bayesian belief networks are graphical models

How are these probabilities estimated?

Example. Suppose customers described by attributes age and income

Naive Bayesian Classification

Naive bayes classifier use all the attributes

Naive Bayesian Classification

The naive Bayesian classifier works as follows:

The probability can be derived from Bayes’ theorem:

Naive Bayesian Classification

3 Since P (X ) is constant for all classes, only the follows need to be

Note that the class prior probabilities may be estimated by

Naive Bayesian Classification

We can estimate the probabilities P (xk |Ci ) from the training

Naive Bayesian Classification

and P (xk |Ci ) is

Naive Bayesian Classification

Naive Bayesian Classification

5 The classifier predicts that the class label of instance X is the

We wish to predict the class label of a instance using Naive

The instance we wish to classify is

We need to maximize P (X |Ci )P (Ci ), for i = 1, 2.

The probability of each class:

The conditional probabilities P (X |Ci ) for i = 1, 2:

Using the above probabilities, we obtain:

To find the class we compute P (X |Ci )P (Ci ):

Therefore, the naive Bayesian classifier predicts buys computer =

Avoiding the 0-Probability Problem

We need to compute P (X |Ci ) for each class (i = 1, 2, · · · , m) in order

Naive Bayesian prediction requires each conditional probability be

Avoiding the 0-Probability Problem

Avoiding the 0-Probability Problem

Laplacian correction (Laplacian estimator)

Avoiding the 0-Probability Problem

Use Laplacian correction (or Laplacian estimator)

The “corrected” prob. estimates are close to their “uncorrected”

Avoiding the 0-Probability Problem

Avoiding the 0-Probability Problem

In this way, we instead obtain the following probabilities (rounded

Example 2: Weather Problem

Outlook Temperature Humidity Windy Play

yes no yes no yes no yes no yes no

sunny 2 3 hot 2 2 high 3 4 false 6 2 9 5

Probabilities for weather data

likelihood of yes = 2/9 × 3/9 × 3/9 × 3/9 × 9/14 = 0.0053

Conversion into a probability by normalization:

The hypothesis H (class) is that

Weather data example

The “zero-frequency problem”

What if an attribute value doesn’t occur with every class value?

A posteriori probability will also be zero!

(No matter how likely the other values are!)

Modified probability estimates

In some cases adding a constant different from 1 might be more

Training: instance is not included in frequency count for attribute

Likelihood of “yes” = 3/9 × 3/9 × 3/9 × 9/14 = 0.0238

Usual assumption: attributes have a normal or Gaussian

Statistics for weather data

Outlook Temperature Humidity Windy Play

yes no yes no yes no yes no yes no

Example density value

If we are considering a yes outcome when temperature has a value

Classifying a new day

Outlook Temperature Humidity Windy Play

Likelihood of “yes” = 2/9 × 0.0340 × 0.0221 × 3/9 × 9/14 = 0.000036

Missing values during training are not included in calculation of

Naive Bayesian Classifier: Comments

J. Han, M. Kamber, Data Mining: Concepts and Techniques, Elsevier

You might also like