0% found this document useful (0 votes)

88 views19 pages

Chapter 4: Classification & Prediction: 4.1 Basic Concepts of Classification and Prediction 4.2 Decision Tree Induction

This document discusses Bayes classification methods for classification and prediction problems. It covers Naive Bayesian classification, which makes a strong independence assumption between attributes. The algorithm calculates the posterior probability that a tuple belongs to a class using Bayes' theorem. It estimates probabilities for categorical and continuous attributes from training data. An example demonstrates how to classify a new tuple by finding the class with the highest posterior probability.

Uploaded by

Qama

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views19 pages

Chapter 4: Classification & Prediction: 4.1 Basic Concepts of Classification and Prediction 4.2 Decision Tree Induction

Uploaded by

Qama

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Chapter 4: Classification & Prediction

}  4.1 Basic Concepts of Classification and Prediction

}  4.2 Decision Tree Induction
4.2.1 The Algorithm
4.2.2 Attribute Selection Measures
4.2.3 Tree Pruning
4.2.4 Scalability and Decision Tree Induction
}  4.3 Bayes Classification Methods
2.3.1 Naïve Bayesian Classification
2.3.2 Note on Bayesian Belief Networks
}  4.4 Rule Based Classification
}  4.5 Lazy Learners
}  4.6 Prediction
}  4.7 How to Evaluate and Improve Classification
4.3 Bayes Classification Methods
}  What are Bayesian Classifiers?
"  Statistical classifiers
"  Predict class membership probabilities: probability of a given tuple
belonging to a particular class
"  Based on Bayes’ Theorem
}  Characteristics?
"  Comparable performance with decision tree and selected neural
network classifiers
}  Bayesian Classifiers
"  Naïve Bayesian Classifiers
  Assume independency between the effect of a given attribute
on a given class and the other values of other attributes
"  Bayesian Belief Networks
  Graphical models
  Allow the representation of dependencies among subsets of
attributes
Bayes’ Theorem In the Classification Context
}  X is a data tuple. In Bayesian term it is considered “evidence”
}  H is some hypothesis that X belongs to a specified class C

P( X | H ) P( H )
P( H | X ) =
P( X )
}  P(H|X) is the posterior probability of H conditioned on X
Example: predict whether a costumer will buy a computer or not
"  Costumers are described by two attributes: age and income
"  X is a 35 years-old costumer with an income of 40k
"  H is the hypothesis that the costumer will buy a computer
"  P(H|X) reflects the probability that costumer X will buy a computer
given that we know the costumers’ age and income
Bayes’ Theorem In the Classification Context
}  X is a data tuple. In Bayesian term it is considered “evidence”
}  H is some hypothesis that X belongs to a specified class C

P( X | H ) P( H )
P( H | X ) =
P( X )
}  P(X|H) is the posterior probability of X conditioned on H
Example: predict whether a costumer will buy a computer or not
"  Costumers are described by two attributes: age and income
"  X is a 35 years-old costumer with an income of 40k
"  H is the hypothesis that the costumer will buy a computer
"  P(X|H) reflects the probability that costumer X, is 35 years-old and
earns 40k, given that we know that the costumer will buy a
computer
Bayes’ Theorem In the Classification Context
}  X is a data tuple. In Bayesian term it is considered “evidence”
}  H is some hypothesis that X belongs to a specified class C

P( X | H ) P( H )
P( H | X ) =
P( X )
}  P(H) is the prior probability of H
Example: predict whether a costumer will buy a computer or not
"  H is the hypothesis that the costumer will buy a computer
"  The prior probability of H is the probability that a costumer will buy
a computer, regardless of age, income, or any other information
for that matter
"  The posterior probability P(H|X) is based on more information than
the prior probability P(H) which is independent from X
Bayes’ Theorem In the Classification Context
}  X is a data tuple. In Bayesian term it is considered “evidence”
}  H is some hypothesis that X belongs to a specified class C

P( X | H ) P( H )
P( H | X ) =
P( X )
}  P(X) is the prior probability of X
Example: predict whether a costumer will buy a computer or not
"  Costumers are described by two attributes: age and income
"  X is a 35 years-old costumer with an income of 40k
"  P(X) is the probability that a person from our set of costumers is 35
years-old and earns 40k
Naïve Bayesian Classification
D: A training set of tuples and their associated class labels
Each tuple is represented by n-dimensional vector X(x1,…,xn), n
measurements of n attributes A1,…,An
Classes: suppose there are m classes C1,…,Cm
Principle
}  Given a tuple X, the classifier will predict that X belongs to the
class having the highest posterior probability conditioned on X
}  Predict that tuple X belongs to the class Ci if and only if

P(Ci | X ) > P(C j | X ) for 1 ≤ j ≤ m, j ≠ i

}  Maximize P(Ci|X): find the maximum posteriori hypothesis

P( X | Ci ) P(Ci )
P(Ci | X ) =
P( X )
}  P(X) is constant for all classes, thus, maximize P(X|Ci)P(Ci)
Naïve Bayesian Classification
}  To maximize P(X|Ci)P(Ci), we need to know class prior
probabilities
"  If the probabilities are not known, assume that P(C1)=P(C2)=…=P
(Cm) ⇒ maximize P(X|Ci)
"  Class prior probabilities can be estimated by P(Ci)=|Ci,D|/|D|
}  Assume Class Conditional Independence to reduce
computational cost of P(X|Ci)
"  given X(x1,…,xn), P(X|Ci) is:
n
P ( X | Ci ) = ∏ P ( x k | C i )
k =1

= P ( x1 | Ci ) × P ( x2 | Ci ) × ... × P ( xn | Ci )

"  The probabilities P(x1|Ci), …P(xn|Ci) can be estimated from the

training tuples
Estimating P(xi|Ci)
}  Categorical Attributes
"  Recall that xk refers to the value of attribute Ak for tuple X
"  X is of the form X(x1,…,xn)
"  P(xk|Ci) is the number of tuples of class Ci in D having the value xk
for Ak, divided by |Ci,D|, the number of tuples of class Ci in D
"  Example
  8 costumers in class Cyes (costumer will buy a computer)
  3 costumers among the 8 costumers have high income
  P(income=high|Cyes) the probability of a costumer having a high
income knowing that he belongs to class Cyes is 3/8
}  Continuous-Valued Attributes
"  A continuous-valued attribute is assumed to have a Gaussian
(Normal) distribution with mean µ and standard deviation σ
( x−µ )2
1 −
2σ 2
g ( x, µ , σ ) = e
2
2πσ
Estimating P(xi|Ci)
}  Continuous-Valued Attributes
"  The probability P(xk|Ci) is given by:

P( xk | Ci ) = g ( xk , µCi , σ Ci )
"  Estimate µCi and σCi the mean and standard variation of the values
of attribute Ak for training tuples of class Ci
"  Example
  X a 35 years-old costumer with an income of 40k (age, income)
  Assume the age attribute is continuous-valued
  Consider class Cyes (the costumer will buy a computer)
  We find that in D, the costumers who will buy a computer are
38±12 years of age ⇒ µCyes=38 and σCyes=12

P(age = 35 | C yes ) = g (35,38,12)

Example
RID age income student credit-rating class:buy_computer

1 youth high no fair no

2 youth high no excellent no
3 middle-aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle-aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle-aged medium no excellent yes
13 middle-aged high yes fair yes
14 senior medium no excellent no

Tuple to classify is

X (age=youth, income=medium, student=yes, credit=fair)

Maximize P(X|Ci)P(Ci), for i=1,2

Example
Given X (age=youth, income=medium, student=yes, credit=fair)
Maximize P(X|Ci)P(Ci), for i=1,2
First step: Compute P(Ci). The prior probability of each class can be
computed based on the training tuples:
P(buys_computer=yes)=9/14=0.643
P(buys_computer=no)=5/14=0.357
Second step: compute P(X|Ci) using the following conditional prob.
P(age=youth|buys_computer=yes)=0.222
P(age=youth|buys_computer=no)=3/5=0.666
P(income=medium|buys_computer=yes)=0.444
P(income=medium|buys_computer=no)=2/5=0.400
P(student=yes|buys_computer=yes)=6/9=0.667
P(tudent=yes|buys_computer=no)=1/5=0.200
P(credit_rating=fair|buys_computer=yes)=6/9=0.667
P(credit_rating=fair|buys_computer=no)=2/5=0.400
Example
P(X|buys_computer=yes)= P(age=youth|buys_computer=yes)×
P(income=medium|buys_computer=yes) ×
P(student=yes|buys_computer=yes) ×
P(credit_rating=fair|buys_computer=yes)
= 0.044
P(X|buys_computer=no)= P(age=youth|buys_computer=no)×
P(income=medium|buys_computer=no) ×
P(student=yes|buys_computer=no) ×
P(credit_rating=fair|buys_computer=no)
= 0.019
Third step: compute P(X|Ci)P(Ci) for each class
P(X|buys_computer=yes)P(buys_computer=yes)=0.044 ×0.643=0.028
P(X|buys_computer=no)P(buys_computer=no)=0.019 ×0.357=0.007

The naïve Bayesian Classifier predicts buys_computer=yes for tuple X

Avoiding the 0-Probability Problem
}  Naïve Bayesian prediction requires each conditional prob. be non-
zero. Otherwise, the predicted prob. will be zero
n
P( X | C i ) = ∏ P( x k | C i )
k =1

}  Ex. Suppose a dataset with 1000 tuples, income=low (0), income=
medium (990), and income = high (10),
}  Use Laplacian correction (or Laplacian estimator)
"  Adding 1 to each case
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003
"  The “corrected” prob. estimates are close to their “uncorrected”
counterparts
Summary of Section 4.3
}  Advantages
"  Easy to implement
"  Good results obtained in most of the cases
}  Disadvantages
"  Assumption: class conditional independence, therefore loss of
accuracy
"  Practically, dependencies exist among variables
  E.g., hospitals: patients: Profile: age, family history, etc.
  Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
  Dependencies among these cannot be modeled by Naïve
Bayesian Classifier
}  How to deal with these dependencies?
"  Bayesian Belief Networks
4.3.2 Bayesian Belief Networks
}  Bayesian belief network allows a subset of the variables
conditionally independent
}  A graphical model of causal relationships
"  Represents dependency among the variables
"  Gives a specification of joint probability distribution

"  Nodes: random variables

"  Links: dependency
X Y "  X and Y are the parents of Z, and Y is
the parent of P
Z "  No dependency between Z and P
P
"  Has no loops or cycles
Example

The conditional probability table

Family (CPT) for variable LungCancer:
Smoker
History
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

LC 0.8 0.5 0.7 0.1

~LC 0.2 0.5 0.3 0.9
LungCancer Emphysema
CPT shows the conditional probability
for each possible combination of its
parents
Derivation of the probability of a
PositiveXRay Dyspnea particular combination of values of
X, from CPT:
n
Bayesian Belief Networks P( x1 ,..., xn ) = ∏ P( xi | Parents(Y i))
i =1
17
Training Bayesian Networks
}  Several scenarios:

"  Given both the network structure and all variables observable: learn
only the CPTs

"  Network structure known, some hidden variables: gradient descent

(greedy hill-climbing) method, analogous to neural network learning

"  Network structure unknown, all variables observable: search through

the model space to reconstruct network topology

"  Unknown structure, all hidden variables: No good algorithms known

for this purpose
Summary of Section 4.3

}  Bayesian Classifiers are statistical classifiers

}  They provide good accuracy

}  Naïve Bayesian classifier assumes independency between

attributes

}  Causal relations are captured by Bayesian Belief Networks

UNIT II Probability Distribution 1
No ratings yet
UNIT II Probability Distribution 1
33 pages
CS2 Booklet 8 (Time Series) 2019 FINAL
100% (3)
CS2 Booklet 8 (Time Series) 2019 FINAL
168 pages
AI Notes
No ratings yet
AI Notes
19 pages
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
No ratings yet
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
2 pages
Lecture 8 - Naive Bayes
No ratings yet
Lecture 8 - Naive Bayes
27 pages
Baye's Rule and Its Use
No ratings yet
Baye's Rule and Its Use
9 pages
23-Naive Bayes
No ratings yet
23-Naive Bayes
22 pages
CS-DM Module-4
No ratings yet
CS-DM Module-4
22 pages
Lab6 - Naive Bayes Classification
No ratings yet
Lab6 - Naive Bayes Classification
4 pages
Bayesian Classification - Problem
No ratings yet
Bayesian Classification - Problem
4 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Classification DMKD
No ratings yet
Classification DMKD
50 pages
Lecture12 Ch8 ClassBasic Part2
No ratings yet
Lecture12 Ch8 ClassBasic Part2
22 pages
Bayes Classification Methods
No ratings yet
Bayes Classification Methods
22 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
WINSEM2019-20 CSE3013 ETH VL2019205006650 Reference Material I 21-Feb-2020 BAYES RULE AND ITS USE
No ratings yet
WINSEM2019-20 CSE3013 ETH VL2019205006650 Reference Material I 21-Feb-2020 BAYES RULE AND ITS USE
10 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Bayes Classification Method
No ratings yet
Bayes Classification Method
18 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
2.3 Bayes Classification
No ratings yet
2.3 Bayes Classification
15 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
Classification - Naive Bayes Classifier: DR - Aruna Malapati Asst Professor Dept of CS & IT BITS Pilani, Hyderabad Campus
No ratings yet
Classification - Naive Bayes Classifier: DR - Aruna Malapati Asst Professor Dept of CS & IT BITS Pilani, Hyderabad Campus
9 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Lecture2 Both Part Merged
No ratings yet
Lecture2 Both Part Merged
25 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
l18 Irsw Pir
No ratings yet
l18 Irsw Pir
36 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Unit-Iv Data Classification: Data Warehousing and Data Mining
No ratings yet
Unit-Iv Data Classification: Data Warehousing and Data Mining
7 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Data Classification and Prediction : Lecture-11
No ratings yet
Data Classification and Prediction : Lecture-11
36 pages
Unit 4
No ratings yet
Unit 4
186 pages
Classification
No ratings yet
Classification
73 pages
3 - Bayesian Classification
No ratings yet
3 - Bayesian Classification
15 pages
Unit-4 DWDM
No ratings yet
Unit-4 DWDM
10 pages
IME672 - Lecture 44
No ratings yet
IME672 - Lecture 44
16 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
Naïve Bayes Classification (Cont.)
No ratings yet
Naïve Bayes Classification (Cont.)
22 pages
Naive by
No ratings yet
Naive by
23 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Data Mining - Classification - Lecture04
No ratings yet
Data Mining - Classification - Lecture04
21 pages
8 - Classification NaiveBayes PDF
No ratings yet
8 - Classification NaiveBayes PDF
13 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
DWM Exp5 C49
No ratings yet
DWM Exp5 C49
12 pages
Classification Clustering
No ratings yet
Classification Clustering
44 pages
Slide 07 Chapter8 Classification Basic Concept
No ratings yet
Slide 07 Chapter8 Classification Basic Concept
55 pages
Module 3 - Bayesian Classifier
No ratings yet
Module 3 - Bayesian Classifier
17 pages
Lesson 3.3 - Supervised Learning Rule Based Classification
No ratings yet
Lesson 3.3 - Supervised Learning Rule Based Classification
43 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
Classification
100% (1)
Classification
37 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
Bayesian Learning
No ratings yet
Bayesian Learning
58 pages
6 Classification
No ratings yet
6 Classification
53 pages
Naive Bayes
No ratings yet
Naive Bayes
7 pages
DWDM Unit 3 Part 2
No ratings yet
DWDM Unit 3 Part 2
8 pages
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
No ratings yet
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
26 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Calculus Super Review
From Everand
Calculus Super Review
Editors of REA
No ratings yet
The Green Book of Mathematical Problems
From Everand
The Green Book of Mathematical Problems
Kenneth Hardy
4.5/5 (3)
Chapter 8 Excercise 3
No ratings yet
Chapter 8 Excercise 3
1 page
Chapter 8 Excercise 2
No ratings yet
Chapter 8 Excercise 2
1 page
What Is Saas?
No ratings yet
What Is Saas?
3 pages
Decision Trees / NLP
No ratings yet
Decision Trees / NLP
27 pages
IT-318: Scalable and Cloud Computing: Programming at Scale Concurrency and Consistency
No ratings yet
IT-318: Scalable and Cloud Computing: Programming at Scale Concurrency and Consistency
37 pages
Intel IT Architecting SaaS
No ratings yet
Intel IT Architecting SaaS
40 pages
IT-318: Cloud Computing: What Is The Cloud?
No ratings yet
IT-318: Cloud Computing: What Is The Cloud?
68 pages
STA360/601 Midterm Solutions
No ratings yet
STA360/601 Midterm Solutions
6 pages
Probability!
No ratings yet
Probability!
60 pages
2nd Stat and Prob (Special Exam)
No ratings yet
2nd Stat and Prob (Special Exam)
4 pages
B.SC Statistics
No ratings yet
B.SC Statistics
16 pages
R - Packages With Applications From Complete and Censored Samples
No ratings yet
R - Packages With Applications From Complete and Censored Samples
43 pages
Tutorial On Higher Order Statistics
No ratings yet
Tutorial On Higher Order Statistics
28 pages
Statif - 2 - Slides Probability I
No ratings yet
Statif - 2 - Slides Probability I
14 pages
Session 1 Practice Prbolems Solutions
No ratings yet
Session 1 Practice Prbolems Solutions
7 pages
ECN 2331 Tutorial Sheet 2 - 26.04
No ratings yet
ECN 2331 Tutorial Sheet 2 - 26.04
4 pages
Baye's Theorem Questions and Answers - Sanfoundry
No ratings yet
Baye's Theorem Questions and Answers - Sanfoundry
5 pages
Information Theory &amp Coding (ECE) by Nitin Mittal
0% (1)
Information Theory &amp Coding (ECE) by Nitin Mittal
15 pages
Ken Black QA ch03
0% (1)
Ken Black QA ch03
61 pages
Lecture 2 EE675
No ratings yet
Lecture 2 EE675
4 pages
Lesson 4: Mean and Variance of A Discrete Random Variables
No ratings yet
Lesson 4: Mean and Variance of A Discrete Random Variables
11 pages
10 HWsol
No ratings yet
10 HWsol
2 pages
Introduction To Bayesian Inference: M. Botje NIKHEF, PO Box 41882, 1009DB Amsterdam, The Netherlands June 21, 2006
No ratings yet
Introduction To Bayesian Inference: M. Botje NIKHEF, PO Box 41882, 1009DB Amsterdam, The Netherlands June 21, 2006
68 pages
Gaussian Random Field
No ratings yet
Gaussian Random Field
24 pages
Probability Theory For Data Science Week 1
No ratings yet
Probability Theory For Data Science Week 1
60 pages
Chapter 5 Decision Theory
No ratings yet
Chapter 5 Decision Theory
85 pages
MC 103 Statistics For Business Decisions 61011104
No ratings yet
MC 103 Statistics For Business Decisions 61011104
4 pages
Introduction To Lévy Processes
No ratings yet
Introduction To Lévy Processes
8 pages
6450 Bayesian Final Project Report - Team 2
No ratings yet
6450 Bayesian Final Project Report - Team 2
15 pages
Games and Information Assignment 1
No ratings yet
Games and Information Assignment 1
3 pages
Waiting Line
No ratings yet
Waiting Line
6 pages
@ CHPTR 03
No ratings yet
@ CHPTR 03
43 pages
Matemáticas para El Análisis Económico
No ratings yet
Matemáticas para El Análisis Económico
33 pages
Basic Probability Concepts
No ratings yet
Basic Probability Concepts
6 pages