0% found this document useful (0 votes)
14 views23 pages

Bayesian

The document provides an overview of Bayesian Classification, detailing its principles, including Bayes Theorem and the Naïve Bayesian Classifier, which simplifies computations through the assumption of class conditional independence. It also discusses the k-Nearest Neighbor algorithm and Case-Based Reasoning as alternative classification methods, while addressing classifier accuracy and techniques for handling class-imbalanced datasets. The document emphasizes the importance of probabilistic learning and the practical applications of Bayesian methods in data classification.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views23 pages

Bayesian

The document provides an overview of Bayesian Classification, detailing its principles, including Bayes Theorem and the Naïve Bayesian Classifier, which simplifies computations through the assumption of class conditional independence. It also discusses the k-Nearest Neighbor algorithm and Case-Based Reasoning as alternative classification methods, while addressing classifier accuracy and techniques for handling class-imbalanced datasets. The document emphasizes the importance of probabilistic learning and the practical applications of Bayesian methods in data classification.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Classification-Bayesian

Classification
Dr. Manish Kumar
Associate Professor
Chair: Data Analytics Lab & M.Tech (Data Engg.)
Department of Information Technology
Indian Institute of Information Technology-Allahabad, Prayagraj
Bayesian Classification
Bayesian Classification
What are Bayesian Classifiers?
▪ Statistical Classifiers
▪ Predict class membership probabilities
▪ Based on Bayes Theorem
▪ Naïve Bayesian Classifier
▪ Computationally Simple
▪ Comparable performance with DT and NN
classifiers
Bayesian Classification
▪ Probabilistic learning: Calculate explicit
probabilities for hypothesis, among the most
practical approaches to certain types of learning
problems
▪ Incremental: Each training example can
incrementally increase/decrease the probability that
a hypothesis is correct. Prior knowledge can be
combined with observed data.
Bayes Theorem
▪ Let X be a data sample whose class label is
unknown
▪ Let H be some hypothesis that X belongs to a
class C
▪ For classification determine P(H/X)
▪ P(H/X) is the probability that H holds given the
observed data sample X
▪ P(H/X) is posterior probability of H conditioned on
X
Bayes Theorem
Example: Sample space: All Fruits described by their
color and shape
X is “round” and “red”
H= hypothesis that X is an Apple
P(H/X) is our confidence that X is an apple given
that X is “round” and “red”
▪ P(H) is Prior Probability of H, ie, the probability
that any given data sample is an apple regardless
of how it looks
▪ P(H/X) is based on more information
▪ Note that P(H) is independent of X
Bayes Theorem
Example: Sample space: All Fruits
▪ P(X/H) ?
▪ It is the probability that X is round and
red given that we know that it is true
that X is an apple
▪ Here P(X) is prior probability =
P(data sample from our set of fruits is
red and round)
Estimating Probabilities
▪ P(X), P(H), and P(X/H) may be estimated
from given data
▪ Bayes Theorem

▪ Use of Bayes Theorem in Naïve Bayesian


Classifier!!
Naïve Bayesian Classification
▪ Also called Simple BC

▪ Class Conditional Independence


Effect of an attribute values on a given class is
independent of the values of other attributes
▪ This assumption simplifies computations
Naïve Bayesian Classification
Steps Involved
1. Each data sample is of the type
X=(xi) i =1(1)n, where xi is the values of X for
attribute Ai
2. Suppose there are m classes C i, i=1(1)m.
X ∈ Ci iff
P(Ci|X) > P(Cj|X) for 1≤ j ≤ m, j≠i
i.e BC assigns X to class Ci having highest
posterior probability conditioned on X
Naïve Bayesian Classification
The class for which P(Ci|X) is maximized is called the
maximum posteriori hypothesis.maximum posterior hypothesis.
From Bayes Theorem
P ( Ci | X ) = P ( X | Ci ) P ( Ci )
P( X )
3. P(X) is constant. Only need be maximized.
▪ If class prior probabilities not known, then assume all
classes to be equally likely i.e. P(C1)=P(C2)=…=P(Cm)
therefore maximize P(X/Ci )
▪ Otherwise maximize
P(Ci) = Si/S
Problem: computing P(X|Ci) is computationally expensive
(may be infeasible)
Naïve Bayesian Classification
4. Naïve assumption: attribute independence
(class conditional independence)
= P(x1,…,xn|C) = Π P(xk|C) over k=1 to n
5. In order to classify an unknown sample X,
evaluate for each class Ci. Sample X
is assigned to the class Ci iff
P(X|Ci)P(Ci) > P(X|Cj) P(Cj) for 1≤ j ≤ m, j≠i
Naïve Bayesian Classification
Example
Age Income Student Credit_rating Class:Buys_comp
<=30 HIGH N FAIR N
<=30 HIGH N EXCELLENT N
31…..40 HIGH N FAIR Y
>40 MEDIUM N FAIR Y
>40 LOW Y FAIR Y
>40 LOW Y EXCELLENT N
31…..40 LOW Y EXCELLENT Y
<=30 MEDIUM N FAIR N
<=30 LOW Y FAIR Y
>40 MEDIUM Y FAIR Y
<=30 MEDIUM Y EXCELLENT Y
31….40 MEDIUM N EXCELLENT Y
31….40 HIGH Y FAIR Y
>40 MEDIUM N EXCELLENT N
Naïve Bayesian Classification
Example
X= (<=30,MEDIUM, Y,FAIR, ???)
We need to max.
P(X|Ci)P(Ci) for i =1,2.
P(Ci) is computed from training sample
P(buys_comp=Y) = 9/14 = 0.643
P(buys_comp=N) = 5/14 = 0.357
How to calculate P(X|Ci)P(Ci) for i=1,2?
P(X|Ci) = P(x1, x2, x3, x4|C) = ΠP(xk|C)
Naïve Bayesian Classification
Example
P(age<=30 | buys_comp=Y)=2/9=0.222
P(age<=30 | buys_comp=N)=3/5=0.600
P(income=medium | buys_comp=Y)=4/9=0.444
P(income=medium | buys_comp=N)=2/5=0.400
P(student=Y | buys_comp=Y)=6/9=0.667
P(student=Y | buys_comp=N)=1/5=0.200
P(credit_rating=FAIR | buys_comp=Y)=6/9=0.667
P(credit_rating=FAIR | buys_comp=N)=2/5=0.400
Naïve Bayesian Classification
Example
P(X | buys_comp=Y)=0.222*0.444*0.667*0.667=0.044
P(X | buys_comp=N)=0.600*0.400*0.200*0.400=0.019

P(X | buys_comp=Y)P(buys_comp=Y) = 0.044*0.643=0.028


P(X | buys_comp=N)P(buys_comp=N) = 0.019*0.357=0.007

CONCLUSION: Bayesian classifier predicts buys_comp=Y for


sample X. X buys computer
Bayesian Belief Networks
▪ Naïve BC assumes Class Conditional Independence
▪ This assumption simplifies computations
▪ When this assumption holds true, Naïve BC is most
accurate compared to all other classifiers
▪ In real problems, dependencies do exist between variables
▪ 2 methods to overcome this limitation of NBC
▪ Bayesian networks, that combine Bayesian reasoning
with causal relationships between attributes
▪ Decision trees, that reason on one attribute at the
time, considering most important attributes first
Bayesian Belief Networks
Known as
▪ Belief Networks
▪ Bayesian Networks
▪ Probabilistic Networks
has 2 components
▪ Directed Acyclic Graph (DAG)
▪ Conditional Probability Table (CPT)
The k-Nearest Neighbor Algorithm
▪ All instances correspond to points in the n-D space.
▪ The nearest neighbor are defined in terms of Euclidean
distance.
▪ Euclidean distance between two points, X = (x1,x2,…,xn) and
Y = (y1,y2,…,yn) is d(X,Y)= √(Σi=1n (xi-yi)2)
▪ The target function could be discrete- or real- valued.
▪ For discrete-valued, the k-NN returns the most common
value among the k training examples nearest to xq .
▪ The k-NN algorithm for continuous-valued target functions
▪ Calculate the mean values of the k nearest neighbors
The k-Nearest Neighbor Algorithm
▪ Distance-weighted nearest neighbor algorithm
▪ Weight the contribution of each of the k neighbors
according to their distance to the query point xq. (giving
greater weight to closer neighbors)
▪ Nearest neighbor classifiers are lazy learners: they store all
of the training samples and do not build a classifier until a
new sample needs to be classified
▪ Robust to noisy data by averaging k-nearest neighbors.
▪ Curse of dimensionality: distance between neighbors could
be dominated by irrelevant attributes
▪ To overcome it, elimination of the least relevant
attributes.
Case-Based Reasoning
▪ Also uses: lazy evaluation + analyze similar instances
▪ Difference: Instances are not “points in a Euclidean space”
▪ Methodology
▪ Instances represented by rich symbolic descriptions
(e.g., function graphs)
▪ Multiple retrieved cases may be combined
▪ Tight coupling between case retrieval, knowledge-based
reasoning, and problem solving
▪ Research issues
Indexing based on syntactic similarity measure, and when
failure, backtracking, and adapting to additional cases
Classifier Accuracy

▪ How it can be measured?


▪ Holdout Method (Random Sub sampling)
▪ K-fold Cross Validation
▪ Bootstrapping
▪ How we can improve classifier Accuracy?
▪ Bagging
▪ Boosting
▪ Is accuracy enough to judge a classifier?
Classification of Class-Imbalanced Data Sets
• Class-imbalance problem: Rare positive example but numerous negative
ones, e.g., medical diagnosis, fraud, oil-spill, fault, etc.
• Traditional methods assume a balanced distribution of classes and
equal error costs: not suitable for class-imbalanced data
• Typical methods for imbalance data in 2-class classification:
– Oversampling: re-sampling of data from positive class
– Under-sampling: randomly eliminate tuples from negative class
– Threshold-moving: moves the decision threshold, t, so that the rare
class tuples are easier to classify, and hence, less chance of false
negative errors
– Ensemble techniques: Ensemble multiple classifiers introduced
above

23

You might also like