Resentation On Aïve Bayesian Lassification
Resentation On Aïve Bayesian Lassification
http://ashrafsau.blogspot.in/
Ashraf Uddin
Sujit Singh
Chetanya Pratap Singh
South Asian University
(Master of Computer Application)
http://ashrafsau.blogspot.in/
OUTLINE
http://ashrafsau.blogspot.in/
Classification Example
http://ashrafsau.blogspot.in/
classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying new
data
Typical Applications
credit approval
target marketing
medical diagnosis
treatment effectiveness analysis
A TWO STEP PROCESS
Model construction: describing a set of predetermined
classes
Each tuple/sample is assumed to belong to a predefined
http://ashrafsau.blogspot.in/
class, as determined by the class label attribute
The set of tuples used for model construction: training set
The model is represented as classification rules, decision
trees, or mathematical formulae
Model usage: for classifying future or unknown objects
Estimate accuracy of the model
The known label of test sample is compared with the
classified result from the model
Accuracy rate is the percentage of test set samples that
are correctly classified by the model
Test set is independent of training set, otherwise over-fitting
will occur
INTRODUCTION TO BAYESIAN CLASSIFICATION
What is it ?
http://ashrafsau.blogspot.in/
Statistical method for classification.
Supervised Learning Method.
Assumes an underlying probabilistic model, the Bayes
theorem.
Can solve problems involving both categorical and
continuous valued attributes.
Named after Thomas Bayes, who proposed the Bayes
Theorem.
http://ashrafsau.blogspot.in/
THE BAYES THEOREM
http://ashrafsau.blogspot.in/
P(H|X) : Probability that the customer will buy a computer given that
we know his age, credit rating and income. (Posterior Probability of
H)
P(H) : Probability that the customer will buy a computer regardless
of age, credit rating, income (Prior Probability of H)
P(X|H) : Probability that the customer is 35 yrs old, have fair credit
rating and earns $40,000, given that he has bought our computer
(Posterior Probability of X)
P(X) : Probability that a person from our set of customers is 35 yrs
old, have fair credit rating and earns $40,000. (Prior Probability of X)
http://ashrafsau.blogspot.in/
BAYESIAN CLASSIFIER
http://ashrafsau.blogspot.in/
NAÏVE BAYESIAN CLASSIFIER...
http://ashrafsau.blogspot.in/
NAÏVE BAYESIAN CLASSIFIER...
“ZERO” PROBLEM
What if there is a class, Ci, and X has an attribute
value, xk, such that none of the samples in Ci has
http://ashrafsau.blogspot.in/
that attribute value?
http://ashrafsau.blogspot.in/
dimensional vector is small.
This can lead to numerical under flow.
http://ashrafsau.blogspot.in/
LOG SUM-EXP TRICK
BASIC ASSUMPTION
The Naïve Bayes assumption is that all the features
are conditionally independent given the class label.
http://ashrafsau.blogspot.in/
Even though this is usually false (since features are
usually dependent)
EXAMPLE: CLASS-LABELED TRAINING TUPLES
FROM THE CUSTOMER DATABASE
http://ashrafsau.blogspot.in/
http://ashrafsau.blogspot.in/
EXAMPLE…
http://ashrafsau.blogspot.in/
EXAMPLE…
USES OF NAÏVE BAYES CLASSIFICATION
Text Classification
Spam Filtering
http://ashrafsau.blogspot.in/
Hybrid Recommender System
Recommender Systems apply machine learning and
data mining techniques for filtering unseen
information and can predict whether a user would like
a given resource
Online Application
Simple Emotion Modeling
TEXT CLASSIFICATION – AN
APPLICATION OF NAIVE BAYES
CLASSIFIER
http://ashrafsau.blogspot.in/
WHY TEXT CLASSIFICATION?
http://ashrafsau.blogspot.in/
Classify web pages by topic
Information extraction
Internet filters
EXAMPLES OF TEXT CLASSIFICATION
CLASSES=BINARY
“spam” / “not spam”
http://ashrafsau.blogspot.in/
CLASSES =TOPICS
“finance” / “sports” / “politics”
CLASSES =OPINION
“like” / “hate” / “neutral”
CLASSES =TOPICS
“AI” / “Theory” / “Graphics”
CLASSES =AUTHOR
“Shakespeare” / “Marlowe” / “Ben Jonson”
EXAMPLES OF TEXT CLASSIFICATION
Classify news stories as
world, business, SciTech, Sports ,Health etc
Classify email as spam / not spam
http://ashrafsau.blogspot.in/
Classify business names by industry
Classify email to tech stuff as Mac, windows etc
Classify pdf files as research , other
Classify movie reviews as
favorable, unfavorable, neutral
Classify documents
Classify technical papers as
Interesting, Uninteresting
Classify Jokes as Funny, Not Funny
Classify web sites of companies by Standard
Industrial Classification (SIC)
NAÏVE BAYES APPROACH
Build the Vocabulary as the list of all distinct words that
appear in all the documents of the training set.
http://ashrafsau.blogspot.in/
Remove stop words and markings
The words in the vocabulary become the
attributes, assuming that classification is independent of
the positions of the words
Each document in the training set becomes a record
with frequencies for each word in the Vocabulary.
Train the classifier based on the training data set, by
computing the prior probabilities for each class and
attributes.
Evaluate the results on Test data
REPRESENTING TEXT: A LIST OF WORDS
http://ashrafsau.blogspot.in/
Common Refinements: Remove Stop
Words, Symbols
TEXT CLASSIFICATION ALGORITHM:
NAÏVE BAYES
http://ashrafsau.blogspot.in/
Tct – Number of particular word in particular class
Tct’ – Number of total words in particular class
http://ashrafsau.blogspot.in/
oProbability of Yes Class is more than that of No
Class, Hence this document will go into Yes Class.
Advantages and Disadvantages of Naïve Bayes
Advantages :
Easy to implement
Requires a small amount of training data to estimate
the parameters
http://ashrafsau.blogspot.in/
Good results obtained in most of the cases
Disadvantages:
Assumption: class conditional independence, therefore
loss of accuracy
Practically, dependencies exist among variables
E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
Dependencies among these cannot be modelled by Naïve
Bayesian Classifier
An extension of Naive Bayes for delivering robust
classifications
http://ashrafsau.blogspot.in/
•NBC computes a single posterior distribution.
•However, the most probable class might depend on the
chosen prior, especially on small data sets.
•Prior-dependent classifications might be weak.
Solution via set of probabilities:
•Robust Bayes Classifier (Ramoni and Sebastiani, 2001)
•Naive Credal Classifier (Zaffalon, 2001)
•Violation of Independence Assumption
http://ashrafsau.blogspot.in/
•Zero conditional probability Problem
VIOLATION OF INDEPENDENCE ASSUMPTION
http://ashrafsau.blogspot.in/
of the values of the other attributes. This
assumption is called class conditional
independence. It is made to simplify the
computations involved and, in this sense, is
considered “naive.”
IMPROVEMENT
http://ashrafsau.blogspot.in/
classifiers, allow the representation of
dependencies among subsets of attributes.
http://ashrafsau.blogspot.in/
based probability estimate will be zero.
http://ashrafsau.blogspot.in/
one to each count
EXAMPLE
Suppose that for the class buys computer D (yes) in some training
database, D, containing 1000 tuples.
we have 0 tuples with income D low,
990 tuples with income D medium, and
10 tuples with income D high.
http://ashrafsau.blogspot.in/
The probabilities of these events, without the Laplacian correction, are
0, 0.990 (from 990/1000), and 0.010 (from 10/1000), respectively.
Using the Laplacian correction for the three quantities, we pretend that
we have 1 more tuple for each income-value pair. In this way, we instead
obtain the following probabilities :
http://ashrafsau.blogspot.in/
performance with decision tree and selected neural network
classifiers.
http://ashrafsau.blogspot.in/
its simplest form it is often surprisingly effective.
It is widely used in areas such as text classification and spam filtering.
A large number of modifications have been introduced, by the
statistical, data mining, machine learning, and pattern recognition
communities, in an attempt to make it more flexible.
but some one has to recognize that such modifications are
necessarily complications, which detract from its basic simplicity.
REFERENCES
http://en.wikipedia.org/wiki/Naive_Bayes_classifier
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-
http://ashrafsau.blogspot.in/
20/www/mlbook/ch6.pdf
Data Mining: Concepts and Techniques, 3rd
Edition, Han & Kamber & Pei ISBN: 9780123
814791