0% found this document useful (0 votes)
77 views21 pages

Lecture - Naive Bayesian

The document discusses Naive Bayesian classifiers and logistic regression models for classification problems. It explains that Naive Bayesian classifiers make a strong independence assumption between features to simplify calculations. Logistic regression models map classification probabilities to real numbers using the log-odds or logit function to allow linear regression. Both methods are commonly used for problems like spam filtering, medical diagnosis, and political affiliation prediction.

Uploaded by

Mahmoud Elnahas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views21 pages

Lecture - Naive Bayesian

The document discusses Naive Bayesian classifiers and logistic regression models for classification problems. It explains that Naive Bayesian classifiers make a strong independence assumption between features to simplify calculations. Logistic regression models map classification probabilities to real numbers using the log-odds or logit function to allow linear regression. Both methods are commonly used for problems like spam filtering, medical diagnosis, and political affiliation prediction.

Uploaded by

Mahmoud Elnahas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Classifiers

Where in the catalog should I place this product listing?


Is this email spam?
Is this politician Democrat/Republican/Green?

• Classification: assign labels to objects.


• Usually supervised: training set of pre-classified examples.
• Our examples:
• Naïve Bayesian
• Decision Trees
• (and Logistic Regression)

Module 4: Analytics Theory/Methods 1


Naïve Bayesian Classifier
• Determine the most probable class label for each object
• Based on the observed object attributes
• Naïvely assumed to be conditionally independent of each other
• Example:
• Based on the objects attributes {shape, color, weight}
• A given object that is {spherical, yellow, < 60 grams},
may be classified (labeled) as a tennis ball
• Class label probabilities are determined using Bayes’ Law
• Input variables are discrete
• Output:
• Probability score – proportional to the true probability
• Class label – based on the highest probability score

Module 4: Analytics Theory/Methods 2


Naïve Bayesian Classifier - Use Cases
• Preferred method for many text classification problems.
• Try this first; if it doesn't work, try something more complicated
• Use cases
• Spam filtering, other text classification tasks
• Fraud detection

Module 4: Analytics Theory/Methods 3


Building a Training Dataset to Predict Good or Bad Credit

• Predict the credit behavior of


a credit card applicant from
applicant's attributes:
• Personal status
• Job type
• Housing type
• Savings amount
• These are all categorical
variables and are better suited
to Naïve Bayesian Classifier
than to logistic regression.

Module 4: Analytics Theory/Methods 4


Apply the Naïve Assumption and Remove a Constant

• For observed attributes A = (a1, a2, … am), we want to compute


P(a1 , a2 ,..., am | Ci ) P (Ci )
P(Ci | A)  i  1, 2,..., n
P(a1 , a2 ,..., am )

and assign the classifier, Ci , with the largest P(Ci|A)

• Two simplifications to the calculations


 Apply naïve assumption - each aj is conditionally independent of
each other, then m
P (a1 , a2 ,..., am | Ci )  P(a1 | Ci ) P(a2 | Ci )  P (am | Ci )   P(a j | Ci )
j 1

 Denominator P(a1,a2,…am) is a constant and can be ignored


Module 4: Analytics Theory/Methods 6
Building a Naïve Bayesian Classifier
• Applying the two simplifications
 m 
P (Ci | a1 , a2 ,..., am )    P (a j | Ci )  P (Ci ) i  1, 2,..., n
 j 1 
• To build a Naïve Bayesian Classifier, collect the following statistics from the
training data:
• P(Ci) for all the class labels.
• P(aj| Ci) for all possible aj and Ci
• Assign the classifier label, Ci, that maximizes the value of

 m 
  P (a j | Ci )  P (Ci ) i  1, 2,..., n
 
 j 1 

Module 4: Analytics Theory/Methods 7


Naïve Bayesian Classifiers for the Credit
Example
• Class labels: {good, bad}
• P(good) = 0.7
• P(bad) = 0.3
• Conditional Probabilities
• P(own|bad) = 0.62
• P(own|good) = 0.75
• P(rent|bad) = 0.23
• P(rent|good) = 0.14
• … and so on

Module 4: Analytics Theory/Methods 8


Naïve Bayesian Classifier for a Particular
Applicant a C P(a | C ) j i j i

female single good 0.28


• Given applicant attributes of female single bad 0.36
A= {female single, own good 0.75
owns home,
own bad 0.62
self-employed,
savings > $1000} self emp good 0.14
self emp bad 0.17
savings>1K good 0.06
• Since P(good|A) > (bad|A), savings>1K bad 0.02
assign the applicant the label
"good" credit
P(good|A) ~ (0.28*0.75*0.14*0.06)*0.7 = 0.0012

P(bad|A) ~ (0.36*0.62*0.17*0.02)*0.3 = 0.0002

Module 4: Analytics Theory/Methods 9


Logistic Regression Model
The classification problem is just like the regression
problem, except that the values y we now want to predict
take on only a small number of discrete values.

Some Example of Classification problem


• Email : Spam / Not spam
• Tumor: Malignant/ Benign
0.5
 Binary Logistic Regression
• We have a set of feature vectors X with corresponding binary
outputs
X  {x 1 ,x 2 ,....,x n } T
n y i  {0,1}
Y  {y 1 , y 2 ,...., } T
, w he re
y to model p(y|x)
• We want

p( yi  1 | xi , )  j xij
By definition p( y i ) xi
 1 | xi , . We want to transform the
{0,1}the range restrictions,
probability to remove j as xiθ can take any
real value.
 Odds
p : probability of an event
occurring
1 – p : probability of the event not
occurring The odds forp ievent i are then
odd s i 
defined as 1 pi
Taking the log of the odds removes the range restrictions.

This way we map the probabilities from the [0; 1] range to the
entire number line (real value).
 Odds
p : probability of an event
occurring
1 – p : probability of the event not
occurring The odds forp ievent i are then
odd s i 
defined as 1 pi
Taking the log of the odds removes the range restrictions.

This way we map the probabilities from the [0; 1] range to the
entire number line (real value).
 pi 
log   xi 
 1 p i
pi
xi
1 pi  e

e xi 1
pi xi 
  xi 
 1 e 1
e Standard logistic sigmoid function

g( )
1
p i  g ( t
x)   t
1  e
x

You might also like