Lecture - Naive Bayesian

The document discusses Naive Bayesian classifiers and logistic regression models for classification problems. It explains that Naive Bayesian classifiers make a strong independence assumption between features to simplify calculations. Logistic regression models map classification probabilities to real numbers using the log-odds or logit function to allow linear regression. Both methods are commonly used for problems like spam filtering, medical diagnosis, and political affiliation prediction.

Uploaded by

Mahmoud Elnahas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views21 pages

Lecture - Naive Bayesian

Uploaded by

Mahmoud Elnahas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Classifiers

Where in the catalog should I place this product listing?

Is this email spam?
Is this politician Democrat/Republican/Green?

• Classification: assign labels to objects.

• Usually supervised: training set of pre-classified examples.
• Our examples:
• Naïve Bayesian
• Decision Trees
• (and Logistic Regression)

Module 4: Analytics Theory/Methods 1

Naïve Bayesian Classifier
• Determine the most probable class label for each object
• Based on the observed object attributes
• Naïvely assumed to be conditionally independent of each other
• Example:
• Based on the objects attributes {shape, color, weight}
• A given object that is {spherical, yellow, < 60 grams},
may be classified (labeled) as a tennis ball
• Class label probabilities are determined using Bayes’ Law
• Input variables are discrete
• Output:
• Probability score – proportional to the true probability
• Class label – based on the highest probability score

Module 4: Analytics Theory/Methods 2

Naïve Bayesian Classifier - Use Cases
• Preferred method for many text classification problems.
• Try this first; if it doesn't work, try something more complicated
• Use cases
• Spam filtering, other text classification tasks
• Fraud detection

Module 4: Analytics Theory/Methods 3

Building a Training Dataset to Predict Good or Bad Credit

• Predict the credit behavior of

a credit card applicant from
applicant's attributes:
• Personal status
• Job type
• Housing type
• Savings amount
• These are all categorical
variables and are better suited
to Naïve Bayesian Classifier
than to logistic regression.

Module 4: Analytics Theory/Methods 4

Apply the Naïve Assumption and Remove a Constant

• For observed attributes A = (a1, a2, … am), we want to compute

P(a1 , a2 ,..., am | Ci ) P (Ci )
P(Ci | A)  i  1, 2,..., n
P(a1 , a2 ,..., am )

and assign the classifier, Ci , with the largest P(Ci|A)

• Two simplifications to the calculations

 Denominator P(a1,a2,…am) is a constant and can be ignored

Module 4: Analytics Theory/Methods 6
Building a Naïve Bayesian Classifier
• Applying the two simplifications
 m 
P (Ci | a1 , a2 ,..., am )    P (a j | Ci )  P (Ci ) i  1, 2,..., n
 j 1 
• To build a Naïve Bayesian Classifier, collect the following statistics from the
training data:
• P(Ci) for all the class labels.
• P(aj| Ci) for all possible aj and Ci
• Assign the classifier label, Ci, that maximizes the value of

 m 
  P (a j | Ci )  P (Ci ) i  1, 2,..., n
 
 j 1 

Module 4: Analytics Theory/Methods 7

Naïve Bayesian Classifiers for the Credit
Example
• Class labels: {good, bad}
• P(good) = 0.7
• P(bad) = 0.3
• Conditional Probabilities
• P(own|bad) = 0.62
• P(own|good) = 0.75
• P(rent|bad) = 0.23
• P(rent|good) = 0.14
• … and so on

Module 4: Analytics Theory/Methods 8

Naïve Bayesian Classifier for a Particular
Applicant a C P(a | C ) j i j i

female single good 0.28

• Given applicant attributes of female single bad 0.36
A= {female single, own good 0.75
owns home,
own bad 0.62
self-employed,
savings > $1000} self emp good 0.14
self emp bad 0.17
savings>1K good 0.06
• Since P(good|A) > (bad|A), savings>1K bad 0.02
assign the applicant the label
"good" credit
P(good|A) ~ (0.28*0.75*0.14*0.06)*0.7 = 0.0012

P(bad|A) ~ (0.360.620.170.02)0.3 = 0.0002

Module 4: Analytics Theory/Methods 9

Logistic Regression Model
The classification problem is just like the regression
problem, except that the values y we now want to predict
take on only a small number of discrete values.

Some Example of Classification problem

• Email : Spam / Not spam
• Tumor: Malignant/ Benign
0.5
 Binary Logistic Regression
• We have a set of feature vectors X with corresponding binary
outputs
X  {x 1 ,x 2 ,....,x n } T
n y i  {0,1}
Y  {y 1 , y 2 ,...., } T
, w he re
y to model p(y|x)
• We want

p( yi  1 | xi , )  j xij
By definition p( y i ) xi
 1 | xi , . We want to transform the
{0,1}the range restrictions,
probability to remove j as xiθ can take any
real value.
 Odds
p : probability of an event
occurring
1 – p : probability of the event not
occurring The odds forp ievent i are then
odd s i 
defined as 1 pi
Taking the log of the odds removes the range restrictions.

This way we map the probabilities from the [0; 1] range to the
entire number line (real value).
 Odds
p : probability of an event
occurring
1 – p : probability of the event not
occurring The odds forp ievent i are then
odd s i 
defined as 1 pi
Taking the log of the odds removes the range restrictions.

This way we map the probabilities from the [0; 1] range to the
entire number line (real value).
 pi 
log   xi 
 1 p i
pi
xi
1 pi  e

e xi 1
pi xi 
  xi 
 1 e 1
e Standard logistic sigmoid function

g( )
1
p i  g ( t
x)   t
1  e
x

Unit-2 SQL Updated
No ratings yet
Unit-2 SQL Updated
102 pages
Module 1
No ratings yet
Module 1
107 pages
SQL
No ratings yet
SQL
101 pages
ESGB - Naive Bayes and Logistic Regression
No ratings yet
ESGB - Naive Bayes and Logistic Regression
36 pages
GeospatialDataQualityGuideENVFinal Rev1
No ratings yet
GeospatialDataQualityGuideENVFinal Rev1
116 pages
UNIT5 - PLSQL Introduction 1
No ratings yet
UNIT5 - PLSQL Introduction 1
110 pages
Blue Team Fundamentals
No ratings yet
Blue Team Fundamentals
11 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
22 pages
Module 4 - Theory and Methods
No ratings yet
Module 4 - Theory and Methods
161 pages
Mapping Global Data Sets - Json
100% (1)
Mapping Global Data Sets - Json
15 pages
Assignment No 1
No ratings yet
Assignment No 1
6 pages
Module No 5 Relational Database Design
No ratings yet
Module No 5 Relational Database Design
160 pages
GIS Ch4
No ratings yet
GIS Ch4
37 pages
DSAL - Assignment 1 Format
No ratings yet
DSAL - Assignment 1 Format
3 pages
CS8091 Big Data Analytics Unit5
No ratings yet
CS8091 Big Data Analytics Unit5
71 pages
Case Final Presentation Group 3 3
No ratings yet
Case Final Presentation Group 3 3
37 pages
UNIT-1 Data Warehousing Part-III
No ratings yet
UNIT-1 Data Warehousing Part-III
68 pages
Data Warehouse
No ratings yet
Data Warehouse
49 pages
23 Big Data and Data Wrangling
No ratings yet
23 Big Data and Data Wrangling
56 pages
Descriptive Data Analytics
No ratings yet
Descriptive Data Analytics
56 pages
Integration and Normalization
No ratings yet
Integration and Normalization
19 pages
Training in R For Data Statistics
No ratings yet
Training in R For Data Statistics
113 pages
Ontologies Engineering
No ratings yet
Ontologies Engineering
71 pages
Data Mining
No ratings yet
Data Mining
87 pages
Lesson1 - Data Definitions
No ratings yet
Lesson1 - Data Definitions
57 pages
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
40 pages
Spatial Data Base - Geocourse
No ratings yet
Spatial Data Base - Geocourse
16 pages
Web and Text Mining
No ratings yet
Web and Text Mining
73 pages
DataMining S
No ratings yet
DataMining S
103 pages
Chapter 6 Introduction To Predictive Analytics
100% (1)
Chapter 6 Introduction To Predictive Analytics
46 pages
Spatial Data Exploration
No ratings yet
Spatial Data Exploration
8 pages
UNIT 3 DWDM Notes
No ratings yet
UNIT 3 DWDM Notes
32 pages
Session 3 4 Data Literacy Privacy Ethics
100% (1)
Session 3 4 Data Literacy Privacy Ethics
19 pages
Final - DBMS UNIT-5
No ratings yet
Final - DBMS UNIT-5
181 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
1 page
Big Data - S
No ratings yet
Big Data - S
79 pages
Train The Trainers: Setup of A Training facility/NSDI Lab
No ratings yet
Train The Trainers: Setup of A Training facility/NSDI Lab
19 pages
Ôn Tập Applied Big Data in Management
No ratings yet
Ôn Tập Applied Big Data in Management
43 pages
Final - Unit 3 Data Preprocessing - Phases
No ratings yet
Final - Unit 3 Data Preprocessing - Phases
42 pages
Introduction GIS
No ratings yet
Introduction GIS
29 pages
Chapter 2 - Key Roles and Responsibilities - Updated
No ratings yet
Chapter 2 - Key Roles and Responsibilities - Updated
27 pages
DBMS Module 2
No ratings yet
DBMS Module 2
125 pages
Literature Review On Spatial Data Analysis in GIS
No ratings yet
Literature Review On Spatial Data Analysis in GIS
17 pages
Mana Mohan R
No ratings yet
Mana Mohan R
147 pages
Lesson 6 Data Life Cycle Part 2
No ratings yet
Lesson 6 Data Life Cycle Part 2
30 pages
Lesson 2 Linear Regression
100% (1)
Lesson 2 Linear Regression
21 pages
Database Design: Presentation By
No ratings yet
Database Design: Presentation By
30 pages
Data Science Engineering Full Time Program Brochure
No ratings yet
Data Science Engineering Full Time Program Brochure
21 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
GIS Data Model Raster Data Model
No ratings yet
GIS Data Model Raster Data Model
11 pages
Report Design & Data Monitor Using Businessobjects Dashboard Design
No ratings yet
Report Design & Data Monitor Using Businessobjects Dashboard Design
74 pages
DBMS Module 1
No ratings yet
DBMS Module 1
56 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
Introduction To R: Shanti.S.Chauhan, PH.D Business Studies Shuats
No ratings yet
Introduction To R: Shanti.S.Chauhan, PH.D Business Studies Shuats
53 pages
02 - Data Preparation and Cleaning
No ratings yet
02 - Data Preparation and Cleaning
16 pages
Perl Tutorial
No ratings yet
Perl Tutorial
32 pages
SQL Basic
100% (1)
SQL Basic
53 pages
Examples On Triggers: Instructor: Mohamed Eltabakh Meltabakh@cs - Wpi.edu
No ratings yet
Examples On Triggers: Instructor: Mohamed Eltabakh Meltabakh@cs - Wpi.edu
15 pages
RDBMS
No ratings yet
RDBMS
155 pages
Bayesian Programming
No ratings yet
Bayesian Programming
208 pages
Email Spam
No ratings yet
Email Spam
12 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
SIC - AI - Chapter 4. Probability and Statistics - Rev2.0
No ratings yet
SIC - AI - Chapter 4. Probability and Statistics - Rev2.0
219 pages
Unit 3
No ratings yet
Unit 3
157 pages
Big Data Analysis Concepts and References
100% (1)
Big Data Analysis Concepts and References
60 pages
Statistics The Art and Science of Learning From Data 3rd Edition Alan Agresti Instant Download
No ratings yet
Statistics The Art and Science of Learning From Data 3rd Edition Alan Agresti Instant Download
41 pages
AK ML Lab Manual
No ratings yet
AK ML Lab Manual
103 pages
Fortinet FortiMail Study Guide For FortiMail 7.2 - Fortinet Training Institute-501-527
No ratings yet
Fortinet FortiMail Study Guide For FortiMail 7.2 - Fortinet Training Institute-501-527
27 pages
SMS Spam Detection Using Machine Learning
No ratings yet
SMS Spam Detection Using Machine Learning
68 pages
Maid Hiring Management System
No ratings yet
Maid Hiring Management System
43 pages
A Model To Detect Spam Email Using Support Vector Classifier and Random Forest Classifier
No ratings yet
A Model To Detect Spam Email Using Support Vector Classifier and Random Forest Classifier
11 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
E-Mail Spam Detection
No ratings yet
E-Mail Spam Detection
8 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
9 pages
Review Machine Learning Techniques Applied To Cybersecurit
No ratings yet
Review Machine Learning Techniques Applied To Cybersecurit
14 pages
A Study of Machine Learning Algorithms On Email Spam Classification
No ratings yet
A Study of Machine Learning Algorithms On Email Spam Classification
10 pages
Statistical Pattern Recognition
No ratings yet
Statistical Pattern Recognition
15 pages
Emai Spam Detection Using Machine Learning and Python - IJRPR3714
No ratings yet
Emai Spam Detection Using Machine Learning and Python - IJRPR3714
6 pages
A Plan For No Spam
No ratings yet
A Plan For No Spam
29 pages
Annex 8
No ratings yet
Annex 8
91 pages
IJISRT23MAY2427
No ratings yet
IJISRT23MAY2427
7 pages
A Support Vector Machine Based Naive Bayes Algorithm For Spam Filtering
No ratings yet
A Support Vector Machine Based Naive Bayes Algorithm For Spam Filtering
8 pages
Professional Training Report at Sathyabama Institute of Science and Technology (Deemed To Be University)
No ratings yet
Professional Training Report at Sathyabama Institute of Science and Technology (Deemed To Be University)
34 pages
Naive Bayes Spam Filte....
No ratings yet
Naive Bayes Spam Filte....
10 pages
Barracuda Spam Firewall
No ratings yet
Barracuda Spam Firewall
2 pages
The History of Digital Spam: Emilio Ferrara
No ratings yet
The History of Digital Spam: Emilio Ferrara
9 pages
A Plan For Spam
No ratings yet
A Plan For Spam
10 pages
ETC 2420/5242 Lab 10 2016: Purpose
No ratings yet
ETC 2420/5242 Lab 10 2016: Purpose
11 pages
Problem Statement - American Express Global Campus
No ratings yet
Problem Statement - American Express Global Campus
5 pages