Naive Bayes Classifier: Coin Toss and Fair Dice Example
Naive Bayes Classifier: Coin Toss and Fair Dice Example
1. Introduction
Naive Bayes is a probabilistic machine learning algorithm that can be
used in a wide variety of classification tasks. Typical applications include
filtering spam, classifying documents, sentiment prediction etc. It is based
on the works of Rev. Thomas Bayes (1702-61) and hence the name.
The name naive is used because it assumes the features that go into the
model is independent of each other. That is changing the value of one
feature, does not directly influence or change the value of any of the other
features used in the algorithm.
But before you go into Naive Bayes, you need to understand what
‘Conditional Probability’ is and what is the ‘Bayes Rule’.
When you flip a fair coin, there is an equal chance of getting either heads
or tails. So you can say the probability of getting heads is 50%.
Similarly what would be the probability of getting a 1 when you roll a dice
with 6 faces? Assuming the dice is fair, the probability of 1/6 = 0.166.
This is a classic example of conditional probability. So, when you say the
conditional probability of A given B, it denotes the probability of A
occurring given that B has already occurred.
School Example
Consider a school with a total population of 100 persons. These 100
persons can be seen either as ‘Students’ and ‘Teachers’ or as a population
of ‘Males’ and ‘Females’.
With below tabulation of the 100 people, what is the conditional
probability that a certain member of the school is a ‘Teacher’ given that he
is a ‘Man’?
This can be represented as the intersection of Teacher (A) and Male (B)
divided by Male (B). Likewise, the conditional probability of B given A can
be computed. The Bayes Rule that we use for Naive Bayes, can be derived
from these two notations.
3. The Bayes Rule
The Bayes Rule is a way of going from P(X|Y), known from the training
dataset, to find P(Y|X).
To do this, we replace A and B in the above formula, with the feature X and
response Y.
For observations in test or scoring data, the X would be known while Y is
unknown. And for each row of the test dataset, you want to compute the
probability of Y given the X has already happened.
What happens if Y has more than 2 categories? we compute the
probability of each class of Y and let the highest win.
fromsklearn.naive_bayesimportGaussianNB
fromsklearn.model_selectionimporttrain_test_split
fromsklearn.metricsimportconfusion_matrix
importnumpyas np
import pandas as pd
importmatplotlib.pyplotasplt
# Import data
training =
pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master
/iris_train.csv ')
test =
pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master
/iris_test.csv ')
# Predict Output
pred=model.predict(xtest)
Advantages
It is easy and fast to predict the class of the test data set. It
also performs well in multi-class prediction.
When assumption of independence holds, a Naive Bayes
classifier performs better compare to other models
like logistic regression and you need less training data.
Disadvantages
If categorical variable has a category (in test data set),
which was not observed in training data set, then model will
assign a 0 (zero) probability and will be unable to make a
prediction. This is often known as Zero Frequency. To solve
this, we can use the smoothing technique. One of the
simplest smoothing techniques is called Laplace estimation.
Applications
Real time Prediction: Naive Bayes is an eager learning
classifier and it is sure fast. Thus, it could be used for making
predictions in real time.
Multi class Prediction: This algorithm is also well
known for multi class prediction feature. Here we can predict
the probability of multiple classes of target variable.
When to use
Text Classification
Dataset
Iris dataset
Wine dataset
Adult dataset
7. Practice Exercise: Predict Human Activity Recognition (HAR)
The objective of this practice exercise is to predict current human activity
based on phisiological activity measurements from 53 different features
based in the HAR dataset . The training and test datasets are provided.
Build a Naive Bayes model, predict on the test dataset and compute the
confusion matrix.
Now suppose you want to calculate the probability of playing when the weather is overcast, and the temperature
is mild.
P(Overcast|Yes)= 4/9=0.44
P(Yes)=9/14=0.6428
P(Mild|Yes)= 4/9=0.44
P(Yes)=9/14=0.6428
Probability of playing:
P(Play= Yes | Weather=Overcast, Temp=Mild)
2. Calculate likelihood Probabilities: P(Overcast |Yes) = 4/9 = 0.44 P(Mild |Yes) = 4/9 = 0.44
3. Put likelihood probabilities in equation (2) P(Weather=Overcast, Temp=Mild | Play= Yes) = 0.44* 0.44=0.1936
5. P(Overcast)*P(Mild)= 0.2857*0.4285=0.122
6. P(Play=Yes|Weather=Overcast, Temp=Mild)=(0.1936*0.64)/0.122=1
2. Calculate likelihood Probabilities: P(Overcast |No) = 0/5 = 0 P(Mild |No) = 2/5 = 0.4
3. Put likelihood probabilities in equation (2) P(Weather=Overcast, Temp=Mild | Play= No) = 0* 0.4=0
5. P(Overcast)*P(Mild)= 0.2857*0.4285=0.122
6. P(Play=Yes|Weather=Overcast, Temp=Mild)=0*0.36/0.122= 0
The probability of a 'Yes' class is higher. So you can say here that if the weather is overcast than players will play the sport.
# Assigning features and label variables
wheather=['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','Sunny','Sunny',
'Rainy','Sunny','Overcast','Overcast','Rainy']
temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','Mild','Hot','Mild']
play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No']
# Import LabelEncoder
from sklearn import preprocessing
#creating labelEncoder
le = preprocessing.LabelEncoder()
# Converting string labels into numbers.
wheather_encoded=le.fit_transform(wheather)
print("Wheather:",wheather_encoded)
# Converting string labels into numbers
temp_encoded=le.fit_transform(temp)
label=le.fit_transform(play)
print("Temp:",temp_encoded)
print("Play:",label)
#Combinig weather and temp into single listof tuples
features=list(zip(wheather_encoded,temp_encoded))
print(features)
#Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB
#Create a Gaussian Classifier
model = GaussianNB()
# Train the model using the training sets
model.fit(features,label)
#Predict Output
predicted= model.predict([[0,2]]) # 0:Overcast, 2:Mild
print("Predicted Value:", predicted)
Output
[2 2 0 1 1 1 0 2 2 1 2 0 0 1]
Temp: [1 1 1 2 0 0 0 2 0 2 2 2 1 2]
Play: [0 0 1 1 1 0 1 0 1 1 1 1 1 0]
[(2, 1), (2, 1), (0, 1), (1, 2), (1, 0), (1, 0), (0, 0), (2, 2), (2, 0), (1, 2), (2, 2), (0, 2), (0,
1), (1, 2)]
Predicted Value: [1]
1. Naive Bayes classifier assumes that the features are
independent of each other.
2. Naive Bayes classifier can be trained faster as compared
to other classification algorithms.
3. Naive Bayes classifier model can predict faster as
compared to other classification algorithms.
4. Naive Bayes classifier model can be modified with new
training data without having to re build the model.
5. Naive Bayes classifier model does not involve
optimization of a cost function.
6. Naive Bayes classifier training does not involve epoch.
7. Naive Bayes classifier model does not involve solving
a matrix equation.
8. When assumptions of independence of features holds,
Naive Bayes classifier model performs better than other
classifiers.
9. When assumptions of independence of features holds,
Naive Bayes classifier model needs less training data.
10. Naive Bayes classifier model performs well in case
of categorical input variables compared to numerical
input variable.