Naive Bayes Numericals

Naive Bayes(Numerical
Example)
Naive Bayes algorithm is a supervised machine learning algorithm

which is based on Bayes Theorem used mainly for classification
problem.
Naive Bayes Classifier is one of the simple and most effective

Classification algorithms which helps in building the fast machine
learning models that can make quick prediction.
It is a probabilistic classifier, which means it predicts on the basis of

the probability of an object. Some popular examples of Naive Bayes
Algorithm are spam filtration, Sentimental analysis, and
classifying articles.
Bayes Theorem:
Bayes’ theorem is also known as Bayes’ Rule or Bayes’ law, which

is used to determine the probability of a hypothesis with prior
knowledge. It depends on the conditional probability.
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on
the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence

given that the probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before

observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
P(A|B) = (P(B|A) * P(A) )/ P(B) == P(A ∩ B) / P(B)
|||ly P(B|A) = (P(A|B) * P(B) )/ P(A) == P(B ∩ A) / P(A)
if P(A∩B) == P(B∩A).
then P(B|A) * P(A) = P(A|B) * P(B)
Types Of Naive Bayes:

There are three types of Naive Bayes model under the scikit-learn
library:
• Gaussian: It is used in classification and it assumes that

features follow a normal distribution.
• Multinomial: It is used for discrete counts. For example, let’s

say, we have a text classification problem. Here we can consider
Bernoulli trials which is one step further and instead of “word
occurring in the document”, we have “count how often word
occurs in the document”, you can think of it as “number of times
outcome number x_i is observed over the n trials”.
• Bernoulli: The binomial model is useful if your feature vectors

are binary (i.e. zeros and ones). One application would be text
classification with ‘bag of words’ model where the 1s & 0s are
“word occurs in the document” and “word does not occur in the
document” respectively.
Numerical Example:
Solution:
P(A|B) = (P(B|A) * P(A) )/ P(B)
1. Mango:
P(X | Mango) = P(Yellow | Mango) * P(Sweet | Mango) * P(Long |

Mango)
1. a)P(Yellow | Mango) = (P(Yellow | Mango) * P(Yellow) )/ P
(Mango)
= ((350/650) * (800/1200)) / (650/1200)
P(Yellow | Mango)= 0.53 →1
1.b)P(Sweet | Mango) = (P(Sweet | Mango) * P(Sweet) )/ P (Mango)
= ((450/650) * (850/1200)) / (650/1200)
P(Sweet | Mango)= 0.69 → 2
1. c)P(Long | Mango) = (P(Long | Mango) * P(Long) )/ P (Mango)
= ((0/650) * (400/1200)) / (800/1200)
P(Long | Mango)= 0 → 3
On multiplying eq 1,2,3 ==> P(X | Mango) = 0.53 * 0.69 * 0
P(X | Mango) = 0
2. Banana:
P(X | Banana) = P(Yellow | Banana) * P(Sweet | Banana) * P(Long |

Banana)
2.a) P(Yellow | Banana) = (P( Banana | Yellow ) * P(Yellow) )/ P
(Banana)
= ((400/400) * (800/1200)) / (400/1200)
P(Yellow | Banana) = 2 → 4
2.b) P(Sweet | Banana) = (P( Banana | Sweet) * P(Sweet) )/ P

(Banana)
= ((300/400) * (850/1200)) / (400/1200)
P(Sweet | Banana) = 1.6 → 5
2.c)P(Long | Banana) = (P( Banana | Yellow ) * P(Long) )/ P

(Banana)
= ((350/400) * (400/1200)) / (400/1200)
P(Yellow | Banana) = 0.875 → 6
On multiplying eq 4,5,6 ==> P(X | Banana) = 2 * 1.6 * 0.875
P(X | Banana) = 2.8

3. Others:
P(X | Others) = P(Yellow | Others) * P(Sweet | Others) * P(Long |

Others)
3.a) P(Yellow | Others) = (P( Others| Yellow ) * P(Yellow) )/ P
(Others)
= ((50/150) * (800/1200)) / (150/1200)
P(Yellow | Others) = 1.78→ 7
3.b) P(Sweet | Others) = (P( Others| Sweet ) * P(Sweet) )/ P (Others)
= ((100/150) * (850/1200)) / (150/1200)
P(Sweet | Others) = 3.78 → 8
3.c) P(Long | Others) = (P( Others| Long) * P(Long) )/ P (Others)
= ((50/150) * (400/1200)) / (150/1200)
P(Long | Others) = 0.9 → 9
On multiplying eq 7,8,9 ==> P(X | Others) = 1.78 * 3.78* 0.9
P(X | Others) = 6.05

So finally from P(X | Mango) == 0 , P(X | Banana) == 0.65 and P(X|
Others) == 0.072.
We can conclude Fruit{Yellow,Sweet,Long} is Banana.
Code:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =
0.25, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Training the Naive Bayes model on the Training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)
# Predicting a new result
print(classifier.predict(sc.transform([[30,87000]])))
# Predicting the Test set results
y_pred = classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred),1),
y_test.reshape(len(y_test),1)),1))
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
as = accuracy_score(y_test, y_pred)
# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = sc.inverse_transform(X_train), y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop =
X_set[:, 0].max() + 10, step = 0.25),np.arange(start = X_set[:, 1].min() -
1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2,
classifier.predict(sc.transform(np.array([X1.ravel(),
X2.ravel()]).T)).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c =
ListedColormap(('red', 'green'))(i), label = j)
plt.title('Naive Bayes (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = sc.inverse_transform(X_test), y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop =
X_set[:, 0].max() + 10, step = 0.25),np.arange(start = X_set[:, 1].min() -
1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2,
classifier.predict(sc.transform(np.array([X1.ravel(),
X2.ravel()]).T)).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c =
ListedColormap(('red', 'green'))(i), label = j)
plt.title('Naive Bayes (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
Pros:
• It is easy and fast to predict class of test data set. It also perform
well in multi class prediction
• When assumption of independence holds, a Naive Bayes

classifier performs better compare to other models like logistic
regression and you need less training data.
• It perform well in case of categorical input variables compared to

numerical variable(s). For numerical variable, normal
distribution is assumed (bell curve, which is a strong
assumption).
Cons:
• If categorical variable has a category (in test data set), which was
not observed in training data set, then model will assign a 0
(zero) probability and will be unable to make a prediction. This is
often known as “Zero Frequency”. To solve this, we can use the
smoothing technique. One of the simplest smoothing techniques
is called Laplace estimation.
• On the other side naive Bayes is also known as a bad estimator,

so the probability outputs from predict_proba are not to be taken
too seriously.
• Another limitation of Naive Bayes is the assumption of

independent predictors. In real life, it is almost impossible that
we get a set of predictors which are completely independent.

Naive Bayes Numericals

Uploaded by

Copyright:

Available Formats

Naive Bayes Numericals

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Naive Bayes Numericals

Uploaded by

Copyright:

Available Formats

Naive Bayes(Numerical

Naive Bayes algorithm is a supervised machine learning algorithm

Naive Bayes Classifier is one of the simple and most effective

It is a probabilistic classifier, which means it predicts on the basis of

Bayes’ theorem is also known as Bayes’ Rule or Bayes’ law, which

P(B|A) is Likelihood probability: Probability of the evidence

P(A) is Prior Probability: Probability of hypothesis before

P(B) is Marginal Probability: Probability of Evidence.

P(A|B) = (P(B|A) * P(A) )/ P(B) == P(A ∩ B) / P(B)

|||ly P(B|A) = (P(A|B) * P(B) )/ P(A) == P(B ∩ A) / P(A)

then P(B|A) * P(A) = P(A|B) * P(B)

Types Of Naive Bayes:

• Gaussian: It is used in classification and it assumes that

• Multinomial: It is used for discrete counts. For example, let’s

• Bernoulli: The binomial model is useful if your feature vectors

P(A|B) = (P(B|A) * P(A) )/ P(B)

P(X | Mango) = P(Yellow | Mango) * P(Sweet | Mango) * P(Long |

= ((350/650) * (800/1200)) / (650/1200)

P(Yellow | Mango)= 0.53 →1

1.b)P(Sweet | Mango) = (P(Sweet | Mango) * P(Sweet) )/ P (Mango)

= ((450/650) * (850/1200)) / (650/1200)

P(Sweet | Mango)= 0.69 → 2

1. c)P(Long | Mango) = (P(Long | Mango) * P(Long) )/ P (Mango)

= ((0/650) * (400/1200)) / (800/1200)

On multiplying eq 1,2,3 ==> P(X | Mango) = 0.53 * 0.69 * 0

P(X | Banana) = P(Yellow | Banana) * P(Sweet | Banana) * P(Long |

= ((400/400) * (800/1200)) / (400/1200)

2.b) P(Sweet | Banana) = (P( Banana | Sweet) * P(Sweet) )/ P

= ((300/400) * (850/1200)) / (400/1200)

P(Sweet | Banana) = 1.6 → 5

2.c)P(Long | Banana) = (P( Banana | Yellow ) * P(Long) )/ P

= ((350/400) * (400/1200)) / (400/1200)

P(Yellow | Banana) = 0.875 → 6

On multiplying eq 4,5,6 ==> P(X | Banana) = 2 * 1.6 * 0.875

P(X | Banana) = 2.8

P(X | Others) = P(Yellow | Others) * P(Sweet | Others) * P(Long |

= ((50/150) * (800/1200)) / (150/1200)

P(Yellow | Others) = 1.78→ 7

3.b) P(Sweet | Others) = (P( Others| Sweet ) * P(Sweet) )/ P (Others)

= ((100/150) * (850/1200)) / (150/1200)

P(Sweet | Others) = 3.78 → 8

3.c) P(Long | Others) = (P( Others| Long) * P(Long) )/ P (Others)

= ((50/150) * (400/1200)) / (150/1200)

P(Long | Others) = 0.9 → 9

On multiplying eq 7,8,9 ==> P(X | Others) = 1.78 * 3.78* 0.9

P(X | Others) = 6.05

We can conclude Fruit{Yellow,Sweet,Long} is Banana.

• When assumption of independence holds, a Naive Bayes

• It perform well in case of categorical input variables compared to

• On the other side naive Bayes is also known as a bad estimator,

• Another limitation of Naive Bayes is the assumption of

You might also like