Dev ML Ex5
Dev ML Ex5
Experiment : Execute the Naïve Bayes algorithm with suitable data set and do Page 67 of 72
proper analysis on the result. Also implement Naïve Bayes algorithm using
python.
1. Objective:
a.Execute the Naïve Bayes algorithm with suitable data set and do proper analysis on the
result.
b. Implement Naïve Bayes algorithm using python.
2. Theory :-
2.1Naïve Bayes Theorem
Naïve Bayes is a probabilistic classification algorithm based on Bayes' Theorem, which is
used in many real-world applications like spam filtering, sentiment analysis, and medical
diagnosis.
Bayes' Theorem:
Bayes' theorem describes how to update our beliefs based on new evidence and is defined as:
Where:
P(A∣B)P(A | B)P(A∣B) = Probability of event A occurring given that B has occurred
(Posterior Probability).
P(B∣A)P(B | A)P(B∣A) = Probability of event B occurring given that A has occurred
(Likelihood).
P(A)P(A)P(A) = Prior probability of event A occurring (Prior Probability).
P(B)P(B)P(B) = Total probability of event B occurring (Evidence).
67
CS3EL15 (P): Machine learning Laboratory Experiment no- 5
Experiment : Execute the Naïve Bayes algorithm with suitable data set and do Page 68 of 72
proper analysis on the result. Also implement Naïve Bayes algorithm using
python.
Let’s say we want to classify an email as spam or not spam based on words in the email.
Suppose we have two categories: Spam (S) and Not Spam (NS).
We see the word "Free" in an email.
We want to calculate:
Breaking it down:
1. P(Spam) → Probability that any random email is spam.
2. P(Free | Spam) → Probability that the word "Free" appears in spam emails.
3. P(Free) → Probability that the word "Free" appears in all emails.
If P(Spam∣Free)P(Spam | Free)P(Spam∣Free) is high, we classify the email as spam;
otherwise, it's not spam.
2.4 Advantages:
Simple and Fast:
o Naïve Bayes is easy to implement and computationally efficient.
o Works well even with small datasets.
68
CS3EL15 (P): Machine learning Laboratory Experiment no- 5
Experiment : Execute the Naïve Bayes algorithm with suitable data set and do Page 69 of 72
proper analysis on the result. Also implement Naïve Bayes algorithm using
python.
o Since it uses probability calculations, missing values don't impact performance
significantly.
2.5 Disadvantages:
Assumption of Feature Independence:
o The "naïve" assumption that features are independent is often unrealistic,
affecting accuracy.
Zero Probability Problem:
o If a category is missing from the training data, Naïve Bayes assigns it a zero
probability.
o This is handled using Laplace Smoothing.
Limited with Continuous Data:
o If data is continuous and does not follow a normal distribution, Gaussian
Naïve Bayes may not perform well.
Sensitive to Irrelevant Features:
o If unnecessary features are present, they can affect performance.
3. Code
import numpy as np
import pandas as pd
69
CS3EL15 (P): Machine learning Laboratory Experiment no- 5
Experiment : Execute the Naïve Bayes algorithm with suitable data set and do Page 70 of 72
proper analysis on the result. Also implement Naïve Bayes algorithm using
python.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Initialize the Naïve Bayes classifier (Gaussian Naïve Bayes for continuous data)
nb_classifier = GaussianNB()
70
CS3EL15 (P): Machine learning Laboratory Experiment no- 5
Experiment : Execute the Naïve Bayes algorithm with suitable data set and do Page 71 of 72
proper analysis on the result. Also implement Naïve Bayes algorithm using
python.
# Print confusion matrix
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
71
CS3EL15 (P): Machine learning Laboratory Experiment no- 5
Experiment : Execute the Naïve Bayes algorithm with suitable data set and do Page 72 of 72
proper analysis on the result. Also implement Naïve Bayes algorithm using
python.
72