0% found this document useful (0 votes)
3 views6 pages

Dev ML Ex5

The document outlines an experiment to execute the Naïve Bayes algorithm using a suitable dataset and analyze the results, including implementation in Python. It explains the Naïve Bayes theorem, assumptions, types of classifiers, advantages, disadvantages, and when to use the algorithm, along with a sample code using the Iris dataset. The model achieved a high accuracy of 97% with minimal misclassification, demonstrating its effectiveness in classification tasks.

Uploaded by

guptadevansh421
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views6 pages

Dev ML Ex5

The document outlines an experiment to execute the Naïve Bayes algorithm using a suitable dataset and analyze the results, including implementation in Python. It explains the Naïve Bayes theorem, assumptions, types of classifiers, advantages, disadvantages, and when to use the algorithm, along with a sample code using the Iris dataset. The model achieved a high accuracy of 97% with minimal misclassification, demonstrating its effectiveness in classification tasks.

Uploaded by

guptadevansh421
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

CS3EL15 (P): Machine learning Laboratory Experiment no- 5

Experiment : Execute the Naïve Bayes algorithm with suitable data set and do Page 67 of 72
proper analysis on the result. Also implement Naïve Bayes algorithm using
python.
1. Objective:
a.Execute the Naïve Bayes algorithm with suitable data set and do proper analysis on the
result.
b. Implement Naïve Bayes algorithm using python.

2. Theory :-
2.1Naïve Bayes Theorem
Naïve Bayes is a probabilistic classification algorithm based on Bayes' Theorem, which is
used in many real-world applications like spam filtering, sentiment analysis, and medical
diagnosis.

Bayes' Theorem:
Bayes' theorem describes how to update our beliefs based on new evidence and is defined as:
Where:
 P(A∣B)P(A | B)P(A∣B) = Probability of event A occurring given that B has occurred
(Posterior Probability).
 P(B∣A)P(B | A)P(B∣A) = Probability of event B occurring given that A has occurred
(Likelihood).
 P(A)P(A)P(A) = Prior probability of event A occurring (Prior Probability).
 P(B)P(B)P(B) = Total probability of event B occurring (Evidence).

2.2 Naïve Bayes Assumption


The algorithm is called "Naïve" because it assumes that all features (variables) are
independent of each other. In reality, this assumption is often false, but it still works well in
many cases.

Example: Spam Email Classification

Department of Computer Science & Engineering


Student Name: Devansh Gupta Enrollment No : EN22CS301325

67
CS3EL15 (P): Machine learning Laboratory Experiment no- 5
Experiment : Execute the Naïve Bayes algorithm with suitable data set and do Page 68 of 72
proper analysis on the result. Also implement Naïve Bayes algorithm using
python.
Let’s say we want to classify an email as spam or not spam based on words in the email.
 Suppose we have two categories: Spam (S) and Not Spam (NS).
 We see the word "Free" in an email.
 We want to calculate:

Breaking it down:
1. P(Spam) → Probability that any random email is spam.
2. P(Free | Spam) → Probability that the word "Free" appears in spam emails.
3. P(Free) → Probability that the word "Free" appears in all emails.
If P(Spam∣Free)P(Spam | Free)P(Spam∣Free) is high, we classify the email as spam;
otherwise, it's not spam.

2.3 Types of Naïve Bayes Classifiers


1. Gaussian Naïve Bayes (Used for continuous data, assumes normal distribution).
2. Multinomial Naïve Bayes (Used for text classification like spam filtering).
3. Bernoulli Naïve Bayes (Used for binary feature classification like word
presence/absence).

2.4 Advantages:
 Simple and Fast:
o Naïve Bayes is easy to implement and computationally efficient.
o Works well even with small datasets.

 Performs Well with High-Dimensional Data:


o Suitable for text classification (e.g., spam filtering, sentiment analysis).
 Works Well with Small Data:
o Unlike deep learning, it does not require a large dataset to perform well.
 Handles Missing Data:
Department of Computer Science & Engineering
Student Name: Devansh Gupta Enrollment No : EN22CS301325

68
CS3EL15 (P): Machine learning Laboratory Experiment no- 5
Experiment : Execute the Naïve Bayes algorithm with suitable data set and do Page 69 of 72
proper analysis on the result. Also implement Naïve Bayes algorithm using
python.
o Since it uses probability calculations, missing values don't impact performance
significantly.

 Performs Well for Multi-Class Classification:


o Works efficiently for problems with multiple output classes (e.g., Iris dataset
with 3 classes).

2.5 Disadvantages:
 Assumption of Feature Independence:
o The "naïve" assumption that features are independent is often unrealistic,
affecting accuracy.
 Zero Probability Problem:
o If a category is missing from the training data, Naïve Bayes assigns it a zero
probability.
o This is handled using Laplace Smoothing.
 Limited with Continuous Data:
o If data is continuous and does not follow a normal distribution, Gaussian
Naïve Bayes may not perform well.
 Sensitive to Irrelevant Features:
o If unnecessary features are present, they can affect performance.

2.6 When to Use Naïve Bayes?


 Spam Filtering (Emails: Spam vs. Not Spam)
 Sentiment Analysis (Positive, Negative, Neutral reviews)
 Medical Diagnosis (Disease prediction based on symptoms)
 News Classification (Categorizing news articles)

3. Code
import numpy as np
import pandas as pd

Department of Computer Science & Engineering


Student Name: Devansh Gupta Enrollment No : EN22CS301325

69
CS3EL15 (P): Machine learning Laboratory Experiment no- 5
Experiment : Execute the Naïve Bayes algorithm with suitable data set and do Page 70 of 72
proper analysis on the result. Also implement Naïve Bayes algorithm using
python.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the Iris dataset


iris = datasets.load_iris()
X, y = iris.data, iris.target # Features and labels

# Split the dataset into training (80%) and testing (20%)


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the Naïve Bayes classifier (Gaussian Naïve Bayes for continuous data)
nb_classifier = GaussianNB()

# Train the model


nb_classifier.fit(X_train, y_train)

# Make predictions on the test set


y_pred = nb_classifier.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
print(f"Naïve Bayes Model Accuracy: {accuracy:.2f}")
# Print classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Department of Computer Science & Engineering


Student Name: Devansh Gupta Enrollment No : EN22CS301325

70
CS3EL15 (P): Machine learning Laboratory Experiment no- 5
Experiment : Execute the Naïve Bayes algorithm with suitable data set and do Page 71 of 72
proper analysis on the result. Also implement Naïve Bayes algorithm using
python.
# Print confusion matrix
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

3.1 output of the Naïve bayes :-

Department of Computer Science & Engineering


Student Name: Devansh Gupta Enrollment No : EN22CS301325

71
CS3EL15 (P): Machine learning Laboratory Experiment no- 5
Experiment : Execute the Naïve Bayes algorithm with suitable data set and do Page 72 of 72
proper analysis on the result. Also implement Naïve Bayes algorithm using
python.

Summary of Model Performance


 Overall Accuracy: 97% (which is very high).
 Misclassification: Only 1 wrong prediction out of 30 test samples.
 Setosa (Class 0) was perfectly classified with 100% accuracy.
 Versicolor (Class 1) and Virginica (Class 2) had one misclassification but still had
high accuracy.
 Naïve Bayes performed well because the Iris dataset has well-separated features,
making it easy for the model to distinguish between classes.

Department of Computer Science & Engineering


Student Name: Devansh Gupta Enrollment No : EN22CS301325

72

You might also like