0% found this document useful (0 votes)
16 views

Assignment - 01

Uploaded by

DHRUV TILLU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Assignment - 01

Uploaded by

DHRUV TILLU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Name: Dhruv Jayant Tillu Roll No.

: 6107
Subject: 510302 - BDS

ASSIGNMENT: 01
Aim: Implement Naive Bayes algorithm, using Java/Python/R to classify a dataset from UCI repository. (Do not
use built-in functions for naive bayes). Compare the performance of your implementation with the Naive Bayes
classifier from the Weka tool/R/Python. Present the Confusion matrix for each classifier. For measuring
performance use at least five metrics such as accuracy, precision, recall, F-measure etc.

Requirements:
• Software: PyCharm Professional
• Libraries: Pandas, Scikit-Learn, Seaborn, Matplotlib, and NumPy
• Dataset: Iris dataset from UCI repository.

Theory: Naive Bayes is a probabilistic classifier based on Bayes' Theorem, assuming independence between
features. It calculates the posterior probability of each class by combining prior probabilities and the
likelihood of observed data. Despite its simplicity, Naive Bayes is highly effective for classification tasks,
especially in text classification and medical diagnosis, due to its efficiency and reasonable accuracy even with
small datasets.

Code:
import pandas as pd
from sklearn.model_selection import train_test_split

# Load the Iris dataset


url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
dataset = pd.read_csv(url, names=names)

# Split dataset into training and testing


X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

import numpy as np
from collections import defaultdict

class NaiveBayes:
def __init__(self):
self.classes = None
self.mean = defaultdict(list)
self.variance = defaultdict(list)
self.priors = {}

def fit(self, X, y):


self.classes = np.unique(y)
for cls in self.classes:
X_cls = X[y == cls]
self.mean[cls] = X_cls.mean(axis=0)
self.variance[cls] = X_cls.var(axis=0)
self.priors[cls] = X_cls.shape[0] / X.shape[0]
Name: Dhruv Jayant Tillu Roll No.: 6107
Subject: 510302 - BDS

def calculate_likelihood(self, mean, var, x):


eps = 1e-4 # to avoid division by zero
coeff = 1.0 / np.sqrt(2.0 * np.pi * var + eps)
exponent = np.exp(-((x - mean) ** 2) / (2 * var + eps))
return coeff * exponent

def calculate_posterior(self, X):


posteriors = []
for cls in self.classes:
prior = np.log(self.priors[cls])
likelihood = np.sum(np.log(self.calculate_likelihood(self.mean[cls],
self.variance[cls], X)))
posterior = prior + likelihood
posteriors.append(posterior)
return self.classes[np.argmax(posteriors)]

def predict(self, X):


y_pred = [self.calculate_posterior(x) for x in X]
return np.array(y_pred)

# Train the model


nb_manual = NaiveBayes()
nb_manual.fit(X_train, y_train)
y_pred_manual = nb_manual.predict(X_test)

from sklearn.naive_bayes import GaussianNB

# Train the model


nb_sklearn = GaussianNB()
nb_sklearn.fit(X_train, y_train)
y_pred_sklearn = nb_sklearn.predict(X_test)

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,


confusion_matrix

def evaluate(y_true, y_pred, model_name):


accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred, average='weighted')
recall = recall_score(y_true, y_pred, average='weighted')
f1 = f1_score(y_true, y_pred, average='weighted')
cm = confusion_matrix(y_true, y_pred)

print(f"Evaluation for {model_name}:")


print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
print(f"Confusion Matrix:\n{cm}\n")
return accuracy, precision, recall, f1, cm

# Evaluate both models


metrics_manual = evaluate(y_test, y_pred_manual, "Manual Naive Bayes")
metrics_sklearn = evaluate(y_test, y_pred_sklearn, "Scikit-Learn Naive Bayes")

Evaluation for Manual Naive Bayes:


Accuracy: 1.00
Name: Dhruv Jayant Tillu Roll No.: 6107
Subject: 510302 - BDS

Precision: 1.00
Recall: 1.00
F1 Score: 1.00
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]

Evaluation for Scikit-Learn Naive Bayes:


Accuracy: 1.00
Precision: 1.00
Recall: 1.00
F1 Score: 1.00
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]

# Plot the confusion matrix both models


import matplotlib.pyplot as plt
import seaborn as sns
sns.heatmap(metrics_manual[4], annot=True, cmap='Blues', fmt='g')
plt.title("Confusion Matrix for Manual Naive Bayes")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()

sns.heatmap(metrics_sklearn[4], annot=True, cmap='Blues', fmt='g')


plt.title("Confusion Matrix for Scikit-Learn Naive Bayes")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()
Name: Dhruv Jayant Tillu Roll No.: 6107
Subject: 510302 - BDS

Conclusion: Naive Bayes is a simple yet powerful probabilistic classifier that assumes independence between
features. It uses Bayes' Theorem to calculate the probability of each class and is particularly effective in text
classification and medical diagnosis due to its efficiency and reasonable accuracy. Despite its simplicity, it
often performs well, even on complex datasets.
Evaluation for Manual Naive Bayes:

Accuracy: 1.00

Precision: 1.00

Recall: 1.00

F1 Score: 1.00

Evaluation for Scikit-Learn Naive Bayes:

Accuracy: 1.00

Precision: 1.00

Recall: 1.00

F1 Score: 1.00

You might also like