Assignment - 01
Assignment - 01
: 6107
Subject: 510302 - BDS
ASSIGNMENT: 01
Aim: Implement Naive Bayes algorithm, using Java/Python/R to classify a dataset from UCI repository. (Do not
use built-in functions for naive bayes). Compare the performance of your implementation with the Naive Bayes
classifier from the Weka tool/R/Python. Present the Confusion matrix for each classifier. For measuring
performance use at least five metrics such as accuracy, precision, recall, F-measure etc.
Requirements:
• Software: PyCharm Professional
• Libraries: Pandas, Scikit-Learn, Seaborn, Matplotlib, and NumPy
• Dataset: Iris dataset from UCI repository.
Theory: Naive Bayes is a probabilistic classifier based on Bayes' Theorem, assuming independence between
features. It calculates the posterior probability of each class by combining prior probabilities and the
likelihood of observed data. Despite its simplicity, Naive Bayes is highly effective for classification tasks,
especially in text classification and medical diagnosis, due to its efficiency and reasonable accuracy even with
small datasets.
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np
from collections import defaultdict
class NaiveBayes:
def __init__(self):
self.classes = None
self.mean = defaultdict(list)
self.variance = defaultdict(list)
self.priors = {}
Precision: 1.00
Recall: 1.00
F1 Score: 1.00
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
Conclusion: Naive Bayes is a simple yet powerful probabilistic classifier that assumes independence between
features. It uses Bayes' Theorem to calculate the probability of each class and is particularly effective in text
classification and medical diagnosis due to its efficiency and reasonable accuracy. Despite its simplicity, it
often performs well, even on complex datasets.
Evaluation for Manual Naive Bayes:
Accuracy: 1.00
Precision: 1.00
Recall: 1.00
F1 Score: 1.00
Accuracy: 1.00
Precision: 1.00
Recall: 1.00
F1 Score: 1.00