0% found this document useful (0 votes)
27 views12 pages

Exp 3 Bi

1. The document describes implementing and evaluating a Naive Bayes classification algorithm using Python. It discusses the theory behind Naive Bayes, including Bayes' theorem. 2. The Python implementation includes data preprocessing, fitting a Gaussian Naive Bayes classifier to the training data, predicting the test results, evaluating accuracy using a confusion matrix, and visualizing the test results. 3. The results show the Naive Bayes classifier achieving an accuracy score of 90% on the test data, with 10 incorrect predictions out of 100 total predictions. A decision boundary plot visualizes the classifier's predictions.

Uploaded by

Smaranika Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views12 pages

Exp 3 Bi

1. The document describes implementing and evaluating a Naive Bayes classification algorithm using Python. It discusses the theory behind Naive Bayes, including Bayes' theorem. 2. The Python implementation includes data preprocessing, fitting a Gaussian Naive Bayes classifier to the training data, predicting the test results, evaluating accuracy using a confusion matrix, and visualizing the test results. 3. The results show the Naive Bayes classifier achieving an accuracy score of 90% on the test data, with 10 incorrect predictions out of 100 total predictions. A decision boundary plot visualizes the classifier's predictions.

Uploaded by

Smaranika Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

EXPERIMENT NO.

03
Aim: Implement and evaluate using Python
a) Classification Algorithm – Naïve Bayes

Date of Performance: Date of Submission:

THEORY
Naive Bayes Classifier Algorithm
Naive Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem
and used for solving classification problems.It is mainly used in text classification that includes
a high-dimensional training dataset.Naïve Bayes Classifier is one of the simple and most
effective Classification algorithms which helps in building the fast machine learning models
that can make quick predictions.It is a probabilistic classifier, which means it predicts on the
basis of the probability of an object.Some popular examples of Naïve Bayes Algorithm are
spam filtration, Sentimental analysis, and classifying articles.

Why is it called Naïve Bayes?


The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:
Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the bases
of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an apple.
Hence each feature individually contributes to identify that it is an apple without depending
on each other.
Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

Bayes' Theorem
Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
The formula for Bayes' theorem is given as:

Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B. P(B|
A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.

Alam Umar|Roll no.16|A1|BI Lab|TE-IT Pg.1


P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.

Python Implementation of the Naïve Bayes algorithm


Now we will implement a Naive Bayes Algorithm using Python. So for this, we will use the
"user_data" dataset, which we have used in our other classification model. Therefore we can
easily compare the Naive Bayes model with the other models.
Steps to implement:
o Data Pre-processing step
o Fitting Naive Bayes to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.

Data Pre-processing step


In this step, we will pre-process/prepare the data so that we can use it efficiently in our
code. It is similar as we did in data-pre-processing. The code for this is given below:
Data Preprocessing
import pandas as pd
from sklearn.model_selection import
train_test_split from sklearn.preprocessing import
LabelEncoder

# Load the
dataset try:
user_data = pd.read_csv("userdata.csv") # Change the file path accordingly
except FileNotFoundError:
print("Error: File not found.")
exit()
# Check if the 'target' column
exists if 'target' not in
user_data.columns:
print("Error: 'target' column not found in the dataset.")
exit()
# Split dataset into features and labels
X = user_data.drop(columns=['target']) #
Features y = user_data['target'] # Labels
# Encode categorical labels
label_encoder =
LabelEncoder()
y = label_encoder.fit_transform(y)
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Alam Umar|Roll no.16|A1|BI Lab|TE-IT Pg.2
In the above code, we have loaded the dataset into our program using "dataset =
pd.read_csv('user_data.csv'). The loaded dataset is divided into training and test set, and
then we have scaled the feature variable.

The output for the dataset is given as:

Fitting Naive Bayes to the Training Set:


After the pre-processing step, now we will fit the Naive Bayes model to the Training set.
Below is the code for it:
Fitting Naive Bayes to the Training set from
sklearn.naive_bayes import GaussianNB
# Create a Naive Bayes
classifier classifier =
GaussianNB()

# Train the classifier


classifier.fit(X_train,
y_train)
In the above code, we have used the GaussianNB classifier to fit it to the training dataset.
We can also use other classifiers as per our requirement

Alam Umar|Roll no.16|A1|BI Lab|TE-IT Pg.3


Output:

Prediction of the test set result:


Now we will predict the test set result. For this, we will create a new predictor variable y_pred,
and will use the predict function to make the predictions.
Predicting the test result y_pred =
classifier.predict(X_test)
Creating Confusion Matrix:
Now we will check the accuracy of the Naive Bayes classifier using the Confusion
matrix. Below is the code for it:
Test accuracy of the result (Creation of Confusion matrix) from
sklearn.metrics import confusion_matrix, accuracy_score
# Calculate confusion matrix
cm = confusion_matrix(y_test,
y_pred) # Calculate accuracy score
accuracy = accuracy_score(y_test,
y_pred) # Print confusion matrix and
accuracy print("Confusion Matrix:")
print(cm)
print("\nAccuracy:", accuracy)

Output:

Alam Umar|Roll no.16|A1|BI Lab|TE-IT Pg.4


As we can see in the above confusion matrix output, there are 7+3= 10 incorrect predictions,
and 65+25=90 correct predictions.

Visualizing the training set result


Next we will visualize the training set result using Naïve Bayes Classifier. Below is the
code for it:
Visualizing the test set result
import matplotlib.pyplot as plt
import numpy as np
# Define function to plot decision regions
def plot_decision_regions(X, y, classifier,
resolution=0.02): markers = ('s', 'x', 'o', '^', 'v')
colors = ('red', 'blue', 'lightgreen', 'gray',
'cyan') cmap = plt.get_cmap('Pastel2')

x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1


x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
np.arange(x2_min, x2_max, resolution))
Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
Z = Z.reshape(xx1.shape)
plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max())
for idx, cl in enumerate(np.unique(y)):
plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1],
alpha=0.8, c=[colors[idx]],
marker=markers[idx], label=cl)
# Plot decision regions (assuming only two
features) if X_test.shape[1] == 2:
plt.figure(figsize=(10, 6))
plot_decision_regions(X_test.values, y_test, classifier=classifier)
plt.title('Naive Bayes - Test set')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend(loc='upper right')
plt.show()
else:
print("Cannot visualize decision regions as the dataset has more than two features.")

Alam Umar|Roll no.16|A1|BI Lab|TE-IT Pg.5


Output:

In the above output we can see that the Naïve Bayes classifier has segregated the data points
with the fine boundary. It is Gaussian curve as we have used GaussianNB classifier in our code.
Alam Umar|Roll no.16|A1|BI Lab|TE-IT Pg.6
CONCLUSION

Alam Umar|Roll no.16|A1|BI Lab|TE-IT Pg.7


1) Fitting Naive Bayes to the Training Set:
After the pre-processing step, now we will fit the Naive Bayes model to the Training set.
Below is the code for it:

# Step 2: Fitting Naive Bayes to the Training


set from sklearn.naive_bayes import
GaussianNB

# Create a Naive Bayes classifier


classifier = GaussianNB()

# Train the classifier


classifier.fit(X_train, y_train)
In the above code, we have used the GaussianNB classifier to fit it to the training dataset. We
can also use other classifiers as per our requirement

Alam Umar|Roll no.16|A1|BI Lab|TE-IT Pg.8


Output:

2) Prediction of the test set result:


Now we will predict the test set result. For this, we will create a new predictor variable y_pred,
and will use the predict function to make the predictions.

# Step 3: Predicting the test result


y_pred = classifier.predict(X_test)

3) Creating Confusion Matrix:


Now we will check the accuracy of the Naive Bayes classifier using the Confusion matrix.
Below is the code for it:

# Step 4: Test accuracy of the result (Creation of Confusion


matrix) from sklearn.metrics import confusion_matrix,
accuracy_score
# Calculate confusion matrix
cm = confusion_matrix(y_test, y_pred)
# Calculate accuracy score
accuracy = accuracy_score(y_test, y_pred)
# Print confusion matrix and accuracy
print("Confusion Matrix:")
print(cm)
print("\nAccuracy:", accuracy)

Output:

Alam Umar|Roll no.16|A1|BI Lab|TE-IT Pg.9


As we can see in the above confusion matrix output, there are 7+3= 10 incorrect predictions,
and 65+25=90 correct predictions.

4) Visualizing the training set result:


Next we will visualize the training set result using Naïve Bayes Classifier. Below is the code
for it:

# Step 5: Visualizing the test set


result import matplotlib.pyplot as plt
import numpy as np
# Define function to plot decision regions
def plot_decision_regions(X, y, classifier,
resolution=0.02): markers = ('s', 'x', 'o', '^', 'v')
colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
cmap = plt.get_cmap('Pastel2')

x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1


x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
np.arange(x2_min, x2_max, resolution))
Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
Z = Z.reshape(xx1.shape)
plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max())
for idx, cl in enumerate(np.unique(y)):
plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1],
alpha=0.8, c=[colors[idx]],
marker=markers[idx], label=cl)
# Plot decision regions (assuming only two features)
if X_test.shape[1] == 2:
plt.figure(figsize=(10, 6))
plot_decision_regions(X_test.values, y_test, classifier=classifier)
plt.title('Naive Bayes - Test set')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend(loc='upper right')
plt.show()
else:
print("Cannot visualize decision regions as the dataset has more than two features.")

Alam Umar|Roll no.16|A1|BI Lab|TE-IT


Pg.10
Output:

In the above output we can see that the Naïve Bayes classifier has segregated the data points
with the fine boundary. It is Gaussian curve as we have used GaussianNB classifier in our code.
Alam Umar|Roll no.16|A1|BI Lab|TE-IT
Pg.11
CONCLUSION

Alam Umar|Roll no.16|A1|BI Lab|TE-IT


Pg.12

You might also like