0% found this document useful (0 votes)

4 views16 pages

Naïve Bayes Classification in Python

The document provides a comprehensive guide on implementing Naïve Bayes classification in Python, detailing the algorithm's foundation on Bayes' theorem and the assumption of feature independence. It outlines the steps for data preparation, including importing libraries, data analysis, preprocessing, model training, and evaluation using various metrics. The guide emphasizes the importance of understanding model performance through accuracy, confusion matrices, and precision-recall curves.

Uploaded by

anbu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views16 pages

Naïve Bayes Classification in Python

Uploaded by

anbu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

Naïve Bayes Classification in Python

Shuvrajyoti Debroy · Follow
11 min read · Feb 9, 2023

Listen Share More

Machine Learning Classification Algorithm

Background Image Source: Analytics Insight

Introduction
Naive Bayes is a classification algorithm that is based on Bayes’ theorem. Bayes’
theorem states that the probability of an event is equal to the prior probability of the
event multiplied by the likelihood of the event given some evidence. In the context
of classification, this means that we are trying to find the class that is most likely
given a set of features or attributes.

https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 1/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

Naive Bayes assumes that the features are independent of each other, meaning that
the presence or absence of one feature does not affect the presence or absence of
another feature. This simplifies the calculation of the likelihood of the features, as
we can calculate the likelihood of each feature separately and then multiply them
together.

Image Source: Techleer

Implement Naïve Bayes Classification in Python

In this example, we will use the social network ads data concerning the Gender, Age,
and Estimated Salary of several users and based on these data we would classify each
user whether they would purchase the insurance or not.

Step 1: Import libraries

We need Pandas for data manipulation, NumPy for mathematical calculations,
MatplotLib, and Seaborn for visualizations. Sklearn libraries are used for machine
learning operations

# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import seaborn as sns
from sklearn.preprocessing import LabelEncoder

https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 2/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score

Step 2: Import data

Download the dataset from here and upload it to your notebook and read it into the
pandas dataframe.

# Read dataset
df_net = pd.read_csv('/content/Social_Network_Ads.csv')
df_net.head()

Step 3: Data Analysis / Preprocessing

Exploratory Data Analysis (EDA) is a process of analyzing and summarizing the
main characteristics of a dataset, with the goal of gaining insight into the underlying
structure, relationships, and patterns within the data. EDA helps to identify
important features, anomalies, and trends in the data that can inform further
analysis and modeling.

EDA typically involves several key steps, including:

Data cleaning and preparation involve removing missing or incorrect values,

transforming variables, and handling outliers.

https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 3/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

Data visualization is the process of creating graphs, charts, and other visual
representations of the data to help identify patterns, relationships, and
anomalies.

Statistical analysis involves applying mathematical and statistical methods to

the data to identify important features and relationships.

Preprocessing aims to prepare the data in a way that will enable effective analysis
and modeling and remove any biases or errors that may affect the results.

Get required data

We don’t need the User ID column so we can drop it.

# Get required data

df_net.drop(columns = ['User ID'], inplace=True)
df_net.head()

Describe data
Get statistical description of data using Pandas describe() function. It shows us the
count, mean, standard deviation, and range of data.

# Describe data
df_net.describe()

https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 4/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

Distribution of data
Check data distribution.

# Salary distribution
sns.distplot(df_net['EstimatedSalary'])

Label encoding
Label encoding is a preprocessing technique in machine learning and data analysis
where categorical data is converted into numerical values, to make it compatible
with mathematical operations and models.

https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 5/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

The categorical data is assigned an integer value, typically starting from 0, and each
unique category in the data is given a unique integer value so that the categorical
data can be treated as numerical data.

# Label encoding
le = LabelEncoder()
df_net['Gender']= le.fit_transform(df_net['Gender'])

Correlation matrix
A correlation matrix is a table that summarizes the relationship between multiple
variables in a dataset. It shows the correlation coefficients between each pair of
variables, which indicate the strength and direction of the relationship between the
variables. It is useful for identifying highly correlated variables and selecting a
subset of variables for further analysis.

The correlation coefficient can range from -1 to 1, where:

A correlation coefficient of -1 indicates a strong negative relationship between

two variables

A correlation coefficient of 0 indicates no relationship between two variables

A correlation coefficient of 1 indicates a strong positive relationship between two

variables

# Correlation matrix
df_net.corr()
sns.heatmap(df_net.corr())

https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 6/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

Drop insignificant data

From the correlation matrix, we see that Gender is not correlated to other attributes
so we can drop that too.

# Drop Gender column

df_net.drop(columns=['Gender'], inplace=True)

Step 4: Split data

Splitting data into independent and dependent variables involves separating the
input features (independent variables) from the target variable (dependent variable).
The independent variables are used to predict the value of the dependent variable.

The data is then split into a training set and a test set, with the training set used to fit
the model and the test set used to evaluate its performance.

Independent / Dependent variables

https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 7/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

In our data Age, EstimatedSalary is the independent variable assigned as X, and

Purchased is the dependent variable y.

# Split data into dependent/independent variables

X = df_net.iloc[:, :-1].values
y = df_net.iloc[:, -1].values

Train / Test split

The data is usually divided into two parts, with the majority of the data used for
training the model and a smaller portion used for testing.

The training set is used to train the model and find the optimal parameters. The
model is then tested on the test set to evaluate its performance and determine its
accuracy. This is important because if the model is trained and tested on the same
data, it may over-fit the data and perform poorly on new, unseen data.

We have split the data into 75% for training and 25% for testing.

to test/train set
y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = True)

 

Step 5: Feature scaling

Feature scaling is a method of transforming the values of numeric variables so that
they have a common scale as machine learning algorithms are sensitive to the scale
of the input features.

There are two common methods of feature scaling: normalization and

standardization.

Normalization scales the values of the variables so that they fall between 0 and
1. This is done by subtracting the minimum value of the feature and dividing it
by the range (max-min).

https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 8/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

Standardization transforms the values of the variables so that they have a mean
of 0 and a standard deviation of 1. This is done by subtracting the mean and
dividing it by the standard deviation.

Feature scaling is usually performed before training a model, as it can improve the
performance of the model and reduce the time required to train it, and helps to
ensure that the algorithm is not biased towards variables with larger values.

# Scale dataset
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Step 6: Train model

Training a machine learning model involves using a training dataset to estimate the
parameters of the model. The training process uses a learning algorithm that
iteratively updates the model parameters, minimizes a loss function, which
measures the difference between the predicted values and the actual values in the
training data, and updates the model parameters to improve the accuracy of the
model.

It’s important to note that the SVM algorithm requires feature scaling and proper
choice of kernel functions and regularization parameters to produce accurate
predictions.

Pass the X_train and y_train data into the Naïve Bayes classifier model by classifier.fit
to train the model with our training data.

# Classifier
classifier = GaussianNB()
classifier.fit(X_train, y_train)

Step 7: Predict result / Score model

Once the likelihood of the features for each class is calculated, the algorithm
multiplies the likelihood by the prior probability of each class, which is estimated

https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 9/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

from the training data. The class with the highest probability is then selected as the
predicted class.

The accuracy of the model can be evaluated on a test set, which was previously held
out from the training process.

# Prediction
y_pred = classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test

 

Step 8: Evaluate model

Accuracy is a useful metric for assessing the performance of a model, but it can be
misleading in some cases. For example, in a highly imbalanced dataset, a model that
always predicts the majority class will have high accuracy, even though it may not be
performing well. Therefore, it is important to consider other metrics, such as
confusion matrix, precision, recall, F1-score, and ROC-AUC, along with accuracy, to
get a more complete picture of the performance of a model.

Accuracy
Accuracy is a commonly used metric for evaluating the performance of a machine
learning model. It measures the proportion of correct predictions made by the
model on a given dataset.

In a binary classification problem, accuracy is defined as the number of correct

predictions divided by the total number of predictions. In a multi-class classification
problem, accuracy is the average of the individual class accuracy scores.

# Accuracy
accuracy_score(y_test, y_pred)

https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 10/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

Classification report
A classification report is a summary of the performance of a classification model. It
provides several metrics for evaluating the performance of the model on a
classification task, including precision, recall, f1-score, and support.

The classification report also provides a weighted average of the individual class
scores, which takes into account the imbalance in the distribution of classes in the
dataset.

# Classification report
print(f'Classification Report: \n{classification_report(y_test, y_pred)}')

F1 score
F1-score is the harmonic mean of precision and recall. It provides a single score that
balances precision and recall. Support is the number of instances of each class in
the evaluation dataset.

# F1 score
print(f"F1 Score : {f1_score(y_test, y_pred)}")

Confusion matrix
https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 11/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

A confusion matrix is used to evaluate the performance of a classification model. It

summarizes the model’s performance by comparing the actual class labels of the
data to the predicted class labels generated by the model.

True Positives (TP): Correctly predicted positive instances.

False Positives (FP): Incorrectly predicted positive instances.
True Negatives (TN): Correctly predicted negative instances.
False Negatives (FN): Incorrectly predicted negative instances.

It provides a clear and detailed understanding of how well the model is performing
and helps to identify areas of improvement.

# Confusion matrix
cf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(cf_matrix, annot=True, fmt='d', cmap='Blues', cbar=False)

Precision-Recall curve
A precision-recall curve is a plot that summarizes the performance of a binary
classification model as a trade-off between precision and recall and is useful for
evaluating the model’s ability to make accurate positive predictions while finding as
many positive instances as possible. Precision and Recall are two common metrics
for evaluating the performance of a classification model.

https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 12/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

Precision is the number of true positive predictions divided by the sum of true
positive and false positive predictions. It measures the accuracy of the positive
predictions made by the model.

Recall is the number of true positive predictions divided by the sum of true positive
and false negative predictions. It measures the ability of the model to find all positive
instances.

# Plot Precision-Recall Curve

y_pred_proba = classifier.predict_proba(X_test)[:,1]
precision, recall, thresholds = precision_recall_curve(y_test, y_pred_proba)

fig, ax = plt.subplots(figsize=(6,6))
ax.plot(recall, precision, label='Naive Bayes Classification', color = 'firebri
ax.set_title('Precision-Recall Curve')
ax.set_xlabel('Recall')
ax.set_ylabel('Precision')
plt.box(False)
ax.legend();

 

AUC/ROC curve
https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 13/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

The Receiver Operating Characteristic (ROC) curve and the Area Under the Curve (AUC)
are commonly used metrics for evaluating the performance of a binary
classification model.

A ROC curve plots the True Positive Rate (TPR) versus the False Positive Rate (FPR) for
different thresholds of the model’s prediction probabilities. The TPR is the number
of true positive predictions divided by the number of actual positive instances, while
the FPR is the number of false positive predictions divided by the number of actual
negative instances.

The AUC is the area under the ROC curve and provides a single-number metric that
summarizes the performance of the model over the entire range of possible
thresholds.

A high AUC indicates that the model is able to distinguish positive instances from
negative instances well.

# Plot AUC/ROC curve

y_pred_proba = classifier.predict_proba(X_test)[:,1]
fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred_proba)

fig, ax = plt.subplots(figsize=(6,6))
ax.plot(fpr, tpr, label='Naive Bayes Classification', color = 'firebrick')
ax.set_title('ROC Curve')
ax.set_xlabel('False Positive Rate')
ax.set_ylabel('True Positive Rate')
plt.box(False)
ax.legend();

https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 14/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

Open in app

Visualization predictions

Prediction results on the training set

Prediction results on the test set

https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 15/24
4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

Example
Let’s see with an example of an Age of 45 and a Salary of 97000 and check if the user
is likely to purchase the insurance or not.

# Predict purchase with Age(45) and Salary(97000)

print(classifier.predict(sc.transform([[45, 97000]])))

Predicted value [1] means the user is going to purchase the insurance.

Full Code at GitHub

You can get the full code in my GitHub repository.

Data-Science/Bayes_Theorem.ipynb at main · shuv50/Data-

Science
You can't perform that action at this time. You signed in with another
tab or window. You signed out in another tab or…
github.com

Conclusion
Naive Bayes is a fast and simple algorithm that is widely used for text classification,
spam filtering, and sentiment analysis. It is also easy to implement and can handle

https://medium.com/@shuv.sdr/naïve-bayes-classification-in-python-f869c2e0dbf1 16/24

DWM Exp 4
No ratings yet
DWM Exp 4
7 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
9 Supervised Learning - II
No ratings yet
9 Supervised Learning - II
55 pages
Artificial Intelligence Unit-1 Introduction and Intelligent Agents
100% (2)
Artificial Intelligence Unit-1 Introduction and Intelligent Agents
16 pages
Naive Bayes
No ratings yet
Naive Bayes
11 pages
DATA - FA 2024 - Dist
No ratings yet
DATA - FA 2024 - Dist
85 pages
Strassen's Matrix Multiplication
100% (1)
Strassen's Matrix Multiplication
12 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
2 Classification
No ratings yet
2 Classification
38 pages
Mla Unit-5'2
No ratings yet
Mla Unit-5'2
74 pages
8 Classification
No ratings yet
8 Classification
45 pages
Lecture3 Linear Classifiers
No ratings yet
Lecture3 Linear Classifiers
36 pages
Lecture 6 - Generative Models
No ratings yet
Lecture 6 - Generative Models
33 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
03 Classification
No ratings yet
03 Classification
66 pages
Machine Ass
No ratings yet
Machine Ass
33 pages
Unit-4 Naïve Bayes & Support Vector Machine
No ratings yet
Unit-4 Naïve Bayes & Support Vector Machine
79 pages
NB Classifier & Bayesian Network 2
No ratings yet
NB Classifier & Bayesian Network 2
37 pages
Unit 3
No ratings yet
Unit 3
20 pages
Ds Notes Mca
No ratings yet
Ds Notes Mca
30 pages
CSL0777 L24
No ratings yet
CSL0777 L24
38 pages
MILIT PPT Modifies
No ratings yet
MILIT PPT Modifies
43 pages
Machine Learning Path
No ratings yet
Machine Learning Path
21 pages
ML Python Exercises UOM BDS Classification
No ratings yet
ML Python Exercises UOM BDS Classification
18 pages
Naive Bayes Algorithm For Classification Tasks: Sana Badagan 1MS24RAI09
No ratings yet
Naive Bayes Algorithm For Classification Tasks: Sana Badagan 1MS24RAI09
31 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
Bayes' Theorem Explained
No ratings yet
Bayes' Theorem Explained
18 pages
Naive Bayes Model With Python 1684166563
No ratings yet
Naive Bayes Model With Python 1684166563
9 pages
Lab 03
No ratings yet
Lab 03
10 pages
Practical 3
No ratings yet
Practical 3
11 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
Annotated Follow-Along Guide - Construct A Naive Bayes Model With Python
No ratings yet
Annotated Follow-Along Guide - Construct A Naive Bayes Model With Python
9 pages
Bayes Classification
No ratings yet
Bayes Classification
8 pages
Naive Bayes Classifier in Machine Learning Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning Javatpoint
23 pages
Naive Bates Classifier
No ratings yet
Naive Bates Classifier
18 pages
Course Tittle:-Project Title:-: Object Oriented Software Analysis and Design
100% (1)
Course Tittle:-Project Title:-: Object Oriented Software Analysis and Design
24 pages
Naive Bayes Classifier in Machine Learning
No ratings yet
Naive Bayes Classifier in Machine Learning
16 pages
Ambo University Woliso Campus
100% (1)
Ambo University Woliso Campus
6 pages
07 Naive - Bayes
No ratings yet
07 Naive - Bayes
7 pages
Classification With Bayes
No ratings yet
Classification With Bayes
12 pages
Ame: Waqar Ali
No ratings yet
Ame: Waqar Ali
22 pages
Naive Bayes Classifier Presentation
No ratings yet
Naive Bayes Classifier Presentation
10 pages
Naive Bayes Classifiers - Parta
No ratings yet
Naive Bayes Classifiers - Parta
17 pages
ML Lab1 PGM
No ratings yet
ML Lab1 PGM
4 pages
ML 9
No ratings yet
ML 9
15 pages
Bwu Bta 21 289
No ratings yet
Bwu Bta 21 289
10 pages
Unit-Iv Data Classification: Data Warehousing and Data Mining
No ratings yet
Unit-Iv Data Classification: Data Warehousing and Data Mining
7 pages
Purva Rawale - BDA Practical No 2
No ratings yet
Purva Rawale - BDA Practical No 2
9 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
11 pages
An Introduction To Naive Bayes Algorithm For Beginners
No ratings yet
An Introduction To Naive Bayes Algorithm For Beginners
11 pages
LAB08 Bayes Theory
No ratings yet
LAB08 Bayes Theory
4 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
11 pages
Report Ai
No ratings yet
Report Ai
7 pages
Supervised Machine Learning Unit 3
No ratings yet
Supervised Machine Learning Unit 3
8 pages
Classification by Decision Tree
No ratings yet
Classification by Decision Tree
2 pages
Naive Bayes - Report (Repaired)
No ratings yet
Naive Bayes - Report (Repaired)
5 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
6 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
Dms-Mba Data Analytics-Syllabus
No ratings yet
Dms-Mba Data Analytics-Syllabus
103 pages
6 Easy Steps To Learn Naive Bayes Algorithm (With Code in Python)
No ratings yet
6 Easy Steps To Learn Naive Bayes Algorithm (With Code in Python)
3 pages
Java Swing Intro
No ratings yet
Java Swing Intro
76 pages
CLI Reference v12 3
No ratings yet
CLI Reference v12 3
314 pages
Advanced View Arduino Projects List - Use Arduino For Projects
No ratings yet
Advanced View Arduino Projects List - Use Arduino For Projects
97 pages
Mini Project Report MD
No ratings yet
Mini Project Report MD
43 pages
PSPD LAB ACTIVITY 3A
50% (2)
PSPD LAB ACTIVITY 3A
6 pages
Local Search
No ratings yet
Local Search
37 pages
Java R23 - UNIT-3
No ratings yet
Java R23 - UNIT-3
34 pages
Carrental Project Proposal
No ratings yet
Carrental Project Proposal
3 pages
Online Management Information System With Appointment System With AI Powered Chatbot
No ratings yet
Online Management Information System With Appointment System With AI Powered Chatbot
38 pages
SE Answer Key
No ratings yet
SE Answer Key
17 pages
04 LD 301
No ratings yet
04 LD 301
12 pages
Pythonpython
No ratings yet
Pythonpython
6 pages
How The Random Forest Algorithm Works in Machine Learning
No ratings yet
How The Random Forest Algorithm Works in Machine Learning
11 pages
Pa600 UpgradeManual v2.1 EFGIC
No ratings yet
Pa600 UpgradeManual v2.1 EFGIC
38 pages
ISP 39 - Joining Letter
No ratings yet
ISP 39 - Joining Letter
4 pages
Title: University of Northeastern Philippines
No ratings yet
Title: University of Northeastern Philippines
3 pages
Penjelasan Listing Program
No ratings yet
Penjelasan Listing Program
63 pages
Dynamic Rule-Based Tags
No ratings yet
Dynamic Rule-Based Tags
16 pages
Random Forest Classification
No ratings yet
Random Forest Classification
8 pages
Mini
No ratings yet
Mini
6 pages
RSHH Qam13 Module 01 PDF
No ratings yet
RSHH Qam13 Module 01 PDF
16 pages
Agglomerative Methods in Machine Learning
No ratings yet
Agglomerative Methods in Machine Learning
12 pages
Expectation-Maximization (EM) Algorithm With Example
No ratings yet
Expectation-Maximization (EM) Algorithm With Example
10 pages
Stacking To Improve Model Performance
No ratings yet
Stacking To Improve Model Performance
10 pages
Math Behind AdaBoost Algorithm in 3 Steps
No ratings yet
Math Behind AdaBoost Algorithm in 3 Steps
10 pages
What Is Bagging in Machine Learning and How To Perform Bagging
No ratings yet
What Is Bagging in Machine Learning and How To Perform Bagging
9 pages
Tech Reviewer Brief - 2.0
No ratings yet
Tech Reviewer Brief - 2.0
17 pages
What Is Bagging in Machine Learning
No ratings yet
What Is Bagging in Machine Learning
6 pages
Regression Answer
No ratings yet
Regression Answer
2 pages
Hacktricks-Cloud:basic-Github-Information - MD at Master Carlospolop:hacktricks-Cloud GitHub
No ratings yet
Hacktricks-Cloud:basic-Github-Information - MD at Master Carlospolop:hacktricks-Cloud GitHub
1 page
Assignment - I - STUDY - ABROAD - OPPORTUNITIES Housna Mounib
No ratings yet
Assignment - I - STUDY - ABROAD - OPPORTUNITIES Housna Mounib
6 pages
Steven P. Jobs Stephen G. Wozniak: Damage Hold Include Invite Make Clean Overtake Show Translate Write Build
No ratings yet
Steven P. Jobs Stephen G. Wozniak: Damage Hold Include Invite Make Clean Overtake Show Translate Write Build
2 pages
ASM Module 01: Creating Your Primary Product List, Part 1: This Lesson Covers
No ratings yet
ASM Module 01: Creating Your Primary Product List, Part 1: This Lesson Covers
4 pages
Exam DP-200: Implementing An Azure Data Solution - Skills Measured
No ratings yet
Exam DP-200: Implementing An Azure Data Solution - Skills Measured
2 pages
Maranatha Christian Academy: Senior High School Department
No ratings yet
Maranatha Christian Academy: Senior High School Department
2 pages

Naïve Bayes Classification in Python

Uploaded by

Naïve Bayes Classification in Python

Uploaded by

4/3/25, 6:24 PM Naïve Bayes Classification in Python | by Shuvrajyoti Debroy | Medium

Naïve Bayes Classification in Python

Listen Share More

Machine Learning Classification Algorithm

Background Image Source: Analytics Insight

Image Source: Techleer

Implement Naïve Bayes Classification in Python

Step 1: Import libraries

from sklearn.preprocessing import StandardScaler

Step 2: Import data

Step 3: Data Analysis / Preprocessing

EDA typically involves several key steps, including:

Data cleaning and preparation involve removing missing or incorrect values,

Statistical analysis involves applying mathematical and statistical methods to

Get required data

# Get required data

The correlation coefficient can range from -1 to 1, where:

A correlation coefficient of -1 indicates a strong negative relationship between

A correlation coefficient of 0 indicates no relationship between two variables

A correlation coefficient of 1 indicates a strong positive relationship between two

Drop insignificant data

# Drop Gender column

Step 4: Split data

Independent / Dependent variables

In our data Age, EstimatedSalary is the independent variable assigned as X, and

# Split data into dependent/independent variables

Train / Test split

Step 5: Feature scaling

There are two common methods of feature scaling: normalization and

Step 6: Train model

Step 7: Predict result / Score model

Step 8: Evaluate model

In a binary classification problem, accuracy is defined as the number of correct

A confusion matrix is used to evaluate the performance of a classification model. It

True Positives (TP): Correctly predicted positive instances.

# Plot Precision-Recall Curve

# Plot AUC/ROC curve

Prediction results on the training set

Prediction results on the test set

# Predict purchase with Age(45) and Salary(97000)

Full Code at GitHub

Data-Science/Bayes_Theorem.ipynb at main · shuv50/Data-

You might also like