0% found this document useful (0 votes)

17 views4 pages

ML Lab1 PGM

This document outlines a Python program that implements the Naive Bayes algorithm for classifying Iris flowers using the Iris dataset. It details the steps involved, including importing libraries, loading the dataset, splitting it into training and test sets, feature scaling, training the model, and evaluating its accuracy with a confusion matrix. The model achieved an accuracy of 96.67% in classifying the flowers based on their features.

Uploaded by

vishnun2811

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views4 pages

ML Lab1 PGM

Uploaded by

vishnun2811

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

1.

Write a Python program to load iris data set and apply Naïve-Bayes algorithm for
classification of Iris flowers.

Overview of Naive Bayes Classification:

Naive Bayes is one such algorithm in classification that can never be overlooked upon due to
its special characteristic of being “naive”. It makes the assumption that features of a
measurement are independent of each other.

For example, an animal may be considered as a cat if it has cat eyes, whiskers and a long tail.
Even if these features depend on each other or upon the existence of the other features, all of
these properties independently contribute to the probability that this animal is a cat and that is
why it is known as ‘Naive’.

According to Bayes Theorem, the various features are mutually independent. For two
independent events, P(A,B) = P(A)P(B). This assumption of Bayes Theorem is probably never
encountered in practice, hence it accounts for the “naive” part in Naive Bayes. Bayes’ Theorem
is stated as: P(a|b) = (P(b|a) * P(a)) / P(b). Where P(a|b) is the probability of a given b.
Let us understand this algorithm with a simple example. The Student will be a pass if he wears
a “red” color dress on the exam day. We can solve it using above discussed method of posterior
probability.

By Bayes Theorem, P(Pass| Red) = P( Red| Pass) * P(Pass) / P (Red).

From the values, let us assume P (Red|Pass) = 3/9 = 0.33, P(Red) = 5/14 = 0.36, P( Pass)= 9/14
= 0.64. Now, P (Pass| Red) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
In this way, Naive Bayes uses a similar method to predict the probability of different class
based on various attributes.

Problem Analysis:

To implement the Naive Bayes Classification, we shall use a very famous Iris Flower Dataset
that consists of 3 classes of flowers. In this, there are 4 independent variables namely
the, sepal_length, sepal_width, petal_length and petal_width. The dependent variable is
the species which we will predict using the four independent features of the flowers.

There are 3 classes of species namely setosa, versicolor and the virginica. This dataset was
originally introduced in 1936 by Ronald Fisher. Using the various features of the flower
(independent variables), we have to classify a given flower using Naive Bayes Classification
model.

Step 1: Importing the Libraries

As always, the first step will always include importing the libraries which are the NumPy,
Pandas and the Matplotlib.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Step 2: Importing the dataset

In this step, we shall import the Iris Flower dataset which is stored in my github repository
as IrisDataset.csv and save it to the variable dataset. After this, we assign the 4 independent
variables to X and the dependent variable ‘species’ to Y. The first 5 rows of the dataset are
displayed.

dataset = pd.read_csv('https://raw.githubusercontent.com/mk-
gurucharan/Classification/master/IrisDataset.csv')

X = dataset.iloc[:,:4].values
y = dataset['species'].values

dataset.head(5)

>>
sepal_length sepal_width petal_length petal_width species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa

Step 3: Splitting the dataset into the Training set and Test set

Once we have obtained our data set, we have to split the data into the training set and the test
set. In this data set, there are 150 rows with 50 rows of each of the 3 classes. As each class is
given in a continuous order, we need to randomly split the dataset. Here, we have
the test_size=0.2, which means that 20% of the dataset will be used for testing purpose as
the test set and the remaining 80% will be used as the training set for training the Naive Bayes
classification model.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size = 0.2)

Step 4: Feature Scaling

The dataset is scaled down to a smaller range using the Feature Scaling option. In this, both
the X_train and X_test values are scaled down to smaller values to improve the speed of the
program.

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Step 5: Training the Naive Bayes Classification model on the Training Set

In this step, we introduce the class GaussianNB that is used from

the sklearn.naive_bayes library. Here, we have used a Gaussian model, there are several other
models such as Bernoulli, Categorical and Multinomial. Here, we assign the GaussianNB class
to the variable classifier and fit the X_train and y_train values to it for training purpose.

from sklearn.naive_bayes import GaussianNB

classifier = GaussianNB()
classifier.fit(X_train, y_train)

Step 6: Predicting the Test set results

Once the model is trained, we use the the classifier.predict() to predict the values for the Test
set and the values predicted are stored to the variable y_pred.

y_pred = classifier.predict(X_test)
y_pred

Step 7: Confusion Matrix and Accuracy

This is a step that is mostly used in classification techniques. In this, we see the Accuracy of
the trained model and plot the confusion matrix.

The confusion matrix is a table that is used to show the number of correct and incorrect
predictions on a classification problem when the real values of the Test Set are known. It is of
the format

The True values are the number of correct predictions made.

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)

from sklearn.metrics import accuracy_score

print ("Accuracy : ", accuracy_score(y_test, y_pred))
cm

>>Accuracy : 0.9666666666666667

>>array([[14, 0, 0],
[ 0, 7, 0],
[ 0, 1, 8]])

From the above confusion matrix, we infer that, out of 30 test set data, 29 were correctly
classified and only 1 was incorrectly classified. This gives us a high accuracy of 96.67%.

Step 8: Comparing the Real Values with Predicted Values

In this step, a Pandas DataFrame is created to compare the classified values of both the original
Test set (y_test) and the predicted results (y_pred).

df = pd.DataFrame({'Real Values':y_test, 'Predicted

Values':y_pred})
df

>>
Real Values Predicted Values
setosa setosa
setosa setosa
virginica virginica
versicolor versicolor
setosa setosa
setosa setosa
... ... ... ... ...
virginica versicolor
virginica virginica
setosa setosa
setosa setosa
versicolor versicolor
versicolor versicolor

This step is an additional step which is not much informative as the Confusion matrix and is
mainly used in regression to check the accuracy of the predicted value.
As you can see, there is one incorrect prediction that has predicted versicolor instead
of virginica.

Conclusion

Thus in this story, we have successfully been able to build a Naive Bayes Classification Model
that is able to classify a flower depending upon 4 characteristic features. This model can be
implemented and tested with several other classification datasets that are available on the net.

ML Life Cycle
No ratings yet
ML Life Cycle
4 pages
Solutions Manual To Advanced Regression Models With SAS and R 1st Edition Olga Korosteleva Instant Download
100% (1)
Solutions Manual To Advanced Regression Models With SAS and R 1st Edition Olga Korosteleva Instant Download
59 pages
1988 The Measurement of End-User Computing Satisfaction PDF
No ratings yet
1988 The Measurement of End-User Computing Satisfaction PDF
17 pages
Naive Bayes
No ratings yet
Naive Bayes
11 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
University of Mumbai: Second Year Semester III and IV
No ratings yet
University of Mumbai: Second Year Semester III and IV
10 pages
Big Data Analytics: By: Syed Nawaz Pasha at SR Univeristy Professional Elective-5 B.Tech Iv-Ii Sem
100% (1)
Big Data Analytics: By: Syed Nawaz Pasha at SR Univeristy Professional Elective-5 B.Tech Iv-Ii Sem
31 pages
Cha 5
50% (4)
Cha 5
9 pages
Momentum Bias Index (AlgoAlpha) @fxsignalspot
No ratings yet
Momentum Bias Index (AlgoAlpha) @fxsignalspot
2 pages
Unit-2 Feature Selection
No ratings yet
Unit-2 Feature Selection
92 pages
ML New Record
No ratings yet
ML New Record
51 pages
JAYESH BANSAL - FinalProjectReport - Jayesh Bansal
No ratings yet
JAYESH BANSAL - FinalProjectReport - Jayesh Bansal
38 pages
Naive Bayes Classifier in Machine Learning Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning Javatpoint
23 pages
Ame: Waqar Ali
No ratings yet
Ame: Waqar Ali
22 pages
Unit 2 ML
No ratings yet
Unit 2 ML
93 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
ML Lecture 10 Project
No ratings yet
ML Lecture 10 Project
20 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
Chapter 3
71% (7)
Chapter 3
4 pages
Quantitative AnalysisJD
No ratings yet
Quantitative AnalysisJD
64 pages
Iris Flower Classification Final
No ratings yet
Iris Flower Classification Final
15 pages
Wa0001
No ratings yet
Wa0001
39 pages
Bhavika Bhatia MBA2C
No ratings yet
Bhavika Bhatia MBA2C
49 pages
Data Analytics Syllabus Germany Compressed
No ratings yet
Data Analytics Syllabus Germany Compressed
19 pages
Unit 2 AAM
No ratings yet
Unit 2 AAM
32 pages
Stats 3
No ratings yet
Stats 3
46 pages
20IT611 Mini Project - II Dermify
No ratings yet
20IT611 Mini Project - II Dermify
33 pages
ML LAB Rec
No ratings yet
ML LAB Rec
9 pages
ML Python Exercises UOM BDS Classification
No ratings yet
ML Python Exercises UOM BDS Classification
18 pages
Section F - Group 12 ODC Kingfisher Airlines
No ratings yet
Section F - Group 12 ODC Kingfisher Airlines
33 pages
Purva Rawale - BDA Practical No 2
No ratings yet
Purva Rawale - BDA Practical No 2
9 pages
Understanding-Code-for A-Classifier
No ratings yet
Understanding-Code-for A-Classifier
15 pages
Assignment 4 R Program1
No ratings yet
Assignment 4 R Program1
11 pages
PSY417 Week12
No ratings yet
PSY417 Week12
34 pages
Practical File DL
No ratings yet
Practical File DL
14 pages
Practical 3
No ratings yet
Practical 3
11 pages
2 Machine Learning
No ratings yet
2 Machine Learning
21 pages
Remaining ML Program
No ratings yet
Remaining ML Program
12 pages
DWM Module 1 (1.1)
No ratings yet
DWM Module 1 (1.1)
11 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
8 pages
Presentation Regression
No ratings yet
Presentation Regression
12 pages
3 Classification
No ratings yet
3 Classification
16 pages
Fo DS
No ratings yet
Fo DS
9 pages
ML File
No ratings yet
ML File
17 pages
Naive Bates Classifier
No ratings yet
Naive Bates Classifier
18 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
11 pages
Machine Learning: Lecture 7: Create Your First Project
No ratings yet
Machine Learning: Lecture 7: Create Your First Project
17 pages
Husen Methodology
No ratings yet
Husen Methodology
7 pages
Naive Bayes Numericals
No ratings yet
Naive Bayes Numericals
9 pages
Tableau Financial Data Analysis
No ratings yet
Tableau Financial Data Analysis
3 pages
Iris Classification
No ratings yet
Iris Classification
6 pages
R Course - Part7 ML - Exercise Sheet 2024
No ratings yet
R Course - Part7 ML - Exercise Sheet 2024
8 pages
Naïve Bayes
No ratings yet
Naïve Bayes
11 pages
Naive Bayes Classifier 066
No ratings yet
Naive Bayes Classifier 066
14 pages
Outlier Detection and Capping
No ratings yet
Outlier Detection and Capping
7 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
11 pages
Budget of Work STAT Q2 1
No ratings yet
Budget of Work STAT Q2 1
7 pages
4c Sklearn-Classification-Regression-Bkhw-Spring 2019
No ratings yet
4c Sklearn-Classification-Regression-Bkhw-Spring 2019
20 pages
Exp 3 Bi
No ratings yet
Exp 3 Bi
12 pages
Naive Bayes Classification Numerical Example With Code
No ratings yet
Naive Bayes Classification Numerical Example With Code
8 pages
Exp 3 Bi 30
No ratings yet
Exp 3 Bi 30
7 pages
Bagging, Random Forest, Gradient Boost, AdaBoost & PCA
No ratings yet
Bagging, Random Forest, Gradient Boost, AdaBoost & PCA
8 pages
Dsbda 10
No ratings yet
Dsbda 10
5 pages
BANA6037-Data Visualization-18FS 001 and 003
No ratings yet
BANA6037-Data Visualization-18FS 001 and 003
10 pages
Practical-3 Ritesh
No ratings yet
Practical-3 Ritesh
5 pages
Institutional Capability of Cooperating Schools of The DMMMSU Teacher Education Program: An Analysis
No ratings yet
Institutional Capability of Cooperating Schools of The DMMMSU Teacher Education Program: An Analysis
14 pages
Data Analytics III
No ratings yet
Data Analytics III
5 pages
Tutorial 6
No ratings yet
Tutorial 6
8 pages
Lab 6
No ratings yet
Lab 6
4 pages
07 Naive - Bayes
No ratings yet
07 Naive - Bayes
7 pages
Group 12 Data Analytics
No ratings yet
Group 12 Data Analytics
5 pages
ML Lab 4
No ratings yet
ML Lab 4
6 pages
Kritika Sejwal - 24MCI10023 - ML Lab - Worksheet 4
No ratings yet
Kritika Sejwal - 24MCI10023 - ML Lab - Worksheet 4
4 pages
Research On Variogram Theory and Its Application in Property Modeling
No ratings yet
Research On Variogram Theory and Its Application in Property Modeling
7 pages
ML Lab
No ratings yet
ML Lab
7 pages
Document
No ratings yet
Document
4 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
4 pages
EC 272 Set 8 and Answers
No ratings yet
EC 272 Set 8 and Answers
4 pages
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
DS6BAYES
No ratings yet
DS6BAYES
2 pages
Prac4 AAM
No ratings yet
Prac4 AAM
2 pages
NaiveBayesClassifier - Jupyter Notebook
No ratings yet
NaiveBayesClassifier - Jupyter Notebook
2 pages
178 hw1
No ratings yet
178 hw1
4 pages
What Is Descriptive Analytics
No ratings yet
What Is Descriptive Analytics
4 pages
Prac4 AAM
No ratings yet
Prac4 AAM
2 pages
Part A Assignment 6
No ratings yet
Part A Assignment 6
2 pages
Nishi's Resume-3
No ratings yet
Nishi's Resume-3
1 page
Correlation and Regression Corrected
No ratings yet
Correlation and Regression Corrected
2 pages

ML Lab1 PGM

Uploaded by

ML Lab1 PGM

Uploaded by

1.

Overview of Naive Bayes Classification:

By Bayes Theorem, P(Pass| Red) = P( Red| Pass) * P(Pass) / P (Red).

Step 1: Importing the Libraries

Step 2: Importing the dataset

from sklearn.model_selection import train_test_split

Step 4: Feature Scaling

from sklearn.preprocessing import StandardScaler

In this step, we introduce the class GaussianNB that is used from

from sklearn.naive_bayes import GaussianNB

Step 6: Predicting the Test set results

Step 7: Confusion Matrix and Accuracy

The True values are the number of correct predictions made.

from sklearn.metrics import confusion_matrix

from sklearn.metrics import accuracy_score

Step 8: Comparing the Real Values with Predicted Values

df = pd.DataFrame({'Real Values':y_test, 'Predicted

You might also like