0% found this document useful (0 votes)
11 views6 pages

PR 6

The document outlines a practical assignment using Python for data analysis on the Iris dataset, including data loading, preprocessing, and model training with Gaussian Naive Bayes. It details the steps taken to split the data, fit the model, and evaluate its performance using confusion matrices and classification reports. The results indicate perfect accuracy and performance metrics for the model.

Uploaded by

omkarmagdum818
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

PR 6

The document outlines a practical assignment using Python for data analysis on the Iris dataset, including data loading, preprocessing, and model training with Gaussian Naive Bayes. It details the steps taken to split the data, fit the model, and evaluate its performance using confusion matrices and classification reports. The results indicate perfect accuracy and performance metrics for the model.

Uploaded by

omkarmagdum818
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Name : omkar magdum

Roll No : COT53

DSBDA Practical 6

In [1]: import numpy as np import pandas as pd from sklearn.model_selection import


train_test_split from sklearn.naive_bayes import GaussianNB import
matplotlib.pyplot as plt import seaborn as sns from sklearn.metrics import
confusion_matrix,ConfusionMatrixDisplay,classificat from sklearn.preprocessing
import LabelEncoder

In [2]:
data = pd.read_csv("https://raw.githubusercontent.com/venky14/Machine-Learning-
In [3]:

Out[3]:
data

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

Iris-
0 1 5.1 3.5 1.4 0.2
setosa
Irissetosa

2
1 4.9 3.0 1.4 0.2
Irissetosa

3
2 4.7 3.2 1.3 0.2

Irissetosa
3 4 4.6 3.1 1.5 0.2

Irissetosa
4 5 5.0 3.6 1.4 0.2

... ... ... ... ... ... ...

Irisvirginica
145 146 6.7 3.0 5.2 2.3

Irisvirginica

localhost:8888/doc/tree/Downloads/Assignment 6 (1).ipynb? 1/6


146 147 6.3 2.5 5.0 1.9

Irisvirginica
147 148 6.5 3.0 5.2 2.0

Irisvirginica
148 149 6.2 3.4 5.4 2.3

Irisvirginica 149 150 5.9 3.0 5.1 1.8

150 rows × 6 columns

In [4]: data.head(5)

Out[4]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

In [5]: data.tail()

Out[5]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

Iris-
145 146 6.7 3.0 5.2 2.3
virginica
Irisvirginica

146 147 6.3 2.5 5.0 1.9


Irisvirginica

147 148 6.5 3.0 5.2 2.0


Irisvirginica

148 149 6.2 3.4 5.4 2.3

Irisvirginica
149 150 5.9 3.0 5.1 1.8

In [6]:

localhost:8888/doc/tree/Downloads/Assignment 6 (1).ipynb? 2/6


Out[6]: data.describe(include = 'all')
Spe

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm

uniqu Na Na
NaN NaN NaN
e N N

to Na Na NaN
p N NaN N NaN
fre Na Na
NaN NaN NaN
q N N

75.50000 5.8433 3.7586


mean 3.054000 1.198667
0 33 67

st 43.44536 0.8280 1.7644


0.433594 0.763161
d 8 66 20

mi 4.3000 1.0000
1.000000 2.000000 0.100000
n 00 00

38.25000 5.1000 1.6000


25% 2.800000 0.300000
0 00 00

75.50000 5.8000 4.3500


50% 3.000000 1.300000
0 00 00

112.75000 6.4000 5.1000


75% 3.300000 1.800000
0 00 00

150.00000 7.9000 6.9000


max 4.400000 2.500000
0 00 00

count 150.000000 150.000000 150.000000 150.000000 150.000000

se

localhost:8888/doc/tree/Downloads/Assignment 6 (1).ipynb? 3/6


data
In [7]: . info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149 Data
columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 150 non-null int64
1 SepalLengthCm 150 non-null float64
2 SepalWidthCm 150 non-null float64
3 PetalLengthCm 150 non-null float64
4 PetalWidthCm 150 non-null float64 5 Species 150 non-null
object dtypes: float64(4), int64(1), object(1) memory usage: 7.2+ KB

In [8]: print(data.shape) data['Species'].unique()

(150, 6)
Out[8]: array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object) In
[9]: data.isnull().sum()
Out[9]: Id 0 SepalLengthCm
0
SepalWidthCm 0
PetalLengthCm 0
PetalWidthCm 0
Species 0 dtype:
int64

x = data.iloc[:,1:5]
y = data.iloc[:,5:]

In [10]:

In [11]:
encode = LabelEncoder() y =
encode.fit_transform(y)

C:\Users\coeco\anaconda3\lib\site-packages\sklearn\preprocessing\_label.py:115:
D ataConversionWarning: A column-vector y was passed when a 1d array was
expected.
Please change the shape of y to (n_samples, ), for example using ravel(). y =
column_or_1d(y, warn=True)

In [12]: x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.3,random_stat

In [13]: naive_bayes = GaussianNB()


naive_bayes.fit(x_train,y_train)
pred = naive_bayes.predict(x_test)

In [14]: pred

Out[14]: array([2, 1, 0, 2, 0, 2, 0, 1, 1, 1, 2, 1, 1, 1, 1, 0, 1, 1, 0, 0, 2, 1,
0, 0, 2, 0, 0, 1, 1, 0, 2, 1, 0, 2, 2, 1, 0, 1, 1, 1, 2, 0, 2, 0,
0])

In [15]: y_test

localhost:8888/doc/tree/Downloads/Assignment 6 (1).ipynb? 4/6


Out[15]: array([2, 1, 0, 2, 0, 2, 0, 1, 1, 1, 2, 1, 1, 1, 1, 0, 1, 1, 0, 0, 2, 1,
0, 0, 2, 0, 0, 1, 1, 0, 2, 1, 0, 2, 2, 1, 0, 1, 1, 1, 2, 0, 2, 0,

matrix = confusion_matrix(y_test,pred,labels = naive_bayes.classes_)


print(matrix) tp, fn, fp, tn =
confusion_matrix(y_test,pred,labels=[1,0]).reshape(-1)

0])
In [16]:

[[16 0 0]
[ 0 18 0]
[ 0 0 11]]

In [17]:v conf_matrix = ConfusionMatrixDisplay(confusion_matrix=matrix,display_labels=nai


conf_matrix.plot(cmap=plt.cm.YlGn) plt.show()

In [18]: print(classification_report(y_test,pred))

precision recall f1-score


support

0 1.00 1.00 1.00 16


1 1.00 1.00 1.00 18 2 1.00 1.00
1.00 11

accuracy 1.00 45

localhost:8888/doc/tree/Downloads/Assignment 6 (1).ipynb? 5/6


macro avg 1.00 1.00 1.00 45 weighted
avg 1.00 1.00 1.00 45

In [19]: print('\nAccuracy: {:.2f}'.format(accuracy_score(y_test,pred)))


print('Error Rate: ',(fp+fn)/(tp+tn+fn+fp)) print('Sensitivity
(Recall or True positive rate) :',tp/(tp+fn)) print('Specificity
(True negative rate) :',tn/(fp+tn)) print('Precision (Positive
predictive value) :',tp/(tp+fp)) print('False Positive Rate
:',fp/(tn+fp))

Accuracy: 1.00
Error Rate: 0.0
Sensitivity (Recall or True positive rate) : 1.0
Specificity (True negative rate) : 1.0
Precision (Positive predictive value) : 1.0
False Positive Rate : 0.0

In [ ]:

In [ ]:

localhost:8888/doc/tree/Downloads/Assignment 6 (1).ipynb? 6/6

You might also like