0% found this document useful (0 votes)
51 views66 pages

ML Lab Manual 7th CSE

Uploaded by

maulisingh08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views66 pages

ML Lab Manual 7th CSE

Uploaded by

maulisingh08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Machine Learning Lab Manual (B.Tech.

7th Sem CSE)

CHOUKSEY ENGINEERING COLLEGE, BILASPUR (CG)


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

B.Tech. 7th Semester CSE

LAB MANUAL

Subject:
Machine Learning Lab

Course Code:
D022721(022)

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

CHOUKSEY ENGINEERING COLLEGE, BILASPUR


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

LIST OF EXPERIMENTS
Program / Semester: B.Tech 7th Branch: Computer Science &
Engineering
Subject: Machine Learning Lab Course Code: D022721(022)

Sr. No. Practicals Associated


CO
1 Write programs to understand the use of Matplotlib for Simple Interactive Chart, CO1
Set the Properties of the Plot, matplotlib and NumPy.
2 Write programs to understand the use of Matplotlib for Working with Multiple CO1
Figures and Axes, Adding Text, Adding a Grid and Adding a Legend.
3 Write programs to understand the use of Matplotlib for Working with Line CO1
Chart, Histogram, Bar Chart, Pie Charts.
4 Write a program in Python to implement Linear Regression for house price CO2
prediction.
5 Write a program in Python to implement K Nearest Neighbor classifier for CO2
diabetes classification.
6 Build a Naive Bayes model in Python to tackle a spam classification problem. CO2

7 Write a Python code to tackle a multi-class classification problem where the CO2
challenge is to classify wine into three types using Decision Tree.
8 Write a program in Python to implement Support Vector Machine for diabetes CO2
classification.
9 Demonstrate the application of Artificial Neural Network using Python. CO3,CO4

Course Outcomes

1. Apply Numpy along with Matplotlib for visual analysis of data.

2. Apply Supervised Learning models for problem solving.

3. Apply Un-Supervised Learning models for problem solving.

4. Apply Artificial Neural Network for problem solving.

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Practical 1: Write programs to understand the use of Matplotlib for


Simple Interactive Chart, Set the Properties of the Plot, matplotlib and
NumPy.
Creating a Simple Plot
import Matplotlib‘s Pyplot module and Numpy library as most of the data that we will be
working with will be in the form of arrays only.
# Importing Libraries
import matplotlib.pyplot as plt
import numpy as np

# plotting the points


plt.plot([1, 2, 3, 4], [1, 4, 9, 16])

# giving chart title


plt.title('Plot Title')

# giving x and y axis title


plt.xlabel('x axis label')
plt.ylabel('y axis label')

# to show Grid
plt.grid(True)

# function to show the plot


plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

We can also specify the size of the figure using method figure() and passing the values as a tuple
of the length of rows and columns to the argument figsize
import matplotlib.pyplot as plt
import numpy as np

# Creating a new figure with width = 15 inches and height = 5 inches


plt.figure(figsize=(15,5))

# plotting the points


plt.plot([1, 2, 3, 4], [1, 4, 9, 16])

# To show plot
plt.show()

plt.plot([1, 2, 3, 4], [1, 4, 9, 16],'go') # go means green circles


plt.title('Plot Title') # chart title
plt.xlabel('x axis label') # x axis title
plt.ylabel('y axis label') # y axis title
plt.grid(True)
plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Basic functions (or methods) in matplotlib:


plot() - it creates the plot at the background of computer, it doesn’t displays it. We can also add a label as it’s
argument that by what name we will call this plot – utilized in legend()

show() - it displays the created plots

xlabel() - it labels the x-axis

ylabel() - it labels the y-axis

title() - it gives the title to the graph

xticks() - it decides how the markings are to be made on the x-axis

yticks() - it decides how the markings are to be made on the y-axis

annotate() - it is use to write comments on the graph at the specified position

figure(figsize = (x, y)) - whenever we want the result to be displayed in a separate window we use this command,
and figsize argument decides what will be the initial size of the window that will be displayed after the run

subplot(r, c, i) - it is used to create multiple plots in the same figure with r signifies the no of rows in the figure, c
signifies no of columns in a figure and i specifies the positioning of the particular plot

suptitle("string") — It adds a common title to the figure specified by the string

subplots(nrows, ncols, figsize) — a convenient way to create subplots, in a single call. It returns a tuple of a figure
and number of axes.

set_xticks - it is used to set the range and the step size of the markings on x – axis in a subplot

set_yticks - it is used to set the range and the step size of the markings on y – axis in a subplot

bar(categorical variables, values, color) — used to create vertical bar graphs

barh(categorical variables, values, color) — used to create horizontal bar graphs

legend(loc) — used to make legend of the graph

xticks(index, categorical variables) — Get or set the current tick locations and labels of the x-axis

pie(value, categorical variables) — used to create a pie chart

hist(values, number of bins) — used to create a histogram

xlim(start value, end value) — used to set the limit of values of the x-axis

ylim(start value, end value) — used to set the limit of values of the y-axis

scatter(x-axis values, y-axis values) — plots a scatter plot with x-axis values against y-axis values

axes() — adds an axes to the current figure

set_xlabel("string") — axes level method used to set the x-label of the plot specified as a string

set_ylabel("string") — axes level method used to set the y-label of the plot specified as a string

scatter3D(x-axis values, y-axis values) — plots a three-dimensional scatter plot with x-axis values against y-axis
values

plot3D(x-axis values, y-axis values) — plots a three-dimensional line graph with x-axis values against y-axis values

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Practical 2: Write programs to understand the use of Matplotlib for


Working with Multiple Figures and Axes, Adding Text, Adding a Grid and
Adding a Legend.
Working with multiple figures
Matplotlib take care of the creation of inbuilt defaults like Figure and Axes.
• Figure: This class is the top-level container for all the plots means it is the overall window or page
on which everything is drawn. A figure object can be considered as a box-like container that can
hold one or more axes.
• Axes: This class is the most basic and flexible component for creating sub-plots. You might confuse
axes as the plural of axis but it is an individual plot or graph. A given figure may contain many
axes but given axes can only be in one figure.

Plot multiple sets of data in the plot()

# Importing Libraries
import matplotlib.pyplot as plt
import numpy as np
# Plot multiple sets of data by passing in multiple sets of arguments of X and Y
axis in the plot()

x=np.arange(1,5)
y=x**3

plt.plot([1, 2, 3, 4], [1, 4, 9, 16],'go',x, y,'r^') # go means g-green o-


circles

plt.title('Plot Title') # chart title


plt.xlabel('x axis label') # x axis title
plt.ylabel('y axis label') # y axis title
plt.grid(True)
plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Multiple plots in one figure:


Method 1: using subplot() method
# Multiple plots in one figure
# If we want our sub-plots in two rows and single column, we can pass arguments
(2,1,1) and (2,1,2)

plt.subplot(1,2,1) # the figure has 1 row , 2 Columns, and this plot is the 1st
plot
plt.plot([1, 2, 3, 4], [1, 4, 9, 16],'go')
plt.title("1st Subplot (1,2,1)")

plt.subplot(1,2,2) # the figure has 1 row , 2 Columns, and this plot is the 2nd
plot
plt.plot(x, y,'r^')
plt.title("2nd Subplot (1,2,2)")

plt.suptitle("My subplots")
plt.show()

Note: Subplot() function have the following disadvantages –


Dept. of CSE Dr. Shanu K Rakesh
Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

• It does not allow adding multiple subplots at the same time.


• It deletes the preexisting plot of the figure.

# Multiple plots in one figure


# If we want our sub-plots in two rows and single column, we can pass arguments
(2,1,1) and (2,1,2)

plt.subplot(2,1,1) # the figure has 2 rows , 1 Column, and this plot is the 1st
plot
plt.plot([1, 2, 3, 4], [1, 4, 9, 16],'go')
plt.title("1st Subplot")
plt.grid()

plt.subplot(2,1,2) # the figure has 2 rows , 1 Column, and this plot is the 1st
plot
plt.plot(x, y,'r^')
plt.title("2nd Subplot")
plt.grid()
plt.suptitle("My subplots")
plt.show()

Method 2: Using Figure and axes and subplots() method


 subplots() provides solution for above subplot() problem
 subplots() function is used to create figure and multiple subplots at the same time. It is a wrapper
function to make it convenient to create common layouts of subplots, including the enclosing
figure object, in a single call.
 This function returns a figure (fig) and an Axes (ax) object or an array of Axes objects.

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

# Using Figure and ax using subplots() method


x=np.arange(1,5)
y=x**3

fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(8,8)) # a Figure object (fig)


and one Axes object (ax)

ax[0,1].plot([1, 2, 3, 4], [1, 4, 9, 16],'go')


ax[1,0].plot(x, y,'r^')

ax[0,0].set_title("(ax[0,0])")
ax[0,1].set_title("Squares (ax[0,1])")

ax[1,0].set_title("Cubes (ax[1,0])")
ax[1,1].set_title("(ax[1,1])")

Text(0.5, 1.0, '(ax[1,1])')

Adding text in the Plot


import matplotlib.pyplot as plt
import numpy as np

x = np.arange(-10, 10, 0.01)


y = x**2

#adding text inside the plot


plt.text(-5, 60, 'Parabola $Y = x^2$', fontsize = 22) # x,y,text, fontsize

plt.plot(x, y, c='g')

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

plt.xlabel("X-axis", fontsize = 15)


plt.ylabel("Y-axis",fontsize = 15)
plt.show()

Get the Figure and Axes all at once, adding text, legend and grid, save figure
import matplotlib.pyplot as plt
import numpy as np

# get the Figure and Axes all at once


fig, ax = plt.subplots(figsize=(8,6))

x1 = np.arange(0, 10, 0.1)


y1 = np.sin(x1)

x2 = np.arange(0, 10, 0.1)


y2 = np.cos(x2)

ax.plot(x1, y1, label='sin') #Label is required for setting legend


ax.plot(x2, y2, label='cos')

# Adding Text
ax.text(3.5, 0.9, 'Sin & Cos wave', fontsize = 15)

# X& y axis title


plt.xlabel('X-axis', fontsize = 15)
plt.ylabel('Y-axis', fontsize = 15)

# add grid, legend, title and save


ax.grid(True)
ax.legend(loc='best') # loc= best places the legend in best place

fig.suptitle('A Simple Multi Axis Plot')


fig.savefig('multiAxisPlot.png', dpi=125)
plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Practical 3: Write programs to understand the use of Matplotlib for


Working with Line Chart, Histogram, Bar Chart, Pie Charts.
Line Graph
Line Chart is used to represent a relationship between two data X and Y on a different axis. It is plotted
using the plot() function.
# Importing Libraries
import matplotlib.pyplot as plt
import numpy as np

# data to display on plots


x = [1, 2, 3, 4]
y = [1, 4, 9, 16]

# This will plot a simple line chart


# with elements of x as x axis and y
# as y axis
plt.plot(x, y)
plt.title("Line Chart")
plt.grid()

# Adding the legends


plt.legend(["Line"])
plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Bar chart
A bar plot or bar chart is a graph that represents the category of data with rectangular bars with lengths
and heights that is proportional to the values which they represent. The bar plots can be plotted
horizontally or vertically. A bar chart describes the comparisons between the discrete categories. It can
be created using the bar() method.
import matplotlib.pyplot as plt
# data to display on plots
x = [1, 2, 3, 4, 5, 6]
y = [2, 4, 9, 10, 6, 3]

# This will plot a simple bar chart


plt.bar(x, y)

# Title to the plot


plt.title("Bar Chart")
plt.xlabel("x-axix")
plt.ylabel("y-axix")

# Adding the legends


plt.legend(["bar"])
plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Histograms
A histogram is basically used to represent data in the form of some groups. It is a type of bar plot where
the X-axis represents the bin ranges while the Y-axis gives information about frequency. To create a
histogram the first step is to create a bin of the ranges, then distribute the whole range of the values into
a series of intervals, and count the values which fall into each of the intervals. Bins are clearly identified
as consecutive, non-overlapping intervals of variables. The hist() function is used to compute and create
histogram of x.
import matplotlib.pyplot as plt

# data to display on plots


x = [1, 2, 3, 4, 5, 6, 7, 4, 2, 2, 3, 5, 7, 4, 4, 6]
bins = [1, 2, 3, 4, 5, 6, 7]

# This will plot a simple histogram


plt.hist(x, bins, rwidth=0.8)

# Title to the plot


plt.title("Histogram")
plt.xlabel("Bins")
plt.ylabel("count")

# Adding the legends


plt.legend(["bar"])

plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Scatter Plot
Scatter plots are used to observe the relationship between variables and use dots to represent the
relationship between them. The scatter() method in the matplotlib library is used to draw a scatter plot.
import matplotlib.pyplot as plt
# data to display on plots
x = [3, 1, 3, 12, 2, 4, 4]
y = [3, 2, 1, 4, 5, 6, 7]

# This will plot a simple scatter chart


plt.scatter(x, y)

# Adding legend to the plot


plt.legend("A")

# Title to the plot


plt.title("Scatter chart")

plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Pie Chart
A Pie Chart is a circular statistical plot that can display only one series of data. The area of the chart is the
total percentage of the given data. The area of slices of the pie represents the percentage of the parts of
the data. The slices of pie are called wedges. The area of the wedge is determined by the length of the arc
of the wedge. It can be created using the pie() method.
import matplotlib.pyplot as plt

# data to display on plots


x = [1, 2, 3, 4]

# this will explode the 1st wedge


# i.e. will separate the 1st wedge from the chart
e = (0.1, 0, 0, 0)

# This will plot a simple pie chart


plt.pie(x, explode = e)

# Title to the plot


plt.title("Pie chart")
plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Practical–04: Write a program in Python to implement Linear


Regression for house price prediction.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv(r"https://gitlab.com/scilab/forge/rdataset/-/raw/master/csv/
MASS/Boston.csv?ref_type=heads&inline=false",on_bad_lines = "skip")

df = df[df.columns[1:]]
df

crim zn indus chas nox rm age dis rad tax \


0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222
.. ... ... ... ... ... ... ... ... ... ...
501 0.06263 0.0 11.93 0 0.573 6.593 69.1 2.4786 1 273
502 0.04527 0.0 11.93 0 0.573 6.120 76.7 2.2875 1 273
503 0.06076 0.0 11.93 0 0.573 6.976 91.0 2.1675 1 273
504 0.10959 0.0 11.93 0 0.573 6.794 89.3 2.3889 1 273
505 0.04741 0.0 11.93 0 0.573 6.030 80.8 2.5050 1 273

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

ptratio black lstat medv


0 15.3 396.90 4.98 24.0
1 17.8 396.90 9.14 21.6
2 17.8 392.83 4.03 34.7
3 18.7 394.63 2.94 33.4
4 18.7 396.90 5.33 36.2
.. ... ... ... ...
501 21.0 391.99 9.67 22.4
502 21.0 396.90 9.08 20.6
503 21.0 396.90 5.64 23.9
504 21.0 393.45 6.48 22.0
505 21.0 396.90 7.88 11.9

[506 rows x 14 columns]

# Data Description
 CRIM per capita crime rate by town
 ZN proportion of residential land zoned for lots over 25,000 sq.ft.
 INDUS proportion of non-retail business acres per town
 CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
 NOX nitric oxides concentration (parts per 10 million)
 RM average number of rooms per dwelling
 AGE proportion of owner-occupied units built prior to 1940
 DIS weighted distances to five Boston employment centres
 RAD index of accessibility to radial highways
 TAX full-value property-tax rate per 10,000usd
 PTRATIO pupil-teacher ratio by town
 B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
 LSTAT % lower status of the population
 MEDV Median value of owner-occupied homes in $1000s
df.rename(columns={"medv":"price"},inplace=True)
df
crim zn indus chas nox rm age dis rad tax \
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222
.. ... ... ... ... ... ... ... ... ... ...
501 0.06263 0.0 11.93 0 0.573 6.593 69.1 2.4786 1 273
502 0.04527 0.0 11.93 0 0.573 6.120 76.7 2.2875 1 273
503 0.06076 0.0 11.93 0 0.573 6.976 91.0 2.1675 1 273
504 0.10959 0.0 11.93 0 0.573 6.794 89.3 2.3889 1 273
505 0.04741 0.0 11.93 0 0.573 6.030 80.8 2.5050 1 273

ptratio black lstat price

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

0 15.3 396.90 4.98 24.0


1 17.8 396.90 9.14 21.6
2 17.8 392.83 4.03 34.7
3 18.7 394.63 2.94 33.4
4 18.7 396.90 5.33 36.2
.. ... ... ... ...
501 21.0 391.99 9.67 22.4
502 21.0 396.90 9.08 20.6
503 21.0 396.90 5.64 23.9
504 21.0 393.45 6.48 22.0
505 21.0 396.90 7.88 11.9

[506 rows x 14 columns]

# Information about the data


df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 crim 506 non-null float64
1 zn 506 non-null float64
2 indus 506 non-null float64
3 chas 506 non-null int64
4 nox 506 non-null float64
5 rm 506 non-null float64
6 age 506 non-null float64
7 dis 506 non-null float64
8 rad 506 non-null int64
9 tax 506 non-null int64
10 ptratio 506 non-null float64
11 black 506 non-null float64
12 lstat 506 non-null float64
13 price 506 non-null float64
dtypes: float64(11), int64(3)
memory usage: 55.5 KB

# Viewing the data statistics


df.describe()

crim zn indus chas nox rm \


count 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000
mean 3.613524 11.363636 11.136779 0.069170 0.554695 6.284634
std 8.601545 23.322453 6.860353 0.253994 0.115878 0.702617
min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000
25% 0.082045 0.000000 5.190000 0.000000 0.449000 5.885500
50% 0.256510 0.000000 9.690000 0.000000 0.538000 6.208500
75% 3.677083 12.500000 18.100000 0.000000 0.624000 6.623500
max 88.976200 100.000000 27.740000 1.000000 0.871000 8.780000

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

age dis rad tax ptratio black \


count 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000
mean 68.574901 3.795043 9.549407 408.237154 18.455534 356.674032
std 28.148861 2.105710 8.707259 168.537116 2.164946 91.294864
min 2.900000 1.129600 1.000000 187.000000 12.600000 0.320000
25% 45.025000 2.100175 4.000000 279.000000 17.400000 375.377500
50% 77.500000 3.207450 5.000000 330.000000 19.050000 391.440000
75% 94.075000 5.188425 24.000000 666.000000 20.200000 396.225000
max 100.000000 12.126500 24.000000 711.000000 22.000000 396.900000

lstat price
count 506.000000 506.000000
mean 12.653063 22.532806
std 7.141062 9.197104
min 1.730000 5.000000
25% 6.950000 17.025000
50% 11.360000 21.200000
75% 16.955000 25.000000
max 37.970000 50.000000

# Finding out the correlation between the features


corr = df.corr()
corr.shape

(14, 14)

# Plotting the heatmap of correlation between features


plt.figure(figsize=(16,16))
sns.heatmap(corr, cbar=True, square= True, fmt='.1f', annot=True,
annot_kws={'size':15})
plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

# Spliting target variable and independent variables


features_df = df[df.columns[:-1]]
target = df['price']
features_df.head()

crim zn indus chas nox rm age dis rad tax ptratio \


0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7

black lstat
0 396.90 4.98
1 396.90 9.14
2 392.83 4.03
3 394.63 2.94
4 396.90 5.33

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

# Splitting to training and testing data

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(features_df, target,
test_size = 0.3, random_state = 42)

print(f"The shape of X train datafame: {X_train.shape}")


print(f"The shape of X test datafame: {X_test.shape}")
print(f"The shape of y train datafame: {y_train.shape}")
print(f"The shape of y test datafame: {y_test.shape}")

The shape of X train datafame: (354, 13)


The shape of X test datafame: (152, 13)
The shape of y train datafame: (354,)
The shape of y test datafame: (152,)

Linear regression: Training the model


# Import library for Linear Regression
from sklearn.linear_model import LinearRegression

# Create a Linear regressor


lr = LinearRegression()
# Train the model using the training sets
lr.fit(X_train, y_train)
LinearRegression()

# Value of y intercept
lr.intercept_

31.631084035694286

#Converting the coefficient values to a dataframe


coeffcients = pd.DataFrame([X_train.columns,lr.coef_]).T
coeffcients = coeffcients.rename(columns={0: 'Attribute', 1: 'Coefficients'})
coeffcients

Attribute Coefficients
0 crim -0.13347
1 zn 0.035809
2 indus 0.049523
3 chas 3.119835
4 nox -15.417061
5 rm 4.057199
6 age -0.010821
7 dis -1.385998
8 rad 0.242727
9 tax -0.008702
10 ptratio -0.910685
11 black 0.011794
12 lstat -0.547113

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

# Model prediction on train data

y_pred = lr.predict(X_train)
y_pred.shape

(354,)

# Model Evaluation

from sklearn import metrics


print('R^2:',metrics.r2_score(y_train, y_pred))
print('Adjusted R^2:',1 - (1-metrics.r2_score(y_train,_pred))*(len(y_train)-
1)/(len(y_train)-X_train.shape[1]-1))
print('MAE:',metrics.mean_absolute_error(y_train, y_pred))
print('MSE:',metrics.mean_squared_error(y_train, y_pred))
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_train, y_pred)))

R^2: 0.7434997532004697
Adjusted R^2: 0.7336923908228405
MAE: 3.356826782168208
MSE: 22.545481487421423
RMSE: 4.748208239685937

Evaluating Linear Model Performance:

2
R : It is a measure of the linear relationship between X and Y. It is interpreted as the proportion
of the variance in the dependent variable that is predictable from the independent variable.

Adjusted R2 :The adjusted R-squared compares the explanatory power of regression models that
contain different numbers of predictors.

MAE : It is the mean of the absolute value of the errors. It measures the difference between two
continuous variables, here actual and predicted values of y.

MSE: The mean square error (MSE) is just like the MAE, but squares the difference before
summing them all instead of using the absolute value.

RMSE: The mean square error (MSE) is just like the MAE, but squares the difference before
summing them all instead of using the absolute value.

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

# Visualizing the differences between actual prices and predicted values

plt.scatter(y_train, y_pred)
plt.xlabel("Prices")
plt.ylabel("Predicted prices")
plt.title("Prices vs Predicted prices")
plt.show()

# Checking residuals

plt.scatter(y_pred,y_train-y_pred)
plt.title("Predicted vs residuals")
plt.xlabel("Predicted")
plt.ylabel("Residuals")
plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

There is no pattern visible in this plot and values are distributed equally around zero. So
Linearity assumption is satisfied.

# Checking Normality of errors


sns.distplot(y_train-y_pred)
plt.title("Histogram of Residuals")
plt.xlabel("Residuals")
plt.ylabel("Frequency")
plt.show()

C:\Users\Akshay\AppData\Local\Temp\ipykernel_12956\3326403628.py:2: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(y_train-y_pred)

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Here the residuals are normally distributed. So normality assumption is satisfied


For test data
# Predicting Test data with the model
y_test_pred = lr.predict(X_test)

# Model Evaluation
acc_linreg = metrics.r2_score(y_test, y_test_pred)
print('R^2:', acc_linreg)
print('Adjusted R^2:',1 - (1-metrics.r2_score(y_test,
y_test_pred))*(len(y_test)-1)/(len(y_test)-X_test.shape[1]-1))
print('MAE:',metrics.mean_absolute_error(y_test, y_test_pred))
print('MSE:',metrics.mean_squared_error(y_test, y_test_pred))
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_test, y_test_pred)))

R^2: 0.7112260057484948
Adjusted R^2: 0.6840226584639327
MAE: 3.1627098714573947
MSE: 21.51744423117709
RMSE: 4.638689926172808
2
• R : 0.7112260057484948
• Adjusted R R2: 0.6840226584639327
• MAE: 3.1627098714573947
• MSE: 21.51744423117709
• RMSE: 4.638689926172808

Here the model evaluations scores are almost matching with that of train data. So the model is
not overfitting.

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Practical – 05: Write a program in Python to implement K Nearest


Neighbor classifier for diabetes classification.
from mlxtend.plotting import plot_decision_regions
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

#plt.style.use('ggplot')
#ggplot is R based visualisation package that provides better graphics with
higher level of abstraction

#Loading the dataset


diabetes_data = pd.read_csv('diabetes.csv')

#Print the first 5 rows of the dataframe.


diabetes_data.head()

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \


0 6 148 72 35 0 33.6
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
4 0 137 40 35 168 43.1

DiabetesPedigreeFunction Age Outcome


0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1

Basic EDA and statistical analysis

## gives information about the data types,columns, null value counts, memory
usage etc

diabetes_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

--- ------ -------------- -----


0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB

## basic statistic details about the data (note only numerical columns would be
displayed here unless parameter include="all")

diabetes_data.describe()

Pregnancies Glucose BloodPressure SkinThickness Insulin \


count 768.000000 768.000000 768.000000 768.000000 768.000000
mean 3.845052 120.894531 69.105469 20.536458 79.799479
std 3.369578 31.972618 19.355807 15.952218 115.244002
min 0.000000 0.000000 0.000000 0.000000 0.000000
25% 1.000000 99.000000 62.000000 0.000000 0.000000
50% 3.000000 117.000000 72.000000 23.000000 30.500000
75% 6.000000 140.250000 80.000000 32.000000 127.250000
max 17.000000 199.000000 122.000000 99.000000 846.000000

BMI DiabetesPedigreeFunction Age Outcome


count 768.000000 768.000000 768.000000 768.000000
mean 31.992578 0.471876 33.240885 0.348958
std 7.884160 0.331329 11.760232 0.476951
min 0.000000 0.078000 21.000000 0.000000
25% 27.300000 0.243750 24.000000 0.000000
50% 32.000000 0.372500 29.000000 0.000000
75% 36.600000 0.626250 41.000000 1.000000
max 67.100000 2.420000 81.000000 1.000000

diabetes_data.describe().T

count mean std min 25% \


Pregnancies 768.0 3.845052 3.369578 0.000 1.00000
Glucose 768.0 120.894531 31.972618 0.000 99.00000
BloodPressure 768.0 69.105469 19.355807 0.000 62.00000
SkinThickness 768.0 20.536458 15.952218 0.000 0.00000
Insulin 768.0 79.799479 115.244002 0.000 0.00000
BMI 768.0 31.992578 7.884160 0.000 27.30000
DiabetesPedigreeFunction 768.0 0.471876 0.331329 0.078 0.24375
Age 768.0 33.240885 11.760232 21.000 24.00000

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Outcome 768.0 0.348958 0.476951 0.000 0.00000

50% 75% max


Pregnancies 3.0000 6.00000 17.00
Glucose 117.0000 140.25000 199.00
BloodPressure 72.0000 80.00000 122.00
SkinThickness 23.0000 32.00000 99.00
Insulin 30.5000 127.25000 846.00
BMI 32.0000 36.60000 67.10
DiabetesPedigreeFunction 0.3725 0.62625 2.42
Age 29.0000 41.00000 81.00
Outcome 0.0000 1.00000 1.00

It is better to replace zeros with nan since after that counting them would be easier and
zeros need to be replaced with suitable values

diabetes_data_copy = diabetes_data.copy(deep = True)


diabetes_data_copy[['Glucose','BloodPressure','SkinThickness','Insulin','BMI']]
=
diabetes_data_copy[['Glucose','BloodPressure','SkinThickness','Insulin','BMI']].
replace(0,np.NaN)

## showing the count of Nans

print(diabetes_data_copy.isnull().sum())

Pregnancies 0
Glucose 5
BloodPressure 35
SkinThickness 227
Insulin 374
BMI 11
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64

To fill these Nan values the data distribution needs to be understood

p = diabetes_data.hist(figsize = (20,20))

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Aiming to impute nan values for the columns in accordance with their distribution

diabetes_data_copy['Glucose'].fillna(diabetes_data_copy['Glucose'].mean(),
inplace = True)
diabetes_data_copy['BloodPressure'].fillna(diabetes_data_copy['BloodPressure'].m
ean(), inplace = True)
diabetes_data_copy['SkinThickness'].fillna(diabetes_data_copy['SkinThickness'].m
edian(), inplace = True)
diabetes_data_copy['Insulin'].fillna(diabetes_data_copy['Insulin'].median(),
inplace = True)
diabetes_data_copy['BMI'].fillna(diabetes_data_copy['BMI'].median(), inplace =
True)

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Plotting after Nan removal


p = diabetes_data_copy.hist(figsize = (20,20))

Skewness
A left-skewed distribution has a long left tail. Left-skewed distributions are also called negatively-
skewed distributions. That’s because there is a long tail in the negative direction on the number line.
The mean is also to the left of the peak.
A right-skewed distribution has a long right tail. Right-skewed distributions are also called positive-
skew distributions. That’s because there is a long tail in the positive direction on the number line. The
mean is also to the right of the peak.

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

## observing the shape of the data


diabetes_data.shape

(768, 9)

## data type analysis


#plt.figure(figsize=(5,5))
#sns.set(font_scale=2)
sns.countplot(y=diabetes_data.dtypes ,data=diabetes_data)
plt.xlabel("count of each data type")
plt.ylabel("data types")
plt.show()

## null count analysis


import missingno as msno
p=msno.bar(diabetes_data)

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

## checking the balance of the data by plotting the count of outcomes by their
value

color_wheel = {1: "#0392cf",


2: "#7bc043"}
colors = diabetes_data["Outcome"].map(lambda x: color_wheel.get(x + 1))
print(diabetes_data.Outcome.value_counts())
p=diabetes_data.Outcome.value_counts().plot(kind="bar")

Outcome
0 500
1 268
Name: count, dtype: int64

The above graph shows that the data is biased towards datapoints having outcome value as 0 where it
means that diabetes was not present actually. The number of non-diabetics is almost twice the number of
diabetic patients

# Scatter matrix of uncleaned data

from pandas.plotting import scatter_matrix


p=scatter_matrix(diabetes_data,figsize=(25, 25))

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

The pairs plot builds on two basic figures, the histogram and the scatter plot. The histogram on the diagonal
allows us to see the distribution of a single variable while the scatter plots on the upper and lower triangles
show the relationship (or lack thereof) between two variables.

#Pair plot for clean data

p=sns.pairplot(diabetes_data_copy, hue = 'Outcome')

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Pearson's Correlation Coefficient: helps you find out the relationship between two quantities. It gives
you the measure of the strength of association between two variables. The value of Pearson's
Correlation Coefficient can be between -1 to +1. 1 means that they are highly correlated and 0 means no
correlation.
A heat map is a two-dimensional representation of information with the help of colors. Heat maps can
help the user visualize simple or complex information.
#Heatmap for unclean data

plt.figure(figsize=(12,10)) # on this line I just set the size of figure to 12


by 10.
p=sns.heatmap(diabetes_data.corr(), annot=True,cmap ='RdYlGn') # seaborn has
very simple solution for heatmap

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

#Heatmap for clean data

plt.figure(figsize=(12,10)) # on this line I just set the size of figure to 12


by 10.
p=sns.heatmap(diabetes_data_copy.corr(), annot=True,cmap ='RdYlGn') # seaborn
has very simple solution for heatmap

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

from sklearn.preprocessing import StandardScaler


sc_X = StandardScaler()
X = pd.DataFrame(sc_X.fit_transform(diabetes_data_copy.drop(["Outcome"],axis =
1),),
columns=['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness',
'Insulin',
'BMI', 'DiabetesPedigreeFunction', 'Age'])

X.head()

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \


0 0.639947 0.865108 -0.033518 0.670643 -0.181541 0.166619
1 -0.844885 -1.206162 -0.529859 -0.012301 -0.181541 -0.852200
2 1.233880 2.015813 -0.695306 -0.012301 -0.181541 -1.332500
3 -0.844885 -1.074652 -0.529859 -0.695245 -0.540642 -0.633881
4 -1.141852 0.503458 -2.680669 0.670643 0.316566 1.549303

DiabetesPedigreeFunction Age
0 0.468492 1.425995
1 -0.365061 -0.190672
2 0.604397 -0.105584

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

3 -0.920763 -1.041549
4 5.484909 -0.020496

#X = diabetes_data.drop("Outcome",axis = 1)
y = diabetes_data_copy.Outcome

Why Scaling the data for KNN?

it is always advisable to bring all the features to the same scale for applying distance based algorithms like
KNN.

Let's see an example of distance calculation using two features whose magnitudes/ranges vary greatly.
Euclidean Distance = [(100000–80000)^2 + (30–25)^2]^(1/2)

We can imagine how the feature with greater range with overshadow or dimenish the smaller feature
completely and this will impact the performance of all distance based model as it will give higher weightage
to variables which have higher magnitude.

Test Train Split and Cross Validation methods


Train Test Split : To have unknown datapoints to test the data rather than testing with the same points
with which the model was trained. This helps capture the model performance much better.
Cross Validation: When model is split into training and testing it can be possible that specific type of
data point may go entirely into either training or testing portion. This would lead the model to perform
poorly. Hence over-fitting and underfitting problems can be well avoided with cross validation
techniques
About Stratify : Stratify parameter makes a split so that the proportion of values in the sample
produced will be the same as the proportion of values provided to parameter stratify.
For example, if variable y is a binary categorical variable with values 0 and 1 and there are 25% of zeros
and 75% of ones, stratify=y will make sure that your random split has 25% of 0's and 75% of 1's.
#importing train_test_split
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test =
train_test_split(X,y,test_size=1/3,random_state=42, stratify=y)

from sklearn.neighbors import KNeighborsClassifier

test_scores = []
train_scores = []

for i in range(1,15):

knn = KNeighborsClassifier(i)
knn.fit(X_train,y_train)

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

train_scores.append(knn.score(X_train,y_train))
test_scores.append(knn.score(X_test,y_test))

## score that comes from testing on the same datapoints that were used for
training

max_train_score = max(train_scores)
train_scores_ind = [i for i, v in enumerate(train_scores) if v ==
max_train_score]
print('Max train score {} % and k =
{}'.format(max_train_score*100,list(map(lambda x: x+1, train_scores_ind))))

Max train score 100.0 % and k = [1]

## score that comes from testing on the datapoints that were split in the
beginning to be used for testing solely

max_test_score = max(test_scores)
test_scores_ind = [i for i, v in enumerate(test_scores) if v == max_test_score]
print('Max test score {} % and k = {}'.format(max_test_score*100,list(map(lambda
x: x+1, test_scores_ind))))

Max test score 76.5625 % and k = [11]

Result Visualisation
plt.figure(figsize=(8,5))
p = sns.lineplot(train_scores,marker='*',label='Train Score')
p = sns.lineplot(test_scores,marker='o',label='Test Score')

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

The best result is captured at k = 11 hence 11 is used for the final model

#Setup a knn classifier with k neighbors

knn = KNeighborsClassifier(11)

knn.fit(X_train,y_train)
knn.score(X_test,y_test)

0.765625

## trying to plot decision boundary

value = 20000
width = 20000
plot_decision_regions(X.values, y.values, clf=knn, legend=2,
filler_feature_values={2: value, 3: value, 4: value, 5: value, 6: value, 7:
value},filler_feature_ranges={2: width, 3: width, 4: width, 5: width, 6: width,
7: width},X_highlight=X_test.values)

# Adding axes annotations


#plt.xlabel('sepal length [cm]')
#plt.ylabel('petal length [cm]')
plt.title('KNN with Diabetes Data')
plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Model Performance Analysis


1. Confusion Matrix
The confusion matrix is a technique used for summarizing the performance of a classification
algorithm i.e. it has binary outputs.
#import confusion_matrix
from sklearn.metrics import confusion_matrix
#let us get the predictions using the classifier we had fit above
y_pred = knn.predict(X_test)

confusion_matrix(y_test,y_pred)
pd.crosstab(y_test, y_pred, rownames=['True'], colnames=['Predicted'],
margins=True)

Predicted 0 1 All
True
0 142 25 167
1 35 54 89
All 177 79 256

y_pred = knn.predict(X_test)

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

from sklearn import metrics


cnf_matrix = metrics.confusion_matrix(y_test, y_pred)
p = sns.heatmap(pd.DataFrame(cnf_matrix), annot=True, cmap="YlGnBu" ,fmt='g')
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')

Text(0.5, 20.049999999999997, 'Predicted label')

2. Classification Report
Report which includes Precision, Recall and F1-Score.
#import classification_report
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))

3. ROC - AUC
ROC (Receiver Operating Characteristic) Curve tells us about how good the model can
distinguish between two things (e.g If a patient has a disease or no). Better models can
accurately distinguish between the two. Whereas, a poor model will have difficulties in
distinguishing between the two
Well Explained in this video: https://www.youtube.com/watch?v=OAl6eAyP-yo
from sklearn.metrics import roc_curve
y_pred_proba = knn.predict_proba(X_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

plt.plot([0,1],[0,1],'k--')
plt.plot(fpr,tpr, label='Knn')
plt.xlabel('fpr')
plt.ylabel('tpr')
plt.title('Knn(n_neighbors=11) ROC curve')
plt.show()

#Area under ROC curve


from sklearn.metrics import roc_auc_score
roc_auc_score(y_test,y_pred_proba)

Hyper Parameter optimization


Grid search is an approach to hyperparameter tuning that will methodically build and evaluate a
model for each combination of algorithm parameters specified in a grid.
#import GridSearchCV
from sklearn.model_selection import GridSearchCV
#In case of classifier like knn the parameter to be tuned is n_neighbors
param_grid = {'n_neighbors':np.arange(1,50)}
knn = KNeighborsClassifier()
knn_cv= GridSearchCV(knn,param_grid,cv=5)
knn_cv.fit(X,y)

print("Best Score:" + str(knn_cv.best_score_))


print("Best Parameters: " + str(knn_cv.best_params_))

Best Score:0.7721840251252015
Best Parameters: {'n_neighbors': 25}

Practical – 06: Build a Naive Bayes model in Python to tackle a


spam classification problem.

#Loading the dataset


import pandas as pd
df = pd.read_csv('sms_spam.csv')
df

type text
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup fina...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
... ... ...
5569 spam This is the 2nd time we have tried 2 contact u...

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

5570 ham Will ü b going to esplanade fr home?


5571 ham Pity, * was in mood for that. So...any other s...
5572 ham The guy did some bitching but I acted like i'd...
5573 ham Rofl. Its true to its name

[5574 rows x 2 columns]

df['type'].value_counts()

ham 4827
spam 747
Name: type, dtype: int64

# assigning labels to the dataset, 0 for ham and 1 for spam


df['label_num'] = df['type'].apply(lambda x: 1 if x == 'spam' else 0)

#splitting the dataset

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(df['text'], df['label_num'],
test_size=0.3, random_state=42)

print("train set:", X_train.shape) # rows in train set


print("test set:", X_test.shape) # rows in test set

train set: (3901,)


test set: (1673,)

from sklearn.feature_extraction.text import TfidfVectorizer


from sklearn.naive_bayes import MultinomialNB

# Converting X_train to a list


lst = X_train.tolist()

# Applying Tf-IDF Vectorization


vectorizer = TfidfVectorizer(input = lst, lowercase = True, stop_words =
"english")
train_transformed = vectorizer.fit_transform(X_train)
test_transformed = vectorizer.transform(X_test)

# Fit the transformed train data to the model.


model = MultinomialNB()
model.fit(train_transformed, y_train)

MultinomialNB()

prediction = model.predict(test_transformed)
actual = y_test

print("Prediction:", list(prediction))
print("Actual: ",list(actual))

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Prediction: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0,
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,
0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0,
0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0]
Actual: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0,
1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0,
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,
0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0,
0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1,
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0]

Evaluating the Model


from sklearn.metrics import confusion_matrix

matrix = confusion_matrix(prediction, actual)


matrix

array([[1447, 45],
[ 0, 181]], dtype=int64)

precision = matrix[1][1]/(matrix[1][1]+matrix[0][1])
recall = matrix[1][1]/(matrix[1][1]+matrix[1][0])
f1score = matrix[1][1]/(matrix[1][1]+(matrix[1][0]+(matrix[0][1]/2)))

print("precision score:", precision)


print("recall score:", recall)
print("f1_score:", f1score)

precision score: 0.8008849557522124


recall score: 1.0
f1_score: 0.8894348894348895

Let's predict some real messages. Here are some messages that I received in the past.
messages = ["Congragulations! You have won a $10,000. Go to https://bit.ly/23343
to claim now.",

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

"Get $10 Amazon Gift Voucher on Completing the Demo:- va.pcb3.in/


click this link to claim now",
"You have won a $500. Please register your account today itself to
claim now https://imp.com",
"Please dont respond to missed calls from unknown international
numbers Call/ SMS on winning prize. lottery as this may be fraudulent call."
]

message_transformed = vectorizer.transform(messages)
new_prediction = model.predict(message_transformed)
for i in range(len(new_prediction)):
if new_prediction[i] == 0:
print("Ham.")
else:
print("Spam.")

Spam.
Spam.
Spam.
Ham.

Practical - 07

Write a Python code to tackle a multi-class classification


problem where the challenge is to classify wine into three types
using Decision Tree.

# Import necessary libraries


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load the wine dataset


wine = datasets.load_wine()
# Create a DataFrame from the features and target
wine_df = pd.DataFrame(data=wine.data, columns=wine.feature_names)
wine_df

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols \


0 14.23 1.71 2.43 15.6 127.0 2.80
1 13.20 1.78 2.14 11.2 100.0 2.65
2 13.16 2.36 2.67 18.6 101.0 2.80
3 14.37 1.95 2.50 16.8 113.0 3.85
4 13.24 2.59 2.87 21.0 118.0 2.80
.. ... ... ... ... ... ...
173 13.71 5.65 2.45 20.5 95.0 1.68
174 13.40 3.91 2.48 23.0 102.0 1.80
175 13.27 4.28 2.26 20.0 120.0 1.59
176 13.17 2.59 2.37 20.0 120.0 1.65
177 14.13 4.10 2.74 24.5 96.0 2.05

flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue \


0 3.06 0.28 2.29 5.64 1.04
1 2.76 0.26 1.28 4.38 1.05
2 3.24 0.30 2.81 5.68 1.03
3 3.49 0.24 2.18 7.80 0.86
4 2.69 0.39 1.82 4.32 1.04
.. ... ... ... ... ...
173 0.61 0.52 1.06 7.70 0.64
174 0.75 0.43 1.41 7.30 0.70
175 0.69 0.43 1.35 10.20 0.59
176 0.68 0.53 1.46 9.30 0.60
177 0.76 0.56 1.35 9.20 0.61

od280/od315_of_diluted_wines proline
0 3.92 1065.0
1 3.40 1050.0
2 3.17 1185.0
3 3.45 1480.0
4 2.93 735.0
.. ... ...
173 1.74 740.0
174 1.56 750.0
175 1.56 835.0
176 1.62 840.0
177 1.60 560.0

[178 rows x 13 columns]

wine_df['target'] = wine.target
wine_df

alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols \


0 14.23 1.71 2.43 15.6 127.0 2.80
1 13.20 1.78 2.14 11.2 100.0 2.65
2 13.16 2.36 2.67 18.6 101.0 2.80
3 14.37 1.95 2.50 16.8 113.0 3.85
4 13.24 2.59 2.87 21.0 118.0 2.80
.. ... ... ... ... ... ...
Dept. of CSE Dr. Shanu K Rakesh
Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

173 13.71 5.65 2.45 20.5 95.0 1.68


174 13.40 3.91 2.48 23.0 102.0 1.80
175 13.27 4.28 2.26 20.0 120.0 1.59
176 13.17 2.59 2.37 20.0 120.0 1.65
177 14.13 4.10 2.74 24.5 96.0 2.05

flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue \


0 3.06 0.28 2.29 5.64 1.04
1 2.76 0.26 1.28 4.38 1.05
2 3.24 0.30 2.81 5.68 1.03
3 3.49 0.24 2.18 7.80 0.86
4 2.69 0.39 1.82 4.32 1.04
.. ... ... ... ... ...
173 0.61 0.52 1.06 7.70 0.64
174 0.75 0.43 1.41 7.30 0.70
175 0.69 0.43 1.35 10.20 0.59
176 0.68 0.53 1.46 9.30 0.60
177 0.76 0.56 1.35 9.20 0.61

od280/od315_of_diluted_wines proline target


0 3.92 1065.0 0
1 3.40 1050.0 0
2 3.17 1185.0 0
3 3.45 1480.0 0
4 2.93 735.0 0
.. ... ... ...
173 1.74 740.0 2
174 1.56 750.0 2
175 1.56 835.0 2
176 1.62 840.0 2
177 1.60 560.0 2

[178 rows x 14 columns]

wine_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 alcohol 178 non-null float64
1 malic_acid 178 non-null float64
2 ash 178 non-null float64
3 alcalinity_of_ash 178 non-null float64
4 magnesium 178 non-null float64
5 total_phenols 178 non-null float64
6 flavanoids 178 non-null float64
7 nonflavanoid_phenols 178 non-null float64
8 proanthocyanins 178 non-null float64
9 color_intensity 178 non-null float64
Dept. of CSE Dr. Shanu K Rakesh
Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

10 hue 178 non-null float64


11 od280/od315_of_diluted_wines 178 non-null float64
12 proline 178 non-null float64
13 target 178 non-null int32
dtypes: float64(13), int32(1)
memory usage: 18.9 KB

wine_df.describe()

alcohol malic_acid ash alcalinity_of_ash magnesium \


count 178.000000 178.000000 178.000000 178.000000 178.000000
mean 13.000618 2.336348 2.366517 19.494944 99.741573
std 0.811827 1.117146 0.274344 3.339564 14.282484
min 11.030000 0.740000 1.360000 10.600000 70.000000
25% 12.362500 1.602500 2.210000 17.200000 88.000000
50% 13.050000 1.865000 2.360000 19.500000 98.000000
75% 13.677500 3.082500 2.557500 21.500000 107.000000
max 14.830000 5.800000 3.230000 30.000000 162.000000

total_phenols flavanoids nonflavanoid_phenols proanthocyanins \


count 178.000000 178.000000 178.000000 178.000000
mean 2.295112 2.029270 0.361854 1.590899
std 0.625851 0.998859 0.124453 0.572359
min 0.980000 0.340000 0.130000 0.410000
25% 1.742500 1.205000 0.270000 1.250000
50% 2.355000 2.135000 0.340000 1.555000
75% 2.800000 2.875000 0.437500 1.950000
max 3.880000 5.080000 0.660000 3.580000

color_intensity hue od280/od315_of_diluted_wines proline \


count 178.000000 178.000000 178.000000 178.000000
mean 5.058090 0.957449 2.611685 746.893258
std 2.318286 0.228572 0.709990 314.907474
min 1.280000 0.480000 1.270000 278.000000
25% 3.220000 0.782500 1.937500 500.500000
50% 4.690000 0.965000 2.780000 673.500000
75% 6.200000 1.120000 3.170000 985.000000
max 13.000000 1.710000 4.000000 1680.000000

target
count 178.000000
mean 0.938202
std 0.775035
min 0.000000
25% 0.000000
50% 1.000000
75% 2.000000
max 2.000000

# Display the pairplot for visualizing relationships between features


sns.pairplot(wine_df, hue='target', palette='viridis')

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

plt.title("Pairplot of Wine Dataset")


plt.show()

C:\Users\Akshay\anaconda3\lib\site-packages\seaborn\axisgrid.py:118:
UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

# Display a correlation heatmap


plt.figure(figsize=(10, 8))
sns.heatmap(wine_df.corr(), annot=True, linewidths=0.5)
plt.title("Correlation Heatmap")
plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

X = wine.data
y = wine.target

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create a Decision Tree classifier


dt_classifier = DecisionTreeClassifier(random_state=42)

# Train the classifier on the training data


dt_classifier.fit(X_train, y_train)

# Get feature importances


feature_importances = dt_classifier.feature_importances_

# Create a DataFrame to display feature importances


feature_importance_df = pd.DataFrame({'Feature': wine_df.columns[:-1],
'Importance': feature_importances})

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

# Sort the DataFrame by importance in descending order


feature_importance_df = feature_importance_df.sort_values(by='Importance',
ascending=False)

# Display the feature importances


print("Feature Importances:\n", feature_importance_df)

Feature Importances:
Feature Importance
6 flavanoids 0.411053
9 color_intensity 0.384934
12 proline 0.164075
2 ash 0.020942
0 alcohol 0.018995
1 malic_acid 0.000000
3 alcalinity_of_ash 0.000000
4 magnesium 0.000000
5 total_phenols 0.000000
7 nonflavanoid_phenols 0.000000
8 proanthocyanins 0.000000
10 hue 0.000000
11 od280/od315_of_diluted_wines 0.000000

# Visualize feature importances


plt.figure(figsize=(10, 6))
plt.bar(feature_importance_df['Feature'], feature_importance_df['Importance'],
color='teal')
plt.title('Feature Importances')
plt.xlabel('Feature')
plt.ylabel('Importance')
plt.xticks(rotation=45, ha='right')
plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

• There are 5 features which are important for the training of the model.
– flavanoids
– color_intensity
– proline
– ash
– alcohol
# Make predictions on the test data
y_pred = dt_classifier.predict(X_test)

# Evaluate the performance


accuracy = accuracy_score(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:\n", classification_rep)

Accuracy: 0.94
Classification Report:
precision recall f1-score support

0 0.93 0.93 0.93 14


1 0.93 1.00 0.97 14
Dept. of CSE Dr. Shanu K Rakesh
Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

2 1.00 0.88 0.93 8

accuracy 0.94 36
macro avg 0.95 0.93 0.94 36
weighted avg 0.95 0.94 0.94 36

• The accuracy of the Decision Tree Classifer model is: 94%.


• There are 5 features with higher importance values contribute more to the decision-making
process of the Decision Tree model.

Practical - 08

Write a program in Python to implement Support Vector


Machine for diabetes classification.

# Importing Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import sklearn.preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')
from sklearn import svm

df = pd.read_csv('diabetes.csv')

df.head(7)

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \


0 6 148.0 72.0 35.0 0 33.6
1 1 85.0 66.0 29.0 0 26.6
2 8 183.0 64.0 29.0 0 23.3
3 1 89.0 66.0 23.0 94 28.1
4 0 137.0 40.0 35.0 168 43.1
5 5 116.0 74.0 29.0 0 25.6

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

6 3 78.0 50.0 32.0 88 31.0

DiabetesPedigreeFunction Age Outcome


0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1
5 0.201 30 0
6 0.248 26 1

df.describe()

Pregnancies Glucose BloodPressure SkinThickness Insulin \


count 768.000000 768.000000 768.000000 768.000000 768.000000
mean 3.845052 121.682292 72.386719 29.108073 79.799479
std 3.369578 30.435999 12.096642 8.791221 115.244002
min 0.000000 44.000000 24.000000 7.000000 0.000000
25% 1.000000 99.750000 64.000000 25.000000 0.000000
50% 3.000000 117.000000 72.000000 29.000000 30.500000
75% 6.000000 140.250000 80.000000 32.000000 127.250000
max 17.000000 199.000000 122.000000 99.000000 846.000000

BMI DiabetesPedigreeFunction Age Outcome


count 768.000000 768.000000 768.000000 768.000000
mean 31.992578 0.471876 33.240885 0.348958
std 7.884160 0.331329 11.760232 0.476951
min 0.000000 0.078000 21.000000 0.000000
25% 27.300000 0.243750 24.000000 0.000000
50% 32.000000 0.372500 29.000000 0.000000
75% 36.600000 0.626250 41.000000 1.000000
max 67.100000 2.420000 81.000000 1.000000

# Checking for missing values.


df.isnull().values.any()

False

Feature Engineering

Although there were no missing values found in the dataset, there still needs to be some feature
engineering done before implementing the SVM model. The three features Glucose, Blood pressure, and
Skin thickness have minimum values of 0. However, these features cannot be equal to zero because humans
can't survive with zero glucose, blood pressure, or skin thickness. So to solve this issue all values equal to
zero in each of those three features were turned into null values and they were just ignored for simplicity.
zero_not_allowed = ["Glucose","BloodPressure","SkinThickness"]

for column in zero_not_allowed:


df[column] = df[column].replace(0, np.NaN)
mean = int(df[column].mean(skipna = True))
df[column] = df[column].replace(np.NaN, mean)
Dept. of CSE Dr. Shanu K Rakesh
Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

SVM Model
# Splitting the dataset into training and testing sets.
x = df.iloc[:, :-2]
y = df.iloc[:, -1]
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state = 0,
test_size = 0.2)

# Creating the SVM model.


clf = svm.SVC(kernel='rbf')
clf.fit(x_train,y_train)
y_pred = clf.predict(x_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.7922077922077922

• The accuracy of the Support Vector Classifier model is: 79%.


from sklearn.metrics import confusion_matrix

confusion_matrix(y_test,y_pred)

array([[98, 9],
[23, 24]], dtype=int64)

Practical : 09

Demonstrate the applications of Artificial Neural network using


Python
Dataset Information
The training archive contains 25,000 images of dogs and cats. Train your algorithm on these files and
predict the labels
(1 = dog, 0 = cat).

Download Dataset
!wget https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-
6DEBA77B919F/kagglecatsanddogs_3367a.zip

--2021-05-06 16:04:20-- https://download.microsoft.com/download/3/E/1/3E1C3F21-


ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip
Resolving download.microsoft.com (download.microsoft.com)... 23.78.216.154,
2600:1417:8000:980::e59, 2600:1417:8000:9b2::e59
Connecting to download.microsoft.com (download.microsoft.com)|
23.78.216.154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 824894548 (787M) [application/octet-stream]

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Saving to: ‘kagglecatsanddogs_3367a.zip’

kagglecatsanddogs_3 100%[===================>] 786.68M 187MB/s in 4.3s

2021-05-06 16:04:24 (183 MB/s) - ‘kagglecatsanddogs_3367a.zip’ saved


[824894548/824894548]

Unzip the Dataset


# !unzip kagglecatsanddogs_3367a.zip

Import Modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import os
import tqdm
import random
from keras.preprocessing.image import load_img
warnings.filterwarnings('ignore')

Create Dataframe for Input and Output


input_path = []
label = []

for class_name in os.listdir("PetImages"):


for path in os.listdir("PetImages/"+class_name):
if class_name == 'Cat':
label.append(0)
else:
label.append(1)
input_path.append(os.path.join("PetImages", class_name, path))
print(input_path[0], label[0])

PetImages/Dog/4253.jpg 1

df = pd.DataFrame()
df['images'] = input_path
df['label'] = label
df = df.sample(frac=1).reset_index(drop=True)
df.head()

images label
0 PetImages/Dog/67.jpg 1
1 PetImages/Dog/8273.jpg 1
2 PetImages/Dog/9117.jpg 1
3 PetImages/Cat/654.jpg 0
4 PetImages/Cat/6418.jpg 0

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

for i in df['images']:
if '.jpg' not in i:
print(i)

PetImages/Cat/Thumbs.db
PetImages/Dog/Thumbs.db

import PIL
l = []
for image in df['images']:
try:
img = PIL.Image.open(image)
except:
l.append(image)
l

['PetImages/Cat/666.jpg',
'PetImages/Cat/Thumbs.db',
'PetImages/Dog/Thumbs.db',
'PetImages/Dog/11702.jpg']

# delete db files
df = df[df['images']!='PetImages/Dog/Thumbs.db']
df = df[df['images']!='PetImages/Cat/Thumbs.db']
df = df[df['images']!='PetImages/Cat/666.jpg']
df = df[df['images']!='PetImages/Dog/11702.jpg']
len(df)

24998

Exploratory Data Analysis


# to display grid of images
plt.figure(figsize=(25,25))
temp = df[df['label']==1]['images']
start = random.randint(0, len(temp))
files = temp[start:start+25]

for index, file in enumerate(files):


plt.subplot(5,5, index+1)
img = load_img(file)
img = np.array(img)
plt.imshow(img)
plt.title('Dogs')
plt.axis('off')

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

# to display grid of images


plt.figure(figsize=(25,25))
temp = df[df['label']==0]['images']
start = random.randint(0, len(temp))
files = temp[start:start+25]

for index, file in enumerate(files):


plt.subplot(5,5, index+1)
img = load_img(file)
img = np.array(img)
plt.imshow(img)
plt.title('Cats')
plt.axis('off')

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

import seaborn as sns


sns.countplot(df['label'])

<matplotlib.axes._subplots.AxesSubplot at 0x7f71e0c80190>

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Create DataGenerator for the Images


df['label'] = df['label'].astype('str')

df.head()

images label
0 PetImages/Dog/67.jpg 1
1 PetImages/Dog/8273.jpg 1
2 PetImages/Dog/9117.jpg 1
3 PetImages/Cat/654.jpg 0
4 PetImages/Cat/6418.jpg 0

# input split
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.2, random_state=42)

from keras.preprocessing.image import ImageDataGenerator


train_generator = ImageDataGenerator(
rescale = 1./255, # normalization of images
rotation_range = 40, # augmention of images to avoid overfitting
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True,
fill_mode = 'nearest'
)

val_generator = ImageDataGenerator(rescale = 1./255)

train_iterator = train_generator.flow_from_dataframe(

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

train,
x_col='images',
y_col='label',
target_size=(128,128),
batch_size=512,
class_mode='binary'
)

val_iterator = val_generator.flow_from_dataframe(
test,
x_col='images',
y_col='label',
target_size=(128,128),
batch_size=512,
class_mode='binary'
)

Found 19998 validated image filenames belonging to 2 classes.


Found 5000 validated image filenames belonging to 2 classes.

Model Creation
from keras import Sequential
from keras.layers import Conv2D, MaxPool2D, Flatten, Dense

model = Sequential([
Conv2D(16, (3,3), activation='relu',
input_shape=(128,128,3)),
MaxPool2D((2,2)),
Conv2D(32, (3,3), activation='relu'),
MaxPool2D((2,2)),
Conv2D(64, (3,3), activation='relu'),
MaxPool2D((2,2)),
Flatten(),
Dense(512, activation='relu'),
Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_12 (Conv2D) (None, 126, 126, 16) 448
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 63, 63, 16) 0
_________________________________________________________________
conv2d_13 (Conv2D) (None, 61, 61, 32) 4640
_________________________________________________________________

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

max_pooling2d_13 (MaxPooling (None, 30, 30, 32) 0


_________________________________________________________________
conv2d_14 (Conv2D) (None, 28, 28, 64) 18496
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 14, 14, 64) 0
_________________________________________________________________
flatten_4 (Flatten) (None, 12544) 0
_________________________________________________________________
dense_8 (Dense) (None, 512) 6423040
_________________________________________________________________
dense_9 (Dense) (None, 1) 513
=================================================================
Total params: 6,447,137
Trainable params: 6,447,137
Non-trainable params: 0
_________________________________________________________________

history = model.fit(train_iterator, epochs=10, validation_data=val_iterator)

Epoch 1/10
40/40 [==============================] - 150s 4s/step - loss: 0.8679 - accuracy:
0.5187 - val_loss: 0.6399 - val_accuracy: 0.6238
Epoch 2/10
40/40 [==============================] - 147s 4s/step - loss: 0.6280 - accuracy:
0.6416 - val_loss: 0.5672 - val_accuracy: 0.7024
Epoch 3/10
40/40 [==============================] - 146s 4s/step - loss: 0.5737 - accuracy:
0.6980 - val_loss: 0.5493 - val_accuracy: 0.7148
Epoch 4/10
40/40 [==============================] - 146s 4s/step - loss: 0.5478 - accuracy:
0.7221 - val_loss: 0.5351 - val_accuracy: 0.7356
Epoch 5/10
40/40 [==============================] - 145s 4s/step - loss: 0.5276 - accuracy:
0.7338 - val_loss: 0.5104 - val_accuracy: 0.7494
Epoch 6/10
40/40 [==============================] - 144s 4s/step - loss: 0.5127 - accuracy:
0.7405 - val_loss: 0.4853 - val_accuracy: 0.7664
Epoch 7/10
40/40 [==============================] - 144s 4s/step - loss: 0.5059 - accuracy:
0.7544 - val_loss: 0.4586 - val_accuracy: 0.7868
Epoch 8/10
40/40 [==============================] - 143s 4s/step - loss: 0.4842 - accuracy:
0.7644 - val_loss: 0.5054 - val_accuracy: 0.7510
Epoch 9/10
40/40 [==============================] - 143s 4s/step - loss: 0.4971 - accuracy:
0.7530 - val_loss: 0.4647 - val_accuracy: 0.7894
Epoch 10/10
40/40 [==============================] - 142s 4s/step - loss: 0.4642 - accuracy:
0.7770 - val_loss: 0.4711 - val_accuracy: 0.7782

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Visualization of Results
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
epochs = range(len(acc))

plt.plot(epochs, acc, 'b', label='Training Accuracy')


plt.plot(epochs, val_acc, 'r', label='Validation Accuracy')
plt.title('Accuracy Graph')
plt.legend()
plt.figure()

loss = history.history['loss']
val_loss = history.history['val_loss']
plt.plot(epochs, loss, 'b', label='Training Loss')
plt.plot(epochs, val_loss, 'r', label='Validation Loss')
plt.title('Loss Graph')
plt.legend()
plt.show()

Dept. of CSE Dr. Shanu K Rakesh


Machine Learning Lab Manual (B.Tech. 7th Sem CSE)

Test with Real Image


image_path = "test.jpg" # path of the image
img = load_img(image_path, target_size=(128, 128))
img = np.array(img)
img = img / 255.0 # normalize the image
img = img.reshape(1, 128, 128, 3) # reshape for prediction
pred = model.predict(img)
if pred[0] > 0.5:
label = 'Dog'
else:
label = 'Cat'
print(label)

Predpared By: Akshay Saini (B.Tech CSE 7th sem)

Dept. of CSE Dr. Shanu K Rakesh

You might also like