ML Lab Manual 7th CSE
ML Lab Manual 7th CSE
LAB MANUAL
Subject:
Machine Learning Lab
Course Code:
D022721(022)
LIST OF EXPERIMENTS
Program / Semester: B.Tech 7th Branch: Computer Science &
Engineering
Subject: Machine Learning Lab Course Code: D022721(022)
7 Write a Python code to tackle a multi-class classification problem where the CO2
challenge is to classify wine into three types using Decision Tree.
8 Write a program in Python to implement Support Vector Machine for diabetes CO2
classification.
9 Demonstrate the application of Artificial Neural Network using Python. CO3,CO4
Course Outcomes
# to show Grid
plt.grid(True)
We can also specify the size of the figure using method figure() and passing the values as a tuple
of the length of rows and columns to the argument figsize
import matplotlib.pyplot as plt
import numpy as np
# To show plot
plt.show()
figure(figsize = (x, y)) - whenever we want the result to be displayed in a separate window we use this command,
and figsize argument decides what will be the initial size of the window that will be displayed after the run
subplot(r, c, i) - it is used to create multiple plots in the same figure with r signifies the no of rows in the figure, c
signifies no of columns in a figure and i specifies the positioning of the particular plot
subplots(nrows, ncols, figsize) — a convenient way to create subplots, in a single call. It returns a tuple of a figure
and number of axes.
set_xticks - it is used to set the range and the step size of the markings on x – axis in a subplot
set_yticks - it is used to set the range and the step size of the markings on y – axis in a subplot
xticks(index, categorical variables) — Get or set the current tick locations and labels of the x-axis
xlim(start value, end value) — used to set the limit of values of the x-axis
ylim(start value, end value) — used to set the limit of values of the y-axis
scatter(x-axis values, y-axis values) — plots a scatter plot with x-axis values against y-axis values
set_xlabel("string") — axes level method used to set the x-label of the plot specified as a string
set_ylabel("string") — axes level method used to set the y-label of the plot specified as a string
scatter3D(x-axis values, y-axis values) — plots a three-dimensional scatter plot with x-axis values against y-axis
values
plot3D(x-axis values, y-axis values) — plots a three-dimensional line graph with x-axis values against y-axis values
# Importing Libraries
import matplotlib.pyplot as plt
import numpy as np
# Plot multiple sets of data by passing in multiple sets of arguments of X and Y
axis in the plot()
x=np.arange(1,5)
y=x**3
plt.subplot(1,2,1) # the figure has 1 row , 2 Columns, and this plot is the 1st
plot
plt.plot([1, 2, 3, 4], [1, 4, 9, 16],'go')
plt.title("1st Subplot (1,2,1)")
plt.subplot(1,2,2) # the figure has 1 row , 2 Columns, and this plot is the 2nd
plot
plt.plot(x, y,'r^')
plt.title("2nd Subplot (1,2,2)")
plt.suptitle("My subplots")
plt.show()
plt.subplot(2,1,1) # the figure has 2 rows , 1 Column, and this plot is the 1st
plot
plt.plot([1, 2, 3, 4], [1, 4, 9, 16],'go')
plt.title("1st Subplot")
plt.grid()
plt.subplot(2,1,2) # the figure has 2 rows , 1 Column, and this plot is the 1st
plot
plt.plot(x, y,'r^')
plt.title("2nd Subplot")
plt.grid()
plt.suptitle("My subplots")
plt.show()
ax[0,0].set_title("(ax[0,0])")
ax[0,1].set_title("Squares (ax[0,1])")
ax[1,0].set_title("Cubes (ax[1,0])")
ax[1,1].set_title("(ax[1,1])")
plt.plot(x, y, c='g')
Get the Figure and Axes all at once, adding text, legend and grid, save figure
import matplotlib.pyplot as plt
import numpy as np
# Adding Text
ax.text(3.5, 0.9, 'Sin & Cos wave', fontsize = 15)
Bar chart
A bar plot or bar chart is a graph that represents the category of data with rectangular bars with lengths
and heights that is proportional to the values which they represent. The bar plots can be plotted
horizontally or vertically. A bar chart describes the comparisons between the discrete categories. It can
be created using the bar() method.
import matplotlib.pyplot as plt
# data to display on plots
x = [1, 2, 3, 4, 5, 6]
y = [2, 4, 9, 10, 6, 3]
Histograms
A histogram is basically used to represent data in the form of some groups. It is a type of bar plot where
the X-axis represents the bin ranges while the Y-axis gives information about frequency. To create a
histogram the first step is to create a bin of the ranges, then distribute the whole range of the values into
a series of intervals, and count the values which fall into each of the intervals. Bins are clearly identified
as consecutive, non-overlapping intervals of variables. The hist() function is used to compute and create
histogram of x.
import matplotlib.pyplot as plt
plt.show()
Scatter Plot
Scatter plots are used to observe the relationship between variables and use dots to represent the
relationship between them. The scatter() method in the matplotlib library is used to draw a scatter plot.
import matplotlib.pyplot as plt
# data to display on plots
x = [3, 1, 3, 12, 2, 4, 4]
y = [3, 2, 1, 4, 5, 6, 7]
plt.show()
Pie Chart
A Pie Chart is a circular statistical plot that can display only one series of data. The area of the chart is the
total percentage of the given data. The area of slices of the pie represents the percentage of the parts of
the data. The slices of pie are called wedges. The area of the wedge is determined by the length of the arc
of the wedge. It can be created using the pie() method.
import matplotlib.pyplot as plt
df = pd.read_csv(r"https://gitlab.com/scilab/forge/rdataset/-/raw/master/csv/
MASS/Boston.csv?ref_type=heads&inline=false",on_bad_lines = "skip")
df = df[df.columns[1:]]
df
# Data Description
CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX nitric oxides concentration (parts per 10 million)
RM average number of rooms per dwelling
AGE proportion of owner-occupied units built prior to 1940
DIS weighted distances to five Boston employment centres
RAD index of accessibility to radial highways
TAX full-value property-tax rate per 10,000usd
PTRATIO pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT % lower status of the population
MEDV Median value of owner-occupied homes in $1000s
df.rename(columns={"medv":"price"},inplace=True)
df
crim zn indus chas nox rm age dis rad tax \
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222
.. ... ... ... ... ... ... ... ... ... ...
501 0.06263 0.0 11.93 0 0.573 6.593 69.1 2.4786 1 273
502 0.04527 0.0 11.93 0 0.573 6.120 76.7 2.2875 1 273
503 0.06076 0.0 11.93 0 0.573 6.976 91.0 2.1675 1 273
504 0.10959 0.0 11.93 0 0.573 6.794 89.3 2.3889 1 273
505 0.04741 0.0 11.93 0 0.573 6.030 80.8 2.5050 1 273
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 crim 506 non-null float64
1 zn 506 non-null float64
2 indus 506 non-null float64
3 chas 506 non-null int64
4 nox 506 non-null float64
5 rm 506 non-null float64
6 age 506 non-null float64
7 dis 506 non-null float64
8 rad 506 non-null int64
9 tax 506 non-null int64
10 ptratio 506 non-null float64
11 black 506 non-null float64
12 lstat 506 non-null float64
13 price 506 non-null float64
dtypes: float64(11), int64(3)
memory usage: 55.5 KB
lstat price
count 506.000000 506.000000
mean 12.653063 22.532806
std 7.141062 9.197104
min 1.730000 5.000000
25% 6.950000 17.025000
50% 11.360000 21.200000
75% 16.955000 25.000000
max 37.970000 50.000000
(14, 14)
black lstat
0 396.90 4.98
1 396.90 9.14
2 392.83 4.03
3 394.63 2.94
4 396.90 5.33
# Value of y intercept
lr.intercept_
31.631084035694286
Attribute Coefficients
0 crim -0.13347
1 zn 0.035809
2 indus 0.049523
3 chas 3.119835
4 nox -15.417061
5 rm 4.057199
6 age -0.010821
7 dis -1.385998
8 rad 0.242727
9 tax -0.008702
10 ptratio -0.910685
11 black 0.011794
12 lstat -0.547113
y_pred = lr.predict(X_train)
y_pred.shape
(354,)
# Model Evaluation
R^2: 0.7434997532004697
Adjusted R^2: 0.7336923908228405
MAE: 3.356826782168208
MSE: 22.545481487421423
RMSE: 4.748208239685937
2
R : It is a measure of the linear relationship between X and Y. It is interpreted as the proportion
of the variance in the dependent variable that is predictable from the independent variable.
Adjusted R2 :The adjusted R-squared compares the explanatory power of regression models that
contain different numbers of predictors.
MAE : It is the mean of the absolute value of the errors. It measures the difference between two
continuous variables, here actual and predicted values of y.
MSE: The mean square error (MSE) is just like the MAE, but squares the difference before
summing them all instead of using the absolute value.
RMSE: The mean square error (MSE) is just like the MAE, but squares the difference before
summing them all instead of using the absolute value.
plt.scatter(y_train, y_pred)
plt.xlabel("Prices")
plt.ylabel("Predicted prices")
plt.title("Prices vs Predicted prices")
plt.show()
# Checking residuals
plt.scatter(y_pred,y_train-y_pred)
plt.title("Predicted vs residuals")
plt.xlabel("Predicted")
plt.ylabel("Residuals")
plt.show()
There is no pattern visible in this plot and values are distributed equally around zero. So
Linearity assumption is satisfied.
C:\Users\Akshay\AppData\Local\Temp\ipykernel_12956\3326403628.py:2: UserWarning:
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(y_train-y_pred)
# Model Evaluation
acc_linreg = metrics.r2_score(y_test, y_test_pred)
print('R^2:', acc_linreg)
print('Adjusted R^2:',1 - (1-metrics.r2_score(y_test,
y_test_pred))*(len(y_test)-1)/(len(y_test)-X_test.shape[1]-1))
print('MAE:',metrics.mean_absolute_error(y_test, y_test_pred))
print('MSE:',metrics.mean_squared_error(y_test, y_test_pred))
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_test, y_test_pred)))
R^2: 0.7112260057484948
Adjusted R^2: 0.6840226584639327
MAE: 3.1627098714573947
MSE: 21.51744423117709
RMSE: 4.638689926172808
2
• R : 0.7112260057484948
• Adjusted R R2: 0.6840226584639327
• MAE: 3.1627098714573947
• MSE: 21.51744423117709
• RMSE: 4.638689926172808
Here the model evaluations scores are almost matching with that of train data. So the model is
not overfitting.
#plt.style.use('ggplot')
#ggplot is R based visualisation package that provides better graphics with
higher level of abstraction
## gives information about the data types,columns, null value counts, memory
usage etc
diabetes_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
## basic statistic details about the data (note only numerical columns would be
displayed here unless parameter include="all")
diabetes_data.describe()
diabetes_data.describe().T
It is better to replace zeros with nan since after that counting them would be easier and
zeros need to be replaced with suitable values
print(diabetes_data_copy.isnull().sum())
Pregnancies 0
Glucose 5
BloodPressure 35
SkinThickness 227
Insulin 374
BMI 11
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64
p = diabetes_data.hist(figsize = (20,20))
Aiming to impute nan values for the columns in accordance with their distribution
diabetes_data_copy['Glucose'].fillna(diabetes_data_copy['Glucose'].mean(),
inplace = True)
diabetes_data_copy['BloodPressure'].fillna(diabetes_data_copy['BloodPressure'].m
ean(), inplace = True)
diabetes_data_copy['SkinThickness'].fillna(diabetes_data_copy['SkinThickness'].m
edian(), inplace = True)
diabetes_data_copy['Insulin'].fillna(diabetes_data_copy['Insulin'].median(),
inplace = True)
diabetes_data_copy['BMI'].fillna(diabetes_data_copy['BMI'].median(), inplace =
True)
Skewness
A left-skewed distribution has a long left tail. Left-skewed distributions are also called negatively-
skewed distributions. That’s because there is a long tail in the negative direction on the number line.
The mean is also to the left of the peak.
A right-skewed distribution has a long right tail. Right-skewed distributions are also called positive-
skew distributions. That’s because there is a long tail in the positive direction on the number line. The
mean is also to the right of the peak.
(768, 9)
## checking the balance of the data by plotting the count of outcomes by their
value
Outcome
0 500
1 268
Name: count, dtype: int64
The above graph shows that the data is biased towards datapoints having outcome value as 0 where it
means that diabetes was not present actually. The number of non-diabetics is almost twice the number of
diabetic patients
The pairs plot builds on two basic figures, the histogram and the scatter plot. The histogram on the diagonal
allows us to see the distribution of a single variable while the scatter plots on the upper and lower triangles
show the relationship (or lack thereof) between two variables.
Pearson's Correlation Coefficient: helps you find out the relationship between two quantities. It gives
you the measure of the strength of association between two variables. The value of Pearson's
Correlation Coefficient can be between -1 to +1. 1 means that they are highly correlated and 0 means no
correlation.
A heat map is a two-dimensional representation of information with the help of colors. Heat maps can
help the user visualize simple or complex information.
#Heatmap for unclean data
X.head()
DiabetesPedigreeFunction Age
0 0.468492 1.425995
1 -0.365061 -0.190672
2 0.604397 -0.105584
3 -0.920763 -1.041549
4 5.484909 -0.020496
#X = diabetes_data.drop("Outcome",axis = 1)
y = diabetes_data_copy.Outcome
it is always advisable to bring all the features to the same scale for applying distance based algorithms like
KNN.
Let's see an example of distance calculation using two features whose magnitudes/ranges vary greatly.
Euclidean Distance = [(100000–80000)^2 + (30–25)^2]^(1/2)
We can imagine how the feature with greater range with overshadow or dimenish the smaller feature
completely and this will impact the performance of all distance based model as it will give higher weightage
to variables which have higher magnitude.
test_scores = []
train_scores = []
for i in range(1,15):
knn = KNeighborsClassifier(i)
knn.fit(X_train,y_train)
train_scores.append(knn.score(X_train,y_train))
test_scores.append(knn.score(X_test,y_test))
## score that comes from testing on the same datapoints that were used for
training
max_train_score = max(train_scores)
train_scores_ind = [i for i, v in enumerate(train_scores) if v ==
max_train_score]
print('Max train score {} % and k =
{}'.format(max_train_score*100,list(map(lambda x: x+1, train_scores_ind))))
## score that comes from testing on the datapoints that were split in the
beginning to be used for testing solely
max_test_score = max(test_scores)
test_scores_ind = [i for i, v in enumerate(test_scores) if v == max_test_score]
print('Max test score {} % and k = {}'.format(max_test_score*100,list(map(lambda
x: x+1, test_scores_ind))))
Result Visualisation
plt.figure(figsize=(8,5))
p = sns.lineplot(train_scores,marker='*',label='Train Score')
p = sns.lineplot(test_scores,marker='o',label='Test Score')
The best result is captured at k = 11 hence 11 is used for the final model
knn = KNeighborsClassifier(11)
knn.fit(X_train,y_train)
knn.score(X_test,y_test)
0.765625
value = 20000
width = 20000
plot_decision_regions(X.values, y.values, clf=knn, legend=2,
filler_feature_values={2: value, 3: value, 4: value, 5: value, 6: value, 7:
value},filler_feature_ranges={2: width, 3: width, 4: width, 5: width, 6: width,
7: width},X_highlight=X_test.values)
confusion_matrix(y_test,y_pred)
pd.crosstab(y_test, y_pred, rownames=['True'], colnames=['Predicted'],
margins=True)
Predicted 0 1 All
True
0 142 25 167
1 35 54 89
All 177 79 256
y_pred = knn.predict(X_test)
2. Classification Report
Report which includes Precision, Recall and F1-Score.
#import classification_report
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))
3. ROC - AUC
ROC (Receiver Operating Characteristic) Curve tells us about how good the model can
distinguish between two things (e.g If a patient has a disease or no). Better models can
accurately distinguish between the two. Whereas, a poor model will have difficulties in
distinguishing between the two
Well Explained in this video: https://www.youtube.com/watch?v=OAl6eAyP-yo
from sklearn.metrics import roc_curve
y_pred_proba = knn.predict_proba(X_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
plt.plot([0,1],[0,1],'k--')
plt.plot(fpr,tpr, label='Knn')
plt.xlabel('fpr')
plt.ylabel('tpr')
plt.title('Knn(n_neighbors=11) ROC curve')
plt.show()
Best Score:0.7721840251252015
Best Parameters: {'n_neighbors': 25}
type text
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup fina...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
... ... ...
5569 spam This is the 2nd time we have tried 2 contact u...
df['type'].value_counts()
ham 4827
spam 747
Name: type, dtype: int64
MultinomialNB()
prediction = model.predict(test_transformed)
actual = y_test
print("Prediction:", list(prediction))
print("Actual: ",list(actual))
Prediction: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0,
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,
0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0,
0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0]
Actual: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0,
1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0,
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,
0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0,
0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1,
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0]
array([[1447, 45],
[ 0, 181]], dtype=int64)
precision = matrix[1][1]/(matrix[1][1]+matrix[0][1])
recall = matrix[1][1]/(matrix[1][1]+matrix[1][0])
f1score = matrix[1][1]/(matrix[1][1]+(matrix[1][0]+(matrix[0][1]/2)))
Let's predict some real messages. Here are some messages that I received in the past.
messages = ["Congragulations! You have won a $10,000. Go to https://bit.ly/23343
to claim now.",
message_transformed = vectorizer.transform(messages)
new_prediction = model.predict(message_transformed)
for i in range(len(new_prediction)):
if new_prediction[i] == 0:
print("Ham.")
else:
print("Spam.")
Spam.
Spam.
Spam.
Ham.
Practical - 07
od280/od315_of_diluted_wines proline
0 3.92 1065.0
1 3.40 1050.0
2 3.17 1185.0
3 3.45 1480.0
4 2.93 735.0
.. ... ...
173 1.74 740.0
174 1.56 750.0
175 1.56 835.0
176 1.62 840.0
177 1.60 560.0
wine_df['target'] = wine.target
wine_df
wine_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 alcohol 178 non-null float64
1 malic_acid 178 non-null float64
2 ash 178 non-null float64
3 alcalinity_of_ash 178 non-null float64
4 magnesium 178 non-null float64
5 total_phenols 178 non-null float64
6 flavanoids 178 non-null float64
7 nonflavanoid_phenols 178 non-null float64
8 proanthocyanins 178 non-null float64
9 color_intensity 178 non-null float64
Dept. of CSE Dr. Shanu K Rakesh
Machine Learning Lab Manual (B.Tech. 7th Sem CSE)
wine_df.describe()
target
count 178.000000
mean 0.938202
std 0.775035
min 0.000000
25% 0.000000
50% 1.000000
75% 2.000000
max 2.000000
C:\Users\Akshay\anaconda3\lib\site-packages\seaborn\axisgrid.py:118:
UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)
X = wine.data
y = wine.target
Feature Importances:
Feature Importance
6 flavanoids 0.411053
9 color_intensity 0.384934
12 proline 0.164075
2 ash 0.020942
0 alcohol 0.018995
1 malic_acid 0.000000
3 alcalinity_of_ash 0.000000
4 magnesium 0.000000
5 total_phenols 0.000000
7 nonflavanoid_phenols 0.000000
8 proanthocyanins 0.000000
10 hue 0.000000
11 od280/od315_of_diluted_wines 0.000000
• There are 5 features which are important for the training of the model.
– flavanoids
– color_intensity
– proline
– ash
– alcohol
# Make predictions on the test data
y_pred = dt_classifier.predict(X_test)
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:\n", classification_rep)
Accuracy: 0.94
Classification Report:
precision recall f1-score support
accuracy 0.94 36
macro avg 0.95 0.93 0.94 36
weighted avg 0.95 0.94 0.94 36
Practical - 08
# Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import sklearn.preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')
from sklearn import svm
df = pd.read_csv('diabetes.csv')
df.head(7)
df.describe()
False
Feature Engineering
Although there were no missing values found in the dataset, there still needs to be some feature
engineering done before implementing the SVM model. The three features Glucose, Blood pressure, and
Skin thickness have minimum values of 0. However, these features cannot be equal to zero because humans
can't survive with zero glucose, blood pressure, or skin thickness. So to solve this issue all values equal to
zero in each of those three features were turned into null values and they were just ignored for simplicity.
zero_not_allowed = ["Glucose","BloodPressure","SkinThickness"]
SVM Model
# Splitting the dataset into training and testing sets.
x = df.iloc[:, :-2]
y = df.iloc[:, -1]
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state = 0,
test_size = 0.2)
Accuracy: 0.7922077922077922
confusion_matrix(y_test,y_pred)
array([[98, 9],
[23, 24]], dtype=int64)
Practical : 09
Download Dataset
!wget https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-
6DEBA77B919F/kagglecatsanddogs_3367a.zip
Import Modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import os
import tqdm
import random
from keras.preprocessing.image import load_img
warnings.filterwarnings('ignore')
PetImages/Dog/4253.jpg 1
df = pd.DataFrame()
df['images'] = input_path
df['label'] = label
df = df.sample(frac=1).reset_index(drop=True)
df.head()
images label
0 PetImages/Dog/67.jpg 1
1 PetImages/Dog/8273.jpg 1
2 PetImages/Dog/9117.jpg 1
3 PetImages/Cat/654.jpg 0
4 PetImages/Cat/6418.jpg 0
for i in df['images']:
if '.jpg' not in i:
print(i)
PetImages/Cat/Thumbs.db
PetImages/Dog/Thumbs.db
import PIL
l = []
for image in df['images']:
try:
img = PIL.Image.open(image)
except:
l.append(image)
l
['PetImages/Cat/666.jpg',
'PetImages/Cat/Thumbs.db',
'PetImages/Dog/Thumbs.db',
'PetImages/Dog/11702.jpg']
# delete db files
df = df[df['images']!='PetImages/Dog/Thumbs.db']
df = df[df['images']!='PetImages/Cat/Thumbs.db']
df = df[df['images']!='PetImages/Cat/666.jpg']
df = df[df['images']!='PetImages/Dog/11702.jpg']
len(df)
24998
<matplotlib.axes._subplots.AxesSubplot at 0x7f71e0c80190>
df.head()
images label
0 PetImages/Dog/67.jpg 1
1 PetImages/Dog/8273.jpg 1
2 PetImages/Dog/9117.jpg 1
3 PetImages/Cat/654.jpg 0
4 PetImages/Cat/6418.jpg 0
# input split
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.2, random_state=42)
train_iterator = train_generator.flow_from_dataframe(
train,
x_col='images',
y_col='label',
target_size=(128,128),
batch_size=512,
class_mode='binary'
)
val_iterator = val_generator.flow_from_dataframe(
test,
x_col='images',
y_col='label',
target_size=(128,128),
batch_size=512,
class_mode='binary'
)
Model Creation
from keras import Sequential
from keras.layers import Conv2D, MaxPool2D, Flatten, Dense
model = Sequential([
Conv2D(16, (3,3), activation='relu',
input_shape=(128,128,3)),
MaxPool2D((2,2)),
Conv2D(32, (3,3), activation='relu'),
MaxPool2D((2,2)),
Conv2D(64, (3,3), activation='relu'),
MaxPool2D((2,2)),
Flatten(),
Dense(512, activation='relu'),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_12 (Conv2D) (None, 126, 126, 16) 448
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 63, 63, 16) 0
_________________________________________________________________
conv2d_13 (Conv2D) (None, 61, 61, 32) 4640
_________________________________________________________________
Epoch 1/10
40/40 [==============================] - 150s 4s/step - loss: 0.8679 - accuracy:
0.5187 - val_loss: 0.6399 - val_accuracy: 0.6238
Epoch 2/10
40/40 [==============================] - 147s 4s/step - loss: 0.6280 - accuracy:
0.6416 - val_loss: 0.5672 - val_accuracy: 0.7024
Epoch 3/10
40/40 [==============================] - 146s 4s/step - loss: 0.5737 - accuracy:
0.6980 - val_loss: 0.5493 - val_accuracy: 0.7148
Epoch 4/10
40/40 [==============================] - 146s 4s/step - loss: 0.5478 - accuracy:
0.7221 - val_loss: 0.5351 - val_accuracy: 0.7356
Epoch 5/10
40/40 [==============================] - 145s 4s/step - loss: 0.5276 - accuracy:
0.7338 - val_loss: 0.5104 - val_accuracy: 0.7494
Epoch 6/10
40/40 [==============================] - 144s 4s/step - loss: 0.5127 - accuracy:
0.7405 - val_loss: 0.4853 - val_accuracy: 0.7664
Epoch 7/10
40/40 [==============================] - 144s 4s/step - loss: 0.5059 - accuracy:
0.7544 - val_loss: 0.4586 - val_accuracy: 0.7868
Epoch 8/10
40/40 [==============================] - 143s 4s/step - loss: 0.4842 - accuracy:
0.7644 - val_loss: 0.5054 - val_accuracy: 0.7510
Epoch 9/10
40/40 [==============================] - 143s 4s/step - loss: 0.4971 - accuracy:
0.7530 - val_loss: 0.4647 - val_accuracy: 0.7894
Epoch 10/10
40/40 [==============================] - 142s 4s/step - loss: 0.4642 - accuracy:
0.7770 - val_loss: 0.4711 - val_accuracy: 0.7782
Visualization of Results
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
epochs = range(len(acc))
loss = history.history['loss']
val_loss = history.history['val_loss']
plt.plot(epochs, loss, 'b', label='Training Loss')
plt.plot(epochs, val_loss, 'r', label='Validation Loss')
plt.title('Loss Graph')
plt.legend()
plt.show()