0% found this document useful (0 votes)
6 views9 pages

ML Lab Assignment2

The document details an ML lab assignment focused on implementing the Find-S and Candidate Elimination algorithms using a dataset related to Covid-19. It includes steps for building and preprocessing the dataset, fitting the models, making predictions, and evaluating performance metrics such as average precision and F1 scores. The results indicate the effectiveness of both algorithms in predicting Covid-19 outcomes based on various features.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views9 pages

ML Lab Assignment2

The document details an ML lab assignment focused on implementing the Find-S and Candidate Elimination algorithms using a dataset related to Covid-19. It includes steps for building and preprocessing the dataset, fitting the models, making predictions, and evaluating performance metrics such as average precision and F1 scores. The results indicate the effectiveness of both algorithms in predicting Covid-19 outcomes based on various features.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

09/09/2020 2175052_Safir_ML_Lab2

ML Lab Assignment 2

Team Members
Safir Motiwala (2175052)
Rajendra Kelwa (2175048)

Topic : Find-S Algorithm and Candidate Elimination ALgorithm

Building the DataSet

In [14]:

fields = ['State', 'Climate', 'Age', 'Body_Shape', 'Immune_Strength', 'Covid_19'


]
state = ['Maharashtra', 'Karnataka']
Age = ['Young', 'Old']
Immune_Strength = ['Weak', 'High']
Covid_19 = ['Yes', 'No']

import csv
import random

filename = 'Covid-19_Data.csv'
with open(filename, 'w') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerow(fields)
row = [random.choice(state), 'Sunny', random.choice(Age), 'Under-weight', ra
ndom.choice(Immune_Strength), 'Yes']
csvwriter.writerow(row)
for i in range(2):
row = [random.choice(state), 'Sunny', random.choice(Age), 'Under-weight'
, random.choice(Immune_Strength), 'No']
csvwriter.writerow(row)
for i in range(0, 390):
row = [random.choice(state), 'Sunny', random.choice(Age), 'Under-weight'
, random.choice(Immune_Strength), random.choice(Covid_19)]
csvwriter.writerow(row)
for i in range(0,10):
row = [random.choice(state), 'Winter', random.choice(Age), 'Over-weight'
, random.choice(Immune_Strength), random.choice(Covid_19)]
csvwriter.writerow(row)

Importing the Dataset and Preprocessing

localhost:8889/nbconvert/html/Desktop/ML_Lab/2175052_Safir_ML_Lab2.ipynb?download=false 1/9
09/09/2020 2175052_Safir_ML_Lab2

In [2]:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('Covid-19_Data.csv')

result = {'Yes':1, 'No':0}


dataset['Covid_19'] = dataset['Covid_19'].map(result)

X = dataset.iloc[:, 0:5].values
y = dataset.iloc[:, -1].values

from sklearn.model_selection import KFold


kf = KFold(n_splits=10)
#print(kf.get_n_splits(X))

for train_index, test_index in kf.split(X,y):


X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]

In [3]:

print(dataset.head())

State Climate Age Body_Shape Immune_Strength Covid_1


9
0 Maharashtra Sunny Young Under-weight Weak
1
1 Maharashtra Sunny Young Under-weight Weak
0
2 Karnataka Sunny Old Under-weight Weak
0
3 Maharashtra Sunny Old Under-weight High
1
4 Maharashtra Sunny Young Under-weight Weak
0

Writing the Code for Find-S Algorithm

localhost:8889/nbconvert/html/Desktop/ML_Lab/2175052_Safir_ML_Lab2.ipynb?download=false 2/9
09/09/2020 2175052_Safir_ML_Lab2

In [4]:

class FindS:
def __init__(self):
self.Xtrain = ""
self.ytrain = ""
self.Xtest = ""
self.ytest = ""
self.specific_hypothesis = []

def fit(self, X, y):


count = 0
self.Xtrain, self.ytrain = X, y
for i, val in enumerate(y):
if val==1:
S_hypothesis = list(X[0].copy())
print("Initial : ", S_hypothesis)
break
for i, val in enumerate(X):
if y[i]==1:
count+=1
for x in range(len(S_hypothesis)):
if val[x] == S_hypothesis[x]:
pass
else:
S_hypothesis[x] = '?'
print("Total Positive Records : ", count)
self.specific_hypothesis = S_hypothesis
return S_hypothesis

def predict(self, X):


y = np.array([0 for i in range(len(X))])
self.Xtest = X
for i, val in enumerate(X):
val = list(val)
check = 0
for x in range(len(val)):
if val[x] == self.specific_hypothesis[x]:
check+=1
else:
pass
if check>0:
y[i] = 1
else:
y[i] = 0
self.ytest = y
return y

Fitting the Training Set to the Model

In [5]:

from FindSAlgorithm import FindS


fs = FindS()
S_hypothesis = fs.fit(X_train, y_train)

Initial : ['Maharashtra', 'Sunny', 'Young', 'Under-weight', 'Weak']


Total Positive Records : 180

localhost:8889/nbconvert/html/Desktop/ML_Lab/2175052_Safir_ML_Lab2.ipynb?download=false 3/9
09/09/2020 2175052_Safir_ML_Lab2

In [6]:

print("Specific Hypothesis : ", S_hypothesis)

Specific Hypothesis : ['?', 'Sunny', '?', 'Under-weight', '?']

Predicting the Test Set Results

In [7]:

y_pred = fs.predict(X_test)
y_pred1 = fs.predict(X_train)

In [8]:

print(y_pred)

[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
0 0 0
0 0 0]

Finding the Average Precision Score

In [9]:

from sklearn.metrics import accuracy_score, average_precision_score, f1_score

Avg1 = average_precision_score(y_train, y_pred1, average='micro', pos_label=1)


Avg2 = average_precision_score(y_test, y_pred, average='micro', pos_label=1)

print("Average Precision Score (Training Data) : ", Avg1)


print("Average Precision Score (Test Data) : ", Avg2)

Average Precision Score (Training Data) : 0.5


Average Precision Score (Test Data) : 0.33809523809523806

Finding the F-1 Score

In [10]:

FS1 = f1_score(y_train, y_pred1, average='binary', pos_label=1)


FS2 = f1_score(y_test, y_pred, average='binary', pos_label=1)

print("F-1 Score (Training Data) : ", FS1)


print("F-1 Score (Test Data) : ", FS2)

F-1 Score (Training Data) : 0.6666666666666666


F-1 Score (Test Data) : 0.4545454545454545

Plotting Area Under the Curve

localhost:8889/nbconvert/html/Desktop/ML_Lab/2175052_Safir_ML_Lab2.ipynb?download=false 4/9
09/09/2020 2175052_Safir_ML_Lab2

In [11]:

from sklearn import metrics


fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred, pos_label=1)
auc = metrics.auc(fpr, tpr)
print("Area under the Curve : ", auc)

Area under the Curve : 0.4725274725274725

In [12]:

from sklearn.metrics import roc_auc_score, roc_curve


ns_probs = [0 for _ in range(len(y_test))]
lr_probs = y_pred

ns_auc = roc_auc_score(y_test, ns_probs)


lr_auc = roc_auc_score(y_test, lr_probs)

print('No Skill: ROC AUC=%.3f' % (ns_auc))


print('Logistic: ROC AUC=%.3f' % (lr_auc))

ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)


lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)
plt.plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')
plt.plot(lr_fpr, lr_tpr, marker='.', label='Logistic')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend()
plt.show()

No Skill: ROC AUC=0.500


Logistic: ROC AUC=0.473

Writing the Code for Candidate Elimination Algorithm

localhost:8889/nbconvert/html/Desktop/ML_Lab/2175052_Safir_ML_Lab2.ipynb?download=false 5/9
09/09/2020 2175052_Safir_ML_Lab2

In [13]:

class Candidate_Elimination:
def __init__(self):
self.Xtrain = ""
self.ytrain = ""
self.Xtest = ""
self.ytest = ""
self.specific_hypothesis = []
self.general_hypothesis = []
self.version_space = []

def fit(self, X, y):


count, count1 = 0, 0
self.Xtrain, self.ytrain = X, y
for i, val in enumerate(y):
if val==1:
S_hypothesis = list(X[0].copy())
G_hypothesis = [['?' for _ in range(len(S_hypothesis))]]
print("Initial Specific Hypothesis : ", S_hypothesis)
print("Initial General Hypothesis : ", G_hypothesis)
break
for i, val in enumerate(X):
if y[i]==1:
count+=1
for x in range(len(S_hypothesis)):
#print(G_hypothesis)
if val[x] == S_hypothesis[x]:
pass
else:
S_hypothesis[x] = '?'
temp = ['?' for _ in range(len(S_hypothesis))]
temp[x] = val[x]
#print(temp)
if temp not in G_hypothesis:
pass
else:
G_hypothesis.remove(temp)
#print("Removed")
elif y[i]==0:
count1+=1
temp = []
for x in range(len(S_hypothesis)):
if val[x] != S_hypothesis[x] and S_hypothesis[x] != '?':
temp = ['?' for _ in range(len(S_hypothesis))]
temp[x] = S_hypothesis[x]
if temp not in G_hypothesis:
G_hypothesis.append(temp)
temp = []

if len(G_hypothesis)==1:
v1 = [S_hypothesis]
else:
v1 = []
for i in G_hypothesis:
for j in range(len(i)):
if i[j] != '?':
h = i[j]
for z in range(len(S_hypothesis)):
if S_hypothesis[z] != '?':
temp = ['?' for _ in range(len(S_hypothesis))]
localhost:8889/nbconvert/html/Desktop/ML_Lab/2175052_Safir_ML_Lab2.ipynb?download=false 6/9
09/09/2020 2175052_Safir_ML_Lab2

temp[z] = S_hypothesis[z]
temp[j] = i[j]
v1.append(temp)

print('Final Specific Hypothesis : ', S_hypothesis)


print('Final General Hypothesis : ', G_hypothesis)
print('Final Version Space : ', v1)
print(count, " ", count1)
self.specific_hypothesis, self.general_hypothesis = S_hypothesis, G_hypo
thesis
self.version_space = v1
return 1

def predict(self, X):


y = np.array([0 for i in range(len(X))])
self.Xtest = X
for i, val in enumerate(X):
val = list(val)
check = 0
for x in range(len(val)):
for z in self.version_space:
if val[x] == z[x]:
check+=1
else:
pass
if check>0:
y[i] = 1
else:
y[i] = 0
self.ytest = y
return y

Fitting the Training Set to the Model

In [15]:

from CandidateEliminationAlgorithm import Candidate_Elimination


ce = Candidate_Elimination()
ce.fit(X_train, y_train)

Initial Specific Hypothesis : ['Maharashtra', 'Sunny', 'Young', 'Un


der-weight', 'Weak']
Initial General Hypothesis : [['?', '?', '?', '?', '?']]
Final Specific Hypothesis : ['?', 'Sunny', '?', 'Under-weight',
'?']
Final General Hypothesis : [['?', '?', '?', '?', '?']]
Final Version Space : [['?', 'Sunny', '?', 'Under-weight', '?']]
180 180

Out[15]:

Predicting the Test Set Results

localhost:8889/nbconvert/html/Desktop/ML_Lab/2175052_Safir_ML_Lab2.ipynb?download=false 7/9
09/09/2020 2175052_Safir_ML_Lab2

In [16]:

y_pred = ce.predict(X_test)
y_pred1 = ce.predict(X_train)

In [17]:

print(y_pred)

[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
0 0 0
0 0 0]

Finding the Average Precision Score

In [18]:

from sklearn.metrics import accuracy_score, average_precision_score, f1_score

Avg1 = average_precision_score(y_train, y_pred1, average='micro', pos_label=1)


Avg2 = average_precision_score(y_test, y_pred, average='micro', pos_label=1)

print("Average Precision Score (Training Data) : ", Avg1)


print("Average Precision Score (Test Data) : ", Avg2)

Average Precision Score (Training Data) : 0.5


Average Precision Score (Test Data) : 0.33809523809523806

Finding the F-1 Score

In [19]:

FS1 = f1_score(y_train, y_pred1, average='binary', pos_label=1)


FS2 = f1_score(y_test, y_pred, average='binary', pos_label=1)

print("F-1 Score (Training Data) : ", FS1)


print("F-1 Score (Test Data) : ", FS2)

F-1 Score (Training Data) : 0.6666666666666666


F-1 Score (Test Data) : 0.4545454545454545

Plotting the Area Under the Curve

In [20]:

from sklearn import metrics


fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred, pos_label=1)
auc = metrics.auc(fpr, tpr)
print("Area under the Curve : ", auc)

Area under the Curve : 0.4725274725274725

localhost:8889/nbconvert/html/Desktop/ML_Lab/2175052_Safir_ML_Lab2.ipynb?download=false 8/9
09/09/2020 2175052_Safir_ML_Lab2

In [21]:

from sklearn.metrics import roc_auc_score, roc_curve


ns_probs = [0 for _ in range(len(y_test))]
lr_probs = y_pred

ns_auc = roc_auc_score(y_test, ns_probs)


lr_auc = roc_auc_score(y_test, lr_probs)

print('No Skill: ROC AUC=%.3f' % (ns_auc))


print('Logistic: ROC AUC=%.3f' % (lr_auc))

ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)


lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)

plt.plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')


plt.plot(lr_fpr, lr_tpr, marker='.', label='Logistic')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend()
plt.show()

No Skill: ROC AUC=0.500


Logistic: ROC AUC=0.473

In [ ]:

localhost:8889/nbconvert/html/Desktop/ML_Lab/2175052_Safir_ML_Lab2.ipynb?download=false 9/9

You might also like