0% found this document useful (0 votes)
74 views

Fall Semester 2020-21 AI With Python ECE-4031

This document contains code for an AI with Python lab assignment submitted by Sejal Mittal. The code imports necessary libraries, reads income data from a file, encodes categorical features, splits the data into training and test sets, and trains and evaluates three classifiers - SVM, logistic regression, and naive Bayes. It calculates the F1 score for each classifier using cross validation and plots the results.

Uploaded by

sejal mittal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

Fall Semester 2020-21 AI With Python ECE-4031

This document contains code for an AI with Python lab assignment submitted by Sejal Mittal. The code imports necessary libraries, reads income data from a file, encodes categorical features, splits the data into training and test sets, and trains and evaluates three classifiers - SVM, logistic regression, and naive Bayes. It calculates the F1 score for each classifier using cross validation and plots the results.

Uploaded by

sejal mittal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Fall Semester 2020-21

AI with Python
ECE-4031

Lab Digital Assignment -2

Submitted To: Hemprasad Yashwant Patil


Name: Sejal Mittal
Reg No: 17BIS0011
Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
#from utilities import visualize_classifier
from sklearn.naive_bayes import GaussianNB
from sklearn import preprocessing from sklearn.svm
import LinearSVC from sklearn.multiclass
import OneVsOneClassifier
import warnings
warnings.filterwarnings("ignore")
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
import warnings
warnings.filterwarnings("ignore")
# Input file containing data
input_file = 'income_data.txt'

# Read the data


X
=
[]
y
=
[]
count_class1 = 0
count_class2 = 0
max_datapoints =
25000

with open(input_file, 'r') as f:


for line in f.readlines():
if count_class1 >= max_datapoints and count_class2 >= max_datapoints:
break

if '?' in line:
continue

data = line[:-1].split(', ')

if data[-1] == '<=50K' and count_class1 < max_datapoints:


X.append(data) count_class1 += 1

if data[-1] == '>50K' and count_class2 < max_datapoints:


X.append(data) count_class2 += 1
# Convert to numpy array
X = np.array(X)

# Convert string data to numerical data


label_encoder = [] X_encoded =
np.empty(X.shape)
for i,item in enumerate(X[0]):
if item.isdigit():
X_encoded[:, i] = X[:, i]
else:
label_encoder.append(preprocessing.LabelEncoder())
X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])

X = X_encoded[:, :-
1].astype(int) y = X_encoded[:, -
1].astype(int) # Create SVM
classifier

#classifier = OneVsOneClassifier(LinearSVC(random_state=0))

# Train the
classifier
#classifier.fit(X,
y) plt.figure()
# Cross validation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=5) w=[]
v=[]
#classifier svm classifier =
OneVsOneClassifier(LinearSVC(random_state=0)) f1 =
cross_val_score(classifier, X, y, scoring='f1_weighted', cv=3)
f1=round(100*f1.mean(), 2)
w.append(f1)
#logistic regression classifer
classifier=linear_model.LogisticRegression(solver="liblinear",C=100)
f1 = cross_val_score(classifier, X, y, scoring='f1_weighted', cv=3)
f1=round(100*f1.mean(), 2)
w.append(f1)

#naive bayes classifer classifier


=GaussianNB() classifier.fit(X_train,
y_train) y_test_pred =
classifier.predict(X_test) # Compute
the F1 score of the naive bayes
classifier f1 =
cross_val_score(classifier, X, y,
scoring='f1_weighted', cv=3)
f1=round(100*f1.mean(), 2)
w.append(f1)

v=['SVM','Logistic regression','Naive Bayes']


dash = '-' * 40

for i in range(4):
if i == 0: print(dash)
print('{:20s}{:>12s}'.format("Classifier name","F1 score"))
print(dash)
else:
print('{:<20s}{:>12.1f}'.format(v[i-1],w[i-1]))

index=np.arange(len(v))
plt.bar(index,w,color=['gainsboro','gainsboro','gainsboro'],edgecolor='blue')
plt.xlabel('Classifier')
plt.ylabel('F1 Score')
plt.xticks(index,v)
plt.show()

You might also like