0% found this document useful (0 votes)
51 views

ML Lab

The document discusses several machine learning algorithms and provides Python code examples to implement them. It includes code to solve the water jug problem, implement the FIND-S algorithm for hypothesis generation, use candidate elimination for concept learning, build an artificial neural network using backpropagation, implement naive Bayes classification on text and medical data, use k-means and EM clustering, implement k-nearest neighbors classification on the iris dataset, and locally weighted regression. The code examples demonstrate applying these algorithms to sample datasets and evaluating algorithm accuracy.

Uploaded by

Sharan Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

ML Lab

The document discusses several machine learning algorithms and provides Python code examples to implement them. It includes code to solve the water jug problem, implement the FIND-S algorithm for hypothesis generation, use candidate elimination for concept learning, build an artificial neural network using backpropagation, implement naive Bayes classification on text and medical data, use k-means and EM clustering, implement k-nearest neighbors classification on the iris dataset, and locally weighted regression. The code examples demonstrate applying these algorithms to sample datasets and evaluating algorithm accuracy.

Uploaded by

Sharan Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

# 1)Write a Program to Implement Water-Jug problem using Python.

x = 0
y = 0
m = 4
n = 3
print("Initial State = (0,0)")
print("Capacities = (4,3)")
print("Goal State = (2,y)")
while(x != 2):
r = int(input("Enter the rule: "))
if(r == 1):
x = m
elif(r == 2):
y = n
elif(r == 3):
x = 0
elif(r == 4):
y = 0
elif(r == 5):
t = n - y
y = n
x -= t
elif(r == 6):
t = m - x
x = m
y -= t
elif(r == 7):
y += x
x = 0
elif(r == 8):
x += y
y = 0
else:
print("Invalid Rule")
break
print(x, y)

# 3)Implement and demonstrate the FIND-S algorithm for finding the most
specific hypothesis based on a given set of training data samples. Read the
training data from a .CSV file.
import csv
num_attribute = 6
a = []
with open('enjoysport.csv', 'r') as file:
reader = csv.reader(file)
a = list(reader)
hypothesis = a[0][:-1]
for i in a:
if i[-1] == 'yes':
for j in range(num_attribute):
if i[j] != hypothesis[j]:
hypothesis[j] = '?'
print(hypothesis)
print("\nThe Maximally Specific Hypothesis for a given Training Examples :\n")
print(hypothesis)

# 4) For a given set of training data examples stored in a .CSV file, implement
and demonstrate the Candidate-Elimination algorithm to output a description of
the set of all hypotheses consistent with the training examples.
import csv
with open("trainingdata.csv") as f:
csv_file = csv.reader(f)
data = list(csv_file)
s = data[1][:-1]
print(s)
g = [['?' for i in range(len(s))] for j in range(len(s))]
print(g)
for i in data:
if i[-1] == "yes":
for j in range(len(s)):
if i[j] != s[j]:
s[j] = '?'
g[j][j] = '?'

elif i[-1] == "no":


for j in range(len(s)):
if i[j] != s[j]:
g[j][j] = s[j]
else:
g[j][j] = "?"
print(s)
print(g)
gh = []
for i in g:
for j in i:
if j != '?':
gh.append(i)
break
print("\nFinal specific hypothesis:\n", s)

print("\nFinal general hypothesis:\n", gh)

# 6) Build an Artificial Neural Network by implementing the Backpropagation


algorithm and test the same using appropriate data sets.
import numpy as np

X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)


y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X, axis=0) # maximum of X array longitudinally
y = y/100
def sigmoid(x):
return 1/(1 + np.exp(-x))

def derivatives_sigmoid(x):
return x * (1 - x)

epoch = 5
lr = 0.1

inputlayer_neurons = 2
hiddenlayer_neurons = 3
output_neurons = 1

wh = np.random.uniform(size=(inputlayer_neurons, hiddenlayer_neurons))
bh = np.random.uniform(size=(1, hiddenlayer_neurons))
wout = np.random.uniform(size=(hiddenlayer_neurons, output_neurons))
bout = np.random.uniform(size=(1, output_neurons))

for i in range(epoch):
hinp1 = np.dot(X, wh)
hinp = hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1 = np.dot(hlayer_act, wout)
outinp = outinp1+bout
output = sigmoid(outinp)

EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO * outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
d_hiddenlayer = EH * hiddengrad

wout += hlayer_act.T.dot(d_output) * lr
wh += X.T.dot(d_hiddenlayer) * lr

print("\n-----------Epoch-", i+1, "Starts----------")


print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n", output)
print("-----------Epoch-", i+1, "Ends----------\n")

# 7) Write a program to implement the naïve Bayesian classifier for a sample


training data set stored as a .CSV file. Compute the accuracy of the
classifier, considering few test data sets.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
iris = datasets.load_iris()
print("Features: ", iris.feature_names)
print("Labels: ", iris.target_names)
print(iris.data[0:5])
print(iris.target)
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target)
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))

# 8) Assuming a set of documents that need to be classified, use the naïve


Bayesian Classifier model to perform this task. Built-in Java classes/API can
be used to write the program. Calculate the accuracy, precision, and recall for
your data set.
from sklearn.metrics import accuracy_score, recall_score, precision_score,
confusion_matrix
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
import pandas as pd

msg = pd.read_csv('naivetext.csv', names=['message', 'label'])


print(msg)

msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})

x = msg.message
y = msg.labelnum

xtrain, xtest, ytrain, ytest = train_test_split(x, y)

count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm = count_vect.transform(xtest)

clf = MultinomialNB()
clf.fit(xtrain_dtm, ytrain)
predicted = clf.predict(xtest_dtm)

print('\n Accuracy of the classifer is',


accuracy_score(ytest, predicted))
print('\n Confusion matrix')
print(confusion_matrix(ytest, predicted))
print('\n The value of Precision', precision_score(ytest, predicted))
print('\n The value of Recall', recall_score(ytest, predicted))

##########VIVA##############
# The (train_test_split) function is for splitting a single dataset for two
different purposes: training and testing. The training subset is for building
your model. The testing subset is for using the model on unknown data to
evaluate the performance of the model.
# CountVectorizer is a great tool provided by the scikit-learn library in
Python. It is used to transform a given text into a vector on the basis of the
frequency (count) of each word that occurs in the entire text.

#9) Write a program to construct a Bayesian network considering medical data.


Use this model to demonstrate the diagnosis of heart patients using standard
Heart Disease Data Set. You can use Java/Python ML library classes/API
# OK
import pandas as pd
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianNetwork
from pgmpy.inference import VariableElimination

data = pd.read_csv('heart1.csv')

model = BayesianNetwork([('age', 'heartdisease'), ('sex', 'heartdisease'),


('exang', 'heartdisease'),
('cp', 'heartdisease'), ('restecg', 'heartdisease'),
('chol', 'heartdisease')])
model.fit(data, estimator=MaximumLikelihoodEstimator)

infer = VariableElimination(model)

q1 = infer.query(variables=['heartdisease'], evidence={'restecg': 1})


print(q1)
q2 = infer.query(variables=['heartdisease'], evidence={'cp': 2})
print(q2)

##################VIVA##########################
# .fit() basically is used to train model. Later this model can be used to make
predictions
#using .predict()

# 10) Apply EM algorithm to cluster a set of data stored in a .CSV file. Use
the same data set for clustering using k-Means algorithm. Compare the results
of these two algorithms and comment on the quality of clustering. You can add
Java/Python ML library classes/API in the program.
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.mixture import GaussianMixture
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataset = load_iris()
# print(dataset)

X = pd.DataFrame(dataset.data)
X.columns = ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width']
y = pd.DataFrame(dataset.target)
y.columns = ['Targets']
# print(X)

plt.figure(figsize=(14, 7))
colormap = np.array(['red', 'lime', 'black'])

# REAL PLOT
plt.subplot(1, 3, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets])
plt.title('Real')

# K-PLOT
plt.subplot(1, 3, 2)
model = KMeans(n_clusters=3)
model.fit(X)
predY = np.choose(model.labels_, [0, 1, 2])
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[predY])
plt.title('KMeans')

# GMM PLOT
scaler = StandardScaler()
scaler.fit(X)
xsa = scaler.transform(X)
xs = pd.DataFrame(xsa, columns=X.columns)
gmm = GaussianMixture(n_components=3)
gmm.fit(xs)

y_cluster_gmm = gmm.predict(xs)
plt.subplot(1, 3, 3)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y_cluster_gmm])
plt.title('GMM Classification')

plt.show()

# 11) Write a program to implement k-Nearest Neighbour algorithm toclassify the


iris data set. Print both correct and wrong predictions. Java/Python ML library
classes can be used for this problem.
# OK
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn import datasets

iris = datasets.load_iris()

x = iris.data
y = iris.target

print("Features: ", iris.feature_names)


print(x)
print("Labels: ", iris.target_names)
print(y)

x_train, x_test, y_train, y_test = train_test_split(x, y)

classifier = KNeighborsClassifier()
classifier.fit(x_train, y_train)

y_pred = classifier.predict(x_test)

print('Confusion Matrix')
print(confusion_matrix(y_test, y_pred))
print('Accuracy Metrics')
print(classification_report(y_test, y_pred))

# 12) Implement the non-parametric Locally Weighted Regression algorithm in


order to fit data points. Select appropriate data set for your experiment and
draw graphs
import numpy as np
import math
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
n = 100
xpoints = np.linspace(0, 2*math.pi, n).reshape(-1, 1)
ypoints = np.sin(xpoints)
linreg = LinearRegression()
linreg.fit(xpoints, ypoints)
prediction = linreg.predict(xpoints)
plt.scatter(xpoints, ypoints, color='red')
plt.plot(xpoints, prediction)
plt.show()

You might also like