ML LAB PROGRAMS
ML LAB PROGRAMS
https://www.risekrishnasaiprakasam.edu.in/naac23-24/
Files-23-24/joy%20of%20Computing%20using
%20python_compressed.pdf
https://github.com/AbhishekMali21/VTU-CSE-LAB-SOLUTIONS/blob/
master/7th%20SEM/MACHINE%20LEARNING%20LABORATORY/2-
Candidate-Elimination%20%20Algorithm/LAB%202.ipynb
Find - S Algorithm
Introduction :
The find-S algorithm is a basic concept learning algorithm in machine learning.
The find-S algorithm finds the most specific hypothesis that fits all the positive
examples. We have to note here that the algorithm considers only those positive
training example. The find-S algorithm starts with the most specific hypothesis
and generalizes this hypothesis each time it fails to classify an observed positive
training data. Hence, the Find-S algorithm moves from the most specific
hypothesis to the most general hypothesis.
Important Representation :
Take the next example and if it is negative, then no changes occur to the
hypothesis.
If the example is positive and we find that our initial hypothesis is too specific
then we update our current hypothesis to a general condition.
Keep repeating the above steps till all the training examples are complete.
After we have completed all the training examples we will have the final
hypothesis when can use to classify the new examples.
Example :
Consider the following data set having the data about which particular seeds are
poisonous.
Consider example 1 :
Consider example 2 :
Here we see that this example has a negative outcome. Hence we neglect this
example and our hypothesis remains the same.
h = { GREEN, HARD, NO, WRINKLED }
Consider example 3 :
Here we see that this example has a negative outcome. Hence we neglect this
example and our hypothesis remains the same.
h = { GREEN, HARD, NO, WRINKLED }
Consider example 4 :
The data present in example 4 is { ORANGE, HARD, NO, WRINKLED }. We
compare every single attribute with the initial data and if any mismatch is found
we replace that particular attribute with a general case ( ” ? ” ). After doing the
process the hypothesis becomes :
h = { ?, HARD, NO, WRINKLED }
Consider example 5 :
The data present in example 5 is { GREEN, SOFT, YES, SMOOTH }. We
compare every single attribute with the initial data and if any mismatch is found
we replace that particular attribute with a general case ( ” ? ” ). After doing the
process the hypothesis becomes :
h = { ?, ?, ?, ? }
Since we have reached a point where all the attributes in our hypothesis have
the general condition, example 6 and example 7 would result in the same
hypothesizes with all general attributes.
h = { ?, ?, ?, ? }
Hence, for the given data the final hypothesis would be :
Final Hyposthesis: h = { ?, ?, ?, ? }
Algorithm :
Then do nothing
3. Output hypothesis h
DATA SET
https://github.com/AbhishekMali21/VTU-CSE-LAB-SOLUTIONS/blob/
master/7th%20SEM/MACHINE%20LEARNING%20LABORATORY/1-Find-
S%20Algorithm/LAB%201.ipynb
https://www.edureka.co/blog/find-s-algorithm-in-machine-learning/
import pandas as pd
import numpy as np
data = pd.read_csv("trainingdata.csv")
print(data,"n")
d = np.array(data)[:,:-1]
target = np.array(data)[:,-1]
def train(c,t):
if val == "Yes":
specific_hypothesis = c[i].copy()
break
if t[i] == "Yes":
for x in range(len(specific_hypothesis)):
if val[x] != specific_hypothesis[x]:
specific_hypothesis[x] = '?'
else:
pass
return specific_hypothesis
#obtaining the final hypothesis
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('trainingdata.csv'))
print(data)
concepts = np.array(data.iloc[:,0:-1])
print(concepts)
print(target)
'''
Arguments:
'''
# .copy() makes sure a new list is created instead of just pointing to the same
memory location
specific_h = concepts[0].copy()
print(specific_h)
#print(h)
print(general_h)
# The learning iterations
for i, h in enumerate(concepts):
if target[i] == "Yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
if target[i] == "No":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(specific_h)
print(general_h)
# find indices where we have empty rows, meaning those that are unchanged
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
• Remove s from S
• Remove from S any hypothesis that is more general than another hypothesis in
S
If d is a negative example
• Remove g from G
• Remove from G any hypothesis that is less general than another hypothesis in
G
For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the
set of all hypotheses consistent with the training examples.
PROGRAM
https://github.com/AbhishekMali21/VTU-CSE-LAB-SOLUTIONS/blob/
master/7th%20SEM/MACHINE%20LEARNING%20LABORATORY/2-
Candidate-Elimination%20%20Algorithm/LAB%202.ipynb
DECISION TREE
A decision tree is a supervised learning algorithm used for
both classification and regression problems. It is represented as a tree structure
where each internal node represents a test on an attribute,
each branch represents the outcome of the test, and each leaf node represents a
class label or a predicted value. The goal of a decision tree is to split the dataset
into subsets based on the value of an attribute, repeating this process until each
subset contains only instances that belong to a single class or have similar
values.
Decision Tree Terminologies
Root Node: Root node is from where the decision tree starts. It represents
the entire dataset, which further gets divided into two or more homogeneous
sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
Parent/Child node: The root node of the tree is called the parent node, and
other nodes are called the child nodes.
How does the Decision Tree algorithm Work?
In a decision tree, for predicting the class of the given dataset, the algorithm
starts from the root node of the tree. This algorithm compares the values of root
attribute with the record (real dataset) attribute and, based on the comparison,
follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the
other sub-nodes and move further. It continues the process until it reaches the
leaf node of the tree. The complete process can be better understood using the
below algorithm:
o Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the
best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as a
leaf node.
EXAMPLE:
https://github.com/AbhishekMali21/VTU-CSE-LAB-SOLUTIONS/blob/
master/7th%20SEM/MACHINE%20LEARNING%20LABORATORY/3-
ID3%20Algorithm/LAB%203.ipynb
PROGRAM
import numpy as np
import math
import csv
def read_data(filename):
headers = next(datareader)
metadata = []
traindata = []
metadata.append(name)
traindata.append(row)
class Node:
self.attribute = attribute
self.children = []
self.answer = ""
def __str__(self):
return self.attribute
dict = {}
for x in range(items.shape[0]):
for y in range(data.shape[0]):
count[x] += 1
for x in range(items.shape[0]):
pos = 0
for y in range(data.shape[0]):
dict[items[x]][pos] = data[y]
pos += 1
if delete:
def entropy(S):
items = np.unique(S)
if items.size == 1:
return 0
sums = 0
for x in range(items.shape[0]):
return sums
total_size = data.shape[0]
for x in range(items.shape[0]):
for x in range(entropies.shape[0]):
total_entropy -= entropies[x]
return total_entropy / iv
if (np.unique(data[:, -1])).shape[0] == 1:
node = Node("")
return node
split = np.argmax(gains)
node = Node(metadata[split])
for x in range(items.shape[0]):
node.children.append((items[x], child))
return node
def empty(size):
s = ""
for x in range(size):
s += " "
return s
if node.answer != "":
print(empty(level), node.answer)
return
print(empty(level), node.attribute)
print_tree(n, level + 2)
data = np.array(traindata)
print_tree(node, 0)
OUTPUT:
Experiment-4:
Exercises to solve the real-world problems using the following machine
learning methods: a) Linear Regression b) Logistic Regression c) Binary
Classifier
Linear Regression
std_marks.csv
Program code:
import pandas as pd
data=pd.read_csv(“std_marks.csv")
print(data.head())
dim=data.shape
print(data.describe())
print(data.corr())
x_set=data[['internal']]
print(x_set.head())
y_set=data[['external']]
print(y_set.head())
model=linear_model.LinearRegression()
model.fit(x_train,y_train)
y_pred=model.predict(x_test)
y_preds=[]
for i in y_pred:
y_preds.append(float(i))
print(y_preds)
plt.ylabel('y value')
plt.xlabel('x value')
plt.legend(loc='best')
plt.show()
Program code:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
data=pd.read_csv(“std_marks.csv")
print(data.head())
dim=data.shape
print(data.describe())
print('Correlation matrix of the data set is:')
print(data.corr())
class_lbls=data['target'].unique()
class_labels=[]
for x in class_lbls:
class_labels.append(str(x))
print(class_labels)
sns.countplot(data['target'])
col_names=data.columns
feature_names=col_names[:-1]
feature_names=list(feature_names)
print(feature_names)
print(x_set.head())
y_set=data[['target']]
print(y_set.head())
scaler=StandardScaler()
x_train=scaler.transform(x_train)
model = LogisticRegression()
model.fit(x_train, y_train)
x_test=scaler.transform(x_test)
y_pred=model.predict(x_test)
print(y_pred)
print("Accuracy:",accuracy_score(y_test, y_pred))
print("Precision:",precision_score(y_test, y_pred))
print("Recall:",recall_score(y_test, y_pred))
print(classification_report(y_test,y_pred,target_names=class_labels))
cm=confusion_matrix(y_test,y_pred)
df_cm.index.name = 'Actual'
df_cm.columns.name = 'Predicted'
sns.set(font_scale=1.5)
sns.heatmap(df_cm, annot=True,cmap="Blues",fmt='d')
output :
Write a program to implement k-Nearest Neighbor algorithm to classify
the iris data set. Print both correct and wrong predictions.
import numpy as np
import pandas as pd
iris = load_iris()
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
k = 5 # Number of neighbors
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
# Make predictions
y_pred = knn.predict(X_test)
# Calculate accuracy
print("Correct Predictions:")
for i in range(len(y_test)):
if y_pred[i] == y_test[i]:
print("\nWrong Predictions:")
for i in range(len(y_test)):
if y_pred[i] != y_test[i]:
output :