0% found this document useful (0 votes)
4 views

ML LAB PROGRAMS

Uploaded by

228a1a05h3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

ML LAB PROGRAMS

Uploaded by

228a1a05h3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 42

ML LAB PROGRAMS

https://www.risekrishnasaiprakasam.edu.in/naac23-24/
Files-23-24/joy%20of%20Computing%20using
%20python_compressed.pdf
https://github.com/AbhishekMali21/VTU-CSE-LAB-SOLUTIONS/blob/
master/7th%20SEM/MACHINE%20LEARNING%20LABORATORY/2-
Candidate-Elimination%20%20Algorithm/LAB%202.ipynb

Experiment-1: Implement and demonstrate the FIND-S algorithm for finding


the most specific hypothesis based on a given set of training data samples. Read
the training data from a .CSV file.

Find - S Algorithm

Introduction :
The find-S algorithm is a basic concept learning algorithm in machine learning.
The find-S algorithm finds the most specific hypothesis that fits all the positive
examples. We have to note here that the algorithm considers only those positive
training example. The find-S algorithm starts with the most specific hypothesis
and generalizes this hypothesis each time it fails to classify an observed positive
training data. Hence, the Find-S algorithm moves from the most specific
hypothesis to the most general hypothesis.

Important Representation :

? indicates that any value is acceptable for the attribute.

specify a single required value ( e.g., Cold ) for the attribute.

Φ indicates that no value is acceptable.

The most general hypothesis is represented by: {?, ?, ?, ?, ?, ?}

The most specific hypothesis is represented by: {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}

Steps Involved In Find-S :

Start with the most specific hypothesis.


h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}

Take the next example and if it is negative, then no changes occur to the
hypothesis.

If the example is positive and we find that our initial hypothesis is too specific
then we update our current hypothesis to a general condition.

Keep repeating the above steps till all the training examples are complete.
After we have completed all the training examples we will have the final
hypothesis when can use to classify the new examples.

Example :

Consider the following data set having the data about which particular seeds are
poisonous.

First, we consider the hypothesis to be a more specific hypothesis. Hence, our


hypothesis would be :
h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}

Consider example 1 :

The data in example 1 is { GREEN, HARD, NO, WRINKLED }. We see that


our initial hypothesis is more specific and we have to generalize it for this
example. Hence, the hypothesis becomes :
h = { GREEN, HARD, NO, WRINKLED }

Consider example 2 :
Here we see that this example has a negative outcome. Hence we neglect this
example and our hypothesis remains the same.
h = { GREEN, HARD, NO, WRINKLED }

Consider example 3 :
Here we see that this example has a negative outcome. Hence we neglect this
example and our hypothesis remains the same.
h = { GREEN, HARD, NO, WRINKLED }

Consider example 4 :
The data present in example 4 is { ORANGE, HARD, NO, WRINKLED }. We
compare every single attribute with the initial data and if any mismatch is found
we replace that particular attribute with a general case ( ” ? ” ). After doing the
process the hypothesis becomes :
h = { ?, HARD, NO, WRINKLED }

Consider example 5 :
The data present in example 5 is { GREEN, SOFT, YES, SMOOTH }. We
compare every single attribute with the initial data and if any mismatch is found
we replace that particular attribute with a general case ( ” ? ” ). After doing the
process the hypothesis becomes :
h = { ?, ?, ?, ? }
Since we have reached a point where all the attributes in our hypothesis have
the general condition, example 6 and example 7 would result in the same
hypothesizes with all general attributes.
h = { ?, ?, ?, ? }
Hence, for the given data the final hypothesis would be :
Final Hyposthesis: h = { ?, ?, ?, ? }

Algorithm :

1. Initialize h to the most specific hypothesis in H

2. For each positive training instance x

For each attribute constraint a, in h

If the constraint a, is satisfied by x

Then do nothing

Else replace a, in h by the next more general constraint that is satisfied by x

3. Output hypothesis h

DATA SET

airTe humidi win wate foreca enjoySp


sky mp ty d r st ort

Sunn Stro War


y Warm Normal ng m Same Yes

Sunn Stro War


y Warm High ng m Same Yes

Rain Stro War Chang


y Cold High ng m e No

Sunn Warm High Stro Cool Chang Yes


y ng e
PROGRAM:

https://github.com/AbhishekMali21/VTU-CSE-LAB-SOLUTIONS/blob/
master/7th%20SEM/MACHINE%20LEARNING%20LABORATORY/1-Find-
S%20Algorithm/LAB%201.ipynb

https://www.edureka.co/blog/find-s-algorithm-in-machine-learning/

import pandas as pd

import numpy as np

#to read the data in the csv file

data = pd.read_csv("trainingdata.csv")

print(data,"n")

#making an array of all the attributes

d = np.array(data)[:,:-1]

print("n The attributes are: ",d)


#segragating the target that has positive and negative examples

target = np.array(data)[:,-1]

print("n The target is: ",target)

#training function to implement find-s algorithm

def train(c,t):

for i, val in enumerate(t):

if val == "Yes":

specific_hypothesis = c[i].copy()

break

for i, val in enumerate(c):

if t[i] == "Yes":

for x in range(len(specific_hypothesis)):

if val[x] != specific_hypothesis[x]:

specific_hypothesis[x] = '?'

else:

pass

return specific_hypothesis
#obtaining the final hypothesis

print("n The final hypothesis is:",train(d,target))

import numpy as np

import pandas as pd

# Loading Data from a CSV File

data = pd.DataFrame(data=pd.read_csv('trainingdata.csv'))

print(data)

# Separating concept features from Target

concepts = np.array(data.iloc[:,0:-1])

print(concepts)

# Isolating target into a separate DataFrame

# copying last column to target array


target = np.array(data.iloc[:,-1])

print(target)

def learn(concepts, target):

'''

learn() function implements the learning method of the Candidate elimination


algorithm.

Arguments:

concepts - a data frame with all the features

target - a data frame with corresponding output values

'''

# Initialise S0 with the first instance from concepts

# .copy() makes sure a new list is created instead of just pointing to the same
memory location

specific_h = concepts[0].copy()

print("\nInitialization of specific_h and general_h")

print(specific_h)

#h=["#" for i in range(0,5)]

#print(h)

general_h = [["?" for i in range(len(specific_h))] for i in


range(len(specific_h))]

print(general_h)
# The learning iterations

for i, h in enumerate(concepts):

# Checking if the hypothesis has a positive target

if target[i] == "Yes":

for x in range(len(specific_h)):

# Change values in S & G only if values change

if h[x] != specific_h[x]:

specific_h[x] = '?'

general_h[x][x] = '?'

# Checking if the hypothesis has a positive target

if target[i] == "No":

for x in range(len(specific_h)):

# For negative hyposthesis change values only in G

if h[x] != specific_h[x]:

general_h[x][x] = specific_h[x]

else:

general_h[x][x] = '?'

print("\nSteps of Candidate Elimination Algorithm",i+1)

print(specific_h)
print(general_h)

# find indices where we have empty rows, meaning those that are unchanged

indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]

for i in indices:

# remove those rows from general_h

general_h.remove(['?', '?', '?', '?', '?', '?'])

# Return final values

return specific_h, general_h

s_final, g_final = learn(concepts, target)

print("\nFinal Specific_h:", s_final, sep="\n")

print("\nFinal General_h:", g_final, sep="\n")

CANDIDATE-ELIMINATION Learning Algorithm

The CANDIDATE-ELIMINTION algorithm computes the version space


containing all hypotheses from H that are consistent with an observed
sequence of training examples.

Initialize G to the set of maximally general hypotheses in H

Initialize S to the set of maximally specific hypotheses in H

For each training example d, do


• If d is a positive example

• Remove from G any hypothesis inconsistent with d

• For each hypothesis s in S that is not consistent with d

• Remove s from S

• Add to S all minimal generalizations h of s such that

• h is consistent with d, and some member of G is more general than h

• Remove from S any hypothesis that is more general than another hypothesis in
S

If d is a negative example

• Remove from S any hypothesis inconsistent with d

• For each hypothesis g in G that is not consistent with d

• Remove g from G

• Add to G all minimal specializations h of g such that

• h is consistent with d, and some member of S is more specific than h

• Remove from G any hypothesis that is less general than another hypothesis in
G

For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the
set of all hypotheses consistent with the training examples.

Candidate Elimination Algorithm


The candidate elimination algorithm incrementally builds the version space
given a hypothesis space H and a set E of examples. The examples are added
one by one; each example possibly shrinks the version space by removing the
hypotheses that are inconsistent with the example. The candidate elimination
algorithm does this by updating the general and specific boundary for each new
example.

PROGRAM

https://github.com/AbhishekMali21/VTU-CSE-LAB-SOLUTIONS/blob/
master/7th%20SEM/MACHINE%20LEARNING%20LABORATORY/2-
Candidate-Elimination%20%20Algorithm/LAB%202.ipynb

DECISION TREE
A decision tree is a supervised learning algorithm used for
both classification and regression problems. It is represented as a tree structure
where each internal node represents a test on an attribute,
each branch represents the outcome of the test, and each leaf node represents a
class label or a predicted value. The goal of a decision tree is to split the dataset
into subsets based on the value of an attribute, repeating this process until each
subset contains only instances that belong to a single class or have similar
values.
Decision Tree Terminologies

 Root Node: Root node is from where the decision tree starts. It represents
the entire dataset, which further gets divided into two or more homogeneous
sets.

 Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.

 Splitting: Splitting is the process of dividing the decision node/root node


into sub-nodes according to the given conditions.

 Branch/Sub Tree: A tree formed by splitting the tree.

 Pruning: Pruning is the process of removing the unwanted branches from


the tree.

 Parent/Child node: The root node of the tree is called the parent node, and
other nodes are called the child nodes.
How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm
starts from the root node of the tree. This algorithm compares the values of root
attribute with the record (real dataset) attribute and, based on the comparison,
follows the branch and jumps to the next node.

For the next node, the algorithm again compares the attribute value with the
other sub-nodes and move further. It continues the process until it reaches the
leaf node of the tree. The complete process can be better understood using the
below algorithm:

o Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.

o Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).

o Step-3: Divide the S into subsets that contains possible values for the
best attributes.

o Step-4: Generate the decision tree node, which contains the best attribute.

o Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as a
leaf node.
EXAMPLE:
https://github.com/AbhishekMali21/VTU-CSE-LAB-SOLUTIONS/blob/
master/7th%20SEM/MACHINE%20LEARNING%20LABORATORY/3-
ID3%20Algorithm/LAB%203.ipynb

PROGRAM
import numpy as np

import math

import csv

def read_data(filename):

with open(filename, 'r') as csvfile:

datareader = csv.reader(csvfile, delimiter=',')

headers = next(datareader)

metadata = []

traindata = []

for name in headers:

metadata.append(name)

for row in datareader:

traindata.append(row)

return (metadata, traindata)

class Node:

def __init__(self, attribute):

self.attribute = attribute
self.children = []

self.answer = ""

def __str__(self):

return self.attribute

def subtables(data, col, delete):

dict = {}

items = np.unique(data[:, col])

count = np.zeros((items.shape[0], 1), dtype=np.int32)

for x in range(items.shape[0]):

for y in range(data.shape[0]):

if data[y, col] == items[x]:

count[x] += 1

for x in range(items.shape[0]):

dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")

pos = 0

for y in range(data.shape[0]):

if data[y, col] == items[x]:

dict[items[x]][pos] = data[y]

pos += 1

if delete:

dict[items[x]] = np.delete(dict[items[x]], col, 1)


return items, dict

def entropy(S):

items = np.unique(S)

if items.size == 1:

return 0

counts = np.zeros((items.shape[0], 1))

sums = 0

for x in range(items.shape[0]):

counts[x] = sum(S == items[x]) / (S.size * 1.0)

for count in counts:

sums += -1 * count * math.log(count, 2)

return sums

def gain_ratio(data, col):

items, dict = subtables(data, col, delete=False)

total_size = data.shape[0]

entropies = np.zeros((items.shape[0], 1))

intrinsic = np.zeros((items.shape[0], 1))

for x in range(items.shape[0]):

ratio = dict[items[x]].shape[0]/(total_size * 1.0)

entropies[x] = ratio * entropy(dict[items[x]][:, -1])

intrinsic[x] = ratio * math.log(ratio, 2)

total_entropy = entropy(data[:, -1])


iv = -1 * sum(intrinsic)

for x in range(entropies.shape[0]):

total_entropy -= entropies[x]

return total_entropy / iv

def create_node(data, metadata):

if (np.unique(data[:, -1])).shape[0] == 1:

node = Node("")

node.answer = np.unique(data[:, -1])[0]

return node

gains = np.zeros((data.shape[1] - 1, 1))

for col in range(data.shape[1] - 1):

gains[col] = gain_ratio(data, col)

split = np.argmax(gains)

node = Node(metadata[split])

metadata = np.delete(metadata, split, 0)

items, dict = subtables(data, split, delete=True)

for x in range(items.shape[0]):

child = create_node(dict[items[x]], metadata)

node.children.append((items[x], child))

return node

def empty(size):

s = ""
for x in range(size):

s += " "

return s

def print_tree(node, level):

if node.answer != "":

print(empty(level), node.answer)

return

print(empty(level), node.attribute)

for value, n in node.children:

print(empty(level + 1), value)

print_tree(n, level + 2)

metadata, traindata = read_data("tennisdata.csv")

data = np.array(traindata)

node = create_node(data, metadata)

print_tree(node, 0)

OUTPUT:

Experiment-4:
Exercises to solve the real-world problems using the following machine
learning methods: a) Linear Regression b) Logistic Regression c) Binary
Classifier

Linear Regression

Dataset: std_marks.csv-constructed on own by using students lab internal


and external marks.

std_marks.csv

sno internal external


1 30 50
2 25 35
3 19 43
4 25 45
5 30 50
6 23 48
7 21 42

Program code:

import pandas as pd

from sklearn import linear_model

import matplotlib.pyplot as plt

from sklearn.metrics import mean_squared_error

from sklearn.model_selection import train_test_split

data=pd.read_csv(“std_marks.csv")

print('First 5 rows of the data set are:')

print(data.head())

dim=data.shape

print('Dimensions of the data set are',dim)


print('Statistics of the data are:')

print(data.describe())

print('Correlation matrix of the data set is:')

print(data.corr())

x_set=data[['internal']]

print('First 5 rows of features set are:')

print(x_set.head())

y_set=data[['external']]

print('First 5 rows of features set are:')

print(y_set.head())

x_train,x_test, y_train, y_test = train_test_split(x_set,y_set, test_size = 0.3)

model=linear_model.LinearRegression()

model.fit(x_train,y_train)

print('Regression coefficient is',float(model.coef_))

print('Regression intercept is',float(model.intercept_))

y_pred=model.predict(x_test)

y_preds=[]

for i in y_pred:

y_preds.append(float(i))

print('Predicted values for test data are:')

print(y_preds)

print('mean squared error is ',mean_squared_error(y_test,y_pred))


plt.scatter(x_test,y_test,color='blue',label='actual y values')

plt.plot(x_test,y_pred,color='red',linewidth=3,label='predicted regression line')

plt.ylabel('y value')

plt.xlabel('x value')

plt.title('simple linear regression')

plt.legend(loc='best')

plt.show()

Output screen shots:


Logistic Regression

Program code:

import warnings

warnings.filterwarnings("ignore")

import pandas as pd

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import confusion_matrix

from sklearn.metrics import classification_report

from sklearn.metrics import accuracy_score

from sklearn.metrics import recall_score

from sklearn.metrics import precision_score

from sklearn.preprocessing import StandardScaler

data=pd.read_csv(“std_marks.csv")

print('The first 5 rows of the data set are:')

print(data.head())

dim=data.shape

print('Dimensions of the data set are',dim)

print('Statistics of the data are:')

print(data.describe())
print('Correlation matrix of the data set is:')

print(data.corr())

class_lbls=data['target'].unique()

class_labels=[]

for x in class_lbls:

class_labels.append(str(x))

print('Class labels are:')

print(class_labels)

sns.countplot(data['target'])

col_names=data.columns

feature_names=col_names[:-1]

feature_names=list(feature_names)

print('Feature names are:')

print(feature_names)

x_set = data.drop(['target'], axis=1)

print('First 5 rows of features set are:')

print(x_set.head())

y_set=data[['target']]

print('First 5 rows of features set are:')

print(y_set.head())

scaler=StandardScaler()

x_train,x_test, y_train, y_test = train_test_split(x_set,y_set, test_size = 0.3)


scaler.fit(x_train)

x_train=scaler.transform(x_train)

model = LogisticRegression()

model.fit(x_train, y_train)

x_test=scaler.transform(x_test)

y_pred=model.predict(x_test)

print('Predicted class labels for test data are:')

print(y_pred)

print("Accuracy:",accuracy_score(y_test, y_pred))

print("Precision:",precision_score(y_test, y_pred))

print("Recall:",recall_score(y_test, y_pred))

print(classification_report(y_test,y_pred,target_names=class_labels))

cm=confusion_matrix(y_test,y_pred)

df_cm = pd.DataFrame(cm, columns=class_labels, index = class_labels)

df_cm.index.name = 'Actual'

df_cm.columns.name = 'Predicted'

sns.set(font_scale=1.5)

sns.heatmap(df_cm, annot=True,cmap="Blues",fmt='d')

output :
Write a program to implement k-Nearest Neighbor algorithm to classify
the iris data set. Print both correct and wrong predictions.

import numpy as np

import pandas as pd

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score

# Load the Iris dataset

iris = load_iris()

X = iris.data # All features

y = iris.target # Target classes

# Splitting dataset into training (80%) and testing (20%) sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

# Standardize the features (Optional but improves performance)

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Train the k-NN Classifier

k = 5 # Number of neighbors
knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(X_train, y_train)

# Make predictions

y_pred = knn.predict(X_test)

# Calculate accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f"Model Accuracy: {accuracy:.2f}\n")

# Print Correct and Wrong Predictions

print("Correct Predictions:")

for i in range(len(y_test)):

if y_pred[i] == y_test[i]:

print(f"Sample {i}: Predicted = {iris.target_names[y_pred[i]]}, Actual =


{iris.target_names[y_test[i]]}")

print("\nWrong Predictions:")

for i in range(len(y_test)):

if y_pred[i] != y_test[i]:

print(f"Sample {i}: Predicted = {iris.target_names[y_pred[i]]}, Actual =


{iris.target_names[y_test[i]]}")

output :

You might also like