0% found this document useful (0 votes)

8 views70 pages

ML Lab Manual

The document is a lab manual for a Machine Learning course at BVC College of Engineering, outlining various experiments to be conducted using Python. It includes instructions for implementing algorithms such as FIND-S, Candidate-Elimination, ID3 decision trees, and various regression techniques, along with the necessary code snippets. Each experiment aims to demonstrate key machine learning concepts and their applications using datasets in CSV format.

Uploaded by

cheetohunter516

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views70 pages

ML Lab Manual

Uploaded by

cheetohunter516

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

BVC College of Engineering, Palacharla,

IIIrd B.Tech II Semester CSE A&B Lab Manual

Subject: Machine Learning using Python Lab(R2032054)

S.NO LIST OF EXPERIMENTS

Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis based on a
1 given set of training data samples. Read the training data from a .CSV file.

For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate- Elimination algorithm to output a description of the set of all hypotheses consistent with the
2 training examples.

Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new sample.
3

Exercises to solve the real-world problems using the following machine learning methods:
a) Linear Regression
4
b) Logistic Regression
c) Binary Classifier.
5 Develop a program for Bias, Variance, Remove duplicates , Cross Validation

6 Write a program to implement Categorical Encoding, One-hot Encoding.

Build an Artificial Neural Network by implementing the Back propagation algorithm and test the same
7 using appropriate data sets.

Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set. Print both
8 correct and wrong predictions.

Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points.
9 Select appropriate data set for your experiment and draw graphs.

Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to
perform this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy,
10 precision, and recall for your data set.

Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for clustering using k-
Means algorithm. Compare the results of these two algorithms and comment on the quality of
11 clustering. You can add Java/Python ML library classes/API in the program.

12 Exploratory Data Analysis for Classification using Pandas or Matplotlib.

Write a Python program to construct a Bayesian network considering medical data. Use this model to
13 demonstrate the diagnosis of heart patients using standard Heart Disease Data Set.

14 Write a program to Implement Support Vector Machines and Principle Component Analysis.

15 Write a program to Implement Principle Component Analysis.

1
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment-1
Implement and demonstrate the FIND-S algorithm for finding the most
specific hypothesis based on a given set of training data samples. Read the
training data from a .CSV file.

Aim: Demonstration of FIND-S algorithm for finding the most specific

hypothesis.

Program:
import csv

with open('tennis.csv', 'r') as f:

Reader=csv.reader(f)

Your_list=list(Reader)

H = [['0', '0', '0', '0', '0']]

for i in Your_list:

print(i)

if i[-1] == "Yes" :

j=0

for x in i:

if x!="Yes" :

if x != h[0][j] and h[0][j] == '0':

h[0][j] = x

elif x != h[0][j] and h[0][j] != '0':

h[0][j] = '?'

else:

pass

j=j+1

print("Most specific hypothesis is")

2
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

print(h)

Output:
['Outlook', 'Temperature', 'Humidity', 'Wind', 'Play Tennis']
['Sunny', 'Hot', 'High', 'Weak', 'No']
['Sunny', 'Hot', 'High', 'Strong', 'No']
['Overcast', 'Hot', 'High', 'Weak', 'Yes']
['Rain', 'Mild', 'High', 'Weak', 'Yes']
['Rain', 'Cool', 'Normal', 'Weak', 'Yes']
['Rain', 'Cool', 'Normal', 'Strong', 'No']
['Overcast', 'Cool', 'Normal', 'Strong', 'Yes']
['Sunny', 'Mild', 'High', 'Weak', 'No']
['Sunny', 'Cool', 'Normal', 'Weak', 'Yes']
['Rain', 'Mild', 'Normal', 'Weak', 'Yes']
['Sunny', 'Mild', 'Normal', 'Strong', 'Yes']
['Overcast', 'Mild', 'High', 'Strong', 'Yes']
['Overcast', 'Hot', 'Normal', 'Weak', 'Yes']
['Rain', 'Mild', 'High', 'Strong', 'No']

3
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment – 2:

For a given set of training data examples stored in a .CSV file, implement
and demonstrate the Candidate-Elimination algorithm to output a
description of the set of allhypotheses consistent with the training
examples.

Aim: Demonstration of Candidate-Elimination algorithm

Program:
import pandas as pd

# Load the training data from a CSV file

data = pd.read_csv('Tennies.csv')

# Extract the feature names

features = list(data.columns[:-1])

# Initialize the most specific hypothesis (S0) and the most general hypothesis (G0)

S = ['0'] * len(features) # Most specific hypothesis

G = [['?'] * len(features)] # Most general hypothesis

def more_general(h1, h2):

""" Check if hypothesis h1 is more general than hypothesis h2 """

more_general_parts = []

for x, y in zip(h1, h2):

mg = x == '?' or (x != '0' and (x == y or y == '0'))

more_general_parts.append(mg)

return all(more_general_parts)

def fulfills(example, hypothesis):

""" Check if a hypothesis is consistent with an example """

return all(h == '?' or h == e for h, e in zip(hypothesis, example))

def min_generalization(h, example):

4
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

""" Minimize the generalization of a hypothesis """

new_h = list(h)

for i, val in enumerate(h):

if not fulfills(example, [val]):

new_h[i] = '?' if h[i] != '0' else example[i]

return new_h

def min_specialization(h, example):

""" Minimize the specialization of a hypothesis """

specializations = []

for i, val in enumerate(h):

if val == '?':

for v in set(data.iloc[:, i]):

specialization = h[:i] + [v] + h[i+1:]

if not fulfills(example, specialization):

specializations.append(specialization)

elif val != '0':

specialization = h[:i] + ['0'] + h[i+1:]

if not fulfills(example, specialization):

specializations.append(specialization)

return specializations

for i, row in data.iterrows():

example = row.iloc[:-1]

label = row.iloc[-1]

if label == 'Yes':

G = [g for g in G if fulfills(example, g)]

S = [min_generalization(s, example) for s in [S] if fulfills(example, S)]

5
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

if not any(fulfills(example, g) for g in G): # Ensure S is generalized if no G is fulfilled

S = [min_generalization(S, example)]

else:

S = [s for s in S if not fulfills(example, s)]

G = [g for g in G if fulfills(example, g)]

G += min_specialization(G[0], example)

G = [g for g in G if any(more_general(g1, g) for g1 in G)]

print("Final specific boundary (S):", S)

print("Final general boundary (G):", G)

Output:
Final specific boundary (S): []
Final general boundary (G): [['?', '?', '?', '?'],
['Sunny', '?', '?', '?'],
['Overcast', '?', '?', '?'],
['?', 'Cool', '?', '?'],
['?', 'Hot', '?', '?'],
['?', '?', 'Normal', '?'],
['?', '?', '?', 'Weak']]

6
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment-3:
Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for building the decision tree and
apply this knowledge to classify a new sample.

Aim: Demonstration of ID3 algorithm

Dataset: Tennis dataset

Program code:

import csv

def read_data(filename):

with open(filename, 'r') as csvfile:

datareader = csv.reader(csvfile, delimiter=',')

headers = next(datareader)

metadata = []

traindata = []

for name in headers:

metadata.append(name)

for row in datareader:

traindata.append(row)

return (metadata, traindata)

import csv

import numpy as np

import math

# Data Loader Function

7
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

def read_data(filename):

with open(filename, 'r') as csvfile:

datareader = csv.reader(csvfile, delimiter=',')

headers = next(datareader)

metadata = []

traindata = []

for name in headers:

metadata.append(name)

for row in datareader:

traindata.append(row)

return (metadata, traindata)

# Decision Tree Classes and Functions

class Node:

def init(self, attribute):

self.attribute = attribute

self.children = []

self.answer = ""

def __str__(self):

return self.attribute

def subtables(data, col, delete):

dict = {}

items = np.unique(data[:, col])

count = np.zeros((items.shape[0],), dtype=np.int32)

for x in range(items.shape[0]):
8
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

for y in range(data.shape[0]):

if data[y, col] == items[x]:

count[x] += 1

for x in range(items.shape[0]):

dict[items[x]] = np.empty((count[x], data.shape[1]), dtype="|S32")

pos = np.zeros((items.shape[0],), dtype=np.int32)

for y in range(data.shape[0]):

for x in range(items.shape[0]):

if data[y, col] == items[x]:

dict[items[x]][pos[x], :] = data[y]

pos[x] += 1

if delete:

for key in dict.keys():

dict[key] = np.delete(dict[key], col, 1)

return items, dict

def entropy(S):

items = np.unique(S)

if items.size == 1:

return 0

counts = np.zeros((items.shape[0],))

sums = 0

for x in range(items.shape[0]):

counts[x] = sum(S == items[x]) / (S.size * 1.0)

for count in counts:

9
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

sums += -1 * count * math.log2(count)

return sums

def gain_ratio(data, col):

items, dict = subtables(data, col, delete=False)

total_size = data.shape[0]

entropies = np.zeros((items.shape[0],))

intrinsic = np.zeros((items.shape[0],))

for x in range(items.shape[0]):

ratio = dict[items[x]].shape[0] / (total_size * 1.0)

entropies[x] = ratio * entropy(dict[items[x]][:, -1].astype(str))

intrinsic[x] = ratio * math.log2(ratio)

total_entropy = entropy(data[:, -1].astype(str))

iv = -1 * sum(intrinsic)

for x in range(entropies.shape[0]):

total_entropy -= entropies[x]

if iv == 0:

return 0

return total_entropy / iv

def create_node(data, metadata):

if (np.unique(data[:, -1])).shape[0] == 1:

node = Node("")

node.answer = np.unique(data[:, -1])[0]

return node

gains = np.zeros((data.shape[1] - 1,))

10
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

for col in range(data.shape[1] - 1):

gains[col] = gain_ratio(data, col)

split = np.argmax(gains)

node = Node(metadata[split])

metadata = np.delete(metadata, split, 0)

items, dict = subtables(data, split, delete=True)

for x in range(items.shape[0]):

child = create_node(dict[items[x]], metadata)

node.children.append((items[x], child))

return node

def empty(size):

return " " * size

def print_tree(node, level):

if node.answer != "":

print(empty(level), node.answer)

return

print(empty(level), node.attribute)

for value, n in node.children:

print(empty(level + 1), value)

print_tree(n, level + 2)

# Load Data and Build Tree

metadata, traindata = read_data("Tennies.csv")

data = np.array(traindata, dtype="|S32")

node = create_node(data, metadata)

11
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

print_tree(node, 0)

Output:
Outlook
b'Overcast'
b'Yes'
b'Rain'
Wind
b'Strong'
b'No'
b'Weak'
b'Yes'
b'Sunny'
Humidity
b'High'
b'No'
b'Normal'
b'Yes'

12
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 4:
Exercises to solve the real-world problems using the following machine learning
methods:
a) Linear Regression
b) Logistic Regression
c) Binary Classifier

Program:

a) Linear Regression

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Load dataset
data = pd.read_csv('students_marks.csv')
# Prepare features and target
X = data[['Lab_Internal_Marks']]
y = data['External_Marks']
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Create and train the Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict external marks
y_pred = model.predict(X_test)
# Visualize the results
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', label='Predicted')
plt.title('Linear Regression - Lab Internal vs External Marks')
plt.xlabel('Lab Internal Marks')
plt.ylabel('External Marks')
plt.legend()
plt.show()
# Print the model's performance
print("Model Coefficient:", model.coef_)
print("Model Intercept:", model.intercept_)

13
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Output:

b) Logistic Regression

Program:

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, classification_report

# Load dataset

data = pd.read_csv('students_marks.csv')

14
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

# Create a binary target: 1 for Pass, 0 for Fail

data['Pass/Fail'] = (data['External_Marks'] >= 40).astype(int)

# Prepare features and target

X = data[['Lab_Internal_Marks']]

y = data['Pass/Fail']

# Split the data into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

# Create and train the Logistic Regression model

model = LogisticRegression()

model.fit(X_train, y_train)

# Predict pass/fail status

y_pred = model.predict(X_test)

# Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Classification Report:\n", classification_report(y_test, y_pred,

zero_division=1))

Output:

c) Binary Classifier:
15
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Program:

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.svm import SVC

from sklearn.metrics import accuracy_score, classification_report

# Load dataset

data = pd.read_csv('students_marks.csv')

# Create a binary target: 1 for Pass, 0 for Fail

data['Pass/Fail'] = (data['External_Marks'] >= 40).astype(int)

# Prepare features and target

X = data[['Lab_Internal_Marks']]

y = data['Pass/Fail']

# Split the data into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

# Create and train the Binary Classifier (SVM)

model = SVC(kernel='linear')

model.fit(X_train, y_train)

# Predict pass/fail status

y_pred = model.predict(X_test)

# Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Classification Report:\n", classification_report(y_test, y_pred,

zero_division=1))

16
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Output:

17
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 5:

Develop a program for Bias, Variance, Remove duplicates , Cross

Validation.

Program:

import pandas as pd

from sklearn.model_selection import cross_val_score

from sklearn.linear_model import LinearRegression

import matplotlib.pyplot as plt

from statistics import mean, stdev

# Load the dataset

data = pd.read_csv(r"winequality-red.csv")

# Remove duplicate rows

data = data.drop_duplicates()

# Display data dimensions and the first 5 rows

dim = data.shape

print('Dimensions of the data set are:', dim)

print('First 5 rows of the data set are:')

print(data.head())

# Get column names and feature names

col_names = list(data.columns)

print('Attribute names are:')

print(col_names)

feature_names = col_names[:-1]

print('Feature names are:', feature_names)

18
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

# Prepare the feature set (X) and target variable (y)

X_set = data.drop('quality', axis=1)

Y_set = data['quality']

# Initialize the Linear Regression model

model = LinearRegression()

# Perform cross-validation and calculate bias and variance

k_list = range(2, 200)

bias = []

variance = []

for k in k_list:

scores = cross_val_score(model, X_set, Y_set, cv=k)

bias.append(mean(scores))

variance.append(stdev(scores))

# Plot the Bias-Variance trade-off

plt.plot(k_list, bias, 'b', label='Bias of model')

plt.plot(k_list, variance, 'r', label='Variance of model')

plt.xlabel('k value')

plt.title('Bias-Variance Trade-off')

plt.legend(loc='best')

plt.show()

# Based on the graph, choose the best value for k (e.g., 85)

optimal_k = 85

scores = cross_val_score(model, X_set, Y_set, cv=optimal_k)

bias = mean(scores)
19
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

variance = stdev(scores)

print('Bias of the model is:', bias)

print('Variance of the model is:', variance)

Output:

Dimensions of the data set are: (1359, 12)

First 5 rows of the data set are:
fixed acidity volatile acidity citric acid residual sugar chlorides
\
0 7.4 0.70 0.00 1.9 0.076
1 7.8 0.88 0.00 2.6 0.098
2 7.8 0.76 0.04 2.3 0.092
3 11.2 0.28 0.56 1.9 0.075
5 7.4 0.66 0.00 1.8 0.075

free sulfur dioxide total sulfur dioxide density pH

sulphates \
0 11.0 34.0 0.9978 3.51 0.56
1 25.0 67.0 0.9968 3.20 0.68
2 15.0 54.0 0.9970 3.26 0.65
3 17.0 60.0 0.9980 3.16 0.58
5 13.0 40.0 0.9978 3.51 0.56

alcohol quality
0 9.4 5
1 9.8 5
2 9.8 5
3 9.8 6
5 9.4 5
Attribute names are:
['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH',
'sulphates', 'alcohol', 'quality']

20
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Feature names are: ['fixed acidity', 'volatile acidity', 'citric acid',

'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide',
'density', 'pH', 'sulphates', 'alcohol']

21
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 6:
Write a program to implement Categorical Encoding, One-hot Encoding.

Program:

import pandas as pd

from sklearn.preprocessing import OneHotEncoder

# Load the dataset

data = pd.read_csv('winequality-red.csv')

# Display the first few rows of the dataset

print('First 5 rows of the dataset:')

print(data.head())

# Check for categorical columns

categorical_columns = data.select_dtypes(include=['object']).columns

print('Categorical columns:', categorical_columns)

# If there are any categorical columns, proceed with one-hot encoding

if len(categorical_columns) > 0:

# Initialize the OneHotEncoder

encoder = OneHotEncoder(sparse=False, drop='first')

# Apply OneHotEncoder to the categorical columns

encoded_data = encoder.fit_transform(data[categorical_columns])

22
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

# Create a DataFrame with the encoded columns

encoded_df = pd.DataFrame(encoded_data,
columns=encoder.get_feature_names_out(categorical_columns))

# Concatenate the encoded columns back to the original dataset

data = pd.concat([data.drop(categorical_columns, axis=1), encoded_df],

axis=1)

# Display the first few rows of the transformed dataset

print('First 5 rows of the transformed dataset:')

print(data.head())

Output:
First 5 rows of the dataset:
fixed acidity volatile acidity citric acid residual sugar chlorides \
0 7.4 0.70 0.00 1.9 0.076
1 7.8 0.88 0.00 2.6 0.098
2 7.8 0.76 0.04 2.3 0.092
3 11.2 0.28 0.56 1.9 0.075
4 7.4 0.70 0.00 1.9 0.076

free sulfur dioxide total sulfur dioxide density pH sulphates \

0 11.0 34.0 0.9978 3.51 0.56
1 25.0 67.0 0.9968 3.20 0.68
2 15.0 54.0 0.9970 3.26 0.65
3 17.0 60.0 0.9980 3.16 0.58
4 11.0 34.0 0.9978 3.51 0.56

alcohol quality
0 9.4 5
1 9.8 5
2 9.8 5
3 9.8 6
4 9.4 5

Categorical columns: Index([], dtype='object')

First 5 rows of the transformed dataset:

fixed acidity volatile acidity citric acid residual sugar chlorides \

23
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

0 7.4 0.70 0.00 1.9 0.076

1 7.8 0.88 0.00 2.6 0.098
2 7.8 0.76 0.04 2.3 0.092
3 11.2 0.28 0.56 1.9 0.075
4 7.4 0.70 0.00 1.9 0.076

free sulfur dioxide total sulfur dioxide density pH sulphates \

0 11.0 34.0 0.9978 3.51 0.56
1 25.0 67.0 0.9968 3.20 0.68
2 15.0 54.0 0.9970 3.26 0.65
3 17.0 60.0 0.9980 3.16 0.58
4 11.0 34.0 0.9978 3.51 0.56

alcohol quality
0 9.4 5
1 9.8 5
2 9.8 5
3 9.8 6
4 9.4 5

24
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 7:
Build an Artificial Neural Network by implementing the Back propagation
algorithm and test the same using appropriate data sets.

Program:
import numpy as np

# Input and output data

X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)

y = np.array(([92], [86], [89]), dtype=float)

# Normalize data

X = X / np.amax(X, axis=0)

y = y / 100

# Sigmoid Function

def sigmoid(x):

return 1 / (1 + np.exp(-x))

# Derivative of Sigmoid Function

def derivatives_sigmoid(x):

return x * (1 - x)

# Variable initialization

epoch = 7000 # Setting training iterations

lr = 0.1 # Setting learning rate

inputlayer_neurons = 2 # Number of features in the dataset

hiddenlayer_neurons = 3 # Number of hidden layer neurons

output_neurons = 1 # Number of neurons at output layer

# Weight and bias initialization

25
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

wh = np.random.uniform(size=(inputlayer_neurons, hiddenlayer_neurons))

bh = np.random.uniform(size=(1, hiddenlayer_neurons))

wout = np.random.uniform(size=(hiddenlayer_neurons, output_neurons))

bout = np.random.uniform(size=(1, output_neurons))

# Training algorithm

for i in range(epoch):

# Forward Propagation

hinp1 = np.dot(X, wh)

hinp = hinp1 + bh

hlayer_act = sigmoid(hinp)

outinp1 = np.dot(hlayer_act, wout)

outinp = outinp1 + bout

output = sigmoid(outinp)

# Backpropagation

EO = y - output

outgrad = derivatives_sigmoid(output)

d_output = EO * outgrad

EH = d_output.dot(wout.T)

hiddengrad = derivatives_sigmoid(hlayer_act)

d_hiddenlayer = EH * hiddengrad

# Updating weights and biases

wout += hlayer_act.T.dot(d_output) * lr

bout += np.sum(d_output, axis=0, keepdims=True) * lr

wh += X.T.dot(d_hiddenlayer) * lr
26
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

bh += np.sum(d_hiddenlayer, axis=0, keepdims=True) * lr

# Display results

print("Input: \n" + str(X))

print("Actual Output: \n" + str(y))

print("Predicted Output: \n" + str(output))

Output:

Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.89317944]
[0.88206035]
[0.89398854]]

27
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 8:
Write a program to implement k-Nearest Neighbor algorithm to classify
the iris data set. Print both correct and wrong predictions.

Program:
import csv

import random

import math

import operator

import os

def loadDataset(filename, split, trainingSet=[], testSet=[]):

if not os.path.exists(filename):

raise FileNotFoundError(f"Error: The file '{filename}' was not found.

Please check the file path.")

with open(filename, 'r') as csvfile:

lines = csv.reader(csvfile)

dataset = list(lines)

for x in range(len(dataset) - 1):

for y in range(4):

dataset[x][y] = float(dataset[x][y])

if random.random() < split:

trainingSet.append(dataset[x])

else:

testSet.append(dataset[x])

28
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

def euclideanDistance(instance1, instance2, length):

distance = 0

for x in range(length):

distance += pow((instance1[x] - instance2[x]), 2)

return math.sqrt(distance)

def getNeighbors(trainingSet, testInstance, k):

distances = []

length = len(testInstance) - 1

for x in range(len(trainingSet)):

dist = euclideanDistance(testInstance, trainingSet[x], length)

distances.append((trainingSet[x], dist))

distances.sort(key=operator.itemgetter(1))

neighbors = []

for x in range(k):

neighbors.append(distances[x][0])

return neighbors

def getResponse(neighbors):

classVotes = {}

for x in range(len(neighbors)):

response = neighbors[x][-1]

if response in classVotes:
29
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

classVotes[response] += 1

else:

classVotes[response] = 1

sortedVotes = sorted(classVotes.items(), key=operator.itemgetter(1),

reverse=True)

return sortedVotes[0][0]

def getAccuracy(testSet, predictions):

correct = 0

for x in range(len(testSet)):

if testSet[x][-1] == predictions[x]:

correct += 1

return (correct / float(len(testSet))) * 100.0

def main():

# Prepare data

trainingSet = []

testSet = []

split = 0.67

filename = 'iris.data' # Ensure the file exists in the current directory

try:

loadDataset(filename, split, trainingSet, testSet)

print('Train set:', len(trainingSet))

30
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

print('Test set:', len(testSet))

except FileNotFoundError as e:

print(e)

return

# Generate predictions

predictions = []

k=3

for x in range(len(testSet)):

neighbors = getNeighbors(trainingSet, testSet[x], k)

result = getResponse(neighbors)

predictions.append(result)

print(f'> predicted={result}, actual={testSet[x][-1]}')

accuracy = getAccuracy(testSet, predictions)

print(f'Accuracy: {accuracy:.2f}%')

if __name__ == "__main__":

main()

Output:
Train set: 91
Test set: 59
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa

31
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

> predicted=Iris-setosa, actual=Iris-setosa

> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-virginica, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica

32
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

> predicted=Iris-virginica, actual=Iris-virginica

> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-versicolor, actual=Iris-virginica
Accuracy: 96.61%

33
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 9:
Implement the non-parametric Locally Weighted Regression algorithm in
order to fit data points. Select appropriate data set for your experiment
and draw graphs.

Program:
import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

def kernel(point, X, k):

m, n = np.shape(X)

weights = np.eye(m)

for j in range(m):

diff = point - X[j]

weights[j, j] = np.exp(diff @ diff.T / (-2.0 * k**2))

return np.matrix(weights)

def localWeight(point, X, y, k):

wei = kernel(point, X, k)

W = np.linalg.inv(X.T @ wei @ X) @ (X.T @ wei @ y)

return W

def localWeightRegression(X, y, k):

m, n = np.shape(X)

ypred = np.zeros(m)

34
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

for i in range(m):

ypred[i] = X[i] @ localWeight(X[i], X, y, k)

return ypred

# Load dataset

data = pd.read_csv('data10.csv')

bill = np.array(data.total_bill)

tip = np.array(data.tip)

# Preparing dataset and adding a column of ones

mbill = np.asarray(bill).reshape(-1, 1)

mtip = np.asarray(tip).reshape(-1, 1)

m = np.shape(mbill)[0]

one = np.ones((m, 1))

X = np.hstack((one, mbill))

# Set bandwidth parameter

k=2

ypred = localWeightRegression(X, mtip, k)

# Sort data for plotting

SortIndex = X[:, 1].argsort()

xsort = X[SortIndex][:, 1].flatten()

ypred_sorted = ypred[SortIndex]
35
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

# Plot results

plt.scatter(bill, tip, color='blue', label='Actual')

plt.plot(xsort, ypred_sorted, color='red', linewidth=2, label='Predicted')

plt.xlabel('Total Bill')

plt.ylabel('Tip')

plt.legend()

plt.show()

Output:

36
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 10:
Assuming a set of documents that need to be classified, use the naïve
Bayesian Classifier model to perform this task. Built-in Java classes/API
can be used to write the program. Calculate the accuracy, precision, and
recall for your data set.

Program:
import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.naive_bayes import MultinomialNB

from sklearn import metrics

import numpy as np

# Load dataset

try:

msg = pd.read_csv('naivetext1.csv', names=['message', 'label'])

print('The dimensions of the dataset:', msg.shape)

except FileNotFoundError:

print("Error: Dataset file 'naivetext1.csv' not found. Please ensure the file is in
the correct directory.")

exit()

# Map labels to numerical values and drop missing values

msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})

msg = msg.dropna()

37
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

# Ensure labelnum is of correct type

msg['labelnum'] = msg['labelnum'].astype(int)

# Splitting the dataset into training and testing sets

X = msg.message

y = msg.labelnum

xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=0.2,

random_state=42)

print(f'Train set size: {xtrain.shape[0]}')

print(f'Test set size: {xtest.shape[0]}')

# Convert text data into numerical vectors

count_vect = CountVectorizer()

xtrain_dtm = count_vect.fit_transform(xtrain)

xtest_dtm = count_vect.transform(xtest)

print('Feature names:', count_vect.get_feature_names_out())

# Train Naive Bayes classifier

clf = MultinomialNB()

clf.fit(xtrain_dtm, ytrain)

# Make predictions

predicted = clf.predict(xtest_dtm)

# Ensure ytest has no NaN values and correct type

38
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

ytest = ytest.dropna().astype(int).to_numpy()

# Validate test set before calculating metrics

if len(ytest) == 0 or len(predicted) == 0:

print("Error: Test set is empty. Adjust test_size parameter in

train_test_split.")

else:

# Calculate and print accuracy metrics

print('Accuracy of the classifier:', metrics.accuracy_score(ytest, predicted))

print('Confusion Matrix:')

print(metrics.confusion_matrix(ytest, predicted))

print('Recall:', metrics.recall_score(ytest, predicted, zero_division=1))

print('Precision:', metrics.precision_score(ytest, predicted, zero_division=1))

# Test new sample predictions

docs_new = ['I like this place', 'My boss is not my saviour']

X_new_counts = count_vect.transform(docs_new)

predictednew = clf.predict(X_new_counts)

for doc, category in zip(docs_new, predictednew):

label = 'pos' if category == 1 else 'neg'

print(f'{doc} -> {label}')

Output:
The dimensions of the dataset: (1,2)

39
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Train set size: 14

Test set size: 4
Feature names: ['about' 'am' 'an' 'and' 'awesome' 'bad' 'beers' 'best' 'boss' 'can'
'dance' 'deal' 'do' 'enemy' 'feel' 'fun' 'good' 'great' 'have' 'holiday'
'horrible' 'house' 'is' 'juice' 'like' 'locality' 'love' 'my' 'not' 'of'
'place' 'sick' 'stay' 'stuff' 'taste' 'that' 'the' 'these' 'this' 'tired'
'to' 'today' 'tomorrow' 'very' 'view' 'we' 'went' 'what' 'will' 'with'
'work']
Accuracy of the classifier: 1.0
Confusion Matrix:
[[2 0]
[0 2]]
Recall: 1.0
Precision: 1.0
I like this place -> neg
My boss is not my saviour -> neg

40
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 11:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data
set for clustering using k-Means algorithm. Compare the results of these
two algorithms and comment on the quality of clustering. You can add
Java/Python ML library classes/API in the program.

Program:
import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.datasets import make_blobs

from sklearn.mixture import GaussianMixture

from sklearn.cluster import KMeans

from sklearn.metrics import silhouette_score, adjusted_rand_score

from matplotlib.patches import Ellipse

# Generate synthetic data

X, y_true = make_blobs(n_samples=100, centers=4, cluster_std=0.60,

random_state=0)

X = X[:, ::-1] # Flip axes for better plotting

# Function to draw an ellipse

def draw_ellipse(position, covariance, ax=None, **kwargs):

ax = ax or plt.gca()

if covariance.shape == (2, 2):

U, s, Vt = np.linalg.svd(covariance)

41
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

angle = np.degrees(np.arctan2(U[1, 0], U[0, 0]))

width, height = 2 * np.sqrt(s)

else:

angle = 0

width, height = 2 * np.sqrt(covariance)

for nsig in range(1, 4):

ax.add_patch(Ellipse(xy=position, width=nsig * width, height=nsig *

height, angle=angle, **kwargs))

# Function to plot GMM results

def plot_gmm(gmm, X, label=True, ax=None):

ax = ax or plt.gca()

labels = gmm.fit(X).predict(X)

if label:

ax.scatter(X[:, 0], X[:, 1], c=labels, s=40, cmap='viridis', zorder=2)

else:

ax.scatter(X[:, 0], X[:, 1], s=40, zorder=2)

ax.axis('equal')

w_factor = 0.2 / gmm.weights_.max()

for pos, covar, w in zip(gmm.means_, gmm.covariances_, gmm.weights_):

draw_ellipse(pos, covar, alpha=w * w_factor)

# Apply k-Means algorithm

kmeans = KMeans(n_clusters=4, random_state=42)

42
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

kmeans_labels = kmeans.fit_predict(X)

kmeans_silhouette = silhouette_score(X, kmeans_labels)

kmeans_ari = adjusted_rand_score(y_true, kmeans_labels)

# Apply EM algorithm (Gaussian Mixture Model)

gmm = GaussianMixture(n_components=4, random_state=42)

gmm_labels = gmm.fit_predict(X)

gmm_silhouette = silhouette_score(X, gmm_labels)

gmm_ari = adjusted_rand_score(y_true, gmm_labels)

# Print results

print('k-Means Clustering:')

print('Silhouette Score:', kmeans_silhouette)

print('Adjusted Rand Index:', kmeans_ari)

print('\nEM Algorithm (GMM) Clustering:')

print('Silhouette Score:', gmm_silhouette)

print('Adjusted Rand Index:', gmm_ari)

# Visualize clustering results

plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)

plt.title('k-Means Clustering')
43
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

plt.scatter(X[:, 0], X[:, 1], c=kmeans_labels, s=40, cmap='viridis')

plt.subplot(1, 2, 2)

plt.title('EM Algorithm (GMM) Clustering')

plot_gmm(GaussianMixture(n_components=4, random_state=42), X)

plt.show()

Output:
k-Means Clustering:
Silhouette Score: 0.6486437837860929
Adjusted Rand Index: 0.9472597722581074

EM Algorithm (GMM) Clustering:

Silhouette Score: 0.6476255767693837
Adjusted Rand Index: 0.92264724951374

44
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 12:
Exploratory Data Analysis for Classification using Pandas or Matplotlib.

Program:
import pandas as pd

import matplotlib.pyplot as plt

# Check Pandas version

print('Pandas version is', pd.version)

# Load dataset

data = pd.read_csv(r"tae.csv", header=None)

col_names = ['native_speaker', 'instructor', 'course', 'semester', 'class_size',

'score']

data.columns = col_names

# Convert target variable to categorical

print('Data type of target variable is:', data['score'].dtype)

data['score'] = data['score'].astype('category')

print('After conversion, data type of target variable is:', data['score'].dtype)

45
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

# Dataset information

print('Dimensions of the dataset:', data.shape)

print('The first 5 rows of the dataset:')

print(data.head())

print('The last 5 rows of the dataset:')

print(data.tail())

print('Randomly selected 5 rows of the dataset:')

print(data.sample(5))

print('Columns of the dataset:', data.columns.tolist())

print('Names and data types of attributes:')

print(data.dtypes)

# Convert 'native_speaker' to categorical

data['native_speaker'] = data['native_speaker'].astype('category')

print('After conversion, Names and data types of attributes:')

print(data.dtypes)

# Dataset information and statistics

print('Information of the dataset attributes:')

print(data.info())

print('Statistics of numerical attributes:')

print(data.describe())

print('Statistics of all attributes:')

46
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

print(data.describe(include='all'))

# Correlation matrix

print('Correlation matrix of numerical attributes:')

corr = data.corr(numeric_only=True)

print(corr)

# Distribution of the target variable

print('Distribution of target variable:')

print(data['score'].value_counts())

# Target class distribution w.r.t 'native_speaker'

print(pd.crosstab(data['native_speaker'], data['score']))

print(pd.crosstab(data['native_speaker'], data['score'], normalize='index'))

print('Target class distribution using groupby:')

print(data.groupby('native_speaker')['score'].value_counts())

# Check and handle missing values

print('Checking for null values:')

print(data.isnull().sum())

data.dropna(subset=['instructor'], inplace=True)

print('After removing rows with null values in column "instructor":')

print(data.isnull().sum())

# Unique values in 'score'

print('Unique values in column "score":', data['score'].unique())

47
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

# Visualization

plt.figure(figsize=(12, 6))

plt.scatter(data['semester'], data['class_size'], color='red')

plt.xlabel('Semester')

plt.ylabel('Class Size')

plt.title('Scatter Plot: Semester vs Class Size')

plt.show()

# Bar plot: Number of distinct courses per semester

data.groupby('semester')['course'].nunique().plot(kind='bar', title='Number of
Distinct Courses per Semester')

plt.show()

# Histogram: Frequency of values in 'semester'

data['semester'].plot(kind='hist', title='Frequency of Semester Values')

plt.show()

# Line and bar plot

ax = data.plot(kind='bar', x='semester', y='course', color='red', title='Bar Plot:

Semester vs Course')

data.plot(kind='line', x='semester', y='class_size', ax=ax, title='Line Plot:

Semester vs Class Size')

plt.show()

Output:
Pandas version is 2.2.3
Data type of target variable is: int64
After conversion, data type of target variable is: category
Dimensions of the dataset: (151, 6)

48
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

The first 5 rows of the dataset:

native_speaker instructor course semester class_size score
0 1 23 3 1 19 3
1 2 15 3 1 17 3
2 1 23 3 2 49 3
3 1 5 2 2 33 3
4 2 7 11 2 55 3
The last 5 rows of the dataset:
native_speaker instructor course semester class_size score
146 2 3 2 2 26 1
147 2 10 3 2 12 1
148 1 18 7 2 48 1
149 2 22 1 2 51 1
150 2 2 10 2 27 1
Randomly selected 5 rows of the dataset:
native_speaker instructor course semester class_size score
82 2 13 3 1 11 3
38 2 14 15 2 38 1
31 2 18 5 2 19 1
149 2 22 1 2 51 1
112 1 14 15 2 32 1
Columns of the dataset: ['native_speaker', 'instructor', 'course', 'semester',
'class_size', 'score']
Names and data types of attributes:
native_speaker int64
instructor int64
course int64
semester int64
class_size int64
score category
dtype: object
After conversion, Names and data types of attributes:
native_speaker category
instructor int64
course int64
semester int64
class_size int64
score category
dtype: object
Information of the dataset attributes:
<class 'pandas.core.frame.DataFrame'>

49
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

RangeIndex: 151 entries, 0 to 150

Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 native_speaker 151 non-null category
1 instructor 151 non-null int64
2 course 151 non-null int64
3 semester 151 non-null int64
4 class_size 151 non-null int64
5 score 151 non-null category
dtypes: category(2), int64(4)
memory usage: 5.4 KB
None
Statistics of numerical attributes:
instructor course semester class_size
count 151.000000 151.000000 151.000000 151.000000
mean 13.642384 8.105960 1.847682 27.867550
std 6.825779 7.023914 0.360525 12.893758
min 1.000000 1.000000 1.000000 3.000000
25% 8.000000 3.000000 2.000000 19.000000
50% 13.000000 4.000000 2.000000 27.000000
75% 20.000000 15.000000 2.000000 37.000000
max 25.000000 26.000000 2.000000 66.000000

Name: count, dtype: int64

50
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

score 1 2 3
native_speaker
1 5 6 18
2 44 44 34
score 1 2 3
native_speaker
1 0.172414 0.206897 0.620690
2 0.360656 0.360656 0.278689
Target class distribution using groupby:
native_speaker score
1 3 18
2 6
1 5
2 1 44
2 44
3 34
Name: count, dtype: int64
Checking for null values:
native_speaker 0
instructor 0
course 0
semester 0
class_size 0
score 0
dtype: int64
After removing rows with null values in column "instructor":
native_speaker 0
instructor 0
course 0
semester 0
class_size 0
score 0
dtype: int64
Unique values in column "score": [3, 2, 1]
Categories (3, int64): [1, 2, 3]

51
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

52
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

53
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 13:
Write a Python program to construct a Bayesian network considering
medical data. Use this model to demonstrate the diagnosis of heart patients
using standard Heart Disease Data Set.

Program:
import bayespy as bp

import numpy as np

import csv

from colorama import init, Fore, Back, Style

init()

# Define Parameter Enum values

# Age

ageEnum = {'SuperSeniorCitizen': 0, 'SeniorCitizen': 1, 'MiddleAged': 2,

'Youth': 3, 'Teen': 4}

# Gender

genderEnum = {'Male': 0, 'Female': 1}

# FamilyHistory

familyHistoryEnum = {'Yes': 0, 'No': 1}

# Diet (Calorie Intake)

dietEnum = {'High': 0, 'Medium': 1, 'Low': 2}

# LifeStyle

lifeStyleEnum = {'Athlete': 0, 'Active': 1, 'Moderate': 2, 'Sedentary': 3}

# Cholesterol

54
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

cholesterolEnum = {'High': 0, 'BorderLine': 1, 'Normal': 2}

# HeartDisease

heartDiseaseEnum = {'Yes': 0, 'No': 1}

# Load heart disease data from CSV

with open('heart_disease_data.csv') as csvfile:

lines = csv.reader(csvfile)

dataset = list(lines)

data = []

for x in dataset:

data.append([ageEnum[x[0]], genderEnum[x[1]], familyHistoryEnum[x[2]],

dietEnum[x[3]],

lifeStyleEnum[x[4]], cholesterolEnum[x[5]],
heartDiseaseEnum[x[6]]])

# Training data for machine learning

data = np.array(data)

N = len(data)

# Input data column assignment

p_age = bp.nodes.Dirichlet(1.0 * np.ones(5))

age = bp.nodes.Categorical(p_age, plates=(N,))

age.observe(data[:, 0])

55
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

p_gender = bp.nodes.Dirichlet(1.0 * np.ones(2))

gender = bp.nodes.Categorical(p_gender, plates=(N,))

gender.observe(data[:, 1])

p_familyhistory = bp.nodes.Dirichlet(1.0 * np.ones(2))

familyhistory = bp.nodes.Categorical(p_familyhistory, plates=(N,))

familyhistory.observe(data[:, 2])

p_diet = bp.nodes.Dirichlet(1.0 * np.ones(3))

diet = bp.nodes.Categorical(p_diet, plates=(N,))

diet.observe(data[:, 3])

p_lifestyle = bp.nodes.Dirichlet(1.0 * np.ones(4))

lifestyle = bp.nodes.Categorical(p_lifestyle, plates=(N,))

lifestyle.observe(data[:, 4])

p_cholesterol = bp.nodes.Dirichlet(1.0 * np.ones(3))

cholesterol = bp.nodes.Categorical(p_cholesterol, plates=(N,))

cholesterol.observe(data[:, 5])

# Prepare nodes and establish edges

p_heartdisease = bp.nodes.Dirichlet(np.ones(2), plates=(5, 2, 2, 3, 4, 3))

heartdisease = bp.nodes.MultiMixture([age, gender, familyhistory, diet,

lifestyle, cholesterol],

56
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

bp.nodes.Categorical, p_heartdisease)

heartdisease.observe(data[:, 6])

# Update the network

p_heartdisease.update()

# Interactive Test

m=0

while m == 0:

print("\n")

input_age = int(input('Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-

MiddleAged, 3-Youth, 4-Teen): '))

input_gender = int(input('Enter Gender (0-Male, 1-Female): '))

input_familyhistory = int(input('Enter FamilyHistory (0-Yes, 1-No): '))

input_diet = int(input('Enter Diet (0-High, 1-Medium, 2-Low): '))

input_lifestyle = int(input('Enter LifeStyle (0-Athlete, 1-Active, 2-Moderate,

3-Sedentary): '))

input_cholesterol = int(input('Enter Cholesterol (0-High, 1-BorderLine, 2-

Normal): '))

res = bp.nodes.MultiMixture([

input_age,

input_gender,

input_familyhistory,

input_diet,
57
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

input_lifestyle,

input_cholesterol

], bp.nodes.Categorical,
p_heartdisease).get_moments()[0][heartDiseaseEnum['Yes']]

print(Fore.RED + "Probability of Heart Disease = " + str(res))

print(Style.RESET_ALL)

m = int(input("Enter 0 to Continue or 1 to Exit: "))

Output:
Test case-0:
Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-MiddleAged, 3-Youth, 4-Teen): 0
Enter Gender (0-Male, 1-Female): 0
Enter FamilyHistory (0-Yes, 1-No): 1
Enter Diet (0-High, 1-Medium, 2-Low): 1
Enter LifeStyle (0-Athlete, 1-Active, 2-Moderate, 3-Sedentary): 2
Enter Cholesterol (0-High, 1-BorderLine, 2-Normal): 1
Probability of Heart Disease = 0.5

Enter 0 to Continue or 1 to Exit: 1

Test case-1:
Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-MiddleAged, 3-Youth, 4-Teen): 0
Enter Gender (0-Male, 1-Female): 0
Enter FamilyHistory (0-Yes, 1-No): 3
Enter Diet (0-High, 1-Medium, 2-Low): 2
Enter LifeStyle (0-Athlete, 1-Active, 2-Moderate, 3-Sedentary): 2
Enter Cholesterol (0-High, 1-BorderLine, 2-Normal): 1
Probability of Heart Disease = 0.5

Enter 0 to Continue or 1 to Exit: 1

Test case-2:
Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-MiddleAged, 3-Youth, 4-Teen): 2

Enter Gender (0-Male, 1-Female): 0

Enter FamilyHistory (0-Yes, 1-No): 0
Enter Diet (0-High, 1-Medium, 2-Low): 0
Enter LifeStyle (0-Athlete, 1-Active, 2-Moderate, 3-Sedentary): 2
Enter Cholesterol (0-High, 1-BorderLine, 2-Normal): 2
Probability of Heart Disease = 0.5

58
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Enter 0 to Continue or 1 to Exit: 1

Test case-3:
Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-MiddleAged, 3-Youth, 4-Teen): 3

Enter Gender (0-Male, 1-Female): 1

Enter FamilyHistory (0-Yes, 1-No): 1
Enter Diet (0-High, 1-Medium, 2-Low): 1
Enter LifeStyle (0-Athlete, 1-Active, 2-Moderate, 3-Sedentary): 2
Enter Cholesterol (0-High, 1-BorderLine, 2-Normal): 2
Probability of Heart Disease = 0.5

Enter 0 to Continue or 1 to Exit: 1

Test case-4:

Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-MiddleAged, 3-Youth, 4-Teen): 4

Enter Gender (0-Male, 1-Female): 1
Enter FamilyHistory (0-Yes, 1-No): 0
Enter Diet (0-High, 1-Medium, 2-Low): 2
Enter LifeStyle (0-Athlete, 1-Active, 2-Moderate, 3-Sedentary): 3
Enter Cholesterol (0-High, 1-BorderLine, 2-Normal): 0
Probability of Heart Disease = 0.5

Enter 0 to Continue or 1 to Exit: 1

Experiment 14:
Write a program to Implement Support Vector Machines and Principle
Component Analysis.

Program:

59
Experiment 14:
Write a program to Implement Support Vector Machines and Principle
Component Analysis.

Program:
import pandas as pd

import seaborn as sns

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.metrics import confusion_matrix, classification_report,

accuracy_score, recall_score, precision_score, f1_score

from sklearn.svm import SVC

from sklearn.preprocessing import StandardScaler

# Load the dataset

data = pd.read_excel(r"D:\ML Lab\haberman.csv.xlsx", header=None)

# Add column names

col_names = [ age , year , pos_axil_nodes , survival_status ]

data.columns = col_names

# Drop the year column

data = data.drop([ year ], axis=1)

# Check for missing data

print( Missing values in the dataset: )

print(data.isnull().sum())

# Display basic information

print( The first 5 rows of the data set are: )

print(data.head())

print( Dimensions of the data set: , data.shape)

print( Statistics of the data are: )

print(data.describe())

print( Correlation matrix of the data set: )

print(data.corr())

# Display class labels

class_labels = data[ survival_status ].unique().astype(str).tolist()

print( Class labels are: )

print(class_labels)

# Plot class distribution

sns.countplot(data[ survival_status ])
plt.title("Class Distribution")

plt.show()

# Separate features and target variable

x_set = data.drop([ survival_status ], axis=1)

y_set = data[ survival_status ]

# Display features and target variable

print( Feature names are: )

print(list(x_set.columns))

print( First 5 rows of the feature set are: )

print(x_set.head())

print( First 5 rows of the target variable are: )

print(y_set.head())

# Check target variable distribution

print( Distribution of Target variable is: )

print(y_set.value_counts())

# Standardize the feature set

scaler = StandardScaler()

x_train, x_test, y_train, y_test = train_test_split(x_set, y_set, test_size=0.3,

random_state=42)

x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

# Train the SVM model

model = SVC()

print("Training the model with train dataset")

model.fit(x_train, y_train)

# Make predictions

y_pred = model.predict(x_test)

print( Predicted class labels for test data are: )

print(y_pred)

# Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Precision:", precision_score(y_test, y_pred, average= binary ))

print("Recall:", recall_score(y_test, y_pred, average= binary ))

print("F1 Score:", f1_score(y_test, y_pred, average= binary ))

print(classification_report(y_test, y_pred, target_names=class_labels))

# Confusion Matrix

cm = confusion_matrix(y_test, y_pred)

df_cm = pd.DataFrame(cm, columns=class_labels, index=class_labels)

df_cm.index.name = Actual

df_cm.columns.name = Predicted

sns.heatmap(df_cm, annot=True, cmap="Blues", fmt= d )

plt.title("Confusion Matrix")
plt.show()

# Scatter plot of training data and decision boundary

plt.scatter(x_train[:, 0], x_train[:, 1], c=y_train, cmap=plt.cm.Paired)

plt.xlabel( Age )

plt.ylabel( Positive Axillary Nodes )

plt.title( Data points in Training Dataset )

plt.show()

# Decision boundary visualization (only works for 2D data)

ax = plt.gca()

xlim = ax.get_xlim()

ylim = ax.get_ylim()

xx = np.linspace(xlim[0], xlim[1], 30)

yy = np.linspace(ylim[0], ylim[1], 30)

YY, XX = np.meshgrid(yy, xx)

xy = np.vstack([XX.ravel(), YY.ravel()]).T

Z = model.decision_function(xy).reshape(XX.shape)

ax.contour(XX, YY, Z, colors= red , levels=[-1, 0, 1], alpha=0.5, linestyles=[ -- , - , --

])

ax.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1], s=30,

facecolors= green )

plt.title( Support Vectors and Decision Boundary )

plt.show()
Output:
Experiment 15:
Write a program to Implement Principle Component Analysis.

Program:
import numpy as nmp

import matplotlib.pyplot as mpltl

import pandas as pnd

DS = pnd.read_csv( Wine.csv )

# Now, we will distribute the dataset into two components "X" and "Y"

X = DS.iloc[: , 0:13].values

Y = DS.iloc[: , 13].values

from sklearn.model_selection import train_test_split as tts

X_train, X_test, Y_train, Y_test = tts(X, Y, test_size = 0.2, random_state = 0)

from sklearn.preprocessing import StandardScaler as SS

SC = SS()

X_train = SC.fit_transform(X_train)

X_test = SC.transform(X_test)

from sklearn.decomposition import PCA

PCa = PCA (n_components = 1)

X_train = PCa.fit_transform(X_train)

X_test = PCa.transform(X_test)

explained_variance = PCa.explained_variance_ratio_
from sklearn.linear_model import LogisticRegression as LR

classifier_1 = LR (random_state = 0)

classifier_1.fit(X_train, Y_train)

Output:

ML Lab Manual
No ratings yet
ML Lab Manual
26 pages
ML Lab Manual-17csl76
No ratings yet
ML Lab Manual-17csl76
43 pages
(Interdisciplinary Applied Mathematics 40) René Vidal, Yi Ma, S.S. Sastry (Auth.) - Generalized Principal Component Analysis-Springer-Verlag New York (2016)
No ratings yet
(Interdisciplinary Applied Mathematics 40) René Vidal, Yi Ma, S.S. Sastry (Auth.) - Generalized Principal Component Analysis-Springer-Verlag New York (2016)
590 pages
Machine Learning Lab Manual (15CSL76)
No ratings yet
Machine Learning Lab Manual (15CSL76)
30 pages
ML Lab Manual R20
No ratings yet
ML Lab Manual R20
77 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
43 pages
ML Lab Manual Devansh
No ratings yet
ML Lab Manual Devansh
57 pages
ML Lab Manual
No ratings yet
ML Lab Manual
46 pages
Ad3461 ML Lab Manual Format Edited
No ratings yet
Ad3461 ML Lab Manual Format Edited
45 pages
ML Lab Manual R20 1
No ratings yet
ML Lab Manual R20 1
63 pages
MLT Bcai 651 Lab Manual
No ratings yet
MLT Bcai 651 Lab Manual
42 pages
AD3461 - ML Lab Manual
No ratings yet
AD3461 - ML Lab Manual
54 pages
R20-21nm-Iii-I-Ml-Lab Manual
No ratings yet
R20-21nm-Iii-I-Ml-Lab Manual
38 pages
Final Lab Programs
No ratings yet
Final Lab Programs
52 pages
ML Lab
No ratings yet
ML Lab
51 pages
ML New Record
No ratings yet
ML New Record
51 pages
Lab Manual Final
No ratings yet
Lab Manual Final
34 pages
MANUAL
No ratings yet
MANUAL
34 pages
ML Manual
No ratings yet
ML Manual
34 pages
ML LAB Record
No ratings yet
ML LAB Record
35 pages
ML Lab Record
No ratings yet
ML Lab Record
30 pages
Machine Learning LAB MANUAL
No ratings yet
Machine Learning LAB MANUAL
23 pages
Jntuk R20 ML
No ratings yet
Jntuk R20 ML
43 pages
ML - LAB Record - Final
No ratings yet
ML - LAB Record - Final
39 pages
22K61A0618 - Removed - Lab Manual Sasi CLD
No ratings yet
22K61A0618 - Removed - Lab Manual Sasi CLD
25 pages
Machine Learning
No ratings yet
Machine Learning
27 pages
Machine Learning Lab Mannual CS 601
No ratings yet
Machine Learning Lab Mannual CS 601
30 pages
MLT Lab1
No ratings yet
MLT Lab1
27 pages
Amit MLT1
No ratings yet
Amit MLT1
22 pages
ML Experiments
No ratings yet
ML Experiments
22 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
31 pages
Original ML Lab Manual
No ratings yet
Original ML Lab Manual
22 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
28 pages
ML Lab File Batch 1
No ratings yet
ML Lab File Batch 1
20 pages
201CS240 Mllabmanual
No ratings yet
201CS240 Mllabmanual
20 pages
Edited - Edited - Final ML Lab Manual Version11
No ratings yet
Edited - Edited - Final ML Lab Manual Version11
83 pages
ML Record
No ratings yet
ML Record
19 pages
ML Lab Draft Manual
No ratings yet
ML Lab Draft Manual
46 pages
ML Priyesha - 778
No ratings yet
ML Priyesha - 778
23 pages
ML Lab Programs
No ratings yet
ML Lab Programs
18 pages
Cse Machine Learning Lab Manual
No ratings yet
Cse Machine Learning Lab Manual
22 pages
Machine Learning Lab (17CSL76)
No ratings yet
Machine Learning Lab (17CSL76)
48 pages
ML Lab Output
No ratings yet
ML Lab Output
15 pages
Machine Learning Techniques Lab: Session: 2023-24, Even Semester
No ratings yet
Machine Learning Techniques Lab: Session: 2023-24, Even Semester
20 pages
My ML Lab Manual
No ratings yet
My ML Lab Manual
21 pages
ML Lab Works
No ratings yet
ML Lab Works
14 pages
Lab Manual: Department of Computer Science and Engineering
No ratings yet
Lab Manual: Department of Computer Science and Engineering
30 pages
15CSL76
No ratings yet
15CSL76
35 pages
MLWP LAB Experiment's
No ratings yet
MLWP LAB Experiment's
11 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
12 pages
IT ML Lab
No ratings yet
IT ML Lab
35 pages
ML Lab Manual PDF
No ratings yet
ML Lab Manual PDF
9 pages
Shashidhar-18csl76 Final
No ratings yet
Shashidhar-18csl76 Final
19 pages
Machine Learning Lab Record: Dr. Sarika Hegde
No ratings yet
Machine Learning Lab Record: Dr. Sarika Hegde
23 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
ML Lab Report
No ratings yet
ML Lab Report
8 pages
15CSL76 Students
No ratings yet
15CSL76 Students
18 pages
Machine Learning Laboratory Manual
No ratings yet
Machine Learning Laboratory Manual
11 pages
ML Lab
No ratings yet
ML Lab
7 pages
K Means R and Rapid Miner Patient and Mall Case Study
No ratings yet
K Means R and Rapid Miner Patient and Mall Case Study
80 pages
PGP-AIFL Brochure
No ratings yet
PGP-AIFL Brochure
16 pages
CS1004 DWM 2marks 2013
No ratings yet
CS1004 DWM 2marks 2013
22 pages
RGPV Course File Cs-801 Soft Computing
No ratings yet
RGPV Course File Cs-801 Soft Computing
13 pages
100-Machine-Learning-Interview-Questions-and-Answers (Downloaded From Internet)
No ratings yet
100-Machine-Learning-Interview-Questions-and-Answers (Downloaded From Internet)
24 pages
Dendrogram - Slides
No ratings yet
Dendrogram - Slides
27 pages
AIML MCQ All
No ratings yet
AIML MCQ All
20 pages
Machine Learning: Algorithms and Applications: Quang Nhat Nguyen
No ratings yet
Machine Learning: Algorithms and Applications: Quang Nhat Nguyen
16 pages
Unit 1
No ratings yet
Unit 1
11 pages
Evaluation and Proper Selection of Multiphase Flow Correlations
No ratings yet
Evaluation and Proper Selection of Multiphase Flow Correlations
304 pages
Coursera Capstone: Opening A New Shopping Mall in Kuala Lumpur, Malaysia
No ratings yet
Coursera Capstone: Opening A New Shopping Mall in Kuala Lumpur, Malaysia
3 pages
The Data Explosion: Modern Computer Systems Are Accumulating Data at An Almost Unimaginable Rate and From A
No ratings yet
The Data Explosion: Modern Computer Systems Are Accumulating Data at An Almost Unimaginable Rate and From A
14 pages
NLP m4
No ratings yet
NLP m4
97 pages
CH02 Data Mining A Closer Look
No ratings yet
CH02 Data Mining A Closer Look
34 pages
JNTUH Used Papers
No ratings yet
JNTUH Used Papers
4 pages
BT4211 Data-Driven Marketing: Product: Segmentation, Targeting
No ratings yet
BT4211 Data-Driven Marketing: Product: Segmentation, Targeting
31 pages
ARS CH6 Multiplex
No ratings yet
ARS CH6 Multiplex
81 pages
DM Clustering UNIT4
No ratings yet
DM Clustering UNIT4
36 pages
Multi-Level Observation and Understanding of Program Behaviors
No ratings yet
Multi-Level Observation and Understanding of Program Behaviors
34 pages
Spe 209127 Ms
No ratings yet
Spe 209127 Ms
17 pages
GEO424 Lect15 Unsup and Object Based PDF
No ratings yet
GEO424 Lect15 Unsup and Object Based PDF
23 pages
Predicting Breast Cancer Recurrence Using Machine Learning Techniques: A Systematic Review
No ratings yet
Predicting Breast Cancer Recurrence Using Machine Learning Techniques: A Systematic Review
41 pages
AIML Syllabus
No ratings yet
AIML Syllabus
7 pages
Lecture 1 - Data Mining 101
No ratings yet
Lecture 1 - Data Mining 101
23 pages
Effects of Eco Animations On Nine and Twelve Year Old Children S Environmental Conceptions How WALL E Changed Young Spectators Views of Earth and
No ratings yet
Effects of Eco Animations On Nine and Twelve Year Old Children S Environmental Conceptions How WALL E Changed Young Spectators Views of Earth and
15 pages
UNIT 3 (2marks) TA
No ratings yet
UNIT 3 (2marks) TA
4 pages
98 Jicr September 3208
No ratings yet
98 Jicr September 3208
6 pages
Knee Point Detection
No ratings yet
Knee Point Detection
8 pages
Case Pil Questions
No ratings yet
Case Pil Questions
2 pages
Quick Python Guide
From Everand
Quick Python Guide
Coder1
No ratings yet
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet

ML Lab Manual

Uploaded by

ML Lab Manual

Uploaded by

BVC College of Engineering, Palacharla,

IIIrd B.Tech II Semester CSE A&B Lab Manual

S.NO LIST OF EXPERIMENTS

6 Write a program to implement Categorical Encoding, One-hot Encoding.

12 Exploratory Data Analysis for Classification using Pandas or Matplotlib.

15 Write a program to Implement Principle Component Analysis.

Aim: Demonstration of FIND-S algorithm for finding the most specific

with open('tennis.csv', 'r') as f:

H = [['0', '0', '0', '0', '0']]

if x != h[0][j] and h[0][j] == '0':

elif x != h[0][j] and h[0][j] != '0':

print("Most specific hypothesis is")

Aim: Demonstration of Candidate-Elimination algorithm

# Load the training data from a CSV file

# Extract the feature names

S = ['0'] * len(features) # Most specific hypothesis

G = [['?'] * len(features)] # Most general hypothesis

def more_general(h1, h2):

""" Check if hypothesis h1 is more general than hypothesis h2 """

for x, y in zip(h1, h2):

mg = x == '?' or (x != '0' and (x == y or y == '0'))

def fulfills(example, hypothesis):

""" Check if a hypothesis is consistent with an example """

return all(h == '?' or h == e for h, e in zip(hypothesis, example))

def min_generalization(h, example):

""" Minimize the generalization of a hypothesis """

for i, val in enumerate(h):

if not fulfills(example, [val]):

new_h[i] = '?' if h[i] != '0' else example[i]

def min_specialization(h, example):

""" Minimize the specialization of a hypothesis """

for i, val in enumerate(h):

for v in set(data.iloc[:, i]):

specialization = h[:i] + [v] + h[i+1:]

if not fulfills(example, specialization):

elif val != '0':

specialization = h[:i] + ['0'] + h[i+1:]

if not fulfills(example, specialization):

for i, row in data.iterrows():

G = [g for g in G if fulfills(example, g)]

S = [min_generalization(s, example) for s in [S] if fulfills(example, S)]

if not any(fulfills(example, g) for g in G): # Ensure S is generalized if no G is fulfilled

S = [s for s in S if not fulfills(example, s)]

G = [g for g in G if fulfills(example, g)]

G = [g for g in G if any(more_general(g1, g) for g1 in G)]

print("Final specific boundary (S):", S)

print("Final general boundary (G):", G)

Aim: Demonstration of ID3 algorithm

Dataset: Tennis dataset

with open(filename, 'r') as csvfile:

datareader = csv.reader(csvfile, delimiter=',')

for name in headers:

for row in datareader:

return (metadata, traindata)

# Data Loader Function

with open(filename, 'r') as csvfile:

datareader = csv.reader(csvfile, delimiter=',')

for name in headers:

for row in datareader:

return (metadata, traindata)

# Decision Tree Classes and Functions

def __init__(self, attribute):

def subtables(data, col, delete):

items = np.unique(data[:, col])

count = np.zeros((items.shape[0],), dtype=np.int32)

if data[y, col] == items[x]:

dict[items[x]] = np.empty((count[x], data.shape[1]), dtype="|S32")

pos = np.zeros((items.shape[0],), dtype=np.int32)

if data[y, col] == items[x]:

for key in dict.keys():

dict[key] = np.delete(dict[key], col, 1)

return items, dict

counts[x] = sum(S == items[x]) / (S.size * 1.0)

for count in counts:

sums += -1 * count * math.log2(count)

def gain_ratio(data, col):

def init(self, attribute):