0% found this document useful (0 votes)
8 views70 pages

ML Lab Manual

The document is a lab manual for a Machine Learning course at BVC College of Engineering, outlining various experiments to be conducted using Python. It includes instructions for implementing algorithms such as FIND-S, Candidate-Elimination, ID3 decision trees, and various regression techniques, along with the necessary code snippets. Each experiment aims to demonstrate key machine learning concepts and their applications using datasets in CSV format.

Uploaded by

cheetohunter516
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views70 pages

ML Lab Manual

The document is a lab manual for a Machine Learning course at BVC College of Engineering, outlining various experiments to be conducted using Python. It includes instructions for implementing algorithms such as FIND-S, Candidate-Elimination, ID3 decision trees, and various regression techniques, along with the necessary code snippets. Each experiment aims to demonstrate key machine learning concepts and their applications using datasets in CSV format.

Uploaded by

cheetohunter516
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

BVC College of Engineering, Palacharla,

IIIrd B.Tech II Semester CSE A&B Lab Manual


Subject: Machine Learning using Python Lab(R2032054)

S.NO LIST OF EXPERIMENTS


Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis based on a
1 given set of training data samples. Read the training data from a .CSV file.

For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate- Elimination algorithm to output a description of the set of all hypotheses consistent with the
2 training examples.

Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new sample.
3

Exercises to solve the real-world problems using the following machine learning methods:
a) Linear Regression
4
b) Logistic Regression
c) Binary Classifier.
5 Develop a program for Bias, Variance, Remove duplicates , Cross Validation

6 Write a program to implement Categorical Encoding, One-hot Encoding.


Build an Artificial Neural Network by implementing the Back propagation algorithm and test the same
7 using appropriate data sets.

Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set. Print both
8 correct and wrong predictions.

Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points.
9 Select appropriate data set for your experiment and draw graphs.

Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to
perform this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy,
10 precision, and recall for your data set.

Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for clustering using k-
Means algorithm. Compare the results of these two algorithms and comment on the quality of
11 clustering. You can add Java/Python ML library classes/API in the program.

12 Exploratory Data Analysis for Classification using Pandas or Matplotlib.


Write a Python program to construct a Bayesian network considering medical data. Use this model to
13 demonstrate the diagnosis of heart patients using standard Heart Disease Data Set.

14 Write a program to Implement Support Vector Machines and Principle Component Analysis.

15 Write a program to Implement Principle Component Analysis.

1
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment-1
Implement and demonstrate the FIND-S algorithm for finding the most
specific hypothesis based on a given set of training data samples. Read the
training data from a .CSV file.

Aim: Demonstration of FIND-S algorithm for finding the most specific


hypothesis.

Program:
import csv

with open('tennis.csv', 'r') as f:

Reader=csv.reader(f)

Your_list=list(Reader)

H = [['0', '0', '0', '0', '0']]

for i in Your_list:

print(i)

if i[-1] == "Yes" :

j=0

for x in i:

if x!="Yes" :

if x != h[0][j] and h[0][j] == '0':

h[0][j] = x

elif x != h[0][j] and h[0][j] != '0':

h[0][j] = '?'

else:

pass

j=j+1

print("Most specific hypothesis is")

2
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

print(h)

Output:
['Outlook', 'Temperature', 'Humidity', 'Wind', 'Play Tennis']
['Sunny', 'Hot', 'High', 'Weak', 'No']
['Sunny', 'Hot', 'High', 'Strong', 'No']
['Overcast', 'Hot', 'High', 'Weak', 'Yes']
['Rain', 'Mild', 'High', 'Weak', 'Yes']
['Rain', 'Cool', 'Normal', 'Weak', 'Yes']
['Rain', 'Cool', 'Normal', 'Strong', 'No']
['Overcast', 'Cool', 'Normal', 'Strong', 'Yes']
['Sunny', 'Mild', 'High', 'Weak', 'No']
['Sunny', 'Cool', 'Normal', 'Weak', 'Yes']
['Rain', 'Mild', 'Normal', 'Weak', 'Yes']
['Sunny', 'Mild', 'Normal', 'Strong', 'Yes']
['Overcast', 'Mild', 'High', 'Strong', 'Yes']
['Overcast', 'Hot', 'Normal', 'Weak', 'Yes']
['Rain', 'Mild', 'High', 'Strong', 'No']

3
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment – 2:

For a given set of training data examples stored in a .CSV file, implement
and demonstrate the Candidate-Elimination algorithm to output a
description of the set of allhypotheses consistent with the training
examples.

Aim: Demonstration of Candidate-Elimination algorithm

Program:
import pandas as pd

# Load the training data from a CSV file

data = pd.read_csv('Tennies.csv')

# Extract the feature names

features = list(data.columns[:-1])

# Initialize the most specific hypothesis (S0) and the most general hypothesis (G0)

S = ['0'] * len(features) # Most specific hypothesis

G = [['?'] * len(features)] # Most general hypothesis

def more_general(h1, h2):

""" Check if hypothesis h1 is more general than hypothesis h2 """

more_general_parts = []

for x, y in zip(h1, h2):

mg = x == '?' or (x != '0' and (x == y or y == '0'))

more_general_parts.append(mg)

return all(more_general_parts)

def fulfills(example, hypothesis):

""" Check if a hypothesis is consistent with an example """

return all(h == '?' or h == e for h, e in zip(hypothesis, example))

def min_generalization(h, example):

4
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

""" Minimize the generalization of a hypothesis """

new_h = list(h)

for i, val in enumerate(h):

if not fulfills(example, [val]):

new_h[i] = '?' if h[i] != '0' else example[i]

return new_h

def min_specialization(h, example):

""" Minimize the specialization of a hypothesis """

specializations = []

for i, val in enumerate(h):

if val == '?':

for v in set(data.iloc[:, i]):

specialization = h[:i] + [v] + h[i+1:]

if not fulfills(example, specialization):

specializations.append(specialization)

elif val != '0':

specialization = h[:i] + ['0'] + h[i+1:]

if not fulfills(example, specialization):

specializations.append(specialization)

return specializations

for i, row in data.iterrows():

example = row.iloc[:-1]

label = row.iloc[-1]

if label == 'Yes':

G = [g for g in G if fulfills(example, g)]

S = [min_generalization(s, example) for s in [S] if fulfills(example, S)]

5
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

if not any(fulfills(example, g) for g in G): # Ensure S is generalized if no G is fulfilled

S = [min_generalization(S, example)]

else:

S = [s for s in S if not fulfills(example, s)]

G = [g for g in G if fulfills(example, g)]

G += min_specialization(G[0], example)

G = [g for g in G if any(more_general(g1, g) for g1 in G)]

print("Final specific boundary (S):", S)

print("Final general boundary (G):", G)

Output:
Final specific boundary (S): []
Final general boundary (G): [['?', '?', '?', '?'],
['Sunny', '?', '?', '?'],
['Overcast', '?', '?', '?'],
['?', 'Cool', '?', '?'],
['?', 'Hot', '?', '?'],
['?', '?', 'Normal', '?'],
['?', '?', '?', 'Weak']]

6
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment-3:
Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for building the decision tree and
apply this knowledge to classify a new sample.

Aim: Demonstration of ID3 algorithm

Dataset: Tennis dataset

Program code:

import csv

def read_data(filename):

with open(filename, 'r') as csvfile:

datareader = csv.reader(csvfile, delimiter=',')

headers = next(datareader)

metadata = []

traindata = []

for name in headers:

metadata.append(name)

for row in datareader:

traindata.append(row)

return (metadata, traindata)

import csv

import numpy as np

import math

# Data Loader Function

7
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

def read_data(filename):

with open(filename, 'r') as csvfile:

datareader = csv.reader(csvfile, delimiter=',')

headers = next(datareader)

metadata = []

traindata = []

for name in headers:

metadata.append(name)

for row in datareader:

traindata.append(row)

return (metadata, traindata)

# Decision Tree Classes and Functions

class Node:

def __init__(self, attribute):

self.attribute = attribute

self.children = []

self.answer = ""

def __str__(self):

return self.attribute

def subtables(data, col, delete):

dict = {}

items = np.unique(data[:, col])

count = np.zeros((items.shape[0],), dtype=np.int32)

for x in range(items.shape[0]):
8
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

for y in range(data.shape[0]):

if data[y, col] == items[x]:

count[x] += 1

for x in range(items.shape[0]):

dict[items[x]] = np.empty((count[x], data.shape[1]), dtype="|S32")

pos = np.zeros((items.shape[0],), dtype=np.int32)

for y in range(data.shape[0]):

for x in range(items.shape[0]):

if data[y, col] == items[x]:

dict[items[x]][pos[x], :] = data[y]

pos[x] += 1

if delete:

for key in dict.keys():

dict[key] = np.delete(dict[key], col, 1)

return items, dict

def entropy(S):

items = np.unique(S)

if items.size == 1:

return 0

counts = np.zeros((items.shape[0],))

sums = 0

for x in range(items.shape[0]):

counts[x] = sum(S == items[x]) / (S.size * 1.0)

for count in counts:


9
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

sums += -1 * count * math.log2(count)

return sums

def gain_ratio(data, col):

items, dict = subtables(data, col, delete=False)

total_size = data.shape[0]

entropies = np.zeros((items.shape[0],))

intrinsic = np.zeros((items.shape[0],))

for x in range(items.shape[0]):

ratio = dict[items[x]].shape[0] / (total_size * 1.0)

entropies[x] = ratio * entropy(dict[items[x]][:, -1].astype(str))

intrinsic[x] = ratio * math.log2(ratio)

total_entropy = entropy(data[:, -1].astype(str))

iv = -1 * sum(intrinsic)

for x in range(entropies.shape[0]):

total_entropy -= entropies[x]

if iv == 0:

return 0

return total_entropy / iv

def create_node(data, metadata):

if (np.unique(data[:, -1])).shape[0] == 1:

node = Node("")

node.answer = np.unique(data[:, -1])[0]

return node

gains = np.zeros((data.shape[1] - 1,))


10
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

for col in range(data.shape[1] - 1):

gains[col] = gain_ratio(data, col)

split = np.argmax(gains)

node = Node(metadata[split])

metadata = np.delete(metadata, split, 0)

items, dict = subtables(data, split, delete=True)

for x in range(items.shape[0]):

child = create_node(dict[items[x]], metadata)

node.children.append((items[x], child))

return node

def empty(size):

return " " * size

def print_tree(node, level):

if node.answer != "":

print(empty(level), node.answer)

return

print(empty(level), node.attribute)

for value, n in node.children:

print(empty(level + 1), value)

print_tree(n, level + 2)

# Load Data and Build Tree

metadata, traindata = read_data("Tennies.csv")

data = np.array(traindata, dtype="|S32")

node = create_node(data, metadata)


11
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

print_tree(node, 0)

Output:
Outlook
b'Overcast'
b'Yes'
b'Rain'
Wind
b'Strong'
b'No'
b'Weak'
b'Yes'
b'Sunny'
Humidity
b'High'
b'No'
b'Normal'
b'Yes'

12
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 4:
Exercises to solve the real-world problems using the following machine learning
methods:
a) Linear Regression
b) Logistic Regression
c) Binary Classifier

Program:

a) Linear Regression

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Load dataset
data = pd.read_csv('students_marks.csv')
# Prepare features and target
X = data[['Lab_Internal_Marks']]
y = data['External_Marks']
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Create and train the Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict external marks
y_pred = model.predict(X_test)
# Visualize the results
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', label='Predicted')
plt.title('Linear Regression - Lab Internal vs External Marks')
plt.xlabel('Lab Internal Marks')
plt.ylabel('External Marks')
plt.legend()
plt.show()
# Print the model's performance
print("Model Coefficient:", model.coef_)
print("Model Intercept:", model.intercept_)

13
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Output:

b) Logistic Regression

Program:

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, classification_report

# Load dataset

data = pd.read_csv('students_marks.csv')

14
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

# Create a binary target: 1 for Pass, 0 for Fail

data['Pass/Fail'] = (data['External_Marks'] >= 40).astype(int)

# Prepare features and target

X = data[['Lab_Internal_Marks']]

y = data['Pass/Fail']

# Split the data into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

# Create and train the Logistic Regression model

model = LogisticRegression()

model.fit(X_train, y_train)

# Predict pass/fail status

y_pred = model.predict(X_test)

# Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Classification Report:\n", classification_report(y_test, y_pred,


zero_division=1))

Output:

c) Binary Classifier:
15
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Program:

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.svm import SVC

from sklearn.metrics import accuracy_score, classification_report

# Load dataset

data = pd.read_csv('students_marks.csv')

# Create a binary target: 1 for Pass, 0 for Fail

data['Pass/Fail'] = (data['External_Marks'] >= 40).astype(int)

# Prepare features and target

X = data[['Lab_Internal_Marks']]

y = data['Pass/Fail']

# Split the data into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

# Create and train the Binary Classifier (SVM)

model = SVC(kernel='linear')

model.fit(X_train, y_train)

# Predict pass/fail status

y_pred = model.predict(X_test)

# Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Classification Report:\n", classification_report(y_test, y_pred,


zero_division=1))

16
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Output:

17
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 5:

Develop a program for Bias, Variance, Remove duplicates , Cross


Validation.

Program:

import pandas as pd

from sklearn.model_selection import cross_val_score

from sklearn.linear_model import LinearRegression

import matplotlib.pyplot as plt

from statistics import mean, stdev

# Load the dataset

data = pd.read_csv(r"winequality-red.csv")

# Remove duplicate rows

data = data.drop_duplicates()

# Display data dimensions and the first 5 rows

dim = data.shape

print('Dimensions of the data set are:', dim)

print('First 5 rows of the data set are:')

print(data.head())

# Get column names and feature names

col_names = list(data.columns)

print('Attribute names are:')

print(col_names)

feature_names = col_names[:-1]

print('Feature names are:', feature_names)

18
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

# Prepare the feature set (X) and target variable (y)

X_set = data.drop('quality', axis=1)

Y_set = data['quality']

# Initialize the Linear Regression model

model = LinearRegression()

# Perform cross-validation and calculate bias and variance

k_list = range(2, 200)

bias = []

variance = []

for k in k_list:

scores = cross_val_score(model, X_set, Y_set, cv=k)

bias.append(mean(scores))

variance.append(stdev(scores))

# Plot the Bias-Variance trade-off

plt.plot(k_list, bias, 'b', label='Bias of model')

plt.plot(k_list, variance, 'r', label='Variance of model')

plt.xlabel('k value')

plt.title('Bias-Variance Trade-off')

plt.legend(loc='best')

plt.show()

# Based on the graph, choose the best value for k (e.g., 85)

optimal_k = 85

scores = cross_val_score(model, X_set, Y_set, cv=optimal_k)

bias = mean(scores)
19
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

variance = stdev(scores)

print('Bias of the model is:', bias)

print('Variance of the model is:', variance)

Output:

Dimensions of the data set are: (1359, 12)


First 5 rows of the data set are:
fixed acidity volatile acidity citric acid residual sugar chlorides
\
0 7.4 0.70 0.00 1.9 0.076
1 7.8 0.88 0.00 2.6 0.098
2 7.8 0.76 0.04 2.3 0.092
3 11.2 0.28 0.56 1.9 0.075
5 7.4 0.66 0.00 1.8 0.075

free sulfur dioxide total sulfur dioxide density pH


sulphates \
0 11.0 34.0 0.9978 3.51 0.56
1 25.0 67.0 0.9968 3.20 0.68
2 15.0 54.0 0.9970 3.26 0.65
3 17.0 60.0 0.9980 3.16 0.58
5 13.0 40.0 0.9978 3.51 0.56

alcohol quality
0 9.4 5
1 9.8 5
2 9.8 5
3 9.8 6
5 9.4 5
Attribute names are:
['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH',
'sulphates', 'alcohol', 'quality']

20
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Feature names are: ['fixed acidity', 'volatile acidity', 'citric acid',


'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide',
'density', 'pH', 'sulphates', 'alcohol']

21
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 6:
Write a program to implement Categorical Encoding, One-hot Encoding.

Program:

import pandas as pd

from sklearn.preprocessing import OneHotEncoder

# Load the dataset

data = pd.read_csv('winequality-red.csv')

# Display the first few rows of the dataset

print('First 5 rows of the dataset:')

print(data.head())

# Check for categorical columns

categorical_columns = data.select_dtypes(include=['object']).columns

print('Categorical columns:', categorical_columns)

# If there are any categorical columns, proceed with one-hot encoding

if len(categorical_columns) > 0:

# Initialize the OneHotEncoder

encoder = OneHotEncoder(sparse=False, drop='first')

# Apply OneHotEncoder to the categorical columns

encoded_data = encoder.fit_transform(data[categorical_columns])

22
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

# Create a DataFrame with the encoded columns

encoded_df = pd.DataFrame(encoded_data,
columns=encoder.get_feature_names_out(categorical_columns))

# Concatenate the encoded columns back to the original dataset

data = pd.concat([data.drop(categorical_columns, axis=1), encoded_df],


axis=1)

# Display the first few rows of the transformed dataset

print('First 5 rows of the transformed dataset:')

print(data.head())

Output:
First 5 rows of the dataset:
fixed acidity volatile acidity citric acid residual sugar chlorides \
0 7.4 0.70 0.00 1.9 0.076
1 7.8 0.88 0.00 2.6 0.098
2 7.8 0.76 0.04 2.3 0.092
3 11.2 0.28 0.56 1.9 0.075
4 7.4 0.70 0.00 1.9 0.076

free sulfur dioxide total sulfur dioxide density pH sulphates \


0 11.0 34.0 0.9978 3.51 0.56
1 25.0 67.0 0.9968 3.20 0.68
2 15.0 54.0 0.9970 3.26 0.65
3 17.0 60.0 0.9980 3.16 0.58
4 11.0 34.0 0.9978 3.51 0.56

alcohol quality
0 9.4 5
1 9.8 5
2 9.8 5
3 9.8 6
4 9.4 5

Categorical columns: Index([], dtype='object')

First 5 rows of the transformed dataset:

fixed acidity volatile acidity citric acid residual sugar chlorides \

23
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

0 7.4 0.70 0.00 1.9 0.076


1 7.8 0.88 0.00 2.6 0.098
2 7.8 0.76 0.04 2.3 0.092
3 11.2 0.28 0.56 1.9 0.075
4 7.4 0.70 0.00 1.9 0.076

free sulfur dioxide total sulfur dioxide density pH sulphates \


0 11.0 34.0 0.9978 3.51 0.56
1 25.0 67.0 0.9968 3.20 0.68
2 15.0 54.0 0.9970 3.26 0.65
3 17.0 60.0 0.9980 3.16 0.58
4 11.0 34.0 0.9978 3.51 0.56

alcohol quality
0 9.4 5
1 9.8 5
2 9.8 5
3 9.8 6
4 9.4 5

24
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 7:
Build an Artificial Neural Network by implementing the Back propagation
algorithm and test the same using appropriate data sets.

Program:
import numpy as np

# Input and output data

X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)

y = np.array(([92], [86], [89]), dtype=float)

# Normalize data

X = X / np.amax(X, axis=0)

y = y / 100

# Sigmoid Function

def sigmoid(x):

return 1 / (1 + np.exp(-x))

# Derivative of Sigmoid Function

def derivatives_sigmoid(x):

return x * (1 - x)

# Variable initialization

epoch = 7000 # Setting training iterations

lr = 0.1 # Setting learning rate

inputlayer_neurons = 2 # Number of features in the dataset

hiddenlayer_neurons = 3 # Number of hidden layer neurons

output_neurons = 1 # Number of neurons at output layer

# Weight and bias initialization

25
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

wh = np.random.uniform(size=(inputlayer_neurons, hiddenlayer_neurons))

bh = np.random.uniform(size=(1, hiddenlayer_neurons))

wout = np.random.uniform(size=(hiddenlayer_neurons, output_neurons))

bout = np.random.uniform(size=(1, output_neurons))

# Training algorithm

for i in range(epoch):

# Forward Propagation

hinp1 = np.dot(X, wh)

hinp = hinp1 + bh

hlayer_act = sigmoid(hinp)

outinp1 = np.dot(hlayer_act, wout)

outinp = outinp1 + bout

output = sigmoid(outinp)

# Backpropagation

EO = y - output

outgrad = derivatives_sigmoid(output)

d_output = EO * outgrad

EH = d_output.dot(wout.T)

hiddengrad = derivatives_sigmoid(hlayer_act)

d_hiddenlayer = EH * hiddengrad

# Updating weights and biases

wout += hlayer_act.T.dot(d_output) * lr

bout += np.sum(d_output, axis=0, keepdims=True) * lr

wh += X.T.dot(d_hiddenlayer) * lr
26
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

bh += np.sum(d_hiddenlayer, axis=0, keepdims=True) * lr

# Display results

print("Input: \n" + str(X))

print("Actual Output: \n" + str(y))

print("Predicted Output: \n" + str(output))

Output:

Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.89317944]
[0.88206035]
[0.89398854]]

27
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 8:
Write a program to implement k-Nearest Neighbor algorithm to classify
the iris data set. Print both correct and wrong predictions.

Program:
import csv

import random

import math

import operator

import os

def loadDataset(filename, split, trainingSet=[], testSet=[]):

if not os.path.exists(filename):

raise FileNotFoundError(f"Error: The file '{filename}' was not found.


Please check the file path.")

with open(filename, 'r') as csvfile:

lines = csv.reader(csvfile)

dataset = list(lines)

for x in range(len(dataset) - 1):

for y in range(4):

dataset[x][y] = float(dataset[x][y])

if random.random() < split:

trainingSet.append(dataset[x])

else:

testSet.append(dataset[x])

28
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

def euclideanDistance(instance1, instance2, length):

distance = 0

for x in range(length):

distance += pow((instance1[x] - instance2[x]), 2)

return math.sqrt(distance)

def getNeighbors(trainingSet, testInstance, k):

distances = []

length = len(testInstance) - 1

for x in range(len(trainingSet)):

dist = euclideanDistance(testInstance, trainingSet[x], length)

distances.append((trainingSet[x], dist))

distances.sort(key=operator.itemgetter(1))

neighbors = []

for x in range(k):

neighbors.append(distances[x][0])

return neighbors

def getResponse(neighbors):

classVotes = {}

for x in range(len(neighbors)):

response = neighbors[x][-1]

if response in classVotes:
29
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

classVotes[response] += 1

else:

classVotes[response] = 1

sortedVotes = sorted(classVotes.items(), key=operator.itemgetter(1),


reverse=True)

return sortedVotes[0][0]

def getAccuracy(testSet, predictions):

correct = 0

for x in range(len(testSet)):

if testSet[x][-1] == predictions[x]:

correct += 1

return (correct / float(len(testSet))) * 100.0

def main():

# Prepare data

trainingSet = []

testSet = []

split = 0.67

filename = 'iris.data' # Ensure the file exists in the current directory

try:

loadDataset(filename, split, trainingSet, testSet)

print('Train set:', len(trainingSet))

30
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

print('Test set:', len(testSet))

except FileNotFoundError as e:

print(e)

return

# Generate predictions

predictions = []

k=3

for x in range(len(testSet)):

neighbors = getNeighbors(trainingSet, testSet[x], k)

result = getResponse(neighbors)

predictions.append(result)

print(f'> predicted={result}, actual={testSet[x][-1]}')

accuracy = getAccuracy(testSet, predictions)

print(f'Accuracy: {accuracy:.2f}%')

if __name__ == "__main__":

main()

Output:
Train set: 91
Test set: 59
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa

31
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

> predicted=Iris-setosa, actual=Iris-setosa


> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-virginica, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-versicolor, actual=Iris-versicolor
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica

32
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

> predicted=Iris-virginica, actual=Iris-virginica


> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-virginica, actual=Iris-virginica
> predicted=Iris-versicolor, actual=Iris-virginica
Accuracy: 96.61%

33
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 9:
Implement the non-parametric Locally Weighted Regression algorithm in
order to fit data points. Select appropriate data set for your experiment
and draw graphs.

Program:
import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

def kernel(point, X, k):

m, n = np.shape(X)

weights = np.eye(m)

for j in range(m):

diff = point - X[j]

weights[j, j] = np.exp(diff @ diff.T / (-2.0 * k**2))

return np.matrix(weights)

def localWeight(point, X, y, k):

wei = kernel(point, X, k)

W = np.linalg.inv(X.T @ wei @ X) @ (X.T @ wei @ y)

return W

def localWeightRegression(X, y, k):

m, n = np.shape(X)

ypred = np.zeros(m)

34
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

for i in range(m):

ypred[i] = X[i] @ localWeight(X[i], X, y, k)

return ypred

# Load dataset

data = pd.read_csv('data10.csv')

bill = np.array(data.total_bill)

tip = np.array(data.tip)

# Preparing dataset and adding a column of ones

mbill = np.asarray(bill).reshape(-1, 1)

mtip = np.asarray(tip).reshape(-1, 1)

m = np.shape(mbill)[0]

one = np.ones((m, 1))

X = np.hstack((one, mbill))

# Set bandwidth parameter

k=2

ypred = localWeightRegression(X, mtip, k)

# Sort data for plotting

SortIndex = X[:, 1].argsort()

xsort = X[SortIndex][:, 1].flatten()

ypred_sorted = ypred[SortIndex]
35
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

# Plot results

plt.scatter(bill, tip, color='blue', label='Actual')

plt.plot(xsort, ypred_sorted, color='red', linewidth=2, label='Predicted')

plt.xlabel('Total Bill')

plt.ylabel('Tip')

plt.legend()

plt.show()

Output:

.1

36
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 10:
Assuming a set of documents that need to be classified, use the naïve
Bayesian Classifier model to perform this task. Built-in Java classes/API
can be used to write the program. Calculate the accuracy, precision, and
recall for your data set.

Program:
import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.naive_bayes import MultinomialNB

from sklearn import metrics

import numpy as np

# Load dataset

try:

msg = pd.read_csv('naivetext1.csv', names=['message', 'label'])

print('The dimensions of the dataset:', msg.shape)

except FileNotFoundError:

print("Error: Dataset file 'naivetext1.csv' not found. Please ensure the file is in
the correct directory.")

exit()

# Map labels to numerical values and drop missing values

msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})

msg = msg.dropna()

37
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

# Ensure labelnum is of correct type

msg['labelnum'] = msg['labelnum'].astype(int)

# Splitting the dataset into training and testing sets

X = msg.message

y = msg.labelnum

xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=0.2,


random_state=42)

print(f'Train set size: {xtrain.shape[0]}')

print(f'Test set size: {xtest.shape[0]}')

# Convert text data into numerical vectors

count_vect = CountVectorizer()

xtrain_dtm = count_vect.fit_transform(xtrain)

xtest_dtm = count_vect.transform(xtest)

print('Feature names:', count_vect.get_feature_names_out())

# Train Naive Bayes classifier

clf = MultinomialNB()

clf.fit(xtrain_dtm, ytrain)

# Make predictions

predicted = clf.predict(xtest_dtm)

# Ensure ytest has no NaN values and correct type

38
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

ytest = ytest.dropna().astype(int).to_numpy()

# Validate test set before calculating metrics

if len(ytest) == 0 or len(predicted) == 0:

print("Error: Test set is empty. Adjust test_size parameter in


train_test_split.")

else:

# Calculate and print accuracy metrics

print('Accuracy of the classifier:', metrics.accuracy_score(ytest, predicted))

print('Confusion Matrix:')

print(metrics.confusion_matrix(ytest, predicted))

print('Recall:', metrics.recall_score(ytest, predicted, zero_division=1))

print('Precision:', metrics.precision_score(ytest, predicted, zero_division=1))

# Test new sample predictions

docs_new = ['I like this place', 'My boss is not my saviour']

X_new_counts = count_vect.transform(docs_new)

predictednew = clf.predict(X_new_counts)

for doc, category in zip(docs_new, predictednew):

label = 'pos' if category == 1 else 'neg'

print(f'{doc} -> {label}')

Output:
The dimensions of the dataset: (1,2)

39
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Train set size: 14


Test set size: 4
Feature names: ['about' 'am' 'an' 'and' 'awesome' 'bad' 'beers' 'best' 'boss' 'can'
'dance' 'deal' 'do' 'enemy' 'feel' 'fun' 'good' 'great' 'have' 'holiday'
'horrible' 'house' 'is' 'juice' 'like' 'locality' 'love' 'my' 'not' 'of'
'place' 'sick' 'stay' 'stuff' 'taste' 'that' 'the' 'these' 'this' 'tired'
'to' 'today' 'tomorrow' 'very' 'view' 'we' 'went' 'what' 'will' 'with'
'work']
Accuracy of the classifier: 1.0
Confusion Matrix:
[[2 0]
[0 2]]
Recall: 1.0
Precision: 1.0
I like this place -> neg
My boss is not my saviour -> neg

40
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 11:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data
set for clustering using k-Means algorithm. Compare the results of these
two algorithms and comment on the quality of clustering. You can add
Java/Python ML library classes/API in the program.

Program:
import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.datasets import make_blobs

from sklearn.mixture import GaussianMixture

from sklearn.cluster import KMeans

from sklearn.metrics import silhouette_score, adjusted_rand_score

from matplotlib.patches import Ellipse

# Generate synthetic data

X, y_true = make_blobs(n_samples=100, centers=4, cluster_std=0.60,


random_state=0)

X = X[:, ::-1] # Flip axes for better plotting

# Function to draw an ellipse

def draw_ellipse(position, covariance, ax=None, **kwargs):

ax = ax or plt.gca()

if covariance.shape == (2, 2):

U, s, Vt = np.linalg.svd(covariance)

41
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

angle = np.degrees(np.arctan2(U[1, 0], U[0, 0]))

width, height = 2 * np.sqrt(s)

else:

angle = 0

width, height = 2 * np.sqrt(covariance)

for nsig in range(1, 4):

ax.add_patch(Ellipse(xy=position, width=nsig * width, height=nsig *


height, angle=angle, **kwargs))

# Function to plot GMM results

def plot_gmm(gmm, X, label=True, ax=None):

ax = ax or plt.gca()

labels = gmm.fit(X).predict(X)

if label:

ax.scatter(X[:, 0], X[:, 1], c=labels, s=40, cmap='viridis', zorder=2)

else:

ax.scatter(X[:, 0], X[:, 1], s=40, zorder=2)

ax.axis('equal')

w_factor = 0.2 / gmm.weights_.max()

for pos, covar, w in zip(gmm.means_, gmm.covariances_, gmm.weights_):

draw_ellipse(pos, covar, alpha=w * w_factor)

# Apply k-Means algorithm

kmeans = KMeans(n_clusters=4, random_state=42)

42
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

kmeans_labels = kmeans.fit_predict(X)

kmeans_silhouette = silhouette_score(X, kmeans_labels)

kmeans_ari = adjusted_rand_score(y_true, kmeans_labels)

# Apply EM algorithm (Gaussian Mixture Model)

gmm = GaussianMixture(n_components=4, random_state=42)

gmm_labels = gmm.fit_predict(X)

gmm_silhouette = silhouette_score(X, gmm_labels)

gmm_ari = adjusted_rand_score(y_true, gmm_labels)

# Print results

print('k-Means Clustering:')

print('Silhouette Score:', kmeans_silhouette)

print('Adjusted Rand Index:', kmeans_ari)

print('\nEM Algorithm (GMM) Clustering:')

print('Silhouette Score:', gmm_silhouette)

print('Adjusted Rand Index:', gmm_ari)

# Visualize clustering results

plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)

plt.title('k-Means Clustering')
43
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

plt.scatter(X[:, 0], X[:, 1], c=kmeans_labels, s=40, cmap='viridis')

plt.subplot(1, 2, 2)

plt.title('EM Algorithm (GMM) Clustering')

plot_gmm(GaussianMixture(n_components=4, random_state=42), X)

plt.show()

Output:
k-Means Clustering:
Silhouette Score: 0.6486437837860929
Adjusted Rand Index: 0.9472597722581074

EM Algorithm (GMM) Clustering:


Silhouette Score: 0.6476255767693837
Adjusted Rand Index: 0.92264724951374

44
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 12:
Exploratory Data Analysis for Classification using Pandas or Matplotlib.

Program:
import pandas as pd

import matplotlib.pyplot as plt

# Check Pandas version

print('Pandas version is', pd.__version__)

# Load dataset

data = pd.read_csv(r"tae.csv", header=None)

col_names = ['native_speaker', 'instructor', 'course', 'semester', 'class_size',


'score']

data.columns = col_names

# Convert target variable to categorical

print('Data type of target variable is:', data['score'].dtype)

data['score'] = data['score'].astype('category')

print('After conversion, data type of target variable is:', data['score'].dtype)

45
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

# Dataset information

print('Dimensions of the dataset:', data.shape)

print('The first 5 rows of the dataset:')

print(data.head())

print('The last 5 rows of the dataset:')

print(data.tail())

print('Randomly selected 5 rows of the dataset:')

print(data.sample(5))

print('Columns of the dataset:', data.columns.tolist())

print('Names and data types of attributes:')

print(data.dtypes)

# Convert 'native_speaker' to categorical

data['native_speaker'] = data['native_speaker'].astype('category')

print('After conversion, Names and data types of attributes:')

print(data.dtypes)

# Dataset information and statistics

print('Information of the dataset attributes:')

print(data.info())

print('Statistics of numerical attributes:')

print(data.describe())

print('Statistics of all attributes:')


46
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

print(data.describe(include='all'))

# Correlation matrix

print('Correlation matrix of numerical attributes:')

corr = data.corr(numeric_only=True)

print(corr)

# Distribution of the target variable

print('Distribution of target variable:')

print(data['score'].value_counts())

# Target class distribution w.r.t 'native_speaker'

print(pd.crosstab(data['native_speaker'], data['score']))

print(pd.crosstab(data['native_speaker'], data['score'], normalize='index'))

print('Target class distribution using groupby:')

print(data.groupby('native_speaker')['score'].value_counts())

# Check and handle missing values

print('Checking for null values:')

print(data.isnull().sum())

data.dropna(subset=['instructor'], inplace=True)

print('After removing rows with null values in column "instructor":')

print(data.isnull().sum())

# Unique values in 'score'

print('Unique values in column "score":', data['score'].unique())


47
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

# Visualization

plt.figure(figsize=(12, 6))

plt.scatter(data['semester'], data['class_size'], color='red')

plt.xlabel('Semester')

plt.ylabel('Class Size')

plt.title('Scatter Plot: Semester vs Class Size')

plt.show()

# Bar plot: Number of distinct courses per semester

data.groupby('semester')['course'].nunique().plot(kind='bar', title='Number of
Distinct Courses per Semester')

plt.show()

# Histogram: Frequency of values in 'semester'

data['semester'].plot(kind='hist', title='Frequency of Semester Values')

plt.show()

# Line and bar plot

ax = data.plot(kind='bar', x='semester', y='course', color='red', title='Bar Plot:


Semester vs Course')

data.plot(kind='line', x='semester', y='class_size', ax=ax, title='Line Plot:


Semester vs Class Size')

plt.show()

Output:
Pandas version is 2.2.3
Data type of target variable is: int64
After conversion, data type of target variable is: category
Dimensions of the dataset: (151, 6)

48
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

The first 5 rows of the dataset:


native_speaker instructor course semester class_size score
0 1 23 3 1 19 3
1 2 15 3 1 17 3
2 1 23 3 2 49 3
3 1 5 2 2 33 3
4 2 7 11 2 55 3
The last 5 rows of the dataset:
native_speaker instructor course semester class_size score
146 2 3 2 2 26 1
147 2 10 3 2 12 1
148 1 18 7 2 48 1
149 2 22 1 2 51 1
150 2 2 10 2 27 1
Randomly selected 5 rows of the dataset:
native_speaker instructor course semester class_size score
82 2 13 3 1 11 3
38 2 14 15 2 38 1
31 2 18 5 2 19 1
149 2 22 1 2 51 1
112 1 14 15 2 32 1
Columns of the dataset: ['native_speaker', 'instructor', 'course', 'semester',
'class_size', 'score']
Names and data types of attributes:
native_speaker int64
instructor int64
course int64
semester int64
class_size int64
score category
dtype: object
After conversion, Names and data types of attributes:
native_speaker category
instructor int64
course int64
semester int64
class_size int64
score category
dtype: object
Information of the dataset attributes:
<class 'pandas.core.frame.DataFrame'>

49
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

RangeIndex: 151 entries, 0 to 150


Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 native_speaker 151 non-null category
1 instructor 151 non-null int64
2 course 151 non-null int64
3 semester 151 non-null int64
4 class_size 151 non-null int64
5 score 151 non-null category
dtypes: category(2), int64(4)
memory usage: 5.4 KB
None
Statistics of numerical attributes:
instructor course semester class_size
count 151.000000 151.000000 151.000000 151.000000
mean 13.642384 8.105960 1.847682 27.867550
std 6.825779 7.023914 0.360525 12.893758
min 1.000000 1.000000 1.000000 3.000000
25% 8.000000 3.000000 2.000000 19.000000
50% 13.000000 4.000000 2.000000 27.000000
75% 20.000000 15.000000 2.000000 37.000000
max 25.000000 26.000000 2.000000 66.000000

Name: count, dtype: int64

50
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

score 1 2 3
native_speaker
1 5 6 18
2 44 44 34
score 1 2 3
native_speaker
1 0.172414 0.206897 0.620690
2 0.360656 0.360656 0.278689
Target class distribution using groupby:
native_speaker score
1 3 18
2 6
1 5
2 1 44
2 44
3 34
Name: count, dtype: int64
Checking for null values:
native_speaker 0
instructor 0
course 0
semester 0
class_size 0
score 0
dtype: int64
After removing rows with null values in column "instructor":
native_speaker 0
instructor 0
course 0
semester 0
class_size 0
score 0
dtype: int64
Unique values in column "score": [3, 2, 1]
Categories (3, int64): [1, 2, 3]

51
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

52
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

53
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Experiment 13:
Write a Python program to construct a Bayesian network considering
medical data. Use this model to demonstrate the diagnosis of heart patients
using standard Heart Disease Data Set.

Program:
import bayespy as bp

import numpy as np

import csv

from colorama import init, Fore, Back, Style

init()

# Define Parameter Enum values

# Age

ageEnum = {'SuperSeniorCitizen': 0, 'SeniorCitizen': 1, 'MiddleAged': 2,


'Youth': 3, 'Teen': 4}

# Gender

genderEnum = {'Male': 0, 'Female': 1}

# FamilyHistory

familyHistoryEnum = {'Yes': 0, 'No': 1}

# Diet (Calorie Intake)

dietEnum = {'High': 0, 'Medium': 1, 'Low': 2}

# LifeStyle

lifeStyleEnum = {'Athlete': 0, 'Active': 1, 'Moderate': 2, 'Sedentary': 3}

# Cholesterol

54
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

cholesterolEnum = {'High': 0, 'BorderLine': 1, 'Normal': 2}

# HeartDisease

heartDiseaseEnum = {'Yes': 0, 'No': 1}

# Load heart disease data from CSV

with open('heart_disease_data.csv') as csvfile:

lines = csv.reader(csvfile)

dataset = list(lines)

data = []

for x in dataset:

data.append([ageEnum[x[0]], genderEnum[x[1]], familyHistoryEnum[x[2]],


dietEnum[x[3]],

lifeStyleEnum[x[4]], cholesterolEnum[x[5]],
heartDiseaseEnum[x[6]]])

# Training data for machine learning

data = np.array(data)

N = len(data)

# Input data column assignment

p_age = bp.nodes.Dirichlet(1.0 * np.ones(5))

age = bp.nodes.Categorical(p_age, plates=(N,))

age.observe(data[:, 0])

55
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

p_gender = bp.nodes.Dirichlet(1.0 * np.ones(2))

gender = bp.nodes.Categorical(p_gender, plates=(N,))

gender.observe(data[:, 1])

p_familyhistory = bp.nodes.Dirichlet(1.0 * np.ones(2))

familyhistory = bp.nodes.Categorical(p_familyhistory, plates=(N,))

familyhistory.observe(data[:, 2])

p_diet = bp.nodes.Dirichlet(1.0 * np.ones(3))

diet = bp.nodes.Categorical(p_diet, plates=(N,))

diet.observe(data[:, 3])

p_lifestyle = bp.nodes.Dirichlet(1.0 * np.ones(4))

lifestyle = bp.nodes.Categorical(p_lifestyle, plates=(N,))

lifestyle.observe(data[:, 4])

p_cholesterol = bp.nodes.Dirichlet(1.0 * np.ones(3))

cholesterol = bp.nodes.Categorical(p_cholesterol, plates=(N,))

cholesterol.observe(data[:, 5])

# Prepare nodes and establish edges

p_heartdisease = bp.nodes.Dirichlet(np.ones(2), plates=(5, 2, 2, 3, 4, 3))

heartdisease = bp.nodes.MultiMixture([age, gender, familyhistory, diet,


lifestyle, cholesterol],

56
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

bp.nodes.Categorical, p_heartdisease)

heartdisease.observe(data[:, 6])

# Update the network

p_heartdisease.update()

# Interactive Test

m=0

while m == 0:

print("\n")

input_age = int(input('Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-


MiddleAged, 3-Youth, 4-Teen): '))

input_gender = int(input('Enter Gender (0-Male, 1-Female): '))

input_familyhistory = int(input('Enter FamilyHistory (0-Yes, 1-No): '))

input_diet = int(input('Enter Diet (0-High, 1-Medium, 2-Low): '))

input_lifestyle = int(input('Enter LifeStyle (0-Athlete, 1-Active, 2-Moderate,


3-Sedentary): '))

input_cholesterol = int(input('Enter Cholesterol (0-High, 1-BorderLine, 2-


Normal): '))

res = bp.nodes.MultiMixture([

input_age,

input_gender,

input_familyhistory,

input_diet,
57
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

input_lifestyle,

input_cholesterol

], bp.nodes.Categorical,
p_heartdisease).get_moments()[0][heartDiseaseEnum['Yes']]

print(Fore.RED + "Probability of Heart Disease = " + str(res))

print(Style.RESET_ALL)

m = int(input("Enter 0 to Continue or 1 to Exit: "))

Output:
Test case-0:
Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-MiddleAged, 3-Youth, 4-Teen): 0
Enter Gender (0-Male, 1-Female): 0
Enter FamilyHistory (0-Yes, 1-No): 1
Enter Diet (0-High, 1-Medium, 2-Low): 1
Enter LifeStyle (0-Athlete, 1-Active, 2-Moderate, 3-Sedentary): 2
Enter Cholesterol (0-High, 1-BorderLine, 2-Normal): 1
Probability of Heart Disease = 0.5

Enter 0 to Continue or 1 to Exit: 1

Test case-1:
Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-MiddleAged, 3-Youth, 4-Teen): 0
Enter Gender (0-Male, 1-Female): 0
Enter FamilyHistory (0-Yes, 1-No): 3
Enter Diet (0-High, 1-Medium, 2-Low): 2
Enter LifeStyle (0-Athlete, 1-Active, 2-Moderate, 3-Sedentary): 2
Enter Cholesterol (0-High, 1-BorderLine, 2-Normal): 1
Probability of Heart Disease = 0.5

Enter 0 to Continue or 1 to Exit: 1

Test case-2:
Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-MiddleAged, 3-Youth, 4-Teen): 2

Enter Gender (0-Male, 1-Female): 0


Enter FamilyHistory (0-Yes, 1-No): 0
Enter Diet (0-High, 1-Medium, 2-Low): 0
Enter LifeStyle (0-Athlete, 1-Active, 2-Moderate, 3-Sedentary): 2
Enter Cholesterol (0-High, 1-BorderLine, 2-Normal): 2
Probability of Heart Disease = 0.5

58
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)

Enter 0 to Continue or 1 to Exit: 1

Test case-3:
Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-MiddleAged, 3-Youth, 4-Teen): 3

Enter Gender (0-Male, 1-Female): 1


Enter FamilyHistory (0-Yes, 1-No): 1
Enter Diet (0-High, 1-Medium, 2-Low): 1
Enter LifeStyle (0-Athlete, 1-Active, 2-Moderate, 3-Sedentary): 2
Enter Cholesterol (0-High, 1-BorderLine, 2-Normal): 2
Probability of Heart Disease = 0.5

Enter 0 to Continue or 1 to Exit: 1

Test case-4:

Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-MiddleAged, 3-Youth, 4-Teen): 4


Enter Gender (0-Male, 1-Female): 1
Enter FamilyHistory (0-Yes, 1-No): 0
Enter Diet (0-High, 1-Medium, 2-Low): 2
Enter LifeStyle (0-Athlete, 1-Active, 2-Moderate, 3-Sedentary): 3
Enter Cholesterol (0-High, 1-BorderLine, 2-Normal): 0
Probability of Heart Disease = 0.5

Enter 0 to Continue or 1 to Exit: 1

Experiment 14:
Write a program to Implement Support Vector Machines and Principle
Component Analysis.

Program:

59
Experiment 14:
Write a program to Implement Support Vector Machines and Principle
Component Analysis.

Program:
import pandas as pd

import seaborn as sns

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.metrics import confusion_matrix, classification_report,


accuracy_score, recall_score, precision_score, f1_score

from sklearn.svm import SVC

from sklearn.preprocessing import StandardScaler

# Load the dataset

data = pd.read_excel(r"D:\ML Lab\haberman.csv.xlsx", header=None)

# Add column names

col_names = [ age , year , pos_axil_nodes , survival_status ]

data.columns = col_names

# Drop the year column


data = data.drop([ year ], axis=1)

# Check for missing data

print( Missing values in the dataset: )

print(data.isnull().sum())

# Display basic information

print( The first 5 rows of the data set are: )

print(data.head())

print( Dimensions of the data set: , data.shape)

print( Statistics of the data are: )

print(data.describe())

print( Correlation matrix of the data set: )

print(data.corr())

# Display class labels

class_labels = data[ survival_status ].unique().astype(str).tolist()

print( Class labels are: )

print(class_labels)

# Plot class distribution

sns.countplot(data[ survival_status ])
plt.title("Class Distribution")

plt.show()

# Separate features and target variable

x_set = data.drop([ survival_status ], axis=1)

y_set = data[ survival_status ]

# Display features and target variable

print( Feature names are: )

print(list(x_set.columns))

print( First 5 rows of the feature set are: )

print(x_set.head())

print( First 5 rows of the target variable are: )

print(y_set.head())

# Check target variable distribution

print( Distribution of Target variable is: )

print(y_set.value_counts())

# Standardize the feature set

scaler = StandardScaler()

x_train, x_test, y_train, y_test = train_test_split(x_set, y_set, test_size=0.3,


random_state=42)

x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

# Train the SVM model

model = SVC()

print("Training the model with train dataset")

model.fit(x_train, y_train)

# Make predictions

y_pred = model.predict(x_test)

print( Predicted class labels for test data are: )

print(y_pred)

# Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Precision:", precision_score(y_test, y_pred, average= binary ))

print("Recall:", recall_score(y_test, y_pred, average= binary ))

print("F1 Score:", f1_score(y_test, y_pred, average= binary ))

print(classification_report(y_test, y_pred, target_names=class_labels))

# Confusion Matrix

cm = confusion_matrix(y_test, y_pred)

df_cm = pd.DataFrame(cm, columns=class_labels, index=class_labels)

df_cm.index.name = Actual

df_cm.columns.name = Predicted

sns.heatmap(df_cm, annot=True, cmap="Blues", fmt= d )

plt.title("Confusion Matrix")
plt.show()

# Scatter plot of training data and decision boundary

plt.scatter(x_train[:, 0], x_train[:, 1], c=y_train, cmap=plt.cm.Paired)

plt.xlabel( Age )

plt.ylabel( Positive Axillary Nodes )

plt.title( Data points in Training Dataset )

plt.show()

# Decision boundary visualization (only works for 2D data)

ax = plt.gca()

xlim = ax.get_xlim()

ylim = ax.get_ylim()

xx = np.linspace(xlim[0], xlim[1], 30)

yy = np.linspace(ylim[0], ylim[1], 30)

YY, XX = np.meshgrid(yy, xx)

xy = np.vstack([XX.ravel(), YY.ravel()]).T

Z = model.decision_function(xy).reshape(XX.shape)

ax.contour(XX, YY, Z, colors= red , levels=[-1, 0, 1], alpha=0.5, linestyles=[ -- , - , --


])

ax.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1], s=30,


facecolors= green )

plt.title( Support Vectors and Decision Boundary )

plt.show()
Output:
Experiment 15:
Write a program to Implement Principle Component Analysis.

Program:
import numpy as nmp

import matplotlib.pyplot as mpltl

import pandas as pnd

DS = pnd.read_csv( Wine.csv )

# Now, we will distribute the dataset into two components "X" and "Y"

X = DS.iloc[: , 0:13].values

Y = DS.iloc[: , 13].values

from sklearn.model_selection import train_test_split as tts

X_train, X_test, Y_train, Y_test = tts(X, Y, test_size = 0.2, random_state = 0)

from sklearn.preprocessing import StandardScaler as SS

SC = SS()

X_train = SC.fit_transform(X_train)

X_test = SC.transform(X_test)

from sklearn.decomposition import PCA

PCa = PCA (n_components = 1)

X_train = PCa.fit_transform(X_train)

X_test = PCa.transform(X_test)

explained_variance = PCa.explained_variance_ratio_
from sklearn.linear_model import LogisticRegression as LR

classifier_1 = LR (random_state = 0)

classifier_1.fit(X_train, Y_train)

Output:

You might also like