ML Lab Manual
ML Lab Manual
For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate- Elimination algorithm to output a description of the set of all hypotheses consistent with the
2 training examples.
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new sample.
3
Exercises to solve the real-world problems using the following machine learning methods:
a) Linear Regression
4
b) Logistic Regression
c) Binary Classifier.
5 Develop a program for Bias, Variance, Remove duplicates , Cross Validation
Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set. Print both
8 correct and wrong predictions.
Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points.
9 Select appropriate data set for your experiment and draw graphs.
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to
perform this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy,
10 precision, and recall for your data set.
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for clustering using k-
Means algorithm. Compare the results of these two algorithms and comment on the quality of
11 clustering. You can add Java/Python ML library classes/API in the program.
14 Write a program to Implement Support Vector Machines and Principle Component Analysis.
1
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Experiment-1
Implement and demonstrate the FIND-S algorithm for finding the most
specific hypothesis based on a given set of training data samples. Read the
training data from a .CSV file.
Program:
import csv
Reader=csv.reader(f)
Your_list=list(Reader)
for i in Your_list:
print(i)
if i[-1] == "Yes" :
j=0
for x in i:
if x!="Yes" :
h[0][j] = x
h[0][j] = '?'
else:
pass
j=j+1
2
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
print(h)
Output:
['Outlook', 'Temperature', 'Humidity', 'Wind', 'Play Tennis']
['Sunny', 'Hot', 'High', 'Weak', 'No']
['Sunny', 'Hot', 'High', 'Strong', 'No']
['Overcast', 'Hot', 'High', 'Weak', 'Yes']
['Rain', 'Mild', 'High', 'Weak', 'Yes']
['Rain', 'Cool', 'Normal', 'Weak', 'Yes']
['Rain', 'Cool', 'Normal', 'Strong', 'No']
['Overcast', 'Cool', 'Normal', 'Strong', 'Yes']
['Sunny', 'Mild', 'High', 'Weak', 'No']
['Sunny', 'Cool', 'Normal', 'Weak', 'Yes']
['Rain', 'Mild', 'Normal', 'Weak', 'Yes']
['Sunny', 'Mild', 'Normal', 'Strong', 'Yes']
['Overcast', 'Mild', 'High', 'Strong', 'Yes']
['Overcast', 'Hot', 'Normal', 'Weak', 'Yes']
['Rain', 'Mild', 'High', 'Strong', 'No']
3
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Experiment – 2:
For a given set of training data examples stored in a .CSV file, implement
and demonstrate the Candidate-Elimination algorithm to output a
description of the set of allhypotheses consistent with the training
examples.
Program:
import pandas as pd
data = pd.read_csv('Tennies.csv')
features = list(data.columns[:-1])
# Initialize the most specific hypothesis (S0) and the most general hypothesis (G0)
more_general_parts = []
more_general_parts.append(mg)
return all(more_general_parts)
4
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
new_h = list(h)
return new_h
specializations = []
if val == '?':
specializations.append(specialization)
specializations.append(specialization)
return specializations
example = row.iloc[:-1]
label = row.iloc[-1]
if label == 'Yes':
5
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
S = [min_generalization(S, example)]
else:
G += min_specialization(G[0], example)
Output:
Final specific boundary (S): []
Final general boundary (G): [['?', '?', '?', '?'],
['Sunny', '?', '?', '?'],
['Overcast', '?', '?', '?'],
['?', 'Cool', '?', '?'],
['?', 'Hot', '?', '?'],
['?', '?', 'Normal', '?'],
['?', '?', '?', 'Weak']]
6
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Experiment-3:
Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for building the decision tree and
apply this knowledge to classify a new sample.
Program code:
import csv
def read_data(filename):
headers = next(datareader)
metadata = []
traindata = []
metadata.append(name)
traindata.append(row)
import csv
import numpy as np
import math
7
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
def read_data(filename):
headers = next(datareader)
metadata = []
traindata = []
metadata.append(name)
traindata.append(row)
class Node:
self.attribute = attribute
self.children = []
self.answer = ""
def __str__(self):
return self.attribute
dict = {}
for x in range(items.shape[0]):
8
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
for y in range(data.shape[0]):
count[x] += 1
for x in range(items.shape[0]):
for y in range(data.shape[0]):
for x in range(items.shape[0]):
dict[items[x]][pos[x], :] = data[y]
pos[x] += 1
if delete:
def entropy(S):
items = np.unique(S)
if items.size == 1:
return 0
counts = np.zeros((items.shape[0],))
sums = 0
for x in range(items.shape[0]):
return sums
total_size = data.shape[0]
entropies = np.zeros((items.shape[0],))
intrinsic = np.zeros((items.shape[0],))
for x in range(items.shape[0]):
iv = -1 * sum(intrinsic)
for x in range(entropies.shape[0]):
total_entropy -= entropies[x]
if iv == 0:
return 0
return total_entropy / iv
if (np.unique(data[:, -1])).shape[0] == 1:
node = Node("")
return node
split = np.argmax(gains)
node = Node(metadata[split])
for x in range(items.shape[0]):
node.children.append((items[x], child))
return node
def empty(size):
if node.answer != "":
print(empty(level), node.answer)
return
print(empty(level), node.attribute)
print_tree(n, level + 2)
print_tree(node, 0)
Output:
Outlook
b'Overcast'
b'Yes'
b'Rain'
Wind
b'Strong'
b'No'
b'Weak'
b'Yes'
b'Sunny'
Humidity
b'High'
b'No'
b'Normal'
b'Yes'
12
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Experiment 4:
Exercises to solve the real-world problems using the following machine learning
methods:
a) Linear Regression
b) Logistic Regression
c) Binary Classifier
Program:
a) Linear Regression
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Load dataset
data = pd.read_csv('students_marks.csv')
# Prepare features and target
X = data[['Lab_Internal_Marks']]
y = data['External_Marks']
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Create and train the Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict external marks
y_pred = model.predict(X_test)
# Visualize the results
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', label='Predicted')
plt.title('Linear Regression - Lab Internal vs External Marks')
plt.xlabel('Lab Internal Marks')
plt.ylabel('External Marks')
plt.legend()
plt.show()
# Print the model's performance
print("Model Coefficient:", model.coef_)
print("Model Intercept:", model.intercept_)
13
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Output:
b) Logistic Regression
Program:
import pandas as pd
# Load dataset
data = pd.read_csv('students_marks.csv')
14
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
X = data[['Lab_Internal_Marks']]
y = data['Pass/Fail']
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Output:
c) Binary Classifier:
15
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Program:
import pandas as pd
# Load dataset
data = pd.read_csv('students_marks.csv')
X = data[['Lab_Internal_Marks']]
y = data['Pass/Fail']
model = SVC(kernel='linear')
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
16
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Output:
17
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Experiment 5:
Program:
import pandas as pd
data = pd.read_csv(r"winequality-red.csv")
data = data.drop_duplicates()
dim = data.shape
print(data.head())
col_names = list(data.columns)
print(col_names)
feature_names = col_names[:-1]
18
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Y_set = data['quality']
model = LinearRegression()
bias = []
variance = []
for k in k_list:
bias.append(mean(scores))
variance.append(stdev(scores))
plt.xlabel('k value')
plt.title('Bias-Variance Trade-off')
plt.legend(loc='best')
plt.show()
# Based on the graph, choose the best value for k (e.g., 85)
optimal_k = 85
bias = mean(scores)
19
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
variance = stdev(scores)
Output:
alcohol quality
0 9.4 5
1 9.8 5
2 9.8 5
3 9.8 6
5 9.4 5
Attribute names are:
['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH',
'sulphates', 'alcohol', 'quality']
20
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
21
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Experiment 6:
Write a program to implement Categorical Encoding, One-hot Encoding.
Program:
import pandas as pd
data = pd.read_csv('winequality-red.csv')
print(data.head())
categorical_columns = data.select_dtypes(include=['object']).columns
if len(categorical_columns) > 0:
encoded_data = encoder.fit_transform(data[categorical_columns])
22
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
encoded_df = pd.DataFrame(encoded_data,
columns=encoder.get_feature_names_out(categorical_columns))
print(data.head())
Output:
First 5 rows of the dataset:
fixed acidity volatile acidity citric acid residual sugar chlorides \
0 7.4 0.70 0.00 1.9 0.076
1 7.8 0.88 0.00 2.6 0.098
2 7.8 0.76 0.04 2.3 0.092
3 11.2 0.28 0.56 1.9 0.075
4 7.4 0.70 0.00 1.9 0.076
alcohol quality
0 9.4 5
1 9.8 5
2 9.8 5
3 9.8 6
4 9.4 5
23
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
alcohol quality
0 9.4 5
1 9.8 5
2 9.8 5
3 9.8 6
4 9.4 5
24
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Experiment 7:
Build an Artificial Neural Network by implementing the Back propagation
algorithm and test the same using appropriate data sets.
Program:
import numpy as np
# Normalize data
X = X / np.amax(X, axis=0)
y = y / 100
# Sigmoid Function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def derivatives_sigmoid(x):
return x * (1 - x)
# Variable initialization
25
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
wh = np.random.uniform(size=(inputlayer_neurons, hiddenlayer_neurons))
bh = np.random.uniform(size=(1, hiddenlayer_neurons))
# Training algorithm
for i in range(epoch):
# Forward Propagation
hinp = hinp1 + bh
hlayer_act = sigmoid(hinp)
output = sigmoid(outinp)
# Backpropagation
EO = y - output
outgrad = derivatives_sigmoid(output)
d_output = EO * outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) * lr
wh += X.T.dot(d_hiddenlayer) * lr
26
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
# Display results
Output:
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.89317944]
[0.88206035]
[0.89398854]]
27
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Experiment 8:
Write a program to implement k-Nearest Neighbor algorithm to classify
the iris data set. Print both correct and wrong predictions.
Program:
import csv
import random
import math
import operator
import os
if not os.path.exists(filename):
lines = csv.reader(csvfile)
dataset = list(lines)
for y in range(4):
dataset[x][y] = float(dataset[x][y])
trainingSet.append(dataset[x])
else:
testSet.append(dataset[x])
28
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
distance = 0
for x in range(length):
return math.sqrt(distance)
distances = []
length = len(testInstance) - 1
for x in range(len(trainingSet)):
distances.append((trainingSet[x], dist))
distances.sort(key=operator.itemgetter(1))
neighbors = []
for x in range(k):
neighbors.append(distances[x][0])
return neighbors
def getResponse(neighbors):
classVotes = {}
for x in range(len(neighbors)):
response = neighbors[x][-1]
if response in classVotes:
29
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
classVotes[response] += 1
else:
classVotes[response] = 1
return sortedVotes[0][0]
correct = 0
for x in range(len(testSet)):
if testSet[x][-1] == predictions[x]:
correct += 1
def main():
# Prepare data
trainingSet = []
testSet = []
split = 0.67
try:
30
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
except FileNotFoundError as e:
print(e)
return
# Generate predictions
predictions = []
k=3
for x in range(len(testSet)):
result = getResponse(neighbors)
predictions.append(result)
print(f'Accuracy: {accuracy:.2f}%')
if __name__ == "__main__":
main()
Output:
Train set: 91
Test set: 59
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
> predicted=Iris-setosa, actual=Iris-setosa
31
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
32
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
33
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Experiment 9:
Implement the non-parametric Locally Weighted Regression algorithm in
order to fit data points. Select appropriate data set for your experiment
and draw graphs.
Program:
import numpy as np
import pandas as pd
m, n = np.shape(X)
weights = np.eye(m)
for j in range(m):
return np.matrix(weights)
wei = kernel(point, X, k)
return W
m, n = np.shape(X)
ypred = np.zeros(m)
34
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
for i in range(m):
return ypred
# Load dataset
data = pd.read_csv('data10.csv')
bill = np.array(data.total_bill)
tip = np.array(data.tip)
mbill = np.asarray(bill).reshape(-1, 1)
mtip = np.asarray(tip).reshape(-1, 1)
m = np.shape(mbill)[0]
X = np.hstack((one, mbill))
k=2
ypred_sorted = ypred[SortIndex]
35
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
# Plot results
plt.xlabel('Total Bill')
plt.ylabel('Tip')
plt.legend()
plt.show()
Output:
.1
36
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Experiment 10:
Assuming a set of documents that need to be classified, use the naïve
Bayesian Classifier model to perform this task. Built-in Java classes/API
can be used to write the program. Calculate the accuracy, precision, and
recall for your data set.
Program:
import pandas as pd
import numpy as np
# Load dataset
try:
except FileNotFoundError:
print("Error: Dataset file 'naivetext1.csv' not found. Please ensure the file is in
the correct directory.")
exit()
msg = msg.dropna()
37
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
msg['labelnum'] = msg['labelnum'].astype(int)
X = msg.message
y = msg.labelnum
count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm = count_vect.transform(xtest)
clf = MultinomialNB()
clf.fit(xtrain_dtm, ytrain)
# Make predictions
predicted = clf.predict(xtest_dtm)
38
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
ytest = ytest.dropna().astype(int).to_numpy()
if len(ytest) == 0 or len(predicted) == 0:
else:
print('Confusion Matrix:')
print(metrics.confusion_matrix(ytest, predicted))
X_new_counts = count_vect.transform(docs_new)
predictednew = clf.predict(X_new_counts)
Output:
The dimensions of the dataset: (1,2)
39
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
40
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Experiment 11:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data
set for clustering using k-Means algorithm. Compare the results of these
two algorithms and comment on the quality of clustering. You can add
Java/Python ML library classes/API in the program.
Program:
import numpy as np
import pandas as pd
ax = ax or plt.gca()
U, s, Vt = np.linalg.svd(covariance)
41
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
else:
angle = 0
ax = ax or plt.gca()
labels = gmm.fit(X).predict(X)
if label:
else:
ax.axis('equal')
42
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
kmeans_labels = kmeans.fit_predict(X)
gmm_labels = gmm.fit_predict(X)
# Print results
print('k-Means Clustering:')
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title('k-Means Clustering')
43
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
plt.subplot(1, 2, 2)
plot_gmm(GaussianMixture(n_components=4, random_state=42), X)
plt.show()
Output:
k-Means Clustering:
Silhouette Score: 0.6486437837860929
Adjusted Rand Index: 0.9472597722581074
44
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Experiment 12:
Exploratory Data Analysis for Classification using Pandas or Matplotlib.
Program:
import pandas as pd
# Load dataset
data.columns = col_names
data['score'] = data['score'].astype('category')
45
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
# Dataset information
print(data.head())
print(data.tail())
print(data.sample(5))
print(data.dtypes)
data['native_speaker'] = data['native_speaker'].astype('category')
print(data.dtypes)
print(data.info())
print(data.describe())
print(data.describe(include='all'))
# Correlation matrix
corr = data.corr(numeric_only=True)
print(corr)
print(data['score'].value_counts())
print(pd.crosstab(data['native_speaker'], data['score']))
print(data.groupby('native_speaker')['score'].value_counts())
print(data.isnull().sum())
data.dropna(subset=['instructor'], inplace=True)
print(data.isnull().sum())
# Visualization
plt.figure(figsize=(12, 6))
plt.xlabel('Semester')
plt.ylabel('Class Size')
plt.show()
data.groupby('semester')['course'].nunique().plot(kind='bar', title='Number of
Distinct Courses per Semester')
plt.show()
plt.show()
plt.show()
Output:
Pandas version is 2.2.3
Data type of target variable is: int64
After conversion, data type of target variable is: category
Dimensions of the dataset: (151, 6)
48
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
49
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
50
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
score 1 2 3
native_speaker
1 5 6 18
2 44 44 34
score 1 2 3
native_speaker
1 0.172414 0.206897 0.620690
2 0.360656 0.360656 0.278689
Target class distribution using groupby:
native_speaker score
1 3 18
2 6
1 5
2 1 44
2 44
3 34
Name: count, dtype: int64
Checking for null values:
native_speaker 0
instructor 0
course 0
semester 0
class_size 0
score 0
dtype: int64
After removing rows with null values in column "instructor":
native_speaker 0
instructor 0
course 0
semester 0
class_size 0
score 0
dtype: int64
Unique values in column "score": [3, 2, 1]
Categories (3, int64): [1, 2, 3]
51
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
52
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
53
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Experiment 13:
Write a Python program to construct a Bayesian network considering
medical data. Use this model to demonstrate the diagnosis of heart patients
using standard Heart Disease Data Set.
Program:
import bayespy as bp
import numpy as np
import csv
init()
# Age
# Gender
# FamilyHistory
# LifeStyle
# Cholesterol
54
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
# HeartDisease
lines = csv.reader(csvfile)
dataset = list(lines)
data = []
for x in dataset:
lifeStyleEnum[x[4]], cholesterolEnum[x[5]],
heartDiseaseEnum[x[6]]])
data = np.array(data)
N = len(data)
age.observe(data[:, 0])
55
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
gender.observe(data[:, 1])
familyhistory.observe(data[:, 2])
diet.observe(data[:, 3])
lifestyle.observe(data[:, 4])
cholesterol.observe(data[:, 5])
56
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
bp.nodes.Categorical, p_heartdisease)
heartdisease.observe(data[:, 6])
p_heartdisease.update()
# Interactive Test
m=0
while m == 0:
print("\n")
res = bp.nodes.MultiMixture([
input_age,
input_gender,
input_familyhistory,
input_diet,
57
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
input_lifestyle,
input_cholesterol
], bp.nodes.Categorical,
p_heartdisease).get_moments()[0][heartDiseaseEnum['Yes']]
print(Style.RESET_ALL)
Output:
Test case-0:
Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-MiddleAged, 3-Youth, 4-Teen): 0
Enter Gender (0-Male, 1-Female): 0
Enter FamilyHistory (0-Yes, 1-No): 1
Enter Diet (0-High, 1-Medium, 2-Low): 1
Enter LifeStyle (0-Athlete, 1-Active, 2-Moderate, 3-Sedentary): 2
Enter Cholesterol (0-High, 1-BorderLine, 2-Normal): 1
Probability of Heart Disease = 0.5
Test case-1:
Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-MiddleAged, 3-Youth, 4-Teen): 0
Enter Gender (0-Male, 1-Female): 0
Enter FamilyHistory (0-Yes, 1-No): 3
Enter Diet (0-High, 1-Medium, 2-Low): 2
Enter LifeStyle (0-Athlete, 1-Active, 2-Moderate, 3-Sedentary): 2
Enter Cholesterol (0-High, 1-BorderLine, 2-Normal): 1
Probability of Heart Disease = 0.5
Test case-2:
Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-MiddleAged, 3-Youth, 4-Teen): 2
58
BVC College of Engineering, Palacharla,
IIIrd B.Tech II Semester CSE A&B Lab Manual
Subject: Machine Learning using Python Lab(R2032054)
Test case-3:
Enter Age (0-SuperSeniorCitizen, 1-SeniorCitizen, 2-MiddleAged, 3-Youth, 4-Teen): 3
Test case-4:
Experiment 14:
Write a program to Implement Support Vector Machines and Principle
Component Analysis.
Program:
59
Experiment 14:
Write a program to Implement Support Vector Machines and Principle
Component Analysis.
Program:
import pandas as pd
import numpy as np
data.columns = col_names
print(data.isnull().sum())
print(data.head())
print(data.describe())
print(data.corr())
print(class_labels)
sns.countplot(data[ survival_status ])
plt.title("Class Distribution")
plt.show()
print(list(x_set.columns))
print(x_set.head())
print(y_set.head())
print(y_set.value_counts())
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
model = SVC()
model.fit(x_train, y_train)
# Make predictions
y_pred = model.predict(x_test)
print(y_pred)
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
df_cm.index.name = Actual
df_cm.columns.name = Predicted
plt.title("Confusion Matrix")
plt.show()
plt.xlabel( Age )
plt.show()
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = model.decision_function(xy).reshape(XX.shape)
plt.show()
Output:
Experiment 15:
Write a program to Implement Principle Component Analysis.
Program:
import numpy as nmp
DS = pnd.read_csv( Wine.csv )
# Now, we will distribute the dataset into two components "X" and "Y"
X = DS.iloc[: , 0:13].values
Y = DS.iloc[: , 13].values
SC = SS()
X_train = SC.fit_transform(X_train)
X_test = SC.transform(X_test)
X_train = PCa.fit_transform(X_train)
X_test = PCa.transform(X_test)
explained_variance = PCa.explained_variance_ratio_
from sklearn.linear_model import LogisticRegression as LR
classifier_1 = LR (random_state = 0)
classifier_1.fit(X_train, Y_train)
Output: