0% found this document useful (0 votes)
30 views23 pages

Machine Learning Lab Manual

The document outlines the implementation of various machine learning algorithms over five weeks, including FIND-S, Candidate-Elimination, ID3 decision tree, Backpropagation for neural networks, and a Naïve Bayesian classifier. Each section provides a programmatic approach to the respective algorithm, along with sample datasets and outputs. The algorithms are demonstrated through Python code, showcasing their functionality and results.

Uploaded by

ayeshakausar2014
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views23 pages

Machine Learning Lab Manual

The document outlines the implementation of various machine learning algorithms over five weeks, including FIND-S, Candidate-Elimination, ID3 decision tree, Backpropagation for neural networks, and a Naïve Bayesian classifier. Each section provides a programmatic approach to the respective algorithm, along with sample datasets and outputs. The algorithms are demonstrated through Python code, showcasing their functionality and results.

Uploaded by

ayeshakausar2014
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

WEEK - 1

Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis
based on a given set of training data samples. Read the training data from a .CSV file.

PROGRAM:
import csv
# Function to load CSV file
def loadCsv(filename):
lines = csv.reader(open(filename, "r"))
dataset = list(lines)
return dataset
# Define attributes
attributes = ['Sky', 'Temp', 'Humidity', 'Wind', 'Water', 'Forecast']
print('Attributes =', attributes)
num_attributes = len(attributes)
# Load dataset
filename = "finds.csv"
dataset = loadCsv(filename)
print("Dataset:", dataset)
# Initialize the hypothesis
hypothesis = ['0'] * num_attributes
print("Initial Hypothesis:", hypothesis)
# Apply the Find-S algorithm
print("The Hypothesis are:")
for i in range(len(dataset)):
target = dataset[i][-1] # The target value (Yes/No)
if target == 'Yes': # Only consider positive examples
for j in range(num_attributes):
if hypothesis[j] == '0':
hypothesis[j] = dataset[i][j]
elif hypothesis[j] != dataset[i][j]:
hypothesis[j] = '?'
print(f"After example {i+1}:", hypothesis)
# Print the final hypothesis
print("Final Hypothesis:", hypothesis)

DATASET("finds.csv"):
Sunny,Warm,Normal,Strong,Warm,Same,Yes
Sunny,Warm,High,Strong,Warm,Same,Yes
Rainy,Cold,High,Strong,Warm,Change,No
Sunny,Warm,High,Strong,Cool,Change,Yes
OUTPUT:
Attributes = ['Sky', 'Temp', 'Humidity', 'Wind', 'Water', 'Forecast']
Dataset: [['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same', 'Yes'], ['Sunny', 'Warm', 'High',
'Strong', 'Warm', 'Same', 'Yes'], ['Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change', 'No'],
['Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change', 'Yes']]
Initial Hypothesis: ['0', '0', '0', '0', '0', '0']
The Hypothesis are:
After example 1: ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same']
After example 2: ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
After example 4: ['Sunny', 'Warm', '?', 'Strong', '?', '?']
Final Hypothesis: ['Sunny', 'Warm', '?', 'Strong', '?', '?']
WEEK-2
For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all
hypotheses consistent with the training examples.

PROGRAM:
import numpy as np
import pandas as pd
# Load the dataset
data = pd.DataFrame(data=pd.read_csv('finds1.csv'))
# Concepts and target variables
concepts = np.array(data.iloc[:, 0:-1]) # All columns except the last
target = np.array(data.iloc[:, -1]) # Last column is the target variable
def learn(concepts, target):
# Initialization of specific_h and general_h
specific_h = concepts[0].copy()
print("Initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for _ in range(len(specific_h))] for _ in range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "Yes": # Positive example
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
if target[i] == "No": # Negative example
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(f"Step {i+1} of Candidate Elimination Algorithm")
print(f"Specific_h after example {i+1}: {specific_h}")
print(f"General_h after example {i+1}: {general_h}")
# Remove irrelevant generalizations (['?', '?', '?', '?', '?', '?'] in general_h)
general_h = [g for g in general_h if g != ['?' for _ in range(len(specific_h))]]
return specific_h, general_h
# Call the learn function
s_final, g_final = learn(concepts, target)
# Output the final hypothesis
print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")

DATASET('finds1.csv'):
Sunny,Warm,Normal,Strong,Warm,Yes
Sunny,Warm,High,Strong,Warm,Yes
Rainy,Cold,High,Strong,Warm,No
Sunny,Warm,High,Strong,Cool,Yes

OUTPUT:
Initialization of specific_h and general_h
['Sunny' 'Warm' 'High' 'Strong' 'Warm']
[['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?',
'?']]
Step 1 of Candidate Elimination Algorithm
Specific_h after example 1: ['Sunny' 'Warm' 'High' 'Strong' 'Warm']
General_h after example 1: [['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?']]
Step 2 of Candidate Elimination Algorithm
Specific_h after example 2: ['Sunny' 'Warm' 'High' 'Strong' 'Warm']
General_h after example 2: [['Sunny', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?'], ['?', '?', '?', '?',
'?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?']]
Step 3 of Candidate Elimination Algorithm
Specific_h after example 3: ['Sunny' 'Warm' 'High' 'Strong' '?']
General_h after example 3: [['Sunny', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?'], ['?', '?', '?', '?',
'?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?']]
Final Specific_h:
['Sunny' 'Warm' 'High' 'Strong' '?']
Final General_h:
[['Sunny', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?']]
WEEK-3
Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for building the decision tree and apply this
knowledge to classify a new sample.

PROGRAM:
import pandas as pd
import numpy as np
# Load the dataset
dataset = pd.read_csv('playtennis.csv', names=['outlook', 'temperature', 'humidity', 'wind',
'class'])
# Function to calculate entropy
def entropy(target_col):
elements, counts = np.unique(target_col, return_counts=True)
entropy = np.sum([(-counts[i]/np.sum(counts)) * np.log2(counts[i]/np.sum(counts)) for i in
range(len(elements))])
return entropy
# Function to calculate information gain
def InfoGain(data, split_attribute_name, target_name="class"):
total_entropy = entropy(data[target_name])
vals, counts = np.unique(data[split_attribute_name], return_counts=True)
# Weighted entropy
Weighted_Entropy = np.sum([(counts[i]/np.sum(counts)) *
entropy(data.where(data[split_attribute_name] == vals[i]).dropna()
[target_name])
for i in range(len(vals))])
Information_Gain = total_entropy - Weighted_Entropy
return Information_Gain
# ID3 algorithm function
def ID3(data, originaldata, features, target_attribute_name="class",
parent_node_class=None):
# If all target values are the same, return that value
if len(np.unique(data[target_attribute_name])) <= 1:
return np.unique(data[target_attribute_name])[0]
# If the dataset is empty, return the mode target feature value from the original dataset
elif len(data) == 0:
return np.unique(originaldata[target_attribute_name])
[np.argmax(np.unique(originaldata[target_attribute_name], return_counts=True)[1])]
# If the feature space is empty, return the parent node's class
elif len(features) == 0:
return parent_node_class
else:
# Set the default value for the parent node class
parent_node_class = np.unique(data[target_attribute_name])
[np.argmax(np.unique(data[target_attribute_name], return_counts=True)[1])]
# Calculate information gain for each feature
item_values = [InfoGain(data, feature, target_attribute_name) for feature in features]
# Select the feature with the maximum information gain
best_feature_index = np.argmax(item_values)
best_feature = features[best_feature_index]
# Create the tree structure
tree = {best_feature: {}}
# Remove the best feature from the feature list
features = [i for i in features if i != best_feature]
# Grow a branch for each value of the best feature
for value in np.unique(data[best_feature]):
value = value
sub_data = data.where(data[best_feature] == value).dropna()
subtree = ID3(sub_data, dataset, features, target_attribute_name,
parent_node_class)
tree[best_feature][value] = subtree
return tree
# Generate the decision tree
tree = ID3(dataset, dataset, dataset.columns[:-1])
print('Decision Tree:\n', tree)

DATASET(‘playtennis.csv'):
Sunny,Hot,High,Weak,No
Sunny,Hot,High,Strong,No
Overcast,Hot,High,Weak,Yes
Rainy,Mild,High,Weak,Yes
Rainy,Cool,Normal,Weak,Yes
Rainy,Cool,Normal,Strong,No
Overcast,Cool,Normal,Strong,Yes
Sunny,Mild,High,Weak,No
Sunny,Cool,Normal,Weak,Yes
Rainy,Mild,Normal,Weak,Yes
Sunny,Mild,Normal,Strong,Yes
Overcast,Mild,High,Strong,Yes
Overcast,Hot,Normal,Weak,Yes
Rainy,Mild,High,Strong,No

OUTPUT:
Decision Tree:
{'outlook': {'Overcast': 'Yes', 'Rainy': {'wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny':
{'humidity': {'High': 'No', 'Normal': 'Yes'}}}}

WEEK-4
Build an Artificial Neural Network by implementing the Backpropagation algorithm
and test the same using appropriate data sets.
PROGRAM:
import numpy as np
# Input data
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
# Normalizing data
X = X / np.amax(X, axis=0) # Normalize X
y = y / 100 # Normalize y to be in range [0, 1]
# Sigmoid Function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
# Variable initialization
epoch = 7000 # Number of training iterations
lr = 0.1 # Learning rate
input_neurons = 2 # Number of features in the dataset
hidden_neurons = 3 # Number of hidden layer neurons
output_neurons = 1 # Number of output layer neurons
# Weight and bias initialization
wh = np.random.uniform(size=(input_neurons, hidden_neurons)) # Weights for the hidden
layer
bh = np.random.uniform(size=(1, hidden_neurons)) # Bias for the hidden layer
wout = np.random.uniform(size=(hidden_neurons, output_neurons)) # Weights for the
output layer
bout = np.random.uniform(size=(1, output_neurons)) # Bias for the output layer
# Training the neural network
for i in range(epoch):
# Forward Propagation
hinp = np.dot(X, wh) + bh
hlayer_act = sigmoid(hinp)
outinp = np.dot(hlayer_act, wout) + bout
output = sigmoid(outinp)
# Backpropagation
EO = y - output # Error at the output
outgrad = derivatives_sigmoid(output) # Derivative of output
d_output = EO * outgrad # Delta for output layer
EH = np.dot(d_output, wout.T) # Error at the hidden layer
hiddengrad = derivatives_sigmoid(hlayer_act) # Derivative of hidden layer
d_hiddenlayer = EH * hiddengrad # Delta for hidden layer
# Update weights and biases
wout += np.dot(hlayer_act.T, d_output) * lr
bout += np.sum(d_output, axis=0, keepdims=True) * lr
wh += np.dot(X.T, d_hiddenlayer) * lr
bh += np.sum(d_hiddenlayer, axis=0, keepdims=True) * lr
# Output the results
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" + str(output))

OUTPUT:
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.89576732]
[0.87939157]
[0.89348727]]

WEEK-5
Write a program to implement the naïve Bayesian classifier for a sample training data
set stored as a .CSV file. Compute the accuracy of the classifier, considering few tests data
sets

PROGRAM:
import csv
import random
import math
def loadCsv(filename):
"""Load CSV file into a list of lists."""
lines = csv.reader(open(filename, "r"))
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset
def splitDataset(dataset, splitRatio):
"""Split the dataset into training and test sets based on the split ratio."""
trainSize = int(len(dataset) * splitRatio)
trainSet = []
copy = list(dataset)
while len(trainSet) < trainSize:
index = random.randrange(len(copy))
trainSet.append(copy.pop(index))
return [trainSet, copy]
def separateByClass(dataset):
"""Separate the dataset into classes."""
separated = {}
for i in range(len(dataset)):
vector = dataset[i]
if vector[-1] not in separated:
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
"""Calculate the mean of a list of numbers."""
return sum(numbers) / float(len(numbers))
def stdev(numbers):
"""Calculate the standard deviation of a list of numbers."""
n = len(numbers)
if n == 1:
return 0 # Return 0 if there's only one number in the dataset (no variation)
avg = mean(numbers)
variance = sum([pow(x - avg, 2) for x in numbers]) / float(n - 1)
return math.sqrt(variance)
def summarize(dataset):
"""Summarize the dataset, providing mean and standard deviation for each attribute."""
summaries = []
for attribute in zip(*dataset):
mean_value = mean(attribute)
stdev_value = stdev(attribute)
print(f"Attribute mean: {mean_value}, stdev: {stdev_value}") # Debugging
summaries.append((mean_value, stdev_value))
del summaries[-1] # Remove the class column summary
return summaries
def summarizeByClass(dataset):
"""Summarize the dataset by class."""
separated = separateByClass(dataset)
summaries = {}
for classValue, instances in separated.items():
summaries[classValue] = summarize(instances)
return summaries
def calculateProbability(x, mean, stdev):
"""Calculate the Gaussian probability distribution function for x."""
# Check for zero standard deviation and handle it
if stdev == 0:
return 1e-10 # Small probability value to avoid division by zero
exponent = math.exp(-(math.pow(x - mean, 2) / (2 * math.pow(stdev, 2))))
return (1 / (math.sqrt(2 * math.pi) * stdev)) * exponent
def calculateClassProbabilities(summaries, inputVector):
"""Calculate the class probabilities for a given input vector."""
probabilities = {}
for classValue, classSummaries in summaries.items():
probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, stdev = classSummaries[i]
x = inputVector[i]
probabilities[classValue] *= calculateProbability(x, mean, stdev)
return probabilities
def predict(summaries, inputVector):
"""Predict the class label for a given input vector."""
probabilities = calculateClassProbabilities(summaries, inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel
def getPredictions(summaries, testSet):
"""Get predictions for all instances in the test set."""
predictions = []
for i in range(len(testSet)):
result = predict(summaries, testSet[i])
predictions.append(result)
return predictions
def getAccuracy(testSet, predictions):
"""Calculate the accuracy of the predictions."""
correct = 0
for i in range(len(testSet)):
if testSet[i][-1] == predictions[i]:
correct += 1
return (correct / float(len(testSet))) * 100.0
def main():
"""Main function to load dataset, train model, and calculate accuracy."""
filename = 'data.csv'
splitRatio = 0.67
dataset = loadCsv(filename)
trainingSet, testSet = splitDataset(dataset, splitRatio)
print(f'Split {len(dataset)} rows into train={len(trainingSet)} and test={len(testSet)} rows')
# Prepare model
summaries = summarizeByClass(trainingSet)
# Test model
predictions = getPredictions(summaries, testSet)
accuracy = getAccuracy(testSet, predictions)
print(f'Accuracy: {accuracy}%')
if __name__ == "__main__":
main()

DATASET('data.csv'):
5.1,3.5,1.4,0.2,0
4.9,3.0,1.4,0.2,0
6.2,3.4,5.4,2.3,1
5.9,3.0,5.1,1.8,1
6.7,3.1,4.7,1.5,2

OUTPUT:
Split 5 rows into train=3 and test=2 rows
Attribute mean: 5.1, stdev: 0
Attribute mean: 3.5, stdev: 0
Attribute mean: 1.4, stdev: 0
Attribute mean: 0.2, stdev: 0
Attribute mean: 0.0, stdev: 0
Attribute mean: 5.9, stdev: 0
Attribute mean: 3.0, stdev: 0
Attribute mean: 5.1, stdev: 0
Attribute mean: 1.8, stdev: 0
Attribute mean: 1.0, stdev: 0
Attribute mean: 6.7, stdev: 0
Attribute mean: 3.1, stdev: 0
Attribute mean: 4.7, stdev: 0
Attribute mean: 1.5, stdev: 0
Attribute mean: 2.0, stdev: 0
Accuracy: 50.0%
WEEK-6
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write
the program. Calculate the accuracy, precision, and recall for your data set.

PROGRAM(JAVA):
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.Evaluation;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import java.util.Random;

public class NaiveBayesClassifier {


public static void main(String[] args) {
try {
// Load dataset
DataSource source = new DataSource("naivetext1.csv");
Instances dataset = source.getDataSet();
// Set class index to the last attribute
if (dataset.classIndex() == -1) {
dataset.setClassIndex(dataset.numAttributes() - 1);
}
// Randomize the dataset
dataset.randomize(new Random(1));
// 70% training, 30% testing split
int trainSize = (int) Math.round(dataset.numInstances() * 0.7);
int testSize = dataset.numInstances() - trainSize;
Instances trainSet = new Instances(dataset, 0, trainSize);
Instances testSet = new Instances(dataset, trainSize, testSize);
// Build classifier
NaiveBayes nb = new NaiveBayes();
nb.buildClassifier(trainSet);
// Evaluate
Evaluation eval = new Evaluation(trainSet);
eval.evaluateModel(nb, testSet);
// Print results
System.out.println("Evaluation Results (Train/Test Split):");
System.out.println("Accuracy: " + eval.pctCorrect() + "%");
System.out.println("Precision: " + eval.weightedPrecision());
System.out.println("Recall: " + eval.weightedRecall());
System.out.println("F-Measure: " + eval.weightedFMeasure());
System.out.println(eval.toMatrixString("=== Confusion Matrix ==="));
} catch (Exception e) {
e.printStackTrace();
}
}
}

DATASET('naivetext1.csv'):
"I absolutely love this product! It's amazing.",pos
"Terrible experience, would never buy again.",neg
"Great quality and fast delivery. Very happy.",pos
"Not worth the price, very disappointed.",neg
"Highly recommend! Will buy again.",pos
"Awful customer service. I'm never coming back.",neg
"Fantastic! This is exactly what I was looking for.",pos
"Very poor quality, broke after one use.",neg
"Exceeded my expectations! Five stars.",pos
"Bad experience, will not buy from here again.",neg

COMMANDS TO RUN:
 javac -cp ".;C:\Program Files\Weka-3-8-6\weka.jar" NaiveBayesClassifier.javacompilation
 java --add-opens java.base/java.lang=ALL-UNNAMED -cp ".;C:\Program Files\Weka-3-8-
6\weka.jar" NaiveBayesClassifierto run

OUTPUT:
Evaluation Results (Train/Test Split):
Accuracy: 66.66666666666667%
Precision: NaN
Recall: 0.6666666666666666
F-Measure: NaN
=== Confusion Matrix ===
a b <-- classified as
2 0 | a = neg
1 0 | b = pos

WEEK-7
Write a program to construct a Bayesian network considering medical data. Use this model
to demonstrate the diagnosis of heart patients using standard Heart DiseaseData Set. You
can use Java/Python ML library classes/API.

PROGRAM:
import numpy as np
import pandas as pd
from pgmpy.models import BayesianNetwork
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination
# Define column names
names = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope',
'ca', 'thal', 'heartdisease']
# Load the dataset and replace '?' with NaN
heartDisease = pd.read_csv('heart.csv', names=names)
heartDisease = heartDisease.replace('?', np.nan)
# Define the Bayesian Network structure
model = BayesianNetwork([
('age', 'trestbps'),
('age', 'fbs'),
('sex', 'trestbps'),
('exang', 'trestbps'),
('trestbps', 'heartdisease'),
('fbs', 'heartdisease'),
('heartdisease', 'restecg'),
('heartdisease', 'thalach'),
('heartdisease', 'chol')
])
# Fit the model using Maximum Likelihood Estimator
model.fit(heartDisease, estimator=MaximumLikelihoodEstimator)
# Perform inference using Variable Elimination
heartDisease_infer = VariableElimination(model)
# Query the model for 'heartdisease' given 'age' = 37 and 'sex' = 0
q = heartDisease_infer.query(variables=['heartdisease'], evidence={'age': 37, 'sex': 0})
# Print the probability distribution for 'heartdisease'
print(q)

DATASET('heart.csv'):
63, 1, 3, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 6, 1
67, 1, 2, 160, 286, 0, 2, 108, 1, 1.5, 2, 3, 3, 1
67, 1, 3, 120, 229, 0, 0, 129, 1, 2.6, 2, 2, 7, 1
37, 1, 1, 130, 250, 0, 1, 187, 0, 3.5, 1, 0, 3, 0
41, 0, 1, 130, 204, 0, 1, 172, 0, 1.4, 1, 0, 3, 0
56, 1, 2, 140, 239, 0, 1, 178, 0, 0.8, 1, 0, 7, 1

OUTPUT:
INFO:pgmpy: Datatype (N=numerical, C=Categorical Unordered, O=Categorical Ordered)
inferred from data:
{'age': 'N', 'sex': 'N', 'cp': 'N', 'trestbps': 'N', 'chol': 'N', 'fbs': 'N', 'restecg': 'N', 'thalach': 'N',
'exang': 'N', 'oldpeak': 'N', 'slope': 'N', 'ca': 'N', 'thal': 'N', 'heartdisease': 'N'}
+---------------------+-------------------------+
| heartdisease | phi(heartdisease) |
+=============+================+
| heartdisease(0) | 0.3000 |
+----------------------+--------------------+
| heartdisease(1) | 0.7000 |
+----------------------+--------------------+

WEEK-8
Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML
library classes/API in the program.

PROGRAM:
import numpy as np
import pandas as pd
from sklearn.mixture import GaussianMixture
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
# Read dataset from CSV file
df = pd.read_csv('kmeansdata.csv')
# Display the first few rows to verify the data
print(df.head())
# Visualize the dataset
plt.figure(figsize=(8, 6))
plt.scatter(df['Distance_Feature'], df['Speeding_Feature'], c='blue', s=50)
plt.title('Dataset: Distance vs Speeding')
plt.xlabel('Distance_Feature')
plt.ylabel('Speeding_Feature')
plt.xlim([0, 100])
plt.ylim([0, 50])
plt.show()
# Convert to NumPy array for clustering
X = df[['Distance_Feature', 'Speeding_Feature']].values
# Expectation Maximization (Gaussian Mixture Model)
gmm = GaussianMixture(n_components=3)
gmm.fit(X)
em_predictions = gmm.predict(X)
# Display EM results
print("\nEM predictions:", em_predictions)
print("Mean of clusters:\n", gmm.means_)
print("\nCovariances:\n", gmm.covariances_)
# Visualize the EM results
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=em_predictions, cmap='viridis', s=50)
plt.title('Expectation Maximization (GMM) Clustering')
plt.xlabel('Distance_Feature')
plt.ylabel('Speeding_Feature')
plt.show()
# KMeans Clustering
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
# Display KMeans results
print("KMeans cluster centers:\n", kmeans.cluster_centers_)
print("KMeans labels:\n", kmeans.labels_)
# Visualize the KMeans results
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='rainbow', s=50)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], color='black',
marker='X', s=100)
plt.title('KMeans Clustering')
plt.xlabel('Distance_Feature')
plt.ylabel('Speeding_Feature')
plt.show()

DATASET('kmeansdata.csv'):
Distance_Feature,Speeding_Feature
37.454012,1.571459
95.071431,31.820521
73.199394,15.717799
59.865848,25.428535
15.601864,45.378324

OUTPUT:
Distance_Feature Speeding_Feature
0 37.454012 1.571459
1 95.071431 31.820521
2 73.199394 15.717799
3 59.865848 25.428535
4 15.601864 45.378324

EM predictions: [2 1 2 2 0]
Mean of clusters:
[[15.601864 45.378324 ]
[95.071431 31.820521 ]
[56.83975135 14.23926434]]

Covariances:
[[[1.00000000e-06 3.53409686e-27]
[3.53409686e-27 1.00000000e-06]]

[[1.00000000e-06 1.51461294e-26]
[1.51461294e-26 1.00000000e-06]]

[[2.17534021e+02 1.01207629e+02]
[1.01207629e+02 9.59530460e+01]]]
KMeans cluster centers:
[[56.83975133 14.23926433]
[15.601864 45.378324 ]
[95.071431 31.820521 ]]
KMeans labels:
[0 2 0 0 1]

WEEK-9
Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be
used for this problem.

PROGRAM:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import pandas as pd
# Load dataset
dataset = pd.read_csv("iris.csv")
# Assuming 'Species' is the target variable and other columns are features
X = dataset.drop(columns=["Species"])
y = dataset["Species"]
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.25)
# Create classifier with class weights
classifier = KNeighborsClassifier(n_neighbors=8, p=3, metric='euclidean', weights='distance')
# Train the classifier
classifier.fit(X_train, y_train)

# Predict the results


y_pred = classifier.predict(X_test)
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion matrix is as follows\n", cm)
# Accuracy Metrics with zero_division=1 to handle warnings
print("Accuracy Metrics:")
print(classification_report(y_test, y_pred, zero_division=1))
# Accuracy score
print("Correct Predictions:", accuracy_score(y_test, y_pred))
print("Wrong Predictions:", (1 - accuracy_score(y_test, y_pred)))

DATASET("iris.csv"):
SepalLength,SepalWidth,PetalLength,PetalWidth,Species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
7.0,3.2,4.7,1.4,versicolor
6.4,3.2,4.5,1.5,versicolor
6.9,3.1,4.9,1.5,versicolor
5.5,2.3,4.0,1.3,versicolor
6.5,2.8,4.6,1.5,versicolor
6.3,3.3,6.0,2.5,virginica
5.8,2.7,5.1,1.9,virginica
7.1,3.0,5.9,2.1,virginica
6.3,2.9,5.6,1.8,virginica
6.5,3.0,5.8,2.2,virginica

OUTPUT:
Confusion matrix is as follows
[[2 0 0]
[0 1 0]
[0 0 1]]
Accuracy Metrics:
precision recall f1-score support

setosa 1.00 1.00 1.00 2


versicolor 1.00 1.00 1.00 1
virginica 1.00 1.00 1.00 1

accuracy 1.00 4
macro avg 1.00 1.00 1.00 4
weighted avg 1.00 1.00 1.00 4

Correct Predictions: 1.0


Wrong Predictions: 0.0

WEEK-10
Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points. Select appropriate data set for your experiment and draw graphs.

PROGRAM:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Kernel function to compute weights
def kernel(point, xmat, k):
m, n = np.shape(xmat)
weights = np.eye(m) # Identity matrix
for j in range(m):
diff = point - xmat[j]
weights[j, j] = np.exp((diff @ diff.T).item() / (-2.0 * k**2)) # Use .item() to extract scalar
return weights
# Local weight computation with regularization
def localWeight(point, xmat, ymat, k, regularization=1e-5):
wei = kernel(point, xmat, k)
XTWX = xmat.T @ (wei @ xmat) + regularization * np.eye(xmat.shape[1]) # Regularization
for numerical stability
W = np.linalg.inv(XTWX) @ (xmat.T @ (wei @ ymat))
return W
# Local regression predictions
def localWeightRegression(xmat, ymat, k):
m, n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = (xmat[i] @ localWeight(xmat[i], xmat, ymat, k)).item() # Use .item() to
extract scalar
return ypred
# Load dataset
data = pd.read_csv('tips.csv')
bill = np.array(data['total_bill'])
tip = np.array(data['tip'])
# Prepare dataset for regression
mbill = np.asmatrix(bill).T # Convert to column vector
mtip = np.asmatrix(tip).T # Convert to column vector
m = np.shape(mbill)[0]
one = np.asmatrix(np.ones(m)).T # Add ones column for intercept
X = np.hstack((one, mbill)) # Combine ones and bill
# Set kernel bandwidth
k = 0.2
# Perform regression
ypred = localWeightRegression(X, mtip, k)
# Sort for plotting
SortIndex = X[:, 1].argsort(0)
xsort = X[SortIndex][:, 0]
# Plot results
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.scatter(bill, tip, color='green') # Scatter plot of data
ax.plot(xsort[:, 1], ypred[SortIndex], color='red', linewidth=5) # Fitted curve
plt.xlabel('Total Bill')
plt.ylabel('Tip')
plt.show()

DATASET('tips.csv'):
total_bill,tip
16.99,1.01
10.34,1.66
21.01,3.50
23.68,3.31
24.59,3.61
25.29,4.71
8.77,2.00
26.88,3.53
15.04,1.96
14.78,3.00

OUTPUT:

You might also like