0% found this document useful (0 votes)

30 views23 pages

Machine Learning Lab Manual

The document outlines the implementation of various machine learning algorithms over five weeks, including FIND-S, Candidate-Elimination, ID3 decision tree, Backpropagation for neural networks, and a Naïve Bayesian classifier. Each section provides a programmatic approach to the respective algorithm, along with sample datasets and outputs. The algorithms are demonstrated through Python code, showcasing their functionality and results.

Uploaded by

ayeshakausar2014

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views23 pages

Machine Learning Lab Manual

Uploaded by

ayeshakausar2014

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 23

WEEK - 1

Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis
based on a given set of training data samples. Read the training data from a .CSV file.

PROGRAM:
import csv
# Function to load CSV file
def loadCsv(filename):
lines = csv.reader(open(filename, "r"))
dataset = list(lines)
return dataset
# Define attributes
attributes = ['Sky', 'Temp', 'Humidity', 'Wind', 'Water', 'Forecast']
print('Attributes =', attributes)
num_attributes = len(attributes)
# Load dataset
filename = "finds.csv"
dataset = loadCsv(filename)
print("Dataset:", dataset)
# Initialize the hypothesis
hypothesis = ['0'] * num_attributes
print("Initial Hypothesis:", hypothesis)
# Apply the Find-S algorithm
print("The Hypothesis are:")
for i in range(len(dataset)):
target = dataset[i][-1] # The target value (Yes/No)
if target == 'Yes': # Only consider positive examples
for j in range(num_attributes):
if hypothesis[j] == '0':
hypothesis[j] = dataset[i][j]
elif hypothesis[j] != dataset[i][j]:
hypothesis[j] = '?'
print(f"After example {i+1}:", hypothesis)
# Print the final hypothesis
print("Final Hypothesis:", hypothesis)

DATASET("finds.csv"):
Sunny,Warm,Normal,Strong,Warm,Same,Yes
Sunny,Warm,High,Strong,Warm,Same,Yes
Rainy,Cold,High,Strong,Warm,Change,No
Sunny,Warm,High,Strong,Cool,Change,Yes
OUTPUT:
Attributes = ['Sky', 'Temp', 'Humidity', 'Wind', 'Water', 'Forecast']
Dataset: [['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same', 'Yes'], ['Sunny', 'Warm', 'High',
'Strong', 'Warm', 'Same', 'Yes'], ['Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change', 'No'],
['Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change', 'Yes']]
Initial Hypothesis: ['0', '0', '0', '0', '0', '0']
The Hypothesis are:
After example 1: ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same']
After example 2: ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
After example 4: ['Sunny', 'Warm', '?', 'Strong', '?', '?']
Final Hypothesis: ['Sunny', 'Warm', '?', 'Strong', '?', '?']
WEEK-2
For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all
hypotheses consistent with the training examples.

PROGRAM:
import numpy as np
import pandas as pd
# Load the dataset
data = pd.DataFrame(data=pd.read_csv('finds1.csv'))
# Concepts and target variables
concepts = np.array(data.iloc[:, 0:-1]) # All columns except the last
target = np.array(data.iloc[:, -1]) # Last column is the target variable
def learn(concepts, target):
# Initialization of specific_h and general_h
specific_h = concepts[0].copy()
print("Initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for _ in range(len(specific_h))] for _ in range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "Yes": # Positive example
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
if target[i] == "No": # Negative example
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(f"Step {i+1} of Candidate Elimination Algorithm")
print(f"Specific_h after example {i+1}: {specific_h}")
print(f"General_h after example {i+1}: {general_h}")
# Remove irrelevant generalizations (['?', '?', '?', '?', '?', '?'] in general_h)
general_h = [g for g in general_h if g != ['?' for _ in range(len(specific_h))]]
return specific_h, general_h
# Call the learn function
s_final, g_final = learn(concepts, target)
# Output the final hypothesis
print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")

DATASET('finds1.csv'):
Sunny,Warm,Normal,Strong,Warm,Yes
Sunny,Warm,High,Strong,Warm,Yes
Rainy,Cold,High,Strong,Warm,No
Sunny,Warm,High,Strong,Cool,Yes

OUTPUT:
Initialization of specific_h and general_h
['Sunny' 'Warm' 'High' 'Strong' 'Warm']
[['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?',
'?']]
Step 1 of Candidate Elimination Algorithm
Specific_h after example 1: ['Sunny' 'Warm' 'High' 'Strong' 'Warm']
General_h after example 1: [['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?']]
Step 2 of Candidate Elimination Algorithm
Specific_h after example 2: ['Sunny' 'Warm' 'High' 'Strong' 'Warm']
General_h after example 2: [['Sunny', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?'], ['?', '?', '?', '?',
'?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?']]
Step 3 of Candidate Elimination Algorithm
Specific_h after example 3: ['Sunny' 'Warm' 'High' 'Strong' '?']
General_h after example 3: [['Sunny', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?'], ['?', '?', '?', '?',
'?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?']]
Final Specific_h:
['Sunny' 'Warm' 'High' 'Strong' '?']
Final General_h:
[['Sunny', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?']]
WEEK-3
Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for building the decision tree and apply this
knowledge to classify a new sample.

PROGRAM:
import pandas as pd
import numpy as np
# Load the dataset
dataset = pd.read_csv('playtennis.csv', names=['outlook', 'temperature', 'humidity', 'wind',
'class'])
# Function to calculate entropy
def entropy(target_col):
elements, counts = np.unique(target_col, return_counts=True)
entropy = np.sum([(-counts[i]/np.sum(counts)) * np.log2(counts[i]/np.sum(counts)) for i in
range(len(elements))])
return entropy
# Function to calculate information gain
def InfoGain(data, split_attribute_name, target_name="class"):
total_entropy = entropy(data[target_name])
vals, counts = np.unique(data[split_attribute_name], return_counts=True)
# Weighted entropy
Weighted_Entropy = np.sum([(counts[i]/np.sum(counts)) *
entropy(data.where(data[split_attribute_name] == vals[i]).dropna()
[target_name])
for i in range(len(vals))])
Information_Gain = total_entropy - Weighted_Entropy
return Information_Gain
# ID3 algorithm function
def ID3(data, originaldata, features, target_attribute_name="class",
parent_node_class=None):
# If all target values are the same, return that value
if len(np.unique(data[target_attribute_name])) <= 1:
return np.unique(data[target_attribute_name])[0]
# If the dataset is empty, return the mode target feature value from the original dataset
elif len(data) == 0:
return np.unique(originaldata[target_attribute_name])
[np.argmax(np.unique(originaldata[target_attribute_name], return_counts=True)[1])]
# If the feature space is empty, return the parent node's class
elif len(features) == 0:
return parent_node_class
else:
# Set the default value for the parent node class
parent_node_class = np.unique(data[target_attribute_name])
[np.argmax(np.unique(data[target_attribute_name], return_counts=True)[1])]
# Calculate information gain for each feature
item_values = [InfoGain(data, feature, target_attribute_name) for feature in features]
# Select the feature with the maximum information gain
best_feature_index = np.argmax(item_values)
best_feature = features[best_feature_index]
# Create the tree structure
tree = {best_feature: {}}
# Remove the best feature from the feature list
features = [i for i in features if i != best_feature]
# Grow a branch for each value of the best feature
for value in np.unique(data[best_feature]):
value = value
sub_data = data.where(data[best_feature] == value).dropna()
subtree = ID3(sub_data, dataset, features, target_attribute_name,
parent_node_class)
tree[best_feature][value] = subtree
return tree
# Generate the decision tree
tree = ID3(dataset, dataset, dataset.columns[:-1])
print('Decision Tree:\n', tree)

DATASET(‘playtennis.csv'):
Sunny,Hot,High,Weak,No
Sunny,Hot,High,Strong,No
Overcast,Hot,High,Weak,Yes
Rainy,Mild,High,Weak,Yes
Rainy,Cool,Normal,Weak,Yes
Rainy,Cool,Normal,Strong,No
Overcast,Cool,Normal,Strong,Yes
Sunny,Mild,High,Weak,No
Sunny,Cool,Normal,Weak,Yes
Rainy,Mild,Normal,Weak,Yes
Sunny,Mild,Normal,Strong,Yes
Overcast,Mild,High,Strong,Yes
Overcast,Hot,Normal,Weak,Yes
Rainy,Mild,High,Strong,No

OUTPUT:
Decision Tree:
{'outlook': {'Overcast': 'Yes', 'Rainy': {'wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny':
{'humidity': {'High': 'No', 'Normal': 'Yes'}}}}

WEEK-4
Build an Artificial Neural Network by implementing the Backpropagation algorithm
and test the same using appropriate data sets.
PROGRAM:
import numpy as np
# Input data
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
# Normalizing data
X = X / np.amax(X, axis=0) # Normalize X
y = y / 100 # Normalize y to be in range [0, 1]
# Sigmoid Function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
# Variable initialization
epoch = 7000 # Number of training iterations
lr = 0.1 # Learning rate
input_neurons = 2 # Number of features in the dataset
hidden_neurons = 3 # Number of hidden layer neurons
output_neurons = 1 # Number of output layer neurons
# Weight and bias initialization
wh = np.random.uniform(size=(input_neurons, hidden_neurons)) # Weights for the hidden
layer
bh = np.random.uniform(size=(1, hidden_neurons)) # Bias for the hidden layer
wout = np.random.uniform(size=(hidden_neurons, output_neurons)) # Weights for the
output layer
bout = np.random.uniform(size=(1, output_neurons)) # Bias for the output layer
# Training the neural network
for i in range(epoch):
# Forward Propagation
hinp = np.dot(X, wh) + bh
hlayer_act = sigmoid(hinp)
outinp = np.dot(hlayer_act, wout) + bout
output = sigmoid(outinp)
# Backpropagation
EO = y - output # Error at the output
outgrad = derivatives_sigmoid(output) # Derivative of output
d_output = EO * outgrad # Delta for output layer
EH = np.dot(d_output, wout.T) # Error at the hidden layer
hiddengrad = derivatives_sigmoid(hlayer_act) # Derivative of hidden layer
d_hiddenlayer = EH * hiddengrad # Delta for hidden layer
# Update weights and biases
wout += np.dot(hlayer_act.T, d_output) * lr
bout += np.sum(d_output, axis=0, keepdims=True) * lr
wh += np.dot(X.T, d_hiddenlayer) * lr
bh += np.sum(d_hiddenlayer, axis=0, keepdims=True) * lr
# Output the results
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" + str(output))

OUTPUT:
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.89576732]
[0.87939157]
[0.89348727]]

WEEK-5
Write a program to implement the naïve Bayesian classifier for a sample training data
set stored as a .CSV file. Compute the accuracy of the classifier, considering few tests data
sets

PROGRAM:
import csv
import random
import math
def loadCsv(filename):
"""Load CSV file into a list of lists."""
lines = csv.reader(open(filename, "r"))
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset
def splitDataset(dataset, splitRatio):
"""Split the dataset into training and test sets based on the split ratio."""
trainSize = int(len(dataset) * splitRatio)
trainSet = []
copy = list(dataset)
while len(trainSet) < trainSize:
index = random.randrange(len(copy))
trainSet.append(copy.pop(index))
return [trainSet, copy]
def separateByClass(dataset):
"""Separate the dataset into classes."""
separated = {}
for i in range(len(dataset)):
vector = dataset[i]
if vector[-1] not in separated:
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
"""Calculate the mean of a list of numbers."""
return sum(numbers) / float(len(numbers))
def stdev(numbers):
"""Calculate the standard deviation of a list of numbers."""
n = len(numbers)
if n == 1:
return 0 # Return 0 if there's only one number in the dataset (no variation)
avg = mean(numbers)
variance = sum([pow(x - avg, 2) for x in numbers]) / float(n - 1)
return math.sqrt(variance)
def summarize(dataset):
"""Summarize the dataset, providing mean and standard deviation for each attribute."""
summaries = []
for attribute in zip(*dataset):
mean_value = mean(attribute)
stdev_value = stdev(attribute)
print(f"Attribute mean: {mean_value}, stdev: {stdev_value}") # Debugging
summaries.append((mean_value, stdev_value))
del summaries[-1] # Remove the class column summary
return summaries
def summarizeByClass(dataset):
"""Summarize the dataset by class."""
separated = separateByClass(dataset)
summaries = {}
for classValue, instances in separated.items():
summaries[classValue] = summarize(instances)
return summaries
def calculateProbability(x, mean, stdev):
"""Calculate the Gaussian probability distribution function for x."""
# Check for zero standard deviation and handle it
if stdev == 0:
return 1e-10 # Small probability value to avoid division by zero
exponent = math.exp(-(math.pow(x - mean, 2) / (2 * math.pow(stdev, 2))))
return (1 / (math.sqrt(2 * math.pi) * stdev)) * exponent
def calculateClassProbabilities(summaries, inputVector):
"""Calculate the class probabilities for a given input vector."""
probabilities = {}
for classValue, classSummaries in summaries.items():
probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, stdev = classSummaries[i]
x = inputVector[i]
probabilities[classValue] *= calculateProbability(x, mean, stdev)
return probabilities
def predict(summaries, inputVector):
"""Predict the class label for a given input vector."""
probabilities = calculateClassProbabilities(summaries, inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel
def getPredictions(summaries, testSet):
"""Get predictions for all instances in the test set."""
predictions = []
for i in range(len(testSet)):
result = predict(summaries, testSet[i])
predictions.append(result)
return predictions
def getAccuracy(testSet, predictions):
"""Calculate the accuracy of the predictions."""
correct = 0
for i in range(len(testSet)):
if testSet[i][-1] == predictions[i]:
correct += 1
return (correct / float(len(testSet))) * 100.0
def main():
"""Main function to load dataset, train model, and calculate accuracy."""
filename = 'data.csv'
splitRatio = 0.67
dataset = loadCsv(filename)
trainingSet, testSet = splitDataset(dataset, splitRatio)
print(f'Split {len(dataset)} rows into train={len(trainingSet)} and test={len(testSet)} rows')
# Prepare model
summaries = summarizeByClass(trainingSet)
# Test model
predictions = getPredictions(summaries, testSet)
accuracy = getAccuracy(testSet, predictions)
print(f'Accuracy: {accuracy}%')
if __name__ == "__main__":
main()

DATASET('data.csv'):
5.1,3.5,1.4,0.2,0
4.9,3.0,1.4,0.2,0
6.2,3.4,5.4,2.3,1
5.9,3.0,5.1,1.8,1
6.7,3.1,4.7,1.5,2

OUTPUT:
Split 5 rows into train=3 and test=2 rows
Attribute mean: 5.1, stdev: 0
Attribute mean: 3.5, stdev: 0
Attribute mean: 1.4, stdev: 0
Attribute mean: 0.2, stdev: 0
Attribute mean: 0.0, stdev: 0
Attribute mean: 5.9, stdev: 0
Attribute mean: 3.0, stdev: 0
Attribute mean: 5.1, stdev: 0
Attribute mean: 1.8, stdev: 0
Attribute mean: 1.0, stdev: 0
Attribute mean: 6.7, stdev: 0
Attribute mean: 3.1, stdev: 0
Attribute mean: 4.7, stdev: 0
Attribute mean: 1.5, stdev: 0
Attribute mean: 2.0, stdev: 0
Accuracy: 50.0%
WEEK-6
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write
the program. Calculate the accuracy, precision, and recall for your data set.

PROGRAM(JAVA):
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.Evaluation;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import java.util.Random;

public class NaiveBayesClassifier {

public static void main(String[] args) {
try {
// Load dataset
DataSource source = new DataSource("naivetext1.csv");
Instances dataset = source.getDataSet();
// Set class index to the last attribute
if (dataset.classIndex() == -1) {
dataset.setClassIndex(dataset.numAttributes() - 1);
}
// Randomize the dataset
dataset.randomize(new Random(1));
// 70% training, 30% testing split
int trainSize = (int) Math.round(dataset.numInstances() * 0.7);
int testSize = dataset.numInstances() - trainSize;
Instances trainSet = new Instances(dataset, 0, trainSize);
Instances testSet = new Instances(dataset, trainSize, testSize);
// Build classifier
NaiveBayes nb = new NaiveBayes();
nb.buildClassifier(trainSet);
// Evaluate
Evaluation eval = new Evaluation(trainSet);
eval.evaluateModel(nb, testSet);
// Print results
System.out.println("Evaluation Results (Train/Test Split):");
System.out.println("Accuracy: " + eval.pctCorrect() + "%");
System.out.println("Precision: " + eval.weightedPrecision());
System.out.println("Recall: " + eval.weightedRecall());
System.out.println("F-Measure: " + eval.weightedFMeasure());
System.out.println(eval.toMatrixString("=== Confusion Matrix ==="));
} catch (Exception e) {
e.printStackTrace();
}
}
}

DATASET('naivetext1.csv'):
"I absolutely love this product! It's amazing.",pos
"Terrible experience, would never buy again.",neg
"Great quality and fast delivery. Very happy.",pos
"Not worth the price, very disappointed.",neg
"Highly recommend! Will buy again.",pos
"Awful customer service. I'm never coming back.",neg
"Fantastic! This is exactly what I was looking for.",pos
"Very poor quality, broke after one use.",neg
"Exceeded my expectations! Five stars.",pos
"Bad experience, will not buy from here again.",neg

COMMANDS TO RUN:
 javac -cp ".;C:\Program Files\Weka-3-8-6\weka.jar" NaiveBayesClassifier.javacompilation
 java --add-opens java.base/java.lang=ALL-UNNAMED -cp ".;C:\Program Files\Weka-3-8-
6\weka.jar" NaiveBayesClassifierto run

OUTPUT:
Evaluation Results (Train/Test Split):
Accuracy: 66.66666666666667%
Precision: NaN
Recall: 0.6666666666666666
F-Measure: NaN
=== Confusion Matrix ===
a b <-- classified as
2 0 | a = neg
1 0 | b = pos

WEEK-7
Write a program to construct a Bayesian network considering medical data. Use this model
to demonstrate the diagnosis of heart patients using standard Heart DiseaseData Set. You
can use Java/Python ML library classes/API.

PROGRAM:
import numpy as np
import pandas as pd
from pgmpy.models import BayesianNetwork
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination
# Define column names
names = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope',
'ca', 'thal', 'heartdisease']
# Load the dataset and replace '?' with NaN
heartDisease = pd.read_csv('heart.csv', names=names)
heartDisease = heartDisease.replace('?', np.nan)
# Define the Bayesian Network structure
model = BayesianNetwork([
('age', 'trestbps'),
('age', 'fbs'),
('sex', 'trestbps'),
('exang', 'trestbps'),
('trestbps', 'heartdisease'),
('fbs', 'heartdisease'),
('heartdisease', 'restecg'),
('heartdisease', 'thalach'),
('heartdisease', 'chol')
])
# Fit the model using Maximum Likelihood Estimator
model.fit(heartDisease, estimator=MaximumLikelihoodEstimator)
# Perform inference using Variable Elimination
heartDisease_infer = VariableElimination(model)
# Query the model for 'heartdisease' given 'age' = 37 and 'sex' = 0
q = heartDisease_infer.query(variables=['heartdisease'], evidence={'age': 37, 'sex': 0})
# Print the probability distribution for 'heartdisease'
print(q)

DATASET('heart.csv'):
63, 1, 3, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 6, 1
67, 1, 2, 160, 286, 0, 2, 108, 1, 1.5, 2, 3, 3, 1
67, 1, 3, 120, 229, 0, 0, 129, 1, 2.6, 2, 2, 7, 1
37, 1, 1, 130, 250, 0, 1, 187, 0, 3.5, 1, 0, 3, 0
41, 0, 1, 130, 204, 0, 1, 172, 0, 1.4, 1, 0, 3, 0
56, 1, 2, 140, 239, 0, 1, 178, 0, 0.8, 1, 0, 7, 1

OUTPUT:
INFO:pgmpy: Datatype (N=numerical, C=Categorical Unordered, O=Categorical Ordered)
inferred from data:
{'age': 'N', 'sex': 'N', 'cp': 'N', 'trestbps': 'N', 'chol': 'N', 'fbs': 'N', 'restecg': 'N', 'thalach': 'N',
'exang': 'N', 'oldpeak': 'N', 'slope': 'N', 'ca': 'N', 'thal': 'N', 'heartdisease': 'N'}
+---------------------+-------------------------+
| heartdisease | phi(heartdisease) |
+=============+================+
| heartdisease(0) | 0.3000 |
+----------------------+--------------------+
| heartdisease(1) | 0.7000 |
+----------------------+--------------------+

WEEK-8
Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML
library classes/API in the program.

PROGRAM:
import numpy as np
import pandas as pd
from sklearn.mixture import GaussianMixture
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
# Read dataset from CSV file
df = pd.read_csv('kmeansdata.csv')
# Display the first few rows to verify the data
print(df.head())
# Visualize the dataset
plt.figure(figsize=(8, 6))
plt.scatter(df['Distance_Feature'], df['Speeding_Feature'], c='blue', s=50)
plt.title('Dataset: Distance vs Speeding')
plt.xlabel('Distance_Feature')
plt.ylabel('Speeding_Feature')
plt.xlim([0, 100])
plt.ylim([0, 50])
plt.show()
# Convert to NumPy array for clustering
X = df[['Distance_Feature', 'Speeding_Feature']].values
# Expectation Maximization (Gaussian Mixture Model)
gmm = GaussianMixture(n_components=3)
gmm.fit(X)
em_predictions = gmm.predict(X)
# Display EM results
print("\nEM predictions:", em_predictions)
print("Mean of clusters:\n", gmm.means_)
print("\nCovariances:\n", gmm.covariances_)
# Visualize the EM results
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=em_predictions, cmap='viridis', s=50)
plt.title('Expectation Maximization (GMM) Clustering')
plt.xlabel('Distance_Feature')
plt.ylabel('Speeding_Feature')
plt.show()
# KMeans Clustering
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
# Display KMeans results
print("KMeans cluster centers:\n", kmeans.cluster_centers_)
print("KMeans labels:\n", kmeans.labels_)
# Visualize the KMeans results
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='rainbow', s=50)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], color='black',
marker='X', s=100)
plt.title('KMeans Clustering')
plt.xlabel('Distance_Feature')
plt.ylabel('Speeding_Feature')
plt.show()

DATASET('kmeansdata.csv'):
Distance_Feature,Speeding_Feature
37.454012,1.571459
95.071431,31.820521
73.199394,15.717799
59.865848,25.428535
15.601864,45.378324

OUTPUT:
Distance_Feature Speeding_Feature
0 37.454012 1.571459
1 95.071431 31.820521
2 73.199394 15.717799
3 59.865848 25.428535
4 15.601864 45.378324

EM predictions: [2 1 2 2 0]
Mean of clusters:
[[15.601864 45.378324 ]
[95.071431 31.820521 ]
[56.83975135 14.23926434]]

Covariances:
[[[1.00000000e-06 3.53409686e-27]
[3.53409686e-27 1.00000000e-06]]

[[1.00000000e-06 1.51461294e-26]
[1.51461294e-26 1.00000000e-06]]

[[2.17534021e+02 1.01207629e+02]
[1.01207629e+02 9.59530460e+01]]]
KMeans cluster centers:
[[56.83975133 14.23926433]
[15.601864 45.378324 ]
[95.071431 31.820521 ]]
KMeans labels:
[0 2 0 0 1]

WEEK-9
Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be
used for this problem.

PROGRAM:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import pandas as pd
# Load dataset
dataset = pd.read_csv("iris.csv")
# Assuming 'Species' is the target variable and other columns are features
X = dataset.drop(columns=["Species"])
y = dataset["Species"]
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.25)
# Create classifier with class weights
classifier = KNeighborsClassifier(n_neighbors=8, p=3, metric='euclidean', weights='distance')
# Train the classifier
classifier.fit(X_train, y_train)

# Predict the results

y_pred = classifier.predict(X_test)
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion matrix is as follows\n", cm)
# Accuracy Metrics with zero_division=1 to handle warnings
print("Accuracy Metrics:")
print(classification_report(y_test, y_pred, zero_division=1))
# Accuracy score
print("Correct Predictions:", accuracy_score(y_test, y_pred))
print("Wrong Predictions:", (1 - accuracy_score(y_test, y_pred)))

DATASET("iris.csv"):
SepalLength,SepalWidth,PetalLength,PetalWidth,Species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
7.0,3.2,4.7,1.4,versicolor
6.4,3.2,4.5,1.5,versicolor
6.9,3.1,4.9,1.5,versicolor
5.5,2.3,4.0,1.3,versicolor
6.5,2.8,4.6,1.5,versicolor
6.3,3.3,6.0,2.5,virginica
5.8,2.7,5.1,1.9,virginica
7.1,3.0,5.9,2.1,virginica
6.3,2.9,5.6,1.8,virginica
6.5,3.0,5.8,2.2,virginica

OUTPUT:
Confusion matrix is as follows
[[2 0 0]
[0 1 0]
[0 0 1]]
Accuracy Metrics:
precision recall f1-score support

setosa 1.00 1.00 1.00 2

versicolor 1.00 1.00 1.00 1
virginica 1.00 1.00 1.00 1

accuracy 1.00 4
macro avg 1.00 1.00 1.00 4
weighted avg 1.00 1.00 1.00 4

Correct Predictions: 1.0

Wrong Predictions: 0.0

WEEK-10
Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points. Select appropriate data set for your experiment and draw graphs.

PROGRAM:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Kernel function to compute weights
def kernel(point, xmat, k):
m, n = np.shape(xmat)
weights = np.eye(m) # Identity matrix
for j in range(m):
diff = point - xmat[j]
weights[j, j] = np.exp((diff @ diff.T).item() / (-2.0 * k**2)) # Use .item() to extract scalar
return weights
# Local weight computation with regularization
def localWeight(point, xmat, ymat, k, regularization=1e-5):
wei = kernel(point, xmat, k)
XTWX = xmat.T @ (wei @ xmat) + regularization * np.eye(xmat.shape[1]) # Regularization
for numerical stability
W = np.linalg.inv(XTWX) @ (xmat.T @ (wei @ ymat))
return W
# Local regression predictions
def localWeightRegression(xmat, ymat, k):
m, n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = (xmat[i] @ localWeight(xmat[i], xmat, ymat, k)).item() # Use .item() to
extract scalar
return ypred
# Load dataset
data = pd.read_csv('tips.csv')
bill = np.array(data['total_bill'])
tip = np.array(data['tip'])
# Prepare dataset for regression
mbill = np.asmatrix(bill).T # Convert to column vector
mtip = np.asmatrix(tip).T # Convert to column vector
m = np.shape(mbill)[0]
one = np.asmatrix(np.ones(m)).T # Add ones column for intercept
X = np.hstack((one, mbill)) # Combine ones and bill
# Set kernel bandwidth
k = 0.2
# Perform regression
ypred = localWeightRegression(X, mtip, k)
# Sort for plotting
SortIndex = X[:, 1].argsort(0)
xsort = X[SortIndex][:, 0]
# Plot results
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.scatter(bill, tip, color='green') # Scatter plot of data
ax.plot(xsort[:, 1], ypred[SortIndex], color='red', linewidth=5) # Fitted curve
plt.xlabel('Total Bill')
plt.ylabel('Tip')
plt.show()

DATASET('tips.csv'):
total_bill,tip
16.99,1.01
10.34,1.66
21.01,3.50
23.68,3.31
24.59,3.61
25.29,4.71
8.77,2.00
26.88,3.53
15.04,1.96
14.78,3.00

OUTPUT:

Machine Learning Laboratory Record Book: 1 Find S Algorithm
No ratings yet
Machine Learning Laboratory Record Book: 1 Find S Algorithm
22 pages
1.implement FIND-S Algorithm: Desription
No ratings yet
1.implement FIND-S Algorithm: Desription
19 pages
Machine Learning LAB MANUAL
No ratings yet
Machine Learning LAB MANUAL
23 pages
ML Lab Manual
No ratings yet
ML Lab Manual
90 pages
New ML Lab Manual
No ratings yet
New ML Lab Manual
29 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
Machine Learning Through Python Lab Mannual
No ratings yet
Machine Learning Through Python Lab Mannual
33 pages
ML Manual
No ratings yet
ML Manual
74 pages
Machine Learning Manual Final
No ratings yet
Machine Learning Manual Final
37 pages
Fedal #5
No ratings yet
Fedal #5
33 pages
Lab Manual
No ratings yet
Lab Manual
25 pages
AD3461 - ML Lab Manual
No ratings yet
AD3461 - ML Lab Manual
54 pages
IV - ML Lab
No ratings yet
IV - ML Lab
31 pages
Lab Manual
No ratings yet
Lab Manual
55 pages
ML Lab Record
No ratings yet
ML Lab Record
49 pages
ML Ex1
No ratings yet
ML Ex1
12 pages
MLWP LAB Experiment's
No ratings yet
MLWP LAB Experiment's
11 pages
ML1 3 Merged
No ratings yet
ML1 3 Merged
19 pages
ML Lab Manual (1-9)
No ratings yet
ML Lab Manual (1-9)
37 pages
Final Lab Programs
No ratings yet
Final Lab Programs
52 pages
ML Lab PFG - Removed - Removed - Removed
No ratings yet
ML Lab PFG - Removed - Removed - Removed
22 pages
IT ML Lab
No ratings yet
IT ML Lab
35 pages
ML Lab - 231009 - 210335
No ratings yet
ML Lab - 231009 - 210335
38 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
MLlab Manual LIET
No ratings yet
MLlab Manual LIET
52 pages
ML Lab Manual-99
No ratings yet
ML Lab Manual-99
23 pages
Ad3461 ML Lab Manual Format Edited
No ratings yet
Ad3461 ML Lab Manual Format Edited
45 pages
ML Lab Programs
No ratings yet
ML Lab Programs
42 pages
22K61A0618 - Removed - Lab Manual Sasi CLD
No ratings yet
22K61A0618 - Removed - Lab Manual Sasi CLD
25 pages
ML Lab Manual - Merged
No ratings yet
ML Lab Manual - Merged
44 pages
ML Lab
No ratings yet
ML Lab
9 pages
Screenshot 2023-12-07 at 11.07.49 AM
No ratings yet
Screenshot 2023-12-07 at 11.07.49 AM
14 pages
Amit MLT1
No ratings yet
Amit MLT1
22 pages
ML Lab
No ratings yet
ML Lab
21 pages
ML Lab Programs
No ratings yet
ML Lab Programs
18 pages
ML Final
No ratings yet
ML Final
19 pages
Codes & Outputs
No ratings yet
Codes & Outputs
9 pages
MANUAL
No ratings yet
MANUAL
33 pages
MANUAL
No ratings yet
MANUAL
34 pages
Machine Learning
No ratings yet
Machine Learning
27 pages
EXP2
No ratings yet
EXP2
3 pages
Practical 1: A. Design A Simple Machine Learning Model To Train The Training Instances and Test The Same
No ratings yet
Practical 1: A. Design A Simple Machine Learning Model To Train The Training Instances and Test The Same
30 pages
Machine Learninf File Final
No ratings yet
Machine Learninf File Final
45 pages
Shashidhar-18csl76 Final
No ratings yet
Shashidhar-18csl76 Final
19 pages
ML Lab Record
No ratings yet
ML Lab Record
30 pages
ML Lab File Batch 1
No ratings yet
ML Lab File Batch 1
20 pages
ML1408-Machine Learning Lab Programs
No ratings yet
ML1408-Machine Learning Lab Programs
17 pages
Machine Learning Lab Record: Dr. Sarika Hegde
No ratings yet
Machine Learning Lab Record: Dr. Sarika Hegde
23 pages
Microsoft Certified Azure AI Fundamentals
No ratings yet
Microsoft Certified Azure AI Fundamentals
75 pages
PESIT Bangalore South Campus: Vii Semester Lab Manual Subject: Machine Learning
No ratings yet
PESIT Bangalore South Campus: Vii Semester Lab Manual Subject: Machine Learning
31 pages
ML Lab Observation
100% (1)
ML Lab Observation
44 pages
ML Lab Output
No ratings yet
ML Lab Output
15 pages
R20 Iii-Ii ML Lab Manual
100% (1)
R20 Iii-Ii ML Lab Manual
79 pages
ML Lab Prog1-5 (5) College PDF
No ratings yet
ML Lab Prog1-5 (5) College PDF
12 pages
Machine Learning Lab (17CSL76)
No ratings yet
Machine Learning Lab (17CSL76)
48 pages
Machine Learning Laboratory Manual
No ratings yet
Machine Learning Laboratory Manual
11 pages
Data Set
No ratings yet
Data Set
10 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
43 pages
ML Lab Programs 1-10-Converted NAM COLLEGE PDF
No ratings yet
ML Lab Programs 1-10-Converted NAM COLLEGE PDF
33 pages
AI For Medical Devices Checklist - Rev. A
No ratings yet
AI For Medical Devices Checklist - Rev. A
16 pages
Crime Rate Analysis Using Machine Learning Final
100% (1)
Crime Rate Analysis Using Machine Learning Final
37 pages
DWDM File
No ratings yet
DWDM File
26 pages
Thesis ADS11
No ratings yet
Thesis ADS11
54 pages
UNIT II DL
No ratings yet
UNIT II DL
17 pages
Steel
No ratings yet
Steel
52 pages
AI Unit-5
No ratings yet
AI Unit-5
34 pages
MERA: A Comprehensive LLM Evaluation in Russian: Bommasani Et Al. 2023 Ye Et Al. 2023
No ratings yet
MERA: A Comprehensive LLM Evaluation in Russian: Bommasani Et Al. 2023 Ye Et Al. 2023
29 pages
Machine Leaning Cours
No ratings yet
Machine Leaning Cours
24 pages
Sentiment Analysis of IMDb Movie Reviews A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
No ratings yet
Sentiment Analysis of IMDb Movie Reviews A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
6 pages
Document
No ratings yet
Document
2 pages
RSAI Assignment 1
No ratings yet
RSAI Assignment 1
6 pages
Pavani
No ratings yet
Pavani
7 pages
10.2. Accuracy and Quality Measurements
No ratings yet
10.2. Accuracy and Quality Measurements
55 pages
Kumar 2021
No ratings yet
Kumar 2021
11 pages
CopyoficnteT12 S44 P249
No ratings yet
CopyoficnteT12 S44 P249
10 pages
Pseudo Trained YOLO R - CNN Model For Weapon Detection With A Real-Time Kaggle Dataset
No ratings yet
Pseudo Trained YOLO R - CNN Model For Weapon Detection With A Real-Time Kaggle Dataset
15 pages
Unit3 - Machine Learning With Big Data
No ratings yet
Unit3 - Machine Learning With Big Data
74 pages
1 s2.0 S1743919121002867 Main
No ratings yet
1 s2.0 S1743919121002867 Main
17 pages
Final - Urop - Report - Heart Attack Machine Learning
No ratings yet
Final - Urop - Report - Heart Attack Machine Learning
33 pages
Ferret-UI - Grounded Mobile UI Understanding With Multimodal LLMs
No ratings yet
Ferret-UI - Grounded Mobile UI Understanding With Multimodal LLMs
28 pages
Weighted Boxes Fusion: Ensembling Boxes From Different Object Detection Models
No ratings yet
Weighted Boxes Fusion: Ensembling Boxes From Different Object Detection Models
9 pages
?introducing PandasAI: The Generative AI Python Library ? - by Gabe A, M.Sc. - May, 2023 - Level Up Coding
No ratings yet
?introducing PandasAI: The Generative AI Python Library ? - by Gabe A, M.Sc. - May, 2023 - Level Up Coding
22 pages
Clinically Relevant Features For Predicting The Severity of Surgical Site Infections
No ratings yet
Clinically Relevant Features For Predicting The Severity of Surgical Site Infections
8 pages
Potential Question Slot 3-4
No ratings yet
Potential Question Slot 3-4
12 pages
Li DeepFusion Lidar-Camera Deep Fusion For Multi-Modal 3D Object Detection CVPR 2022 Paper
No ratings yet
Li DeepFusion Lidar-Camera Deep Fusion For Multi-Modal 3D Object Detection CVPR 2022 Paper
10 pages
Chapter4 (The Evaluating Multiple Models Chapter Is Really Good!)
No ratings yet
Chapter4 (The Evaluating Multiple Models Chapter Is Really Good!)
47 pages
Big Data Report 1
No ratings yet
Big Data Report 1
17 pages
Impact of Outliers On Machine Learning Models
No ratings yet
Impact of Outliers On Machine Learning Models
2 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Machine Learning Lab Manual

Uploaded by

Machine Learning Lab Manual

Uploaded by

WEEK - 1

public class NaiveBayesClassifier {

# Predict the results

setosa 1.00 1.00 1.00 2

Correct Predictions: 1.0

You might also like