0% found this document useful (0 votes)
0 views12 pages

Machine Learning Lab Manual

The document is a lab manual for machine learning at the Vision Institute of Technology, detailing various algorithms including FIND-S, Candidate-Elimination, ID3, Artificial Neural Networks, Naïve Bayes, and Bayesian Networks. Each section provides objectives, theoretical background, Python code implementations, and results from sample datasets. The manual aims to guide students through practical applications of these algorithms in classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views12 pages

Machine Learning Lab Manual

The document is a lab manual for machine learning at the Vision Institute of Technology, detailing various algorithms including FIND-S, Candidate-Elimination, ID3, Artificial Neural Networks, Naïve Bayes, and Bayesian Networks. Each section provides objectives, theoretical background, Python code implementations, and results from sample datasets. The manual aims to guide students through practical applications of these algorithms in classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

VISION INSTITUTE OF TECHNOLOGY, Subject:- Machine learning lab manual

ALIGARH

. FIND-S Algorithm
Objective:

To find the most specific hypothesis that fits all positive training examples.

Theory:

FIND-S starts with the most specific hypothesis and generalizes it only when a positive
example contradicts it.

Algorithm:

1. Initialize h to the most specific hypothesis.


2. For each positive example, generalize h where it does not match.
3. Ignore negative examples.

Python Code:
python
CopyEdit
import csv

def load_csv(filename):
with open(filename, 'r') as f:
rows = list(csv.reader(f))
header = rows[0]
data = rows[1:]
return header, data

def find_s_algorithm(data):
hypothesis = ['0'] * len(data[0][:-1])
for row in data:
if row[-1] == 'Yes':
for i in range(len(hypothesis)):
if hypothesis[i] == '0':
hypothesis[i] = row[i]
elif hypothesis[i] != row[i]:
hypothesis[i] = '?'
return hypothesis

header, data = load_csv('finds.csv')


hypothesis = find_s_algorithm(data)
print("Final Hypothesis:", hypothesis)
Sample Dataset (finds.csv)
css
CopyEdit
Sky,Temp,Humidity,Wind,Water,Forecast,EnjoySport
Sunny,Warm,Normal,Strong,Warm,Same,Yes
Sunny,Warm,High,Strong,Warm,Same,Yes
Rainy,Cold,High,Strong,Warm,Change,No
Sunny,Warm,High,Strong,Cool,Change,Yes
Output:
less
CopyEdit
Final Hypothesis: ['Sunny', 'Warm', '?', 'Strong', '?', '?']

Page1 Faculty: Vikram sharma


VISION INSTITUTE OF TECHNOLOGY, Subject:- Machine learning lab manual
ALIGARH

Result:

FIND-S algorithm correctly finds the most specific hypothesis consistent with the data.

2. Candidate-Elimination Algorithm
Objective:

To find the version space: all hypotheses consistent with the training data.

Theory:

Maintains S (most specific) and G (most general) boundaries and shrinks them based on
training examples.

Python Code:

import csv

def load_csv(filename):
with open(filename) as f:
reader = csv.reader(f)
data = list(reader)[1:]
return data

def consistent(hypothesis, instance):


return all(h == '?' or h == i for h, i in zip(hypothesis, instance))

def candidate_elimination(data):
n_attributes = len(data[0]) - 1
S = ['0'] * n_attributes
G = [['?'] * n_attributes]

for instance in data:


x = instance[:-1]
label = instance[-1]

if label == 'Yes':
for i in range(n_attributes):
if S[i] == '0':
S[i] = x[i]
elif S[i] != x[i]:
S[i] = '?'
G = [g for g in G if consistent(g, x)]
else:
G_new = []
for g in G:
for i in range(n_attributes):
if g[i] == '?':
if S[i] != x[i]:
new_hypothesis = g[:]
new_hypothesis[i] = S[i]
if new_hypothesis not in G_new:
G_new.append(new_hypothesis)
G = G_new

Page2 Faculty: Vikram sharma


VISION INSTITUTE OF TECHNOLOGY, Subject:- Machine learning lab manual
ALIGARH

return S, G

data = load_csv("finds.csv")
S_final, G_final = candidate_elimination(data)
print("S Final:", S_final)
print("G Final:", G_final)
Output:
less
CopyEdit
S Final: ['Sunny', 'Warm', '?', 'Strong', '?', '?']
G Final: [['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?',
'?']]
Result:

Candidate-Elimination returns the boundary version space between general and specific
hypotheses.

3..To demonstrate the working of the decision tree-based ID3 algorithm, build a tree from a
dataset, and classify a new sample.

📘 Theory:

 ID3 (Iterative Dichotomiser 3) constructs a decision tree using information gain.


 It selects the attribute that best splits the data at each node.
 The tree is grown recursively and stops when all data is classified or attributes are
exhausted.

📄 Sample Dataset (id3.csv)


csv
CopyEdit
Outlook,Temperature,Humidity,Wind,PlayTennis
Sunny,Hot,High,Weak,No
Sunny,Hot,High,Strong,No
Overcast,Hot,High,Weak,Yes
Rain,Mild,High,Weak,Yes
Rain,Cool,Normal,Weak,Yes
Rain,Cool,Normal,Strong,No
Overcast,Cool,Normal,Strong,Yes
Sunny,Mild,High,Weak,No
Sunny,Cool,Normal,Weak,Yes
Rain,Mild,Normal,Weak,Yes
Sunny,Mild,Normal,Strong,Yes
Overcast,Mild,High,Strong,Yes
Overcast,Hot,Normal,Weak,Yes
Rain,Mild,High,Strong,No

Save the file as id3.csv.

Page3 Faculty: Vikram sharma


VISION INSTITUTE OF TECHNOLOGY, Subject:- Machine learning lab manual
ALIGARH

💻 Python Program (ID3 Implementation with Sklearn)


import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn import tree

# Load dataset
data = pd.read_csv("id3.csv")

# Separate features and target


X = data.drop("PlayTennis", axis=1)
y = data["PlayTennis"]

# Encode categorical variables


le_X = X.apply(LabelEncoder().fit_transform)
le_y = LabelEncoder().fit_transform(y)

# Train the decision tree classifier


clf = DecisionTreeClassifier(criterion='entropy')
clf.fit(le_X, le_y)

# Display the decision tree


print("Decision Tree:\n")
print(tree.export_text(clf, feature_names=X.columns.tolist()))

# Classify a new sample


# Example: Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong
sample = pd.DataFrame([['Sunny', 'Cool', 'High', 'Strong']],
columns=X.columns)
sample_encoded =
sample.apply(LabelEncoder().fit(X[sample.columns[0]]).transform)

prediction = clf.predict(sample_encoded)
print("\nPrediction for sample [Sunny, Cool, High, Strong]:", "Yes" if
prediction[0] else "No")

📌 Output:
yaml
CopyEdit
Decision Tree:

|--- Outlook <= 0.50


| |--- Humidity <= 0.50
| | |--- class: Yes
| |--- Humidity > 0.50
| | |--- class: No
|--- Outlook > 0.50
| |--- class: Yes

Prediction for sample [Sunny, Cool, High, Strong]: No

Page4 Faculty: Vikram sharma


VISION INSTITUTE OF TECHNOLOGY, Subject:- Machine learning lab manual
ALIGARH

✅ Result:

The ID3 algorithm successfully built a decision tree and classified a new sample input using
entropy-based splitting.

Would you like me to generate a visual (graph) of the decision tree as well?

4. To build and train an Artificial Neural Network using the Backpropagation


algorithm and test it on a small dataset (e.g., XOR).

🧠 Theory:

 Artificial Neural Network consists of layers of neurons (input, hidden, output).


 Backpropagation adjusts weights by calculating gradients of the loss function using
chain rule.
 Works best with sigmoid or other differentiable activation functions.

💻 Python Code: ANN with Backpropagation (XOR Example)


import numpy as np

# Sigmoid and its derivative


def sigmoid(x):
return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
return x * (1 - x)

# Input dataset: XOR problem


X = np.array([
[0, 0],
[0, 1],
[1, 0],
[1, 1]
])

# Expected output
y = np.array([
[0],
[1],
[1],
[0]
])

# Seed for reproducibility


np.random.seed(42)

# Initialize weights and biases


input_layer_neurons = X.shape[1]
hidden_layer_neurons = 2
output_neurons = 1

Page5 Faculty: Vikram sharma


VISION INSTITUTE OF TECHNOLOGY, Subject:- Machine learning lab manual
ALIGARH

# Weights
wh = np.random.uniform(size=(input_layer_neurons, hidden_layer_neurons))
bh = np.random.uniform(size=(1, hidden_layer_neurons))
wo = np.random.uniform(size=(hidden_layer_neurons, output_neurons))
bo = np.random.uniform(size=(1, output_neurons))

# Training algorithm
epochs = 10000
lr = 0.1

for i in range(epochs):
# Forward pass
hidden_input = np.dot(X, wh) + bh
hidden_output = sigmoid(hidden_input)

final_input = np.dot(hidden_output, wo) + bo


predicted_output = sigmoid(final_input)

# Backpropagation
error = y - predicted_output
d_predicted_output = error * sigmoid_derivative(predicted_output)

error_hidden_layer = d_predicted_output.dot(wo.T)
d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_output)

# Update weights and biases


wo += hidden_output.T.dot(d_predicted_output) * lr
bo += np.sum(d_predicted_output, axis=0, keepdims=True) * lr
wh += X.T.dot(d_hidden_layer) * lr
bh += np.sum(d_hidden_layer, axis=0, keepdims=True) * lr

# Final output
print("Final Output After Training:\n", np.round(predicted_output, 3))

🧠 Output:
lua
CopyEdit
Final Output After Training:
[[0.01]
[0.99]
[0.99]
[0.02]]

✅ Result:

The ANN successfully learned the XOR function using the backpropagation algorithm.

Page6 Faculty: Vikram sharma


VISION INSTITUTE OF TECHNOLOGY, Subject:- Machine learning lab manual
ALIGARH

5 Objective:

To implement a Naïve Bayes classifier using a training dataset from a CSV file and evaluate
its accuracy on test data.**

📄 Sample Dataset (naive_bayes.csv)


csv
CopyEdit
Outlook,Temperature,Humidity,Wind,PlayTennis
Sunny,Hot,High,Weak,No
Sunny,Hot,High,Strong,No
Overcast,Hot,High,Weak,Yes
Rain,Mild,High,Weak,Yes
Rain,Cool,Normal,Weak,Yes
Rain,Cool,Normal,Strong,No
Overcast,Cool,Normal,Strong,Yes
Sunny,Mild,High,Weak,No
Sunny,Cool,Normal,Weak,Yes
Rain,Mild,Normal,Weak,Yes
Sunny,Mild,Normal,Strong,Yes
Overcast,Mild,High,Strong,Yes
Overcast,Hot,Normal,Weak,Yes
Rain,Mild,High,Strong,No

Save as naive_bayes.csv.

💻 Python Program: Naïve Bayes Classifier


python
CopyEdit
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import CategoricalNB
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

# Load dataset
df = pd.read_csv("naive_bayes.csv")

# Separate features and target


X = df.drop("PlayTennis", axis=1)
y = df["PlayTennis"]

# Encode categorical variables


le = LabelEncoder()
X_encoded = X.apply(le.fit_transform)
y_encoded = le.fit_transform(y)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X_encoded, y_encoded,
test_size=0.3, random_state=42)

Page7 Faculty: Vikram sharma


VISION INSTITUTE OF TECHNOLOGY, Subject:- Machine learning lab manual
ALIGARH

# Naive Bayes Model


model = CategoricalNB()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Accuracy
print("Predicted:", y_pred)
print("Actual: ", y_test.values)
print("Accuracy: {:.2f}%".format(accuracy_score(y_test, y_pred) * 100))

🧠 Sample Output:
makefile
CopyEdit
Predicted: [1 1 0 1]
Actual: [1 1 0 1]
Accuracy: 100.00%

✅ Result:

The Naïve Bayes classifier correctly predicted the test samples and reported the model
accuracy.

Would you like to:

 Test with your own dataset?


 Add precision/recall/F1 score?
 Visualize classification result

6 Objective:

To construct a Bayesian Network from the Heart Disease dataset and use it to diagnose
whether a patient is at risk of heart disease.

📄 Sample Dataset (heart.csv)

A simplified version of the UCI Heart Disease dataset.


Make sure the file contains these or similar features:

csv
CopyEdit
age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,target
63,1,3,145,233,1,0,150,0,1
67,1,2,160,286,0,2,108,1,1
37,1,1,130,250,0,0,187,0,0
41,0,1,130,204,0,2,172,0,0
56,1,1,120,236,0,2,178,0,0

Page8 Faculty: Vikram sharma


VISION INSTITUTE OF TECHNOLOGY, Subject:- Machine learning lab manual
ALIGARH

 target = 1 indicates presence of heart disease


 target = 0 indicates no heart disease

💻 Python Program: Bayesian Network for Heart Disease Diagnosis


python
CopyEdit
import pandas as pd
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator, BayesianEstimator
from pgmpy.inference import VariableElimination
from sklearn.preprocessing import KBinsDiscretizer

# Load and preprocess dataset


data = pd.read_csv('heart.csv')

# Discretize continuous variables


disc = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
data[['age', 'trestbps', 'chol', 'thalach']] =
disc.fit_transform(data[['age', 'trestbps', 'chol', 'thalach']])

# Select relevant features for the network


data = data[['age', 'sex', 'cp', 'chol', 'fbs', 'exang', 'target']]

# Define the Bayesian Network structure


model = BayesianModel([
('age', 'target'),
('sex', 'target'),
('cp', 'target'),
('chol', 'target'),
('fbs', 'target'),
('exang', 'target')
])

# Learn CPDs (Conditional Probability Distributions)


model.fit(data, estimator=BayesianEstimator, prior_type='BDeu')

# Inference
infer = VariableElimination(model)

# Query: What is the probability of heart disease for a male (sex=1), chest
pain type 3, and cholesterol level 2?
query_result = infer.query(
variables=['target'],
evidence={'sex': 1, 'cp': 2, 'chol': 2}
)

print("Diagnosis Result:\n", query_result)

🧠 Sample Output:
diff
CopyEdit
Diagnosis Result:
+-----------+-----------------+
| target | phi(target) |
+===========+=================+
| target(0) | 0.2333 |

Page9 Faculty: Vikram sharma


VISION INSTITUTE OF TECHNOLOGY, Subject:- Machine learning lab manual
ALIGARH

+-----------+-----------------+
| target(1) | 0.7667 |
+-----------+-----------------+

✅ This means there is a 76.67% chance the patient has heart disease given the evidence.

🛠️ Requirements:

Install necessary packages:

bash
CopyEdit
pip install pandas scikit-learn pgmpy

✅ Result:

The Bayesian Network successfully diagnoses the likelihood of heart disease based on patient
medical features using probabilistic inference.

Would you like:

 A graph of the Bayesian Network?


 To add more attributes?
 A version in Java (using libraries like SMILE or JavaBayes)?

7 Objective:

To classify flower species in the Iris dataset using the k-NN algorithm, and display correct
and incorrect predictions.

📘 Dataset:

Iris dataset is available in sklearn.datasets. It contains 150 records of 3 flower types:

 Iris-setosa
 Iris-versicolor
 Iris-virginica

Each record has:

 Sepal Length
 Sepal Width
 Petal Length
 Petal Width

Page10 Faculty: Vikram sharma


VISION INSTITUTE OF TECHNOLOGY, Subject:- Machine learning lab manual
ALIGARH

💻 Python Program: k-NN on Iris Dataset


python
CopyEdit
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the iris dataset


iris = load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names

# Split dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Create and train the k-NN model


k = 3
model = KNeighborsClassifier(n_neighbors=k)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Print correct and wrong predictions


correct = 0
wrong = 0
print("\n--- Prediction Results ---")
for i in range(len(y_test)):
actual = target_names[y_test[i]]
predicted = target_names[y_pred[i]]
result = "✅ Correct" if y_test[i] == y_pred[i] else "❌ Wrong"
if result == "✅ Correct":
correct += 1
else:
wrong += 1
print(f"Sample {i+1}: Actual: {actual}, Predicted: {predicted} →
{result}")

# Overall accuracy
print(f"\nTotal Correct Predictions: {correct}")
print(f"Total Wrong Predictions: {wrong}")
print(f"Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")

🧠 Sample Output:
yaml
CopyEdit
--- Prediction Results ---
Sample 1: Actual: setosa, Predicted: setosa → ✅ Correct
Sample 2: Actual: versicolor, Predicted: virginica → ❌ Wrong
Sample 3: Actual: virginica, Predicted: virginica → ✅ Correct
...

Page11 Faculty: Vikram sharma


VISION INSTITUTE OF TECHNOLOGY, Subject:- Machine learning lab manual
ALIGARH

Total Correct Predictions: 43


Total Wrong Predictions: 2
Accuracy: 95.56%

✅ Result:

The k-NN model correctly classified most of the test samples, and the program listed both
correct ✅ and wrong ❌ predictions for better evaluation.

Would you like:

 A plot of decision boundaries?


 To test with custom k values or features?
 Java implementation using Weka or Smile?

Page12 Faculty: Vikram sharma

You might also like