Machine Learning Lab Manual
Machine Learning Lab Manual
ALIGARH
. FIND-S Algorithm
Objective:
To find the most specific hypothesis that fits all positive training examples.
Theory:
FIND-S starts with the most specific hypothesis and generalizes it only when a positive
example contradicts it.
Algorithm:
Python Code:
python
CopyEdit
import csv
def load_csv(filename):
with open(filename, 'r') as f:
rows = list(csv.reader(f))
header = rows[0]
data = rows[1:]
return header, data
def find_s_algorithm(data):
hypothesis = ['0'] * len(data[0][:-1])
for row in data:
if row[-1] == 'Yes':
for i in range(len(hypothesis)):
if hypothesis[i] == '0':
hypothesis[i] = row[i]
elif hypothesis[i] != row[i]:
hypothesis[i] = '?'
return hypothesis
Result:
FIND-S algorithm correctly finds the most specific hypothesis consistent with the data.
2. Candidate-Elimination Algorithm
Objective:
To find the version space: all hypotheses consistent with the training data.
Theory:
Maintains S (most specific) and G (most general) boundaries and shrinks them based on
training examples.
Python Code:
import csv
def load_csv(filename):
with open(filename) as f:
reader = csv.reader(f)
data = list(reader)[1:]
return data
def candidate_elimination(data):
n_attributes = len(data[0]) - 1
S = ['0'] * n_attributes
G = [['?'] * n_attributes]
if label == 'Yes':
for i in range(n_attributes):
if S[i] == '0':
S[i] = x[i]
elif S[i] != x[i]:
S[i] = '?'
G = [g for g in G if consistent(g, x)]
else:
G_new = []
for g in G:
for i in range(n_attributes):
if g[i] == '?':
if S[i] != x[i]:
new_hypothesis = g[:]
new_hypothesis[i] = S[i]
if new_hypothesis not in G_new:
G_new.append(new_hypothesis)
G = G_new
return S, G
data = load_csv("finds.csv")
S_final, G_final = candidate_elimination(data)
print("S Final:", S_final)
print("G Final:", G_final)
Output:
less
CopyEdit
S Final: ['Sunny', 'Warm', '?', 'Strong', '?', '?']
G Final: [['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?',
'?']]
Result:
Candidate-Elimination returns the boundary version space between general and specific
hypotheses.
3..To demonstrate the working of the decision tree-based ID3 algorithm, build a tree from a
dataset, and classify a new sample.
📘 Theory:
# Load dataset
data = pd.read_csv("id3.csv")
prediction = clf.predict(sample_encoded)
print("\nPrediction for sample [Sunny, Cool, High, Strong]:", "Yes" if
prediction[0] else "No")
📌 Output:
yaml
CopyEdit
Decision Tree:
✅ Result:
The ID3 algorithm successfully built a decision tree and classified a new sample input using
entropy-based splitting.
Would you like me to generate a visual (graph) of the decision tree as well?
🧠 Theory:
def sigmoid_derivative(x):
return x * (1 - x)
# Expected output
y = np.array([
[0],
[1],
[1],
[0]
])
# Weights
wh = np.random.uniform(size=(input_layer_neurons, hidden_layer_neurons))
bh = np.random.uniform(size=(1, hidden_layer_neurons))
wo = np.random.uniform(size=(hidden_layer_neurons, output_neurons))
bo = np.random.uniform(size=(1, output_neurons))
# Training algorithm
epochs = 10000
lr = 0.1
for i in range(epochs):
# Forward pass
hidden_input = np.dot(X, wh) + bh
hidden_output = sigmoid(hidden_input)
# Backpropagation
error = y - predicted_output
d_predicted_output = error * sigmoid_derivative(predicted_output)
error_hidden_layer = d_predicted_output.dot(wo.T)
d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_output)
# Final output
print("Final Output After Training:\n", np.round(predicted_output, 3))
🧠 Output:
lua
CopyEdit
Final Output After Training:
[[0.01]
[0.99]
[0.99]
[0.02]]
✅ Result:
The ANN successfully learned the XOR function using the backpropagation algorithm.
5 Objective:
To implement a Naïve Bayes classifier using a training dataset from a CSV file and evaluate
its accuracy on test data.**
Save as naive_bayes.csv.
# Load dataset
df = pd.read_csv("naive_bayes.csv")
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X_encoded, y_encoded,
test_size=0.3, random_state=42)
# Predict
y_pred = model.predict(X_test)
# Accuracy
print("Predicted:", y_pred)
print("Actual: ", y_test.values)
print("Accuracy: {:.2f}%".format(accuracy_score(y_test, y_pred) * 100))
🧠 Sample Output:
makefile
CopyEdit
Predicted: [1 1 0 1]
Actual: [1 1 0 1]
Accuracy: 100.00%
✅ Result:
The Naïve Bayes classifier correctly predicted the test samples and reported the model
accuracy.
6 Objective:
To construct a Bayesian Network from the Heart Disease dataset and use it to diagnose
whether a patient is at risk of heart disease.
csv
CopyEdit
age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,target
63,1,3,145,233,1,0,150,0,1
67,1,2,160,286,0,2,108,1,1
37,1,1,130,250,0,0,187,0,0
41,0,1,130,204,0,2,172,0,0
56,1,1,120,236,0,2,178,0,0
# Inference
infer = VariableElimination(model)
# Query: What is the probability of heart disease for a male (sex=1), chest
pain type 3, and cholesterol level 2?
query_result = infer.query(
variables=['target'],
evidence={'sex': 1, 'cp': 2, 'chol': 2}
)
🧠 Sample Output:
diff
CopyEdit
Diagnosis Result:
+-----------+-----------------+
| target | phi(target) |
+===========+=================+
| target(0) | 0.2333 |
+-----------+-----------------+
| target(1) | 0.7667 |
+-----------+-----------------+
✅ This means there is a 76.67% chance the patient has heart disease given the evidence.
🛠️ Requirements:
bash
CopyEdit
pip install pandas scikit-learn pgmpy
✅ Result:
The Bayesian Network successfully diagnoses the likelihood of heart disease based on patient
medical features using probabilistic inference.
7 Objective:
To classify flower species in the Iris dataset using the k-NN algorithm, and display correct
and incorrect predictions.
📘 Dataset:
Iris-setosa
Iris-versicolor
Iris-virginica
Sepal Length
Sepal Width
Petal Length
Petal Width
# Make predictions
y_pred = model.predict(X_test)
# Overall accuracy
print(f"\nTotal Correct Predictions: {correct}")
print(f"Total Wrong Predictions: {wrong}")
print(f"Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")
🧠 Sample Output:
yaml
CopyEdit
--- Prediction Results ---
Sample 1: Actual: setosa, Predicted: setosa → ✅ Correct
Sample 2: Actual: versicolor, Predicted: virginica → ❌ Wrong
Sample 3: Actual: virginica, Predicted: virginica → ✅ Correct
...
✅ Result:
The k-NN model correctly classified most of the test samples, and the program listed both
correct ✅ and wrong ❌ predictions for better evaluation.