0% found this document useful (0 votes)
10 views3 pages

Step 2: Implement The ID3 Algorithm

The document outlines the steps to prepare and implement the ID3 algorithm using the PlayTennis dataset, which includes features like Outlook, Temperature, Humidity, and Wind to predict the target variable PlayTennis. It details the calculation of entropy and information gain, the construction of the decision tree, and the testing of the model with a sample input. The ID3 algorithm's inductive bias is highlighted as a preference for features that maximize information gain, influencing the decision tree's structure.

Uploaded by

ritammaiti2016
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views3 pages

Step 2: Implement The ID3 Algorithm

The document outlines the steps to prepare and implement the ID3 algorithm using the PlayTennis dataset, which includes features like Outlook, Temperature, Humidity, and Wind to predict the target variable PlayTennis. It details the calculation of entropy and information gain, the construction of the decision tree, and the testing of the model with a sample input. The ID3 algorithm's inductive bias is highlighted as a preference for features that maximize information gain, influencing the decision tree's structure.

Uploaded by

ritammaiti2016
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Step 1: Prepare the PlayTennis Dataset

The dataset looks like this (a small portion for illustration):


In [8]:
Outlook Temperature Humidity Wind PlayTennis
0 Sunny Hot High Weak No
1 Sunny Hot High Strong No
2 Overcast Hot High Weak Yes
3 Rain Mild High Weak Yes
4 Rain Cool Normal Weak Yes
5 Rain Cool Normal Strong No
6 Overcast Cool Normal Strong Yes
7 Sunny Mild High Weak No
8 Sunny Cool Normal Weak Yes
9 Rain Mild Normal Weak Yes
10 Sunny Mild Normal Strong Yes
11 Overcast Mild High Strong Yes
12 Overcast Hot Normal Weak Yes
13 Rain Mild High Strong No

Step 2: Implement the ID3 Algorithm


In [8]: import pandas as pd
import numpy as np

# PlayTennis dataset
data = {
'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rain', 'Rain', 'Rain', 'Overc
'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mi
'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal
'Wind': ['Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', '
'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes'
}

df = pd.DataFrame(data)

In [9]: # Step 1: Calculate Entropy


def entropy(target_column):
# Calculate the entropy of a given column (target_column)
elements, counts = np.unique(target_column, return_counts=True)
entropy_value = sum((-counts[i] / np.sum(counts)) * np.log2(counts[i] /
return entropy_value

In [10]: # Step 2: Calculate Information Gain


def information_gain(data, feature, target):
# Calculate Information Gain for a given feature
total_entropy = entropy(data[target])
values, counts = np.unique(data[feature], return_counts=True)
weighted_entropy = 0
for i in range(len(values)):
subset = data[data[feature] == values[i]]
weighted_entropy += (counts[i] / np.sum(counts)) * entropy(subset[ta

return total_entropy - weighted_entropy

In [15]: # Step 3: ID3 Algorithm to Build Tree


def id3(data, target, features):
# Base cases: If only one class, return it; or no features left
if len(np.unique(data[target])) == 1:
return np.unique(data[target])[0]
if len(features) == 0:
return np.unique(data[target])[np.argmax(np.bincount(data[target].va

# Step 3: Select the feature with the highest information gain


gains = [information_gain(data, feature, target) for feature in features
best_feature = features[np.argmax(gains)]

# Create a new decision tree node


tree = {best_feature: {}}

# Recurse for each value of the best feature


remaining_features = [f for f in features if f != best_feature]
for value in np.unique(data[best_feature]):
subset = data[data[best_feature] == value]
tree[best_feature][value] = id3(subset, target, remaining_features)

return tree

# Train the decision tree using ID3


target_column = 'PlayTennis'
features = [col for col in df.columns if col != target_column]
tree = id3(df, target_column, features)
print("Decision Tree:", tree)
Decision Tree: {'Outlook': {'Overcast': 'Yes', 'Rain': {'Wind': {'Strong': 'N
o', 'Weak': 'Yes'}}, 'Sunny': {'Humidity': {'High': 'No', 'Normal': 'Yes'}}}}

Step 3: Testing the Algorithm


Once the tree is built, we can evaluate its performance and verify
its inductive bias.
Step 4: Inductive Bias of ID3
The inductive bias of the ID3 algorithm is that it prefers splitting
on features that maximize information gain. This bias is reflected
in the fact that the algorithm prefers simpler trees with fewer
levels if they can achieve high information gain. In the case of the
PlayTennis dataset, the ID3 algorithm will likely choose attributes
like Outlook and Humidity over others like Wind based on the
information gain calculation.
Explanation:
Entropy: Measures the disorder or impurity of the dataset. Higher
entropy means more disorder.
Information Gain: Measures the effectiveness of an attribute in
classifying the dataset. The attribute with the highest information
gain is chosen for the decision tree node.
Step 5: Testing the Model
Once the tree is built, you can test it by making predictions on new data by
traversing the tree based on the feature values. Here's an example of how you
can test the model on a new sample:
In [2]: def predict(tree, sample):
# Traverse the decision tree
if not isinstance(tree, dict):
return tree
feature = list(tree.keys())[0]
feature_value = sample[feature]
return predict(tree[feature][feature_value], sample)

# Test the model with a sample


test_sample = {'Outlook': 'Sunny', 'Temperature': 'Hot', 'Humidity': 'High',
prediction = predict(tree, test_sample)
print(f"Prediction for test sample: {prediction}")
Prediction for test sample: No

Conclusion
This implementation provides a basic ID3 algorithm that builds a decision tree
using information gain and can be used to predict the target variable
(PlayTennis) based on the features. The inductive bias of the ID3 algorithm is
revealed through its preference for attributes that maximize information gain,
which influences the structure of the tree and the model's generalization.
In [ ]:

You might also like