0% found this document useful (0 votes)
3 views

dev id3.ipynb - Colab

The document outlines the ID3 algorithm for building decision trees, emphasizing the calculation of entropy and information gain to select the best attributes. It provides a step-by-step guide to manually create a decision tree using a dataset and includes a Python implementation using the sklearn library to train and visualize the tree. Additionally, it demonstrates how to test the decision tree with a sample query and presents the classification report for model evaluation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

dev id3.ipynb - Colab

The document outlines the ID3 algorithm for building decision trees, emphasizing the calculation of entropy and information gain to select the best attributes. It provides a step-by-step guide to manually create a decision tree using a dataset and includes a Python implementation using the sklearn library to train and visualize the tree. Additionally, it demonstrates how to test the decision tree with a sample query and presents the classification report for model evaluation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Experiment-8

1. State the ID3 Algo for Decision Tree


2. Solve the given dataset or create a decision tree for a given dataset on paper using ID3
3. Implement this ID3 algorithm in python for the same given dataset
4. visualise the decision tree
5. Test/Validate the tree for any query

1. ID3 Algorithm (Conceptual Summary) ID3 (Iterative Dichotomiser 3) builds a decision tree by
selecting the feature with the highest Information Gain at each node.

Steps:

1. Calculate Entropy of the dataset.

2. For each attribute, calculate Information Gain:

Information Gain = Entropy(Parent) − ∑ ( ∣ Subset ∣ ∣ Total ∣ × Entropy(Subset) ) Information


Gain=Entropy(Parent)−∑( ∣Total∣ ∣Subset∣​×Entropy(Subset))

3. Choose the attribute with the highest Information Gain.

4. Repeat recursively for each subset until:

All samples belong to one class

No attributes are left

2. Manually Solving (on Paper)

You can solve the decision tree on paper using the following steps:

Start with full dataset entropy (target = "PlayTennis")

For each attribute (Outlook, Humidity, etc.), compute the information gain

Choose attribute with max gain as root


Repeat for each branch (subset) until classified

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt

# Load the dataset


df = pd.read_csv("/content/PlayTennis.csv")
print("Dataset:\n", df)

# Encode categorical features


label_encoders = {}
for col in df.columns:
if df[col].dtype == 'object':
le = LabelEncoder()
df[col] = le.fit_transform(df[col])
label_encoders[col] = le

# Features and target


X = df.drop('Play Tennis', axis=1)
y = df['Play Tennis']

Dataset:
Outlook Temperature Humidity Wind Play Tennis
0 Sunny Hot High Weak No
1 Sunny Hot High Strong No
2 Overcast Hot High Weak Yes
3 Rain Mild High Weak Yes
4 Rain Cool Normal Weak Yes
5 Rain Cool Normal Strong No
6 Overcast Cool Normal Strong Yes
7 Sunny Mild High Weak No
8 Sunny Cool Normal Weak Yes
9 Rain Mild Normal Weak Yes
10 Sunny Mild Normal Strong Yes
11 Overcast Mild High Strong Yes
12 Overcast Hot Normal Weak Yes
13 Rain Mild High Strong No

# Train the Decision Tree using ID3 (entropy)


clf = DecisionTreeClassifier(criterion="entropy", random_state=0)
clf.fit(X, y)

# Visualize the decision tree


plt.figure(figsize=(12, 8))
plot_tree(clf, feature_names=X.columns, class_names=label_encoders['Play Tennis'].classes
plt.title("Decision Tree using ID3 (Entropy)")
plt.show()
# Predict on the training data (or a query below)
y_pred = clf.predict(X)
print("\nClassification Report:\n", classification_report(y, y_pred, target_names=label_e

Classification Report:
precision recall f1-score support

No 1.00 1.00 1.00 5


Yes 1.00 1.00 1.00 9

accuracy 1.00 14
macro avg 1.00 1.00 1.00 14
weighted avg 1.00 1.00 1.00 14
query = {
'Outlook': 'Sunny',
'Temperature': 'Cool',
'Humidity': 'High',
'Wind': 'Strong'
}

# Encode the input


query_encoded = [label_encoders[col].transform([query[col]])[0] for col in X.columns]
prediction = clf.predict([query_encoded])
predicted_label = label_encoders['Play Tennis'].inverse_transform(prediction)

print("Prediction for query:", predicted_label[0])

Prediction for query: No


/usr/local/lib/python3.11/dist-packages/sklearn/utils/validation.py:2739: UserWarning
warnings.warn(

 

You might also like