0% found this document useful (0 votes)
0 views5 pages

ML 4

The document outlines an experiment aimed at creating a decision tree using the ID3 algorithm, detailing the structure and components of decision trees. It explains the ID3 algorithm's steps, including calculating entropy and information gain to select the best attribute for splitting the dataset. Additionally, it provides code examples for implementing decision trees with the Iris dataset and a Play Tennis dataset, demonstrating data encoding, model training, and prediction.

Uploaded by

samv0294
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views5 pages

ML 4

The document outlines an experiment aimed at creating a decision tree using the ID3 algorithm, detailing the structure and components of decision trees. It explains the ID3 algorithm's steps, including calculating entropy and information gain to select the best attribute for splitting the dataset. Additionally, it provides code examples for implementing decision trees with the Iris dataset and a Play Tennis dataset, demonstrating data encoding, model training, and prediction.

Uploaded by

samv0294
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Experiment 4

Aim: to create a decision tree for the given dataset using the ID3 algorithm

Theory:
Decision Tree:
A decision tree is a supervised learning algorithm used for classification and regression tasks. It is structured like a
flowchart, consisting of a root node, internal decision nodes, branches, and leaf nodes.

• Root Node: The starting point of the tree, representing the entire dataset.
• Internal Nodes: Represent tests or decisions based on attributes.
• Branches: Indicate the outcomes of these tests.
• Leaf Nodes: Represent final decisions or predictions.

The tree works by recursively splitting the dataset into subsets based on attribute values, aiming to create homogenous
groups at the leaf nodes. Algorithms like ID3, C4.5, and CART use metrics such as information gain, gain ratio, or Gini
index to decide the best splits. Decision trees are popular for their interpretability and ability to handle both categorical
and continuous data.

Id3:
The ID3 algorithm is a decision tree-building method that uses a greedy, top-down approach to classify data. It selects
attributes based on information gain, a measure derived from entropy, to create splits that maximize classification
accuracy.
Steps of the ID3 Algorithm:
1. Start with the Root Node:
• Begin with the entire dataset as the root node.
2. Calculate Entropy and Information Gain:
• Compute the entropy of the dataset to measure impurity.
• For each attribute, calculate the information gain by splitting the dataset based on its values.
3. Select Best Attribute:
• Choose the attribute with the highest information gain as the node for splitting.
4. Partition Data:
• Split the dataset into subsets based on the selected attribute's values.
5. Create Child Nodes:
• For each subset, create a child node and repeat steps 2–4 recursively.
6. Stop Recursion:
• Stop when all instances in a subset belong to the same class, no attributes remain, or no examples are
left in a subset.
• Label leaf nodes with the majority class if stopping conditions are met before pure classification.
Code:
from sklearn.datasets import load_iris from

sklearn.model_selection import train_test_split from

sklearn.tree import DecisionTreeClassifier from

sklearn import tree import matplotlib.pyplot as plt


# Load dataset

data = load_iris()

X = data.data y

= data.target

# Split the dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Build Decision Tree using ID3 (criterion='entropy') model

= DecisionTreeClassifier(criterion='entropy')

model.fit(X_train, y_train)

# Predict a sample
sample = [X_test[0]] predicted = model.predict(sample)

print("Predicted class for sample:", data.target_names[predicted[0]])

# Accuracy on test data accuracy =

model.score(X_test, y_test)

print("Model accuracy:", accuracy)

Output:

Play Tennis:

Code:
from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import accuracy_score import

pandas as pd

# Sample Dataset: Play Tennis data

={

'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rain', 'Rain', 'Rain', 'Overcast', 'Sunny', 'Sunny', 'Rain', 'Sunny',
'Overcast', 'Overcast', 'Rain'],

'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild', 'Mild', 'Mild', 'Hot',
'Mild'],

'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'Normal', 'Normal',
'High', 'Normal', 'High'],
'Wind': ['Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong',
'Weak', 'Strong'],

'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']

df = pd.DataFrame(data)

# Encode categorical features

encoders = {} for column in

df.columns:

encoders[column] = LabelEncoder() df[column] =

encoders[column].fit_transform(df[column])
# Features and target

X = df.drop('PlayTennis', axis=1) y

= df['PlayTennis']

# Split dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train using entropy (C4.5-like) clf =

DecisionTreeClassifier(criterion='entropy')

clf.fit(X_train, y_train)

# Accuracy on test set y_pred

= clf.predict(X_test)

print("Accuracy on test set:",

accuracy_score(y_test,

y_pred))

# Classify a new sample

# Sample: Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong sample

= pd.DataFrame([{

'Outlook': encoders['Outlook'].transform(['Sunny'])[0],

'Temperature': encoders['Temperature'].transform(['Cool'])[0],

'Humidity': encoders['Humidity'].transform(['High'])[0],

'Wind': encoders['Wind'].transform(['Strong'])[0]

}])
prediction = clf.predict(sample)[0] result =

encoders['PlayTennis'].inverse_transform([prediction])[0]

print("New Sample: Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong")

print("Prediction:", result) Output:

You might also like