Decision Tree
Decision Tree
Since the information gain of f2 is greater than that of f1, the split will
be done with respect to that of f2.
Mini version of Play Tennis dataset
Information Gain for the attribute “Outlook”
• Step 1: Calculate the Entropy of the Dataset
9 2 5 1 9 9 5 5
G(X) = 1− + = 0.45 ➔ 𝐺 𝑋 = 1− + 1− = 0.45
14 14 14 14 14 14
• Numerically, it turns out that gini impurity and entropy are very similar.
Gini is preferred as we don’t have to find log, so we can speed up our
implementation.
• Evaluate Splits for each feature: Outlook, Temperature, Humidity, and
Wind.
𝐼𝐺 𝑂𝑢𝑡𝑙𝑜𝑜𝑘 = 0.45 − 0.34 = 0.11 IG is maximized for Outlook so we will select
𝐼𝐺 𝑇𝑒𝑚𝑝𝑎𝑟𝑎𝑡𝑢𝑟𝑒 = 0.04
𝐼𝐺 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.09
Outlook as the node at this level.
𝐼𝐺 𝑊𝑖𝑛𝑑 = 0.03
IG is maximum for Outlook, Outlook will be Root
Information Gain on Sunny outlook factor
Now focusing on the left branch Outlook = Sunny
2 2 3 3
𝐺 𝑋 𝑂 = 𝑆𝑢𝑛𝑛𝑦 = 1− + 1− = 0.48
5 5 5 5
2 2
2 2
2 3
𝐺 𝑋 𝑂 = 𝑆𝑢𝑛𝑛𝑦 = 1 − 𝑝1 + 𝑝2 =1− + = 0.48
5 5
• Choosing Humidity (H) as our next node, the Gini impurity of PlayTennis
given Humidity and Outlook being Sunny is:
• Subset 1: High (3 Samples: 0 Yes, 3 No)
• Subset 2: Normal (2 Samples: 2 Yes, 0 No)
3 3 3 2 2 2
• 𝐺 𝑋 𝐻, 𝑂 = 𝑆𝑢𝑛𝑛𝑦 = 1 − + 1− =0
5 3 3 5 2 2
• 𝐼𝐺 𝑋 𝐻, 𝑂 = 𝑆𝑢𝑛𝑛𝑦 = 0.48 − 0 = 0.48
• Similarly we can compute,
• 𝐼𝐺 𝑋 𝑂 = 𝑆𝑢𝑛𝑛𝑦, 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.48
• 𝐼𝐺(𝑋|𝑂 = 𝑆𝑢𝑛𝑛𝑦, 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒) = 0.28
• 𝐼𝐺(𝑋|𝑂 = 𝑆𝑢𝑛𝑛𝑦, 𝑊𝑖𝑛𝑑) = 0.013
• IG is maximized for Humidity so we will select Humidity as the node at
this level.
• Our tree will look like:
Information Gain on Rain outlook factor
Day Wind Decision
4 Weak Yes
5 Weak Yes
10 Weak Yes
2 2 3 2 2 2
• 𝐺 𝑋 𝑂 = 𝑅𝑎𝑖𝑛 = 1 − 𝑝1 + 𝑝2 =1− + = 0.48
5 5
• Choosing Humidity (H) as our next node, the Gini impurity of Play
Tennis given Humidity and Outlook being Rain is:
• Subset 1: Weak (3 Samples: 3 Yes, 0 No)
• Subset 2: Strong (2 Samples: 0 Yes, 2 No)
3 3 3 2 2 2
• 𝐺 𝑋 𝑊, 𝑂 = 𝑅𝑎𝑖𝑛 = 1− + 1 − =0
5 3 3 5 2 2
• 𝐼𝐺 𝑋 𝑊, 𝑂 = 𝑅𝑎𝑖𝑛 = 0.48 − 0 = 0.48
Choosing Humidity (H) and Temperature (T)with
Outlook being Rain is:
• Similarly we can compute,
• 𝐼𝐺 𝑋 𝑂 = 𝑅𝑎𝑖𝑛, 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.48 − 0.46 = 0.02
• 𝐼𝐺 𝑋 𝑂 = 𝑅𝑎𝑖𝑛, 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 0.48 − 0.46 = 0.02
• 𝐼𝐺 𝑋 𝑂 = 𝑅𝑎𝑖𝑛, 𝑊𝑖𝑛𝑑 = 0.48 − 0 = 0.48
• IG is maximized for Wind so we will select Wind as the node at this
level.
Final Decision Tree
Test Instance
• For test instance, we just need to start from the root node and traverse
to a leaf node.
• For example, if we had a sample like: Outlook = Sunny; Humidity =
Normal; Wind = Strong; Temperature = Mild.
• We will first start from Outlook and as it is Sunny so we go to the left
branch, then we check Humidity and as that is Normal, we go to the
right branch and reach a leaf node and our output is Yes!
• Final decision, we will play tennis.
Overfitting of Decision Tree
• Decision trees can easily overfit, if we have too few examples near the
bottom or our tree is too deep then we could end up overfitting to
our training data.
• In such situations we can use stopping criteria's like maximum depth,
etc.
• We can also use pruning where we remove a subtree and use
majority voting at that node instead.
• Intuitively decision trees are relatively easier to understand as
compared to other classification algorithms.
• That is why they are widely used in the industry.
• import pandas as pd
• from sklearn.tree import DecisionTreeClassifier
• from sklearn.model_selection import train_test_split
• from sklearn.metrics import accuracy_score
• df = pd.DataFrame(data)
• # Encode categorical variables
• df_encoded = pd.get_dummies(df, columns=["Outlook", "Temperature", "Humidity", "Wind"], drop_first=True)
• X = df_encoded.drop("Play Tennis", axis=1)
• y = df["Play Tennis"]
• # Compute accuracy
• accuracy = accuracy_score(y_test, y_pred)
• print("Accuracy of the Decision Tree: {:.2f}%".format(accuracy * 100))