C4.
5 Algorithm in Decision Trees
Explanation with Numerical Example
C4.5 Algorithm in Decision Trees
• C4.5 algorithm, developed by Ross Quinlan, is an enhanced
version of the ID3 algorithm used for building decision trees.
• It introduces advanced features to handle continuous data,
missing values, and overfitting.
Key Features of C4.5
• Supports both categorical and numerical attributes
• Uses Information Gain Ratio to avoid bias
• Employs post-pruning for better generalization
• Handles missing values
• Can generate decision rules from the tree
C4.5 Algorithm Steps
1. Select the attribute with the highest gain ratio
2. Create a decision node for the chosen attribute
3. Split the dataset accordingly
4. Recur for each subset until:
All instances belong to the same class
No attributes remain
No instances remain
Outlook Temperature Humidity Windy Play Tennis
Sunny 85 85 No No
Sunny 80 90 Yes No
Overcast 83 78 No Yes
Rainy 70 96 No Yes
Rainy 68 80 No Yes
Rainy 65 70 Yes No
Overcast 64 65 Yes Yes
Sunny 72 95 No No
Sunny 69 70 No Yes
Rainy 75 80 No Yes
Sunny 75 70 Yes Yes
Overcast 72 90 Yes Yes
Overcast 81 75 No Yes
Rainy 71 80 Yes No
C4.5 vs ID3
Feature ID3 C4.5
Splitting Criterion Information Gain Gain Ratio
Categorical +
Attribute Types Categorical only Numerical
Pruning Not available Post-pruning
Missing Data Handling Not supported Supported
Rule Extraction Not supported Supported
Summary
• C4.5 is a robust and practical decision tree
algorithm
• Handles complex data types and overfitting
• Builds accurate and interpretable decision
trees