We have used J48 trees in our experiments. It is an open source Java implementation of the C4.5 algorithm in the Weka data mining tool.
C4.5 builds decision trees from a set of training data using the concept of information entropy. The training data is a set of already classified samples. Each sample consists of a vector which contains features of the sample, as well as the class in which the sample falls. At each node of the tree, C4.5 chooses the attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. The splitting criterion is the normalized information gain (difference in entropy). The attribute with the highest normalized information gain is chosen to make the decision. The C4.5 algorithm then recurses on the partitioned sublists.
- Simple to understand and interpret
- Important insights can be generated
- Help determine worst, best and expected values for different scenarios
- Use a white box model
- Can be combined with other decision techniques
- They are unstable, a small change in data can lead to a large change in the structure of the optimal decision tree
- They are often relatively inaccurate but random forests of multiple decision trees can give better results
- Calculations can get very complex, particularly if many values are uncertain and/or if many outcomes are linked