Classification by Decision Tree Induction
Classification by Decision Tree Induction
Decision tree induction is the learning of decision trees from class-labeled training tuples.
A decision tree is a flowchart-like tree structure, where each internal node (non-leaf node) denotes a test on an attribute,
each branch represents an outcome of the test, and each leaf node (or terminal node) holds a class label.
purchase a computer.
attribute.
During the late 1970s and early 1980s, J. Ross Quinlan, a researcher in machine learning, developed a decision tree
This work expanded on earlier work on concept learning systems, described by E. B. Hunt, J. Marin, and P. T. Stone.
In 1984, a group of statisticians (L. Breiman, J. Friedman, R. Olshen, and C. Stone) published the book Classification
and Regression Trees (CART), which described the generation of binary decision trees.
A is discrete-valued:
In this case, the outcomes of the test at node N correspond directly to the known values of A.
A branch is created for each known value, aj, of A and labeled with that value.
Because all of the tuples in a given partition have the same value for A, then A need not be considered in any
In this case, the test at node N has two possible outcomes, corresponding to the conditions A <= split point and A >
The test at node N is of the form “A∈SA?”. SA is the splitting subset for A, returned by Attribute selection
method as part of the splitting criterion. It is a subset of the known values of A. If a given tuple has value
An attribute selection measure is a heuristic for selecting the splitting criterion that “best” separates a given data
Attribute selectionmeasures are also known as splitting rules because they determine how the tuples at a given
1. Information gain
2. Gain ratio
3. Gini index.
Information gain
The following table represents a training set, D, of class-labeled tuples randomly selected from the AllElectronics
customer database
Int the above example, each attribute is discrete-valued.
The class label attribute, buys computer, has two distinct values (namely{yes, no}); therefore, there are two distinct
There are nine tuples of class ‘yes’ and five tuples of class ‘no’.
middle_aged
senior
YOUTH
Youth Class:buys_computer
Yes 2
No 3
MIDDLE_AGED SENIOR
medium
high
LOW
Low Class:buys_computer
Yes 3
No 1
MEDIUM HIGH
= (4/14)X(-(3/4)-(3/4))
+(6/14)X(-(4/6)-(2/6))
+(4/14)X(-(2/4)-(2/4))
= 0.911 bits
Gain(income)= =0.94-0.911=0.029
Calculate information gain for student:
yes
no
Calculate entropy for yes: Calculate entropy for no:
YES NO
= (7/14)X(-(6/7)-(1/7))
+(7/14)X(-(3/7)-(4/7))
= 0.789 bits
Gain(income)= =0.94-0.789=0.151
Calculate information gain for credit_rating:
fair
excellent
Calculate entropy for fair: Calculate entropy for excellent:
FAIR EXCELLENT
= (8/14)X(-(6/8)-(2/8))
+(6/14)X(-(3/6)-(3/6))
= 0.892 bits
Gain(income)= =0.94-0.892=0.048
Because age has the highest information gain among the attributes, it is selected as the splitting attribute.
Node N is labeled with age, and branches are grown for each of the attribute’s values.
Notice that the tuples falling into the partition for age = middle_aged all belong to the same class.
Because they all belong to class “yes,” a leaf should therefore be created at the end of this branch and labeled with “yes.”
The attribute age has the highest information gain and therefore becomes the splitting attribute at the root node of the decision
tree. Branches are grown for each outcome of age. The tuples are shown partitioned accordingly.
The final decision tree returned by the algorithm is shown in below figure