0% found this document useful (0 votes)
4 views25 pages

Classification by Decision Tree Induction

Decision tree induction is a method for creating decision trees from class-labeled training data, where each node represents a test on an attribute and each leaf node indicates a class label. The process involves calculating attribute selection measures like information gain to determine the best splitting criteria for the data. The document discusses the historical development of decision tree algorithms, including ID3 and CART, and provides examples of how to calculate information gain for various attributes.

Uploaded by

anilatthuluru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views25 pages

Classification by Decision Tree Induction

Decision tree induction is a method for creating decision trees from class-labeled training data, where each node represents a test on an attribute and each leaf node indicates a class label. The process involves calculating attribute selection measures like information gain to determine the best splitting criteria for the data. The document discusses the historical development of decision tree algorithms, including ID3 and CART, and provides examples of how to calculate information gain for various attributes.

Uploaded by

anilatthuluru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

CLASSIFICATION BY DECISION TREE INDUCTION

Decision tree induction is the learning of decision trees from class-labeled training tuples.

A decision tree is a flowchart-like tree structure, where each internal node (non-leaf node) denotes a test on an attribute,

each branch represents an outcome of the test, and each leaf node (or terminal node) holds a class label.

The topmost node in a tree is the root node.


A decision tree for the concept buys computer, indicating

whether a customer at AllElectronics is likely to

purchase a computer.

Each internal (nonleaf) node represents a test on an

attribute.

Each leaf node represents a class (either buys computer

= yes or buys computer = no).


Decision Tree Induction

During the late 1970s and early 1980s, J. Ross Quinlan, a researcher in machine learning, developed a decision tree

algorithm known as ID3 (Iterative Dichotomiser).

This work expanded on earlier work on concept learning systems, described by E. B. Hunt, J. Marin, and P. T. Stone.

Quinlan later presented C45.

In 1984, a group of statisticians (L. Breiman, J. Friedman, R. Olshen, and C. Stone) published the book Classification

and Regression Trees (CART), which described the generation of binary decision trees.
A is discrete-valued:

In this case, the outcomes of the test at node N correspond directly to the known values of A.

A branch is created for each known value, aj, of A and labeled with that value.

Partition Dj is the subset of class-labeled tuples in D having value aj of A.

Because all of the tuples in a given partition have the same value for A, then A need not be considered in any

future partitioning of the tuples.


A is continuous-valued

In this case, the test at node N has two possible outcomes, corresponding to the conditions A <= split point and A >

split point, respectively,


A is discrete-valued and a binary tree must be produced

The test at node N is of the form “A∈SA?”. SA is the splitting subset for A, returned by Attribute selection

method as part of the splitting criterion. It is a subset of the known values of A. If a given tuple has value

aj of A and if aj∈SA, then the test at node N is satisfied


Attribute Selection Measures

An attribute selection measure is a heuristic for selecting the splitting criterion that “best” separates a given data

partition, D, of class-labeled training tuples into individual classes.

Attribute selectionmeasures are also known as splitting rules because they determine how the tuples at a given

node are to be split.

This section describes three popular attribute selection measures—

1. Information gain

2. Gain ratio

3. Gini index.
Information gain

The following table represents a training set, D, of class-labeled tuples randomly selected from the AllElectronics

customer database
 Int the above example, each attribute is discrete-valued.

 Continuous valued attributes have been generalized.

 The class label attribute, buys computer, has two distinct values (namely{yes, no}); therefore, there are two distinct

classes (that is, m=2).

 Let class c1 correspond to ‘yes’ and class c2 correspond to ‘no’.

 There are nine tuples of class ‘yes’ and five tuples of class ‘no’.

 A(root) node N is created for the tuples in D.


To find the splitting criterion for these tuples, we must compute the information gain of each attribute.

The expected information needed to classify a tuple in D is given by


Calculate information gain for age:
Age can be:
Calculate entropy for youth:
 youth

 middle_aged

 senior

YOUTH
Youth Class:buys_computer
Yes 2
No 3

entropy for youth= -(2/5)-(3/5)


Calculate entropy for middle_aged: Calculate entropy for senior:

MIDDLE_AGED SENIOR

middle_aged Class:buys_computer senior Class:buys_computer


Yes 4 Yes 3
No 0 No 2

entropy for middle_aged= -(4/4)-(0/4) entropy for senior= -(3/5)-(2/5)


the expected information needed to classify a tuple in D if the tuples are partitioned according to age is

Hence, the gain in information from such a partitioning would be


Calculate information gain for income:
Income can be:
Calculate entropy for low:
 low

 medium

 high

LOW
Low Class:buys_computer
Yes 3
No 1

entropy for low= -(3/4)-(3/4)


Calculate entropy for medium: Calculate entropy for high:

MEDIUM HIGH

medium Class:buys_computer high Class:buys_computer


Yes 4 Yes 2
No 2 No 2

entropy for medium= -(4/6)-(2/6) entropy for high= -(2/4)-(2/4)


the expected information needed to classify a tuple in D if the tuples are partitioned according to income is

= (4/14)X(-(3/4)-(3/4))

+(6/14)X(-(4/6)-(2/6))

+(4/14)X(-(2/4)-(2/4))

= 0.911 bits

Hence, the gain in information from such a partitioning would be

Gain(income)= =0.94-0.911=0.029
Calculate information gain for student:

student can be:

 yes

 no
Calculate entropy for yes: Calculate entropy for no:

YES NO

Yes Class:buys_computer No Class:buys_computer


Yes 6 Yes 3
No 1 No 4

entropy for Yes= -(6/7)-(1/7) entropy for No= -(3/7)-(4/7)


the expected information needed to classify a tuple in D if the tuples are partitioned according to student is

= (7/14)X(-(6/7)-(1/7))

+(7/14)X(-(3/7)-(4/7))

= 0.789 bits

Hence, the gain in information from such a partitioning would be

Gain(income)= =0.94-0.789=0.151
Calculate information gain for credit_rating:

Credit_rating can be:

 fair

 excellent
Calculate entropy for fair: Calculate entropy for excellent:

FAIR EXCELLENT

Fair Class:buys_computer Excellent Class:buys_computer


Yes 6 Yes 3
No 2 No 3

entropy for fair= -(6/8)-(2/8) entropy for excellent= -(3/6)-(3/6)


the expected information needed to classify a tuple in D if the tuples are partitioned according to credit_rating is

= (8/14)X(-(6/8)-(2/8))

+(6/14)X(-(3/6)-(3/6))

= 0.892 bits

Hence, the gain in information from such a partitioning would be

Gain(income)= =0.94-0.892=0.048
Because age has the highest information gain among the attributes, it is selected as the splitting attribute.

Node N is labeled with age, and branches are grown for each of the attribute’s values.

The tuples are then partitioned accordingly.

Notice that the tuples falling into the partition for age = middle_aged all belong to the same class.

Because they all belong to class “yes,” a leaf should therefore be created at the end of this branch and labeled with “yes.”
The attribute age has the highest information gain and therefore becomes the splitting attribute at the root node of the decision

tree. Branches are grown for each outcome of age. The tuples are shown partitioned accordingly.
The final decision tree returned by the algorithm is shown in below figure

You might also like