0% found this document useful (0 votes)
16 views41 pages

21-Data Clustering (K-Means Clustering Algorithm), Predictive Analytics-11!04!2023

The document provides an overview of random forests and decision trees. It defines decision trees as data structures that can solve computational problems by organizing nodes and edges to represent attribute tests and outcomes. Decision trees help classify data by posing a series of questions starting at the root node. The document also describes how decision trees are built using a greedy algorithm that recursively splits the training data into purer subsets based on attribute tests until reaching leaf nodes of the same class. It discusses different approaches for splitting nodes based on attribute type, such as binary, nominal, ordinal, or numerical. An example illustrates how the build decision tree algorithm constructs different trees by selecting attributes in different orders.

Uploaded by

Shubham Kodilkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views41 pages

21-Data Clustering (K-Means Clustering Algorithm), Predictive Analytics-11!04!2023

The document provides an overview of random forests and decision trees. It defines decision trees as data structures that can solve computational problems by organizing nodes and edges to represent attribute tests and outcomes. Decision trees help classify data by posing a series of questions starting at the root node. The document also describes how decision trees are built using a greedy algorithm that recursively splits the training data into purer subsets based on attribute tests until reaching leaf nodes of the same class. It discusses different approaches for splitting nodes based on attribute type, such as binary, nominal, ordinal, or numerical. An example illustrates how the build decision tree algorithm constructs different trees by selecting attributes in different orders.

Uploaded by

Shubham Kodilkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Random Forest

&
Decision Tree

1
Basic Concept
 A Decision Tree is an important data structure known to solve many
computational problems

Example 9.1: Binary Decision Tree

A B C f
0 0 0 m0
0 0 1 m1
0 1 0 m2
0 1 1 m3
1 0 0 m4
1 0 1 m5
1 1 0 m6
1 1 1 m7

22
Basic Concept
 In Example 9.1, we have considered a decision tree where values of any
attribute if binary only. Decision tree is also possible where attributes are
of continuous data type

Example 9.2: Decision Tree with numeric data

23
Some Characteristics
 Decision tree may be n-ary, n ≥ 2.

 There is a special node called root node.

 All nodes drawn with circle (ellipse) are called internal nodes.

 All nodes drawn with rectangle boxes are called terminal nodes or leaf
nodes.

 Edges of a node represent the outcome for a value of the node.

 In a path, a node with same label is never repeated.

 Decision tree is not unique, as different ordering of internal nodes can


give different decision tree.

24
Decision Tree and Classification Task
 Decision tree helps us to classify data.

 Internal nodes are some attribute

 Edges are the values of attributes

 External nodes are the outcome of classification

 Such a classification is, in fact, made by posing questions starting from


the root node to each terminal node.

25
Decision Tree and Classification Task
 Example 9.3 illustrates how we can solve a classification
problem by asking a series of question about the attributes.
 Each time we receive an answer, a follow-up question is asked until we
reach a conclusion about the class-label of the test.

 The series of questions and their answers can be organized in


the form of a decision tree
 As a hierarchical structure consisting of nodes and edges

 Once a decision tree is built, it is applied to any test to


classify it.

26
Definition of Decision Tree

Definition 9.1: Decision Tree

Given a database D = 𝑡1 , 𝑡2 , … . . , 𝑡𝑛 , where 𝑡𝑖 denotes a tuple, which is defined by


a set of attribute 𝐴 = 𝐴1 , 𝐴2 , … . . , 𝐴𝑚 . Also, given a set of classes C =
𝑐1 , 𝑐2 , … . . , 𝑐𝑘 .

A decision tree T is a tree associated with D that has the following properties:
• Each internal node is labeled with an attribute Ai
• Each edges is labeled with predicate that can be applied to the attribute
associated with the parent node of it
• Each leaf node is labeled with class cj

27
Building Decision Tree
 In principle, there are exponentially many decision tree that can be
constructed from a given database (also called training data).
 Some of the tree may not be optimum

 Some of them may give inaccurate result

 Two approaches are known

 Greedy strategy
 A top-down recursive divide-and-conquer

 Modification of greedy strategy


 ID3

 C4.5
 CART, etc.

28
Built Decision Tree Algorithm
 Algorithm BuiltDT
 Input: D : Training data set
 Output: T : Decision tree
Steps
1. If all tuples in D belongs to the same class Cj
Add a leaf node labeled as Cj
Return // Termination condition

2. Select an attribute Ai (so that it is not selected twice in the same branch)
3. Partition D = { D1, D2, …, Dp} based on p different values of Ai in D
4. For each Dk ϵ D
Create a node and add an edge between D and Dk with label as the Ai’s attribute value in Dk

5. For each Dk ϵ D
BuildTD(Dk) // Recursive call
6. Stop

29
Node Splitting in BuildDT Algorithm
 BuildDT algorithm must provides a method for expressing an attribute test
condition and corresponding outcome for different attribute type

 Case: Binary attribute


 This is the simplest case of node splitting

 The test condition for a binary attribute generates only two outcomes

30
Node Splitting in BuildDT Algorithm
 Case: Nominal attribute
 Since a nominal attribute can have many values, its test condition can be expressed
in two ways:
 A multi-way split
 A binary split

 Muti-way split: Outcome depends on the number of distinct values for the
corresponding attribute

 Binary splitting by grouping attribute values

31
Node Splitting in BuildDT Algorithm
 Case: Ordinal attribute
 It also can be expressed in two ways:

 A multi-way split
 A binary split

 Muti-way split: It is same as in the case of nominal attribute

 Binary splitting attribute values should be grouped maintaining the order property
of the attribute values

32
Node Splitting in BuildDT Algorithm
 Case: Numerical attribute
 For numeric attribute (with discrete or continuous values), a test condition can be
expressed as a comparison set

 Binary outcome: A >v or A ≤ v

 In this case, decision tree induction must consider all possible split positions

 Range query : vi ≤ A < vi+1 for i = 1, 2, …, q (if q number of ranges are chosen)

 Here, q should be decided a priori

 For a numeric attribute, decision tree induction is a combinatorial


optimization problem

33
Illustration : BuildDT Algorithm
Example 9.4: Illustration of BuildDT Algorithm
 Consider a training data set as shown.

Person Gender Height Class


1 F 1.6 S
2 M 2.0 M
3 F 1.9 M Attributes:
4 F 1.88 M
Gender = {Male(M), Female (F)} // Binary attribute
5 F 1.7 S
Height = {1.5, …, 2.5} // Continuous attribute
6 M 1.85 M
7 F 1.6 S
Class = {Short (S), Medium (M), Tall (T)}
8 M 1.7 S
9 M 2.2 T
10 M 2.1 T
11 F 1.8 M Given a person, we are to test in which class s/he belongs
12 M 1.95 M
13 F 1.9 M
14 F 1.8 M
15 F 1.75 S

34
Illustration : BuildDT Algorithm
 To built a decision tree, we can select an attribute in two different orderings:
<Gender, Height> or <Height, Gender>

 Further, for each ordering, we can choose different ways of splitting

 Different instances are shown in the following.

 Approach 1 : <Gender, Height>

35
Illustration : BuildDT Algorithm

36
Illustration : BuildDT Algorithm
 Approach 2 : <Height, Gender>

37
38
39
40
41

You might also like