21-Data Clustering (K-Means Clustering Algorithm), Predictive Analytics-11!04!2023
21-Data Clustering (K-Means Clustering Algorithm), Predictive Analytics-11!04!2023
&
Decision Tree
1
Basic Concept
A Decision Tree is an important data structure known to solve many
computational problems
A B C f
0 0 0 m0
0 0 1 m1
0 1 0 m2
0 1 1 m3
1 0 0 m4
1 0 1 m5
1 1 0 m6
1 1 1 m7
22
Basic Concept
In Example 9.1, we have considered a decision tree where values of any
attribute if binary only. Decision tree is also possible where attributes are
of continuous data type
23
Some Characteristics
Decision tree may be n-ary, n ≥ 2.
All nodes drawn with circle (ellipse) are called internal nodes.
All nodes drawn with rectangle boxes are called terminal nodes or leaf
nodes.
24
Decision Tree and Classification Task
Decision tree helps us to classify data.
25
Decision Tree and Classification Task
Example 9.3 illustrates how we can solve a classification
problem by asking a series of question about the attributes.
Each time we receive an answer, a follow-up question is asked until we
reach a conclusion about the class-label of the test.
26
Definition of Decision Tree
A decision tree T is a tree associated with D that has the following properties:
• Each internal node is labeled with an attribute Ai
• Each edges is labeled with predicate that can be applied to the attribute
associated with the parent node of it
• Each leaf node is labeled with class cj
27
Building Decision Tree
In principle, there are exponentially many decision tree that can be
constructed from a given database (also called training data).
Some of the tree may not be optimum
Greedy strategy
A top-down recursive divide-and-conquer
C4.5
CART, etc.
28
Built Decision Tree Algorithm
Algorithm BuiltDT
Input: D : Training data set
Output: T : Decision tree
Steps
1. If all tuples in D belongs to the same class Cj
Add a leaf node labeled as Cj
Return // Termination condition
2. Select an attribute Ai (so that it is not selected twice in the same branch)
3. Partition D = { D1, D2, …, Dp} based on p different values of Ai in D
4. For each Dk ϵ D
Create a node and add an edge between D and Dk with label as the Ai’s attribute value in Dk
5. For each Dk ϵ D
BuildTD(Dk) // Recursive call
6. Stop
29
Node Splitting in BuildDT Algorithm
BuildDT algorithm must provides a method for expressing an attribute test
condition and corresponding outcome for different attribute type
The test condition for a binary attribute generates only two outcomes
30
Node Splitting in BuildDT Algorithm
Case: Nominal attribute
Since a nominal attribute can have many values, its test condition can be expressed
in two ways:
A multi-way split
A binary split
Muti-way split: Outcome depends on the number of distinct values for the
corresponding attribute
31
Node Splitting in BuildDT Algorithm
Case: Ordinal attribute
It also can be expressed in two ways:
A multi-way split
A binary split
Binary splitting attribute values should be grouped maintaining the order property
of the attribute values
32
Node Splitting in BuildDT Algorithm
Case: Numerical attribute
For numeric attribute (with discrete or continuous values), a test condition can be
expressed as a comparison set
In this case, decision tree induction must consider all possible split positions
Range query : vi ≤ A < vi+1 for i = 1, 2, …, q (if q number of ranges are chosen)
33
Illustration : BuildDT Algorithm
Example 9.4: Illustration of BuildDT Algorithm
Consider a training data set as shown.
34
Illustration : BuildDT Algorithm
To built a decision tree, we can select an attribute in two different orderings:
<Gender, Height> or <Height, Gender>
35
Illustration : BuildDT Algorithm
36
Illustration : BuildDT Algorithm
Approach 2 : <Height, Gender>
37
38
39
40
41