0% found this document useful (0 votes)

136 views25 pages

Decision Tree

Decision trees are a popular classification method that represent rules for classifying data. They work by splitting the data into purer subsets based on attribute values, starting with the root node and moving through the tree until a leaf node with a class prediction is reached. The splits are determined by choosing the attribute that creates the largest information gain or reduction in entropy at each node. Decision trees are easy to understand but can overfit data and have difficulty with continuous attributes.

Uploaded by

ricardo_g1973

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

136 views25 pages

Decision Tree

Uploaded by

ricardo_g1973

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 25

Decision Tree Algorithm

Comp328 tutorial 1 Kai Zhang

Outline

Introduction Example Principles

Entropy Information gain

Evaluations Demo

The problem

Given a set of training cases/objects and their attribute values, try to determine the target attribute value of new examples.

Classification Prediction

Why decision tree?

Decision trees are powerful and popular tools for classification and prediction. Decision trees represent rules, which can be understood by humans and used in knowledge system such as database.

key requirements

Attribute-value description: object or case must be

expressible in terms of a fixed collection of properties or attributes (e.g., hot, mild, cold).

Predefined classes (target values): the target

function has discrete output values (bollean or multiclass)

Sufficient data: enough training cases should be

provided to learn the model.

A simple example

You want to guess the outcome of next week's game between the MallRats and the Chinooks.

Available knowledge / Attribute

was the game at Home or Away was the starting time 5pm, 7pm or 9pm. Did Joe play center, or forward. whether that opponent's center was tall or not. ..

Basket ball data

What we know

The game will be away, at 9pm, and that Joe will play center on offense

A classification problem Generalizing the learned rule to new examples

Definition

Decision tree is a classifier in the form of a tree structure

Decision node: specifies a test on a single attribute Leaf node: indicates the value of the target attribute Arc/edge: split of one attribute Path: a disjunction of test to make the final decision

Decision trees classify instances or examples by starting at the root of the tree and moving through it until a leaf node.

Illustration

(1) Which to start? (root)

(2) Which node to proceed?

(3) When to stop/ come to conclusion?

Random split

The tree can grow huge These trees are hard to understand. Larger trees are typically less accurate than smaller trees.

Principled Criterion

Selection of an attribute to test at each node choosing the most useful attribute for classifying examples. information gain

measures how well a given attribute separates the training examples according to their target classification This measure is used to select among the candidate attributes at each step while growing the tree

Entropy

A measure of homogeneity of the set of examples. Given a set S of positive and negative examples of some target concept (a 2-class problem), the entropy of set S relative to this binary classification is E(S) = - p(P)log2 p(P) p(N)log2 p(N)

Suppose S has 25 examples, 15 positive and 10 negatives [15+, 10-]. Then the entropy of S relative to this classification is E(S)=-(15/25) log2(15/25) - (10/25) log2 (10/25)

Some Intuitions

The entropy is 0 if the outcome is ``certain. The entropy is maximum if we have no knowledge of the system (or any outcome is equally possible).

Entropy of a 2-class problem with regard to the portion of one of the two groups

Information Gain

Information gain measures the expected reduction in entropy, or uncertainty.

Gain( S , A) Entropy(S )

vValues ( A)

Sv Entropy(Sv ) S

Values(A) is the set of all possible values for attribute A, and Sv the subset of S for which attribute A has value v Sv = {s in S | A(s) = v}. the first term in the equation for Gain is just the entropy of the original collection S the second term is the expected value of the entropy after S is partitioned using attribute A

It is simply the expected reduction in entropy caused by partitioning the examples according to this attribute. It is the number of bits saved when encoding the target value of an arbitrary member of S, by knowing the value of attribute A.

Examples

Before partitioning, the entropy is

H(10/20, 10/20) = - 10/20 log(10/20) - 10/20 log(10/20) = 1 Entropy of the first set H(home) = - 6/12 log(6/12) - 6/12 log(6/12) = 1 Entropy of the second set H(away) = - 4/8 log(6/8) - 4/8 log(4/8) = 1 12/20 * H(home) + 8/20 * H(away) = 1

Using the ``where attribute, divide into 2 subsets

Expected entropy after partitioning

Using the ``when attribute, divide into 3 subsets

Entropy of the first set H(5pm) = - 1/4 log(1/4) - 3/4 log(3/4); Entropy of the second set H(7pm) = - 9/12 log(9/12) - 3/12 log(3/12); Entropy of the second set H(9pm) = - 0/4 log(0/4) - 4/4 log(4/4) = 0

Expected entropy after partitioning

4/20 * H(1/4, 3/4) + 12/20 * H(9/12, 3/12) + 4/20 * H(0/4, 4/4) = 0.65 Information gain 1-0.65 = 0.35

Decision

Knowing the ``when attribute values provides larger information gain than ``where. Therefore the ``when attribute should be chosen for testing prior to the ``where attribute. Similarly, we can compute the information gain for other attributes. At each node, choose the attribute with the largest information gain.

Stopping rule

Every attribute has already been included along this path through the tree, or The training examples associated with this leaf node all have the same target attribute value (i.e., their entropy is zero).

Demo

Continuous Attribute?

Each non-leaf node is a test, its edge partitioning the attribute into subsets (easy for discrete attribute). For continuous attribute

Partition the continuous value of attribute A into a discrete set of intervals Create a new boolean attribute Ac , looking for a threshold c,

true if Ac c Ac false otherwise

How to choose c ?

Evaluation

Training accuracy

How many training instances can be correctly classify based on the available data? Is high when the tree is deep/large, or when there is less confliction in the training instances. however, higher training accuracy does not mean good generalization Given a number of new instances, how many of them can we correctly classify? Cross validation

Testing accuracy

Strengths

can generate understandable rules perform classification without much computation can handle continuous and categorical variables provide a clear indication of which fields are most important for prediction or classification

Weakness

Not suitable for prediction of continuous attribute. Perform poorly with many class and small data. Computationally expensive to train.

At each node, each candidate splitting field must be sorted before its best split can be found. In some algorithms, combinations of fields are used and a search must be made for optimal combining weights. Pruning algorithms can also be expensive since many candidate sub-trees must be formed and compared.

Do not treat well non-rectangular regions.

Test Bank For Business Analytics 3rd Edition by Evans
No ratings yet
Test Bank For Business Analytics 3rd Edition by Evans
28 pages
DataMiningForTheMasses (001 158)
No ratings yet
DataMiningForTheMasses (001 158)
158 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Types of Analytics
No ratings yet
Types of Analytics
10 pages
Support Vector Machine
No ratings yet
Support Vector Machine
12 pages
Fitting & Interpreting Linear Models in Rinear Models in R
100% (1)
Fitting & Interpreting Linear Models in Rinear Models in R
8 pages
Predictive Modeling Project Report
100% (2)
Predictive Modeling Project Report
31 pages
Data Mining in Medicine
No ratings yet
Data Mining in Medicine
42 pages
Week 1 Analytics in Practice
100% (2)
Week 1 Analytics in Practice
12 pages
Assignment 1&2
No ratings yet
Assignment 1&2
4 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
No ratings yet
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
9 pages
For Power BI Installation:: Get Data: To Get The Data From Different Sources Like CSV, Excel, Test, SQL, Access Etc..
No ratings yet
For Power BI Installation:: Get Data: To Get The Data From Different Sources Like CSV, Excel, Test, SQL, Access Etc..
11 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
209 pages
Basic Python
No ratings yet
Basic Python
111 pages
Business Analytics and Big Data PDF
100% (1)
Business Analytics and Big Data PDF
15 pages
The Box-Jenkins Methodology For RIMA Models
No ratings yet
The Box-Jenkins Methodology For RIMA Models
172 pages
Data Science With R - Course Materials
No ratings yet
Data Science With R - Course Materials
25 pages
Power BI - Exam Prep - 29 - 3
No ratings yet
Power BI - Exam Prep - 29 - 3
40 pages
Row-Level Security (RLS) and Data Permissions - PowerBI
No ratings yet
Row-Level Security (RLS) and Data Permissions - PowerBI
11 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Sqlserver Toturial
No ratings yet
Sqlserver Toturial
3,386 pages
Topic 1 Etw3482
100% (2)
Topic 1 Etw3482
69 pages
Data Analysis
No ratings yet
Data Analysis
17 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Fundamentals of Predictive Analytics A Business Analytics Course
No ratings yet
Fundamentals of Predictive Analytics A Business Analytics Course
36 pages
Statistical Infrences Lec 1
No ratings yet
Statistical Infrences Lec 1
35 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Power BI Notes
No ratings yet
Power BI Notes
16 pages
Applied Statistics: Assessment Tasks
No ratings yet
Applied Statistics: Assessment Tasks
4 pages
Data Science: Concepts and Practice: Course Slides
No ratings yet
Data Science: Concepts and Practice: Course Slides
9 pages
Chapter 1 Data Analysis
No ratings yet
Chapter 1 Data Analysis
18 pages
New Batches Info: Quality Thought Ai-Data Science Diploma
No ratings yet
New Batches Info: Quality Thought Ai-Data Science Diploma
16 pages
K Means R and Rapid Miner Patient and Mall Case Study
No ratings yet
K Means R and Rapid Miner Patient and Mall Case Study
80 pages
Fundamentals of Business Analytics
No ratings yet
Fundamentals of Business Analytics
5 pages
AnalytixLabs - Data Science With Python
No ratings yet
AnalytixLabs - Data Science With Python
13 pages
Data Analytics in Hospitality Industry
No ratings yet
Data Analytics in Hospitality Industry
13 pages
Heart Prediction
No ratings yet
Heart Prediction
15 pages
8 Power BI
No ratings yet
8 Power BI
20 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
2nd Unit - 2.2 - Data Analytics
No ratings yet
2nd Unit - 2.2 - Data Analytics
22 pages
DataScience Unit1 (+notes)
No ratings yet
DataScience Unit1 (+notes)
56 pages
Rapid Minder Assignment
No ratings yet
Rapid Minder Assignment
38 pages
Regression - Elements of AI 4-2
100% (2)
Regression - Elements of AI 4-2
20 pages
Logistic Regression
100% (2)
Logistic Regression
30 pages
Netflix Data Science Interview Question
No ratings yet
Netflix Data Science Interview Question
7 pages
MicroStrategy Tutorial Documentation
No ratings yet
MicroStrategy Tutorial Documentation
18 pages
Fourth Edition: Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization
No ratings yet
Fourth Edition: Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization
66 pages
B.SC Statistics
No ratings yet
B.SC Statistics
16 pages
Chart Handout
No ratings yet
Chart Handout
9 pages
Msbi Developer (SSRS, Ssas, Ssis) : Advanced Level
100% (1)
Msbi Developer (SSRS, Ssas, Ssis) : Advanced Level
4 pages
Mastering Machine Learning With Scikit-Learn: Chapter No. 5 "Nonlinear Classification and Regression With Decision Trees"
No ratings yet
Mastering Machine Learning With Scikit-Learn: Chapter No. 5 "Nonlinear Classification and Regression With Decision Trees"
23 pages
Linear Regression Chap01
100% (1)
Linear Regression Chap01
7 pages
Statistical Modeling
No ratings yet
Statistical Modeling
22 pages
Data Science
100% (1)
Data Science
7 pages
Excel 2013/2016: Get Your Hands Dirty
From Everand
Excel 2013/2016: Get Your Hands Dirty
Sam Akrasi
No ratings yet
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
Decision Tree
No ratings yet
Decision Tree
20 pages

Decision Tree

Uploaded by

Decision Tree

Uploaded by

Decision Tree Algorithm

Comp328 tutorial 1 Kai Zhang

Introduction Example Principles

Entropy Information gain

Why decision tree?

Attribute-value description: object or case must be

Predefined classes (target values): the target

Sufficient data: enough training cases should be

Available knowledge / Attribute

Basket ball data

A classification problem Generalizing the learned rule to new examples

Decision tree is a classifier in the form of a tree structure

(1) Which to start? (root)

(2) Which node to proceed?

(3) When to stop/ come to conclusion?

Information gain measures the expected reduction in entropy, or uncertainty.

Before partitioning, the entropy is

Using the ``where attribute, divide into 2 subsets

Expected entropy after partitioning

Using the ``when attribute, divide into 3 subsets

Expected entropy after partitioning

true if Ac c Ac false otherwise

Do not treat well non-rectangular regions.

You might also like