Decision Tree Part 1

Uploaded by

sondaravalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views16 pages

Decision Tree Part 1

Uploaded by

sondaravalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Classification

Mohd Vaseem
M.Tech IIT Delhi
Pursuing PhD IIT Kanpur
(Assistant Professor , NIFT Panchukla)
Classification
• A form of data analysis that extracts model or classifier to
predict class labels
– class labels categorical (discrete or nominal)
– classifies data based on training set and values in a
classifying attribute, and uses it in classifying new data
• Numeric Prediction
– models continuous-valued functions, i.e., predicts
unknown or missing values
• Typical applications
– Credit/loan approval: loan application is “safe” or “risky”
– Medical diagnosis: tumor is “cancerous” or “benign”
– Fraud detection: transaction is “fraudulent”
Supervised vs. Unsupervised Learning
• Supervised learning (classification)
– Supervision: Training data is accompanied by labels
indicating the class of the observations
– New data is classified based on the training set

• Unsupervised learning (clustering)

– Class labels of training data is unknown
– Given a set of observations, the aim is to establish existence
of classes or clusters in the data
Classification— Two-Step Process
• Model construction: Describe a set of predetermined classes
– Each tuple is assumed to belong to a predefined class, as determined by
the class label attribute
– The model is represented as classification rules, decision trees, or
mathematical formulae

• Model usage: Classify future or unknown objects

– Estimate accuracy of the model
• The known label of test sample is compared with the classified result
from the model
• Accuracy = percentage of test set samples that are correctly
classified by the model
• Test set is independent of training set (otherwise overfitting)
– If the accuracy is acceptable, use the model to classify new data
Phase 1: Model Construction
Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
5
Phase 2: Model Usage
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes 6
Classification
Linearly Separable Not Linearly Separable
Decision Trees
Divides the feature space by Internal nodes
axes aligned decision boundaries branch (test on attributes)
Each rectangular region is (outcome of the test)

labeled with one label

1 Width > 6.5 cm?
1
Yes No
2
2 Height > 9.5 cm? 3 Height > 6.0 cm?
Yes No Yes No

Leaf node
(class label)

Decision tree: a flowchart-like tree structure

Not Linearly Separable
Decision Trees
• If-Then Rules
– If Width> 6.5 cm AND
Height> 9.5 cm THEN Lemon Width > 6.5 cm?
– If Width> 6.5 cm AND Yes No
Height≤ 9.5 cm THEN Orange
– If Width≤ 6.5 cm AND Height > 9.5 cm? Height > 6.0 cm?
Height> 6.0 cm THEN Lemon Yes Yes No
No
– If Width≤ 6.5 cm AND
Height≤ 6.0 cm THEN Orange
Example
• Whether a customer will wait for a table at a restaurant?
• Attributes:
1. Alternate: is there an alternative restaurant nearby?
2. Bar: is there a comfortable bar area to wait in?
3. Fri/Sat: is today Friday or Saturday?
4. Hungry: are we hungry?
5. Patrons: number of people in the restaurant (None, Some, Full)
6. Price: price range ($, $$, $$$)
7. Raining: is it raining outside?
8. Reservation: have we made a reservation?
9. Type: kind of restaurant (French, Italian, Thai, Burger)
10. Wait Estimate: estimated waiting time (0-10 min, 10-30, 30-60, >60)
Example
Which Tree is Better?
The tree to decide whether to
wait (T) or not (F)
What Makes a Good Tree?
• Not too big:
– computational efficiency (avoid redundant, spurious attributes)
– avoid overfitting training examples
– generalise well to new/unseen observations
– easy to understand and interpret
• Not too small:
– need to handle important but possibly subtle distinctions in data
• Occam's Razor: "the simplest explanation is most likely
the right one"
– find the simplest hypothesis (smallest tree) that fits the observations
Learning Decision Trees
• Learning the simplest (smallest) decision tree is an NP
complete problem (Hyal & Rivest,1976)
• Resort to a greedy heuristic:
– Start from an empty decision tree
– Split on next best attribute
– Recurse
• What is best attribute?
• We use information theory to guide us
– ID3 (Iterative Dichotomiser) – Information Gain
– C4.5 – Gain Ratio
– Classification and Regression Trees (CART) – Gini index
Decision Tree Learning Algorithm
• Simple, greedy, recursive approach, builds up tree
node-by-node
1. pick an attribute to split at a non-terminal node
2. split examples into groups based on attribute
value
3. for each group:
– if no examples - return majority from parent
– else if all examples in same class - return class
– else loop to Step 1
Choosing a Good Attribute
• Which attribute is better to split on, X1 or X2?

Pure Node

Idea:
1. use counts at leaves to define probability distributions, so we
can measure uncertainty
2. a good attribute splits the examples into subsets that are
(ideally) pure

7 Classification
100% (3)
7 Classification
63 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Classification and Prediction
No ratings yet
Classification and Prediction
14 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Classification
No ratings yet
Classification
23 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
No ratings yet
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
60 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
Week 5
No ratings yet
Week 5
72 pages
Classification
No ratings yet
Classification
81 pages
Data Mining Book
No ratings yet
Data Mining Book
84 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
DWM Unit-III
No ratings yet
DWM Unit-III
24 pages
V1-CH-6-Classification and Prediction
No ratings yet
V1-CH-6-Classification and Prediction
38 pages
Data Mining Classification Algorithms: Credits: Padhraic Smyth
No ratings yet
Data Mining Classification Algorithms: Credits: Padhraic Smyth
54 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
Classification
No ratings yet
Classification
33 pages
Lec.7.intro.D.S. Fall 2023
No ratings yet
Lec.7.intro.D.S. Fall 2023
26 pages
What Is Classification? What Is Prediction?
No ratings yet
What Is Classification? What Is Prediction?
36 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Unit Iii
No ratings yet
Unit Iii
11 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
CH 5
No ratings yet
CH 5
84 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
ML-Lec-06-Supervised Learning-Decision Trees
No ratings yet
ML-Lec-06-Supervised Learning-Decision Trees
45 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
DM Module-3 Notes
No ratings yet
DM Module-3 Notes
25 pages
Down 4
No ratings yet
Down 4
83 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
IntroClassificationDA 2024
No ratings yet
IntroClassificationDA 2024
129 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
Introduction To Principal Components and Factoranalysis
No ratings yet
Introduction To Principal Components and Factoranalysis
29 pages
Module 04
No ratings yet
Module 04
75 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Roaini Alkazam Spss1-2413351138
No ratings yet
Roaini Alkazam Spss1-2413351138
29 pages
Influence of Age Financial Status and Gender On Ac
No ratings yet
Influence of Age Financial Status and Gender On Ac
5 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
41 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
4th Sem Detailed Syllabus (B. Sc. in Data Science)
No ratings yet
4th Sem Detailed Syllabus (B. Sc. in Data Science)
5 pages
Machine Learning: B.Tech (CSBS) V Semester
No ratings yet
Machine Learning: B.Tech (CSBS) V Semester
17 pages
Classification
No ratings yet
Classification
75 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
EOT Part 1
No ratings yet
EOT Part 1
32 pages
Lecture 5 Statistics
0% (1)
Lecture 5 Statistics
52 pages
Advanced Statistics Project Report
100% (1)
Advanced Statistics Project Report
42 pages
Notes - Introduction To AI, ML, DS
No ratings yet
Notes - Introduction To AI, ML, DS
61 pages
Econometrics Final Exam Study Guide PDF
No ratings yet
Econometrics Final Exam Study Guide PDF
14 pages
IDM Endterm Main
No ratings yet
IDM Endterm Main
36 pages
PORTFOLIO
No ratings yet
PORTFOLIO
91 pages
9516 Golf Tournament Paper
No ratings yet
9516 Golf Tournament Paper
11 pages
Walpole Ch-12 KZ
No ratings yet
Walpole Ch-12 KZ
33 pages
Souled Store
No ratings yet
Souled Store
31 pages
Probability: PSYB07 Gabriel Baylon October 2, 2013
No ratings yet
Probability: PSYB07 Gabriel Baylon October 2, 2013
9 pages
UNIQLO
No ratings yet
UNIQLO
26 pages
Handouts Forecasting Fundamentals
No ratings yet
Handouts Forecasting Fundamentals
27 pages
Anova: Descriptives
No ratings yet
Anova: Descriptives
8 pages
Pengaruh Mekanisme Corporate Governance, Ukuran Perusahaan Dan Debt To Equity Ratio Terhadap Konservatisme Akuntansi
No ratings yet
Pengaruh Mekanisme Corporate Governance, Ukuran Perusahaan Dan Debt To Equity Ratio Terhadap Konservatisme Akuntansi
18 pages
Milling Process
No ratings yet
Milling Process
17 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
DSA5205 2 DIstribution&Risk
No ratings yet
DSA5205 2 DIstribution&Risk
59 pages
5 - Template For DFT GPs For Craft Related Topic
No ratings yet
5 - Template For DFT GPs For Craft Related Topic
12 pages
AI in Ecommerce
No ratings yet
AI in Ecommerce
10 pages
Bharat Tex
No ratings yet
Bharat Tex
10 pages
Lesson 3 - Quality Control and Calculations - Schoology
No ratings yet
Lesson 3 - Quality Control and Calculations - Schoology
3 pages
05c 9FM0-3B Further Statistics 1 Mock Mark Scheme
No ratings yet
05c 9FM0-3B Further Statistics 1 Mock Mark Scheme
9 pages
Session 3
No ratings yet
Session 3
10 pages
MOF123
No ratings yet
MOF123
33 pages
Numerical Exercises For Cutting Parameters Optimization
No ratings yet
Numerical Exercises For Cutting Parameters Optimization
40 pages
Chap03 E.57 StockComparison
No ratings yet
Chap03 E.57 StockComparison
4 pages
Datesheet
No ratings yet
Datesheet
4 pages
Cold Pulp Testing Is The Simplest and Most Accurate of All Dental Pulp Sensibility Tests
No ratings yet
Cold Pulp Testing Is The Simplest and Most Accurate of All Dental Pulp Sensibility Tests
2 pages
Sure Shot
No ratings yet
Sure Shot
4 pages
Tutorial Exercise II
No ratings yet
Tutorial Exercise II
2 pages
Common Statistical Tests Encountered in Chiropractic Research Michael T. Haneline, DC, MPH
No ratings yet
Common Statistical Tests Encountered in Chiropractic Research Michael T. Haneline, DC, MPH
12 pages
MODIFIED - STUDY GUIDE FOR FINAL EXAM - English
No ratings yet
MODIFIED - STUDY GUIDE FOR FINAL EXAM - English
3 pages
Session
No ratings yet
Session
34 pages
ML Mid Papers
No ratings yet
ML Mid Papers
3 pages
Ism Research Assessment 3
No ratings yet
Ism Research Assessment 3
27 pages
Week Time Table (01: Global Marketing and E-Commerce
No ratings yet
Week Time Table (01: Global Marketing and E-Commerce
1 page
3 - Basic Designs
No ratings yet
3 - Basic Designs
5 pages
Reliability and Validity
No ratings yet
Reliability and Validity
17 pages
Homework Assignment 2
No ratings yet
Homework Assignment 2
8 pages
Crush Hypothesis Testing
From Everand
Crush Hypothesis Testing
Allison Dillard
No ratings yet
AQA Psychology A Level – Research Methods: Practice Questions
From Everand
AQA Psychology A Level – Research Methods: Practice Questions
Sheila Thomas
No ratings yet
Measurement - Task Sheets Gr. 3-5
From Everand
Measurement - Task Sheets Gr. 3-5
Chris Forest
No ratings yet