Model Answers For Chapter 7: CLASSIFICATION AND REGRESSION TREES

This document provides model answers for questions about classification and regression trees (CART) applied to a flight delay dataset. It notes which variables cannot be used to predict new flights and explains the pruned trees produced. The best pruned tree for one dataset contains a single node classifying all flights as on time, while the best pruned tree for another dataset requires knowing additional flight details to make predictions.

Uploaded by

Test Test

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

122 views3 pages

Model Answers For Chapter 7: CLASSIFICATION AND REGRESSION TREES

Uploaded by

Test Test

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Model Answers for chapter 7: CLASSIFICATION AND REGRESSION TREES

Note 1: the variable DEP_TIME (actual departure time) cannot be used for predicting
new flights, unless we are classifying them after their departure. For this reason we
omit it from the model.

Note 2: Once binned dummies are used for the scheduled departure time, we
remove the original variable (CRS_DEP_TIME) from the analysis.

Refer to “Data_Partition1” excel sheet in 7.2_Flightdelay.

Answer to 7.2.a:

Refer to “CT_PruneTree1” and “CT_Output1” excel sheets in

7.2_Flightdelay.

Note: If variable names are too long, they might be truncated when they appear on
the tree. To see their full name, examine the “Best Pruned Tree Rules” table at the
bottom of worksheet “CT_Output1”.

If (Origin_DCA > 0.5) then classify as ontime

If (Origin_DCA < 0.5) and (Day_OF_MONTH < 3.5) then classify as ontime
If (Origin_DCA < 0.5) and (5.5 < Day_OF_MONTH < 14.5) then classify as ontime
If (Origin_DCA < 0.5) and (14.5 < Day_OF_MONTH < 24.5) then classify as
ontime
If (Origin_DCA < 0.5) and (24.5 < Day_OF_MONTH < 27.5) then classify as
delayed
If (Origin_DCA < 0.5) and (Day_OF_MONTH > 27.5) then classify as ontime
If (Origin_DCA < 0.5) and (3.5 < Day_OF_MONTH < 5.5) and (Distance < 220.5)
and (BinnedDEP_Time2 <0.5) then classify as delayed
[Note: BinnedDEP_Time2 means 8AM-10-AM]
If (Origin_DCA < 0.5) and (3. 5 < Day_OF_MONTH < 5.5) and (Distance < 220.5)
and (BinnedDEP_Time2 > 0.5) then classify as ontime
If (Origin_DCA < 0.5) and (3.5 < Day_OF_MONTH < 5.5) and (Distance > 220.5)
then classify as ontime

Answer to 7.2.b:

We cannot use this tree, because we must know the day of month and distance
traveled (although the last can be inferred from the route DCA-EWR). The redundant
information is the day of week (Monday) and arrival airport (EWR). The tree requires
knowing whether the departure airport is DCA or not, the day of month, the distance
traveled (if it is more than 220.5 or not), and the departure time (whether it is
between 8AM-10AM or not).

Refer to the “CT_PruneTree2” sheet in 7.2_Flightdelay.xls.

Answer to 7.2.c.i:

In the best-pruned tree we get a single terminal node labeled “ontime.” Therefore
any new flight will be classified as being “on time”.

Answer to 7.2.c.ii:

This is equivalent to the naïve rule, which is the majority rule. In this dataset most of
the flights arrived on time, and therefore the naïve rule is to classify a new flight as
arriving on time.

Answer to 7.2.c.iii:

Refer to “CT_FullTree2” excel sheet in 7.2_Flightdelay.

Top three predictors are ORIGIN_DCA, DEST_JFK and CARRIER_MQ.

Answer to 7.2.c.iv:

The pruned tree results in a single node because adding splits does not reduce the
classification error on the validation set (see sheet CT_PruneLog2”). With one node
the validation error rate is 19.5%; adding nodes increase the error to 20.8%.
Answer to 7.2.c.v:

A full-grown tree leads to over fitting of the training data, which will lead to poor
performance on new data. In contrast, the best-pruned tree is obtained by
assessing the classification accuracy of the tree on the validation set, and therefore
avoids over-fitting.

Answer to 7.2.c.vi:

Our second classification tree coincides with the naïve rule (it has a single “ontime”
node). Considering that 80.55% of the flights are on time in the full data set, the
error rate for this tree's rule of "every flight is on time" is only 19.45. The logistic
regression has only a slightly lower overall error rate when examining the validation
data (18%). So it could be that there is little predictive power in the predictor
variables (regardless of method). In addition, logistic regression's improvement
might be due to the different pre-processing of the data. For example, in the logistic
regression the days of week are grouped into two (“Sunday or Monday” vs. “Other”)
whereas in the tree we have six dummies. Also, the departure time in the logistic
regression is broken down into 16 bins, whereas in the classification tree it uses 8
bins. Finally, because the dataset is not very large, a model-based method such as
logistic regression (which imposes more structure) is likely to have more accuracy
than a data-driven method, like the classification tree.

Introductory Econometrics A Modern Approach 6ed. Edition Wooldridge J.M Instant Download
No ratings yet
Introductory Econometrics A Modern Approach 6ed. Edition Wooldridge J.M Instant Download
47 pages
Exam All Questions
No ratings yet
Exam All Questions
566 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
Assignment 2 Solution
No ratings yet
Assignment 2 Solution
6 pages
Ue22cs342aa2 20241114095341
No ratings yet
Ue22cs342aa2 20241114095341
23 pages
Gabriel Otieno Okello - Statistical Methods Using SPSS-Chapman and Hall - CRC (2024)
No ratings yet
Gabriel Otieno Okello - Statistical Methods Using SPSS-Chapman and Hall - CRC (2024)
204 pages
DMcase 2
No ratings yet
DMcase 2
5 pages
Chapter 8 Simple Linear Regression
100% (1)
Chapter 8 Simple Linear Regression
39 pages
ECON1313 Individual Assignment-Time Series
100% (1)
ECON1313 Individual Assignment-Time Series
25 pages
All Life Bank - AIML - ML - Project - Low - Code - Notebook
No ratings yet
All Life Bank - AIML - ML - Project - Low - Code - Notebook
78 pages
620 Case Study2
No ratings yet
620 Case Study2
2 pages
Churn Data
100% (1)
Churn Data
56 pages
Advanced Econometrics I: Tesfaye Chofana (PHD)
No ratings yet
Advanced Econometrics I: Tesfaye Chofana (PHD)
59 pages
Machine Learning For Structural Engineering (April 2022)
No ratings yet
Machine Learning For Structural Engineering (April 2022)
44 pages
Flightdelays
No ratings yet
Flightdelays
2 pages
R Studio How To
No ratings yet
R Studio How To
12 pages
Untitled
No ratings yet
Untitled
1,326 pages
Objects Oriented Programming OOP
No ratings yet
Objects Oriented Programming OOP
67 pages
Regression BTW SHRM V Ip
No ratings yet
Regression BTW SHRM V Ip
6 pages
Interview Questions AI
No ratings yet
Interview Questions AI
7 pages
Moderation Reporting-Results
No ratings yet
Moderation Reporting-Results
5 pages
5103A1
No ratings yet
5103A1
6 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
Forec
No ratings yet
Forec
6 pages
Regression Techniques
No ratings yet
Regression Techniques
14 pages
Lecture 2 & 3: Simple Linear Regression: Gumilang Aryo Sahadewo
No ratings yet
Lecture 2 & 3: Simple Linear Regression: Gumilang Aryo Sahadewo
55 pages
Icecream Case: Process
No ratings yet
Icecream Case: Process
8 pages
Jasp 1
No ratings yet
Jasp 1
7 pages
Logistic Regression: Classification
No ratings yet
Logistic Regression: Classification
28 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Random Forest
No ratings yet
Random Forest
30 pages
Lecture 10 Clustering and Classification
No ratings yet
Lecture 10 Clustering and Classification
41 pages
DS+C25 PGDDS+Masters
No ratings yet
DS+C25 PGDDS+Masters
13 pages
Art 3653 PDF
No ratings yet
Art 3653 PDF
13 pages
Tugas Praktikum Ekonometrika 2
No ratings yet
Tugas Praktikum Ekonometrika 2
18 pages
Business Report Project - Sheetal - SMDM
100% (1)
Business Report Project - Sheetal - SMDM
20 pages
Chapter6 28doe 29 Week14b
No ratings yet
Chapter6 28doe 29 Week14b
32 pages
ECON1150 Lec 02
No ratings yet
ECON1150 Lec 02
5 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
PYF Project LearnerNotebook LowCode
No ratings yet
PYF Project LearnerNotebook LowCode
6 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
Machine Learning: Notes by Aniket Sahoo - Part II
No ratings yet
Machine Learning: Notes by Aniket Sahoo - Part II
140 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Hackathon Overall Travel Experience of Traveling in Shinkansen Bullet Train Merging Two Data Set
No ratings yet
Hackathon Overall Travel Experience of Traveling in Shinkansen Bullet Train Merging Two Data Set
59 pages
Answer 1722791857 NLP and Classification Practical MCQ 4991
No ratings yet
Answer 1722791857 NLP and Classification Practical MCQ 4991
26 pages
The Director of Marketing at Vanguard Corporation Believes That Sales
No ratings yet
The Director of Marketing at Vanguard Corporation Believes That Sales
1 page
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Machine Learning Mini-Project Report
No ratings yet
Machine Learning Mini-Project Report
26 pages
Applied Regression Analysis and Generalized Linear Models, 3rd Edition Annotated PDF Download
100% (10)
Applied Regression Analysis and Generalized Linear Models, 3rd Edition Annotated PDF Download
16 pages
Capstone Notes-2
No ratings yet
Capstone Notes-2
27 pages
What Methods Are Most Frequently Used in Research in Criminology and Criminal Justice?
No ratings yet
What Methods Are Most Frequently Used in Research in Criminology and Criminal Justice?
6 pages
Capstone Notes-Model
No ratings yet
Capstone Notes-Model
20 pages
Week 7 - Tree-Based Model
100% (1)
Week 7 - Tree-Based Model
8 pages
Classification Demo
No ratings yet
Classification Demo
4 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
70 534
No ratings yet
70 534
33 pages
Assignment 2 PDF
No ratings yet
Assignment 2 PDF
25 pages
Econometrics Final Assignment
No ratings yet
Econometrics Final Assignment
4 pages
CRISP DM Business Aissgnment
No ratings yet
CRISP DM Business Aissgnment
18 pages
Syllabus
No ratings yet
Syllabus
3 pages
Python For Data Analytics
No ratings yet
Python For Data Analytics
3 pages
Business Report: Advanced Statistics Module Project I
100% (1)
Business Report: Advanced Statistics Module Project I
5 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
d2 - 1 PDF
No ratings yet
d2 - 1 PDF
5 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
SMDM Guided Project Sample Business Report
No ratings yet
SMDM Guided Project Sample Business Report
17 pages
Assignment #5: Printed
No ratings yet
Assignment #5: Printed
2 pages
Decision Trees For Predictive Modeling (Neville)
100% (1)
Decision Trees For Predictive Modeling (Neville)
24 pages
Module 1 Quiz
No ratings yet
Module 1 Quiz
7 pages
Capstone Project
0% (1)
Capstone Project
6 pages
X Education - Lead Scoring Case Study
No ratings yet
X Education - Lead Scoring Case Study
24 pages
Data Mining Problem 2 Report
No ratings yet
Data Mining Problem 2 Report
13 pages
Ch15 Evans BA1e Case Solution
100% (1)
Ch15 Evans BA1e Case Solution
12 pages
Network Analytics - Problem Statement
No ratings yet
Network Analytics - Problem Statement
4 pages
K Mean Clustering 1
100% (1)
K Mean Clustering 1
12 pages
Enterpreneurship and Innovations in The Digital Transformation Age
No ratings yet
Enterpreneurship and Innovations in The Digital Transformation Age
10 pages
Vinee
100% (1)
Vinee
28 pages
12 Outlier
No ratings yet
12 Outlier
55 pages
Market Segmentation - Product Service Management
No ratings yet
Market Segmentation - Product Service Management
16 pages
Churn Analysis in Telecommunication Using Logistic Regression
No ratings yet
Churn Analysis in Telecommunication Using Logistic Regression
6 pages
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
17 pages
Machine Lpipearning Interview Questions: Algorithms/Tp: Q1-What's The Trade-Off Between Bias and Variance?
No ratings yet
Machine Lpipearning Interview Questions: Algorithms/Tp: Q1-What's The Trade-Off Between Bias and Variance?
46 pages
Capstone Presentation
No ratings yet
Capstone Presentation
9 pages
Ch7 Evans BA1e Case Solution
No ratings yet
Ch7 Evans BA1e Case Solution
1 page
Simple Regression Quiz
No ratings yet
Simple Regression Quiz
6 pages
Market Segmentation For Airlines
No ratings yet
Market Segmentation For Airlines
1 page

Model Answers For Chapter 7: CLASSIFICATION AND REGRESSION TREES

Uploaded by

Model Answers For Chapter 7: CLASSIFICATION AND REGRESSION TREES

Uploaded by

Model Answers for chapter 7: CLASSIFICATION AND REGRESSION TREES

Refer to “Data_Partition1” excel sheet in 7.2_Flightdelay.

Refer to “CT_PruneTree1” and “CT_Output1” excel sheets in

If (Origin_DCA > 0.5) then classify as ontime

Refer to the “CT_PruneTree2” sheet in 7.2_Flightdelay.xls.

Refer to “CT_FullTree2” excel sheet in 7.2_Flightdelay.

Top three predictors are ORIGIN_DCA, DEST_JFK and CARRIER_MQ.

You might also like