0% found this document useful (0 votes)
100 views36 pages

Chapter1 PDF

This document introduces a course on machine learning with tree-based models in R. The course will cover classification and regression trees, bagged trees, random forests, and boosted trees. Students will learn to interpret and explain model decisions, explore use cases, build and evaluate models, and tune parameters for optimal performance. Key concepts covered include decision tree terminology, training decision trees in R using the rpart package, evaluating model performance on test data using metrics like accuracy and confusion matrices. The use of splitting criteria like the Gini index to determine the best splits in trees is also discussed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views36 pages

Chapter1 PDF

This document introduces a course on machine learning with tree-based models in R. The course will cover classification and regression trees, bagged trees, random forests, and boosted trees. Students will learn to interpret and explain model decisions, explore use cases, build and evaluate models, and tune parameters for optimal performance. Key concepts covered include decision tree terminology, training decision trees in R using the rpart package, evaluating model performance on test data using metrics like accuracy and confusion matrices. The use of splitting criteria like the Gini index to determine the best splits in trees is also discussed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Welcome to the

course!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R

Erin LeDell & Gabriela de Queiroz


Machine Learning Scientist & Data
Scientist
Tree-based models
Interpretability + Ease-of-Use + Accuracy

Make Decisions + Numeric Predictions

MACHINE LEARNING WITH TREE-BASED MODELS IN R


What you'll learn:
Interpret and explain decisions

Explore different use cases

Build and evaluate classi cation and regression models

Tune model parameters for optimal performance

MACHINE LEARNING WITH TREE-BASED MODELS IN R


We will cover:
Classi cation & Regression Trees

Bagged Trees

Random Forests

Boosted Trees (GBM)

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Decision tree terminology: nodes

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Training Decision Trees in R
library("rpart")

help(package = "rpart")

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Training Decision Trees in R
rpart(response ~ ., data = dataset)

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Let's practice!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R
Introduction to
classi cation trees
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R

Gabriela de Queiroz
Instructor
Advantages
✔ Simple to understand, interpret, visualize

✔ Can handle both numerical and categorical features (inputs)


natively

✔ Can handle missing data elegantly

✔ Robust to outliers

✔ Requires little data preparation

✔ Can model non-linearity in the data

✔ Can be trained quickly on large datasets

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Disadvantages
✖ Large trees can be hard to interpret

✖ Trees have high variance, which causes model performance to be


poor

✖ Trees over t easily

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Will you wait for a table or go elsewhere?

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Restaurant Example

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Decision Tree in R

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Prediction example
The wait estimate is 20 minutes, no reservation was made, and it is
Wednesday

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Example

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Let's practice!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R
Overview of the
modeling process
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R

Gabriela de Queiroz
Instructor
Train/Test Split

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Train/test split in R
# Total number of rows in the restaurant data frame
n <- nrow(restaurant)

# Number of rows for the training set (80% of the dataset)


n_train <- round(0.80 * n)

# Set a random seed for reproducibility


set.seed(123)

# Create a vector of indices which is an 80% random sample


train_indices <- sample(1:n, n_train)

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Train/test split in R
# Subset the data frame to training indices only
restaurant_train <- restaurant[train_indices, ]

# Exclude the training indices to create the test set


restaurant_test <- restaurant[-train_indices, ]

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Train a Classi cation Tree
# train the model to predict the binary response, "will_wai

restaurant_model <- rpart(formula = will_wait ~.,


data = restaurant_train,
method = "class")

formula: response variable ~ predictor variables

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Let's practice!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R
Evaluate Model
Performance
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R

Gabriela de Queiroz
Instructor
Predicting class labels for test data
predict(model, test_dataset)

predict(model, test_dataset, type = ___)

class_pred <- predict(object = restaurant_model,


newdata = restaurant_test,
type = "class")

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Evaluation Metrics for Binary Classi cation
Accuracy

Confusion Matrix

Log-loss

AUC

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Accuracy
n of correct predictions
accuracy =
n of total data points

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Confusion Matrix

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Confusion Matrix

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Confusion Matrix
library(caret)

# Calculate the confusion matrix for the test set


confusionMatrix(data = class_pred,
reference = restaurant_test$will_wait)

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Let's practice!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R
Use of splitting
criterion in trees
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R

Gabriela de Queiroz
Instructor
Split the data into "pure" regions

MACHINE LEARNING WITH TREE-BASED MODELS IN R


How to determine the best split?

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Impurity Measure - Gini Index

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Let's practice!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R

You might also like