0% found this document useful (0 votes)
8 views3 pages

Week14 - LAQs - SWR

Decision trees are supervised machine learning models used for classification and regression, characterized by their tree-like structure and interpretability. In R, decision trees can be created using the 'rpart' package, with steps including data preparation, model building, visualization, and evaluation. While they have advantages like ease of interpretation and minimal preprocessing, they are also prone to overfitting and sensitive to data variations.

Uploaded by

akashrs2604
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views3 pages

Week14 - LAQs - SWR

Decision trees are supervised machine learning models used for classification and regression, characterized by their tree-like structure and interpretability. In R, decision trees can be created using the 'rpart' package, with steps including data preparation, model building, visualization, and evaluation. While they have advantages like ease of interpretation and minimal preprocessing, they are also prone to overfitting and sensitive to data variations.

Uploaded by

akashrs2604
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Decision Trees in R

Decision trees are supervised machine learning models used for classification and
regression tasks. They work by splitting data into branches based on feature values,
forming a tree-like structure. Decision trees are simple, interpretable, and widely used in
data analysis.

1. Installing Required Packages

To work with decision trees in R, we commonly use the following packages:

install.packages("rpart") # For creating decision trees


install.packages("rpart.plot") # For visualizing trees
install.packages("caret") # For model evaluation

Load the required libraries:

library(rpart)
library(rpart.plot)
library(caret)

2. Creating a Decision Tree in R

Dataset: iris

The built-in iris dataset is often used for classification.

data(iris) # Load the dataset


str(iris) # View structure of data

Splitting Data into Training and Testing Sets


set.seed(123) # For reproducibility
index <- createDataPartition(iris$Species, p=0.7, list=FALSE)
train_data <- iris[index, ]
test_data <- iris[-index, ]

Building the Decision Tree Model

Using rpart(), we create a classification tree.

tree_model <- rpart(Species ~ ., data=train_data, method="class")


3. Visualizing the Decision Tree
rpart.plot(tree_model, type=2, extra=104, tweak=1.2)

 type=2: Shows labels at each node.


 extra=104: Displays class counts and percentages.
 tweak=1.2: Adjusts text size.

4. Making Predictions
predictions <- predict(tree_model, test_data, type="class")
head(predictions)

5. Evaluating Model Performance

Confusion Matrix
conf_matrix <- confusionMatrix(predictions, test_data$Species)
print(conf_matrix)

This provides accuracy, precision, recall, and F1-score.

6. Pruning the Decision Tree

Pruning helps prevent overfitting by trimming unnecessary branches.

printcp(tree_model) # Check complexity parameter (cp)


pruned_tree <- prune(tree_model, cp=0.01) # Adjust cp based on output
rpart.plot(pruned_tree, type=2, extra=104)

7. Decision Trees for Regression

Decision trees can also be used for predicting continuous values.

Example: Predicting mpg from mtcars dataset


data(mtcars)
reg_tree <- rpart(mpg ~ ., data=mtcars, method="anova") # Regression
tree
rpart.plot(reg_tree)
8. Advantages & Disadvantages

✅ Advantages

 Easy to interpret
 Requires little data preprocessing
 Works with both numerical and categorical data

❌ Disadvantages

 Prone to overfitting
 Sensitive to small data variations
 Can create biased splits with imbalanced datasets

9. Alternative Tree-Based Models

 Random Forest (randomForest package)


 Gradient Boosting Trees (gbm package)
 XGBoost (xgboost package)

You might also like