0% found this document useful (0 votes)
15 views20 pages

Unit 4-2

Uploaded by

anusha.m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views20 pages

Unit 4-2

Uploaded by

anusha.m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

Tree Building – Regression,

Classification, Overfitting, Pruning


And Complexity, Multiple Decision
Trees.
Introduction to Tree Building

Decision trees are a versatile model


used for both classification and
regression tasks.

They work by splitting the data into


subsets based on feature values.

This presentation will explore key


concepts in tree building, including
overfitting and pruning.
What is Decision Tree?

A decision tree is a flowchart-like


structure used for decision-making.

It consists of nodes (decisions) and


leaves (outcomes).

Decision trees can be visualized


easily, making them intuitive for
interpretation.
Types of Decision Trees

Decision trees can be categorized into


classification trees and regression
trees.

Classification trees are used for


predicting categorical outcomes.

Regression trees predict continuous


numeric values based on input
features.
Decision Tree for Classification

In classification, the goal is to assign


labels to instances based on features.

Each internal node represents a test


on an attribute, leading to branches
for outcomes.

The final leaves represent the


predicted classes for the input data.
Decision Tree for Regression

Regression trees predict a continuous


value rather than a class label.

The splits in the tree are based on


minimizing the variance of the target
variable.

Each leaf node contains the average


of the target values for the instances
in that node.
How Trees Split Data

Trees use algorithms like CART


(Classification and Regression Trees)
to split data.

The splitting criterion can be Gini


impurity for classification and Mean
Squared Error for regression.

The objective is to maximize


information gain and minimize
impurity at each split.
Overfitting in Decision Trees

Overfitting occurs when a tree model


learns noise in the training data.

A tree that is too deep can capture


outliers, leading to poor
generalization.

It is crucial to find a balance between


model complexity and predictive
performance.
Signs of Overfitting

High accuracy on training data but


poor performance on validation data
indicates overfitting.

A very complex tree with numerous


splits is often a sign of overfitting.

Visual inspections of tree depth and


structure can help identify overfitting.
Pruning Techniques

Pruning reduces the size of the tree to


improve generalization.

There are two main types: pre-pruning


(stopping growth early) and post-
pruning (removing nodes after
construction).

Pruning helps to eliminate branches


that have little importance, thus
reducing complexity.
Pre-Pruning Strategies

Pre-pruning involves setting


conditions to halt tree growth during
the creation process.

Common conditions include limiting


the maximum depth or minimum
samples per leaf.

This approach helps to create a


simpler model before overfitting
occurs.
Post-Pruning Strategies

Post-pruning involves trimming the


tree after it has been fully grown.

Techniques such as cost complexity


pruning evaluate the contribution of
each subtree.

This can significantly enhance the


model's ability to generalize to
unseen data.
Complexity of Decision Trees

The complexity of decision trees is


determined by their depth and
number of leaves.

More complex trees can capture more


patterns but may lead to overfitting.

Understanding the trade-off between


complexity and performance is
essential for effective modeling.
Evaluating Tree Performance

Performance metrics for decision


trees include accuracy, precision,
recall, and F1 score.

For regression trees, metrics like


Mean Absolute Error (MAE) and R-
squared are used.

Evaluating performance on both


training and validation datasets is
crucial.
Ensemble Methods

Ensemble methods combine multiple


decision trees to improve
performance.

Techniques like Random Forests and


Gradient Boosting Trees are popular
examples.

These methods reduce the variance of


predictions and enhance
generalization.
Random Forests Overview

Random forests build multiple


decision trees using random subsets
of data and features.

Each tree votes on the output, leading


to a more robust prediction.

This approach helps mitigate


overfitting that might occur with
individual trees.
Gradient Boosting Trees

Gradient boosting builds trees


sequentially, with each tree correcting
errors from the previous one.

This method focuses on optimizing the


loss function, leading to high
accuracy.

Boosted trees are more susceptible to


overfitting, requiring careful tuning.
Handling Missing Values

Decision trees can effectively handle


missing values during splitting.

They can create surrogate splits or


utilize imputation techniques.

This feature makes decision trees a


robust choice for datasets with
incomplete data.
Practical Applications

Decision trees are widely used in


various domains, including finance,
healthcare, and marketing.

They are effective for customer


segmentation, credit scoring, and
predictive analytics.

Their interpretability makes them a


favored choice for applications
requiring clear decision paths.
Summary of Key Concepts

Decision trees are powerful tools for


both classification and regression
tasks.

Managing complexity through pruning


and using ensemble methods can
enhance performance.

Understanding the balance between


bias and variance is crucial in tree
building.

You might also like