0% found this document useful (0 votes)
8 views26 pages

Machine Learning Lecture 2,3,4

Uploaded by

Asma Ayub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views26 pages

Machine Learning Lecture 2,3,4

Uploaded by

Asma Ayub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Decision Trees and Ensemble Methods in

Machine Learning
Lecture #2
Dr.Sadaqat Ali
Introduction to Decision Trees

● Decision trees are used for classification and regression


● They split data into subsets based on feature values
● Create a tree-like structure for making decisions
● A decision tree is a flowchart-like structure used to
make decisions or predictions
Introduction to Decision Trees
Structure of a Decision Tree

1. Root Node: Represents the entire dataset and the initial


decision to be made.
2. Internal Nodes: Represent decisions or tests on
attributes. Each internal node has one or more branches.
3. Branches: Represent the outcome of a decision or test,
leading to another node.
4. Leaf Nodes: Represent the final decision or prediction. No
further splits occur at these nodes.
How Decision Trees Work?

The process of creating a decision tree involves:

1. Selecting the Best Attribute: Using a metric like Gini impurity,


entropy, or information gain, the best attribute to split the data is
selected.
2. Splitting the Dataset: The dataset is split into subsets based on the
selected attribute.
3. Repeating the Process: The process is repeated recursively for each
subset, creating a new internal node or leaf node until a stopping
criterion is met (e.g., all instances in a node belong to the same class
Classification Trees

● Goal: Classify data into categories


● Use measures like Gini Impurity or Entropy
● Output: A class label
● Can you think of a situation where you'd
want to classify something?
Regression Trees

● Goal: Predict continuous numeric values


● Use measures like Mean Squared Error (MSE)
● Output: A predicted value
● When might predicting a number be useful in real life?
Advantages of Decision Trees

Advantages of Decision Trees


● Simplicity and Interpretability: Decision trees are easy to
understand and interpret. The visual representation closely
mirrors human decision-making processes.
● Versatility: Can be used for both classification and regression
tasks.
● No Need for Feature Scaling: Decision trees do not require
normalization or scaling of the data.
● Handles Non-linear Relationships: Capable of capturing
non-linear relationships between features and target variables.
Disadvantages of Decision Trees

● Prone to overfitting
● Sensitive to small data changes
● Small variations in the data can result in a
completely different tree being generated.
Introduction to Ensemble Methods

● Combine multiple models to improve predictions


● Often use "weak learners" as base models
● Aim to increase accuracy and robustness

“Ensemble means ‘a collection of things’ and in Machine


Learning terminology, Ensemble learning refers to the approach
of combining multiple ML models to produce a more accurate and
robust prediction compared to any individual model ”
Introduction to Ensemble Methods

What is Ensemble Learning with examples?


● Ensemble learning is a machine learning technique that
combines the predictions from multiple individual models to
obtain a better predictive performance than any single model.
Introduction to Ensemble Methods

● Ensemble learning combines multiple models (weak or strong learners) to improve


overall predictive performance, making it more accurate and robust than individual
models. The main idea is to reduce errors by leveraging the strengths of diverse
models.
Types of Ensemble Methods

● Bagging: Uses bootstrap samples of data


● Boosting: Builds models sequentially
● Stacking: Combines predictions with a meta-model
● Which method sounds most interesting to you and why?
Types of Ensemble Methods
1. Bagging (Bootstrap Aggregating):
○ Trains models independently on random subsets of data.
○ Reduces variance and prevents overfitting.
○ Example: Random Forest (majority vote for classification or averaging for regression).
2. Boosting:
○ Builds models sequentially, where each corrects the errors of the previous one.
○ Reduces bias and variance.
○ Examples: AdaBoost, XGBoost (popular in predictive tasks).
3. Stacking:
○ Combines predictions from multiple base models using a meta-model.
○ Uses diverse models to optimize performance.
○ Example: Logistic Regression as a meta-model over Decision Trees and SVM.
4. Voting:
○ Aggregates predictions from independent models.
○ Hard Voting: Majority vote.
○ Soft Voting: Weighted average of probabilities.
Random Forests

● Based on bagging with decision trees


● Uses random sampling of data and
features
● Combines predictions from multiple trees
● How might this reduce overfitting
compared to a single tree?
Advantages of Random Forests

● Reduces overfitting compared to single trees


● Handles both numerical and categorical data well
● Scalable to large datasets
● Which advantage do you think is most important?
Disadvantages of Random Forests

● Less interpretable than a single decision tree


● Can be computationally expensive
● May require more memory for large datasets
● Why might interpretability be important in some cases?
Introduction to Boosting

● Iterative technique to improve weak models


● Adjusts weights of data points and models
● Aims to minimize errors over time
● How is this different from random forests?
AdaBoost (Adaptive Boosting)

● Combines weak learners (e.g., shallow trees)


● Increases weight of misclassified samples
● Final prediction is a weighted vote
● Why might focusing on misclassified samples be helpful?
XGBoost (Extreme Gradient Boosting)

● Advanced, scalable version of boosting


● Uses gradient boosting with optimizations
● Handles missing data and uses parallel computation
● How might these features be useful for big datasets?
Advantages of Boosting

● Improves weak models iteratively


● Works well with smaller datasets
(AdaBoost)
● Highly efficient and scalable (XGBoost)
● Which advantage stands out to you most?
Disadvantages of Boosting

● Can be sensitive to noisy data


● Prone to overfitting if not regularized
● May require parameter tuning (XGBoost)
● How might these disadvantages affect real-world use?
Introduction to Stacking

● Combines predictions from multiple base models


● Uses a meta-model trained on base model outputs
● Allows blending of diverse models
● How is this different from boosting and bagging?
How Stacking Works

● Train multiple base models (e.g., trees,


SVM)
● Use their predictions as features for a
meta-model
● Final prediction comes from the meta-
model
● Can you think of a real-world analogy for
this process?
Advantages and Disadvantages of Stacking

● Advantage: Can outperform individual models


● Advantage: Allows blending diverse models
● Disadvantage: Complex and time-consuming
● Disadvantage: Risk of overfitting the meta-model
● Which do you think is more significant: the advantages or
disadvantages?
Comparing Ensemble Methods

● Random Forests: Robust but less interpretable


● AdaBoost: Improves weak models but sensitive to noise
● XGBoost: Scalable but requires tuning
● Stacking: Blends models but complex
● Based on this comparison, which method interests you most?

You might also like